Compare commits

...

51 Commits

Author SHA1 Message Date
Philipp Hagemeister
627a209f74 release 2014.03.20 2014-03-20 16:35:54 +01:00
Philipp Hagemeister
1a4895453a [YoutubeDL] Improve error message 2014-03-20 16:33:46 +01:00
Philipp Hagemeister
aab74fa106 [ted] Simplify embed code (#2587) 2014-03-20 16:33:23 +01:00
Philipp Hagemeister
2bd9efd4c2 Merge remote-tracking branch 'anovicecodemonkey/TEDIEimprovements' 2014-03-20 16:24:34 +01:00
Jaime Marquínez Ferrándiz
39a743fb9b [arte] Modernize tests and fix _VALID_REGEX 2014-03-20 09:14:43 +01:00
Jaime Marquínez Ferrándiz
4966a0b22d [arte] Add extractor for concert.arte.tv (closes #2588) 2014-03-20 09:11:47 +01:00
anovicecodemonkey
fc26023120 [TEDIE] Add support for embeded TED video URLs 2014-03-20 01:04:21 +10:30
anovicecodemonkey
8d7c0cca13 [generic] Add support for embeded TED videos 2014-03-20 00:56:32 +10:30
Sergey M․
f66ede4328 [arte.tv:+7] Fix _VALID_URL 2014-03-19 21:23:55 +07:00
Philipp Hagemeister
cc88b90ec8 [desvscripts/release] Bump the number of password tries to accomodate stubby-fingered @phihag 2014-03-18 15:02:37 +01:00
Philipp Hagemeister
b6c5fa9a0b release 2014.03.18.1 2014-03-18 14:42:59 +01:00
Philipp Hagemeister
dff10eaa77 release 2014.03.18 2014-03-18 14:31:03 +01:00
Philipp Hagemeister
4e6f9aeca1 Fix typo 2014-03-18 14:28:53 +01:00
Philipp Hagemeister
e68301af21 Fix getpass on Windows (Fixes #2547) 2014-03-18 14:27:42 +01:00
Sergey M․
17286a96f2 [iprima] Fix permission check regex 2014-03-18 19:33:28 +07:00
Jaime Marquínez Ferrándiz
0892363e6d Merge pull request #2580 from ericpardee/patch-1
Update to comedycentral.py (cc.com)
2014-03-18 08:14:39 +01:00
ericpardee
f102372b5f Update to comedycentral.py (cc.com)
Added cc.com as it's same as comedycentral.com and used, i.e. http://www.cc.com/video-clips/fmyq0m/broad-city-a-beautiful-railroad-style-apartment
2014-03-17 18:01:26 -07:00
Jaime Marquínez Ferrándiz
ecbe1ad207 [generic] Fix access to removed function in python 3.4
The `Request.get_origin_req_host` method was deprecated in 3.3, use the
 `origin_req_host` property if it's not available, see http://docs.python.org/3.3/library/urllib.request.html#urllib.request.Request.get_origin_req_host.
2014-03-17 21:59:21 +01:00
Philipp Hagemeister
9d840c43b5 release 2014.03.17 2014-03-17 14:49:02 +01:00
Philipp Hagemeister
6f50f63382 Merge remote-tracking branch 'origin/wheels' 2014-03-17 14:31:22 +01:00
Philipp Hagemeister
ff14fc4964 [test] Rename get_testcases to gettestcases
Apparently, newer versions of nosetests are somewhat over-eager in their test discovery.
2014-03-17 14:30:13 +01:00
Sergey M․
e125c21531 [vesti] Restore vesti extractor 2014-03-17 02:01:01 +07:00
Sergey M․
93d020dd65 [generic] Add support for embedded rutv player 2014-03-17 02:00:31 +07:00
Sergey M․
a7515ec265 [rutv] Refactor vgtrk/rutv extractor 2014-03-17 01:59:40 +07:00
Jaime Marquínez Ferrándiz
b6c1ceccc2 [ted] Add 'http://' to the thumbnail url if it's missing 2014-03-16 11:24:11 +01:00
Jaime Marquínez Ferrándiz
4056ad8f36 Build and upload universal wheels to pypi 2014-03-16 10:22:41 +01:00
Philipp Hagemeister
6563837ee1 [udemy] Make sure test case is not inherited 2014-03-16 07:09:10 +01:00
Philipp Hagemeister
fd5e6f7ef2 [vevo] Mark all test timestamps as approximate 2014-03-16 07:05:48 +01:00
Sergey M․
15fd51b37c [generic] More generic support for embedded vimeo player (#1602) 2014-03-16 00:47:04 +07:00
Sergey M․
f1cef7a9ff [iprima] Skip test 2014-03-15 01:39:42 +07:00
Sergey M․
8264223511 [iprima] Add access permission check 2014-03-15 01:38:44 +07:00
Jaime Marquínez Ferrándiz
bc6d597828 Add bestvideo and worstvideo to special format names (#2163) 2014-03-14 17:01:47 +01:00
Philipp Hagemeister
aba77bbfc2 [vevo] Adapt test to constantly changing timestamp 2014-03-13 18:45:14 +01:00
Philipp Hagemeister
955c451456 Rename upload_timestamp to timestamp 2014-03-13 18:45:14 +01:00
Sergey M․
e5de3f6c89 [udemy] Initial support for free courses (#1617) 2014-03-14 00:36:39 +07:00
Philipp Hagemeister
2a1db721d4 [test_download] Move assertions before debugging output 2014-03-13 17:05:51 +01:00
Philipp Hagemeister
1e0eb60f1a [videobam] Fix empty title handling 2014-03-13 17:03:43 +01:00
Philipp Hagemeister
87a29e6f25 [wdr] Add description to tests 2014-03-13 17:01:58 +01:00
Philipp Hagemeister
c3d36f134f [googlesearch] Fix next page indicator check 2014-03-13 16:52:13 +01:00
Philipp Hagemeister
84769e708c [ninegag] Fix extraction 2014-03-13 16:40:53 +01:00
Philipp Hagemeister
9d2ecdbc71 [vevo] Centralize timestamp handling 2014-03-13 15:30:25 +01:00
Philipp Hagemeister
9b69af5342 Merge remote-tracking branch 'soult/br' 2014-03-13 14:35:34 +01:00
David Triendl
c21215b421 [br] Allow '/' in URL, allow empty author + broadcastDate fields
* Allow URLs that have a 'subdirectory' before the actual program name, e.g.
  'xyz/xyz-episode-1'.
* The author and broadcastDate fields in the XML file may be empty.
* Add test case for the two problems above.
2014-03-13 14:08:34 +01:00
Philipp Hagemeister
cddcfd90b4 [funnyordie] Correct JSON interpretation 2014-03-13 00:53:19 +01:00
Sergey M․
f36aacba0f [collegehumor] Fix one more test 2014-03-13 06:25:12 +07:00
Sergey M․
355271fb61 [collegehumor] Extract like count 2014-03-13 06:12:39 +07:00
Sergey M․
2a5b502364 [collegehumor] Fix test 2014-03-13 06:09:21 +07:00
Philipp Hagemeister
98ff9d82d4 release 2014.03.12 2014-03-12 14:50:14 +01:00
Jaime Marquínez Ferrándiz
b1ff87224c [vimeo] Now VimeoIE doesn't match urls of channels with a numeric id (fixes #2552) 2014-03-12 14:23:06 +01:00
Sergey M․
b461641fb9 [wdr] Add support for WDR sites (Closes #1367) 2014-03-12 04:20:47 +07:00
Sergey M․
b047de6f6e Add format to unified_strdate 2014-03-12 04:18:43 +07:00
32 changed files with 752 additions and 256 deletions

View File

@@ -191,9 +191,9 @@ which means you can modify it, redistribute it or use it however you like.
preference using slashes: "-f 22/17/18".
"-f mp4" and "-f flv" are also supported.
You can also use the special names "best",
"bestaudio", "worst", and "worstaudio". By
default, youtube-dl will pick the best
quality.
"bestvideo", "bestaudio", "worst",
"worstvideo" and "worstaudio". By default,
youtube-dl will pick the best quality.
--all-formats download all available video formats
--prefer-free-formats prefer free video formats unless a specific
one is requested

View File

@@ -70,7 +70,7 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
git checkout HEAD -- youtube-dl youtube-dl.exe
/bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..."
for f in $RELEASE_FILES; do gpg --detach-sig "build/$version/$f"; done
for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done
scp -r "build/$version" ytdl@yt-dl.org:html/tmp/
ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/"
ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
@@ -97,7 +97,7 @@ rm -rf build
make pypi-files
echo "Uploading to PyPi ..."
python setup.py sdist upload
python setup.py sdist bdist_wheel upload
make clean
/bin/echo -e "\n### DONE!"

2
setup.cfg Normal file
View File

@@ -0,0 +1,2 @@
[wheel]
universal = True

View File

@@ -71,7 +71,7 @@ class FakeYDL(YoutubeDL):
old_report_warning(message)
self.report_warning = types.MethodType(report_warning, self)
def get_testcases():
def gettestcases():
for ie in youtube_dl.extractor.gen_extractors():
t = getattr(ie, '_TEST', None)
if t:

View File

@@ -182,6 +182,24 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'vid-high')
def test_format_selection_video(self):
formats = [
{'format_id': 'dash-video-low', 'ext': 'mp4', 'preference': 1, 'acodec': 'none'},
{'format_id': 'dash-video-high', 'ext': 'mp4', 'preference': 2, 'acodec': 'none'},
{'format_id': 'vid', 'ext': 'mp4', 'preference': 3},
]
info_dict = {'formats': formats, 'extractor': 'test'}
ydl = YDL({'format': 'bestvideo'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-high')
ydl = YDL({'format': 'worstvideo'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-low')
def test_youtube_format_selection(self):
order = [
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '36', '17', '13',

View File

@@ -9,7 +9,7 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from test.helper import gettestcases
from youtube_dl.extractor import (
FacebookIE,
@@ -105,7 +105,7 @@ class TestAllURLsMatching(unittest.TestCase):
def test_no_duplicates(self):
ies = gen_extractors()
for tc in get_testcases():
for tc in gettestcases():
url = tc['url']
for ie in ies:
if type(ie).__name__ in ('GenericIE', tc['name'] + 'IE'):
@@ -124,6 +124,8 @@ class TestAllURLsMatching(unittest.TestCase):
def test_vimeo_matching(self):
self.assertMatch('http://vimeo.com/channels/tributes', ['vimeo:channel'])
self.assertMatch('http://vimeo.com/channels/31259', ['vimeo:channel'])
self.assertMatch('http://vimeo.com/channels/31259/53576664', ['vimeo'])
self.assertMatch('http://vimeo.com/user7108434', ['vimeo:user'])
self.assertMatch('http://vimeo.com/user7108434/videos', ['vimeo:user'])
self.assertMatch('https://vimeo.com/user21297594/review/75524534/3c257a1b5d', ['vimeo:review'])

View File

@@ -8,7 +8,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import (
get_params,
get_testcases,
gettestcases,
try_rm,
md5,
report_warning
@@ -51,7 +51,7 @@ def _file_md5(fn):
with open(fn, 'rb') as f:
return hashlib.md5(f.read()).hexdigest()
defs = get_testcases()
defs = gettestcases()
class TestDownload(unittest.TestCase):
@@ -144,6 +144,10 @@ def generator(test_case):
self.assertTrue(
isinstance(got, compat_str) and match_rex.match(got),
u'field %s (value: %r) should match %r' % (info_field, got, match_str))
elif isinstance(expected, type):
got = info_dict.get(info_field)
self.assertTrue(isinstance(got, expected),
u'Expected type %r, but got value %r of type %r' % (expected, got, type(got)))
else:
if isinstance(expected, compat_str) and expected.startswith('md5:'):
got = 'md5:' + md5(info_dict.get(info_field))
@@ -152,19 +156,19 @@ def generator(test_case):
self.assertEqual(expected, got,
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
# If checkable fields are missing from the test case, print the info_dict
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
for key, value in info_dict.items()
if value and key in ('title', 'description', 'uploader', 'upload_date', 'uploader_id', 'location'))
if not all(key in tc.get('info_dict', {}).keys() for key in test_info_dict.keys()):
sys.stderr.write(u'\n"info_dict": ' + json.dumps(test_info_dict, ensure_ascii=False, indent=4) + u'\n')
# Check for the presence of mandatory fields
for key in ('id', 'url', 'title', 'ext'):
self.assertTrue(key in info_dict.keys() and info_dict[key])
# Check for mandatory fields that are automatically set by YoutubeDL
for key in ['webpage_url', 'extractor', 'extractor_key']:
self.assertTrue(info_dict.get(key), u'Missing field: %s' % key)
# If checkable fields are missing from the test case, print the info_dict
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
for key, value in info_dict.items()
if value and key in ('title', 'description', 'uploader', 'upload_date', 'timestamp', 'uploader_id', 'location'))
if not all(key in tc.get('info_dict', {}).keys() for key in test_info_dict.keys()):
sys.stderr.write(u'\n"info_dict": ' + json.dumps(test_info_dict, ensure_ascii=False, indent=4) + u'\n')
finally:
try_rm_tcs_files()

View File

@@ -249,7 +249,7 @@ class TestPlaylists(unittest.TestCase):
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'python language')
self.assertEqual(result['title'], 'python language')
self.assertTrue(len(result['entries']) == 15)
self.assertEqual(len(result['entries']), 15)
def test_generic_rss_feed(self):
dl = FakeYDL()

View File

@@ -4,6 +4,7 @@
from __future__ import absolute_import, unicode_literals
import collections
import datetime
import errno
import io
import json
@@ -532,7 +533,7 @@ class YoutubeDL(object):
else:
raise
else:
self.report_error('no suitable InfoExtractor: %s' % url)
self.report_error('no suitable InfoExtractor for URL %s' % url)
def process_ie_result(self, ie_result, download=True, extra_info={}):
"""
@@ -666,6 +667,18 @@ class YoutubeDL(object):
if f.get('vcodec') == 'none']
if audio_formats:
return audio_formats[0]
elif format_spec == 'bestvideo':
video_formats = [
f for f in available_formats
if f.get('acodec') == 'none']
if video_formats:
return video_formats[-1]
elif format_spec == 'worstvideo':
video_formats = [
f for f in available_formats
if f.get('acodec') == 'none']
if video_formats:
return video_formats[0]
else:
extensions = ['mp4', 'flv', 'webm', '3gp']
if format_spec in extensions:
@@ -688,6 +701,11 @@ class YoutubeDL(object):
if 'display_id' not in info_dict and 'id' in info_dict:
info_dict['display_id'] = info_dict['id']
if info_dict.get('upload_date') is None and info_dict.get('timestamp') is not None:
upload_date = datetime.datetime.utcfromtimestamp(
info_dict['timestamp'])
info_dict['upload_date'] = upload_date.strftime('%Y%m%d')
# This extractors handle format selection themselves
if info_dict['extractor'] in ['Youku']:
if download:

View File

@@ -56,7 +56,6 @@ __authors__ = (
__license__ = 'Public Domain'
import codecs
import getpass
import io
import locale
import optparse
@@ -68,6 +67,7 @@ import sys
from .utils import (
compat_getpass,
compat_print,
DateRange,
decodeOption,
@@ -316,7 +316,7 @@ def parseOpts(overrideArguments=None):
video_format.add_option('-f', '--format',
action='store', dest='format', metavar='FORMAT', default=None,
help='video format code, specify the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported. You can also use the special names "best", "bestaudio", "worst", and "worstaudio". By default, youtube-dl will pick the best quality.')
help='video format code, specify the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported. You can also use the special names "best", "bestvideo", "bestaudio", "worst", "worstvideo" and "worstaudio". By default, youtube-dl will pick the best quality.')
video_format.add_option('--all-formats',
action='store_const', dest='format', help='download all available video formats', const='all')
video_format.add_option('--prefer-free-formats',
@@ -611,7 +611,7 @@ def _real_main(argv=None):
if opts.usetitle and opts.useid:
parser.error(u'using title conflicts with using video ID')
if opts.username is not None and opts.password is None:
opts.password = getpass.getpass(u'Type account password and press return:')
opts.password = compat_getpass(u'Type account password and press [Return]: ')
if opts.ratelimit is not None:
numeric_limit = FileDownloader.parse_bytes(opts.ratelimit)
if numeric_limit is None:

View File

@@ -10,6 +10,7 @@ from .arte import (
ArteTvIE,
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVConcertIE,
ArteTVFutureIE,
ArteTVDDCIE,
)
@@ -196,6 +197,7 @@ from .rutube import (
RutubeMovieIE,
RutubePersonIE,
)
from .rutv import RUTVIE
from .savefrom import SaveFromIE
from .servingsys import ServingSysIE
from .sina import SinaIE
@@ -242,13 +244,17 @@ from .tumblr import TumblrIE
from .tutv import TutvIE
from .tvigle import TvigleIE
from .tvp import TvpIE
from .udemy import (
UdemyIE,
UdemyCourseIE
)
from .unistra import UnistraIE
from .ustream import UstreamIE, UstreamChannelIE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vesti import VestiIE
from .vevo import VevoIE
from .vgtrk import VGTRKIE
from .vice import ViceIE
from .viddler import ViddlerIE
from .videobam import VideoBamIE
@@ -268,6 +274,7 @@ from .viki import VikiIE
from .vk import VKIE
from .vube import VubeIE
from .wat import WatIE
from .wdr import WDRIE
from .weibo import WeiboIE
from .wimp import WimpIE
from .wistia import WistiaIE

View File

@@ -131,7 +131,7 @@ class ArteTvIE(InfoExtractor):
class ArteTVPlus7IE(InfoExtractor):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
@classmethod
def _extract_url_info(cls, url):
@@ -202,6 +202,8 @@ class ArteTVPlus7IE(InfoExtractor):
re.match(r'VO-ST(F|A)', f.get('versionCode', '')) is None,
# The version with sourds/mal subtitles has also lower relevance
re.match(r'VO?(F|A)-STM\1', f.get('versionCode', '')) is None,
# Prefer http downloads over m3u8
0 if f['url'].endswith('m3u8') else 1,
)
formats = sorted(formats, key=sort_key)
def _format(format_info):
@@ -242,8 +244,9 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
_TEST = {
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
'file': '050489-002.mp4',
'info_dict': {
'id': '050489-002',
'ext': 'mp4',
'title': 'Agentur Amateur / Agence Amateur #2 : Corporate Design',
},
}
@@ -255,8 +258,9 @@ class ArteTVFutureIE(ArteTVPlus7IE):
_TEST = {
'url': 'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
'file': '050940-003.mp4',
'info_dict': {
'id': '050940-003',
'ext': 'mp4',
'title': 'Les champignons au secours de la planète',
},
}
@@ -270,7 +274,7 @@ class ArteTVFutureIE(ArteTVPlus7IE):
class ArteTVDDCIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:ddc'
_VALID_URL = r'http?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>.+)'
_VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>.+)'
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
@@ -284,3 +288,19 @@ class ArteTVDDCIE(ArteTVPlus7IE):
javascriptPlayerGenerator = self._download_webpage(script_url, video_id, 'Download javascript player generator')
json_url = self._search_regex(r"json_url=(.*)&rendering_place.*", javascriptPlayerGenerator, 'json url')
return self._extract_from_json_url(json_url, video_id, lang)
class ArteTVConcertIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:concert'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>de|fr)/(?P<id>.+)'
_TEST = {
'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
'md5': '9ea035b7bd69696b67aa2ccaaa218161',
'info_dict': {
'id': '186',
'ext': 'mp4',
'title': 'The Notwist im Pariser Konzertclub "Divan du Monde"',
'upload_date': '20140128',
},
}

View File

@@ -9,21 +9,35 @@ from ..utils import ExtractorError
class BRIE(InfoExtractor):
IE_DESC = "Bayerischer Rundfunk Mediathek"
_VALID_URL = r"^https?://(?:www\.)?br\.de/mediathek/video/(?:sendungen/)?(?P<id>[a-z0-9\-]+)\.html$"
_VALID_URL = r"^https?://(?:www\.)?br\.de/mediathek/video/(?:sendungen/)?(?:[a-z0-9\-/]+/)?(?P<id>[a-z0-9\-]+)\.html$"
_BASE_URL = "http://www.br.de"
_TEST = {
"url": "http://www.br.de/mediathek/video/anselm-gruen-114.html",
"md5": "c4f83cf0f023ba5875aba0bf46860df2",
"info_dict": {
"id": "2c8d81c5-6fb7-4a74-88d4-e768e5856532",
"ext": "mp4",
"title": "Feiern und Verzichten",
"description": "Anselm Grün: Feiern und Verzichten",
"uploader": "BR/Birgit Baier",
"upload_date": "20140301"
_TESTS = [
{
"url": "http://www.br.de/mediathek/video/anselm-gruen-114.html",
"md5": "c4f83cf0f023ba5875aba0bf46860df2",
"info_dict": {
"id": "2c8d81c5-6fb7-4a74-88d4-e768e5856532",
"ext": "mp4",
"title": "Feiern und Verzichten",
"description": "Anselm Grün: Feiern und Verzichten",
"uploader": "BR/Birgit Baier",
"upload_date": "20140301"
}
},
{
"url": "http://www.br.de/mediathek/video/sendungen/unter-unserem-himmel/unter-unserem-himmel-alpen-ueber-den-pass-100.html",
"md5": "ab451b09d861dbed7d7cc9ab0be19ebe",
"info_dict": {
"id": "2c060e69-3a27-4e13-b0f0-668fac17d812",
"ext": "mp4",
"title": "Über den Pass",
"description": "Die Eroberung der Alpen: Über den Pass",
"uploader": None,
"upload_date": None
}
}
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -33,16 +47,21 @@ class BRIE(InfoExtractor):
r"return BRavFramework\.register\(BRavFramework\('avPlayer_(?:[a-f0-9-]{36})'\)\.setup\({dataURL:'(/mediathek/video/[a-z0-9/~_.-]+)'}\)\);", page, "XMLURL")
xml = self._download_xml(self._BASE_URL + xml_url, None)
videos = [{
"id": xml_video.get("externalId"),
"title": xml_video.find("title").text,
"formats": self._extract_formats(xml_video.find("assets")),
"thumbnails": self._extract_thumbnails(xml_video.find("teaserImage/variants")),
"description": " ".join(xml_video.find("shareTitle").text.splitlines()),
"uploader": xml_video.find("author").text,
"upload_date": "".join(reversed(xml_video.find("broadcastDate").text.split("."))),
"webpage_url": xml_video.find("permalink").text,
} for xml_video in xml.findall("video")]
videos = []
for xml_video in xml.findall("video"):
video = {
"id": xml_video.get("externalId"),
"title": xml_video.find("title").text,
"formats": self._extract_formats(xml_video.find("assets")),
"thumbnails": self._extract_thumbnails(xml_video.find("teaserImage/variants")),
"description": " ".join(xml_video.find("shareTitle").text.splitlines()),
"webpage_url": xml_video.find("permalink").text
}
if xml_video.find("author").text:
video["uploader"] = xml_video.find("author").text
if xml_video.find("broadcastDate").text:
video["upload_date"] = "".join(reversed(xml_video.find("broadcastDate").text.split(".")))
videos.append(video)
if len(videos) > 1:
self._downloader.report_warning(

View File

@@ -17,8 +17,9 @@ class CollegeHumorIE(InfoExtractor):
'id': '6902724',
'ext': 'mp4',
'title': 'Comic-Con Cosplay Catastrophe',
'description': 'Fans get creative this year',
'description': "Fans get creative this year at San Diego. Too creative. And yes, that's really Joss Whedon.",
'age_limit': 13,
'duration': 187,
},
},
{
@@ -28,7 +29,7 @@ class CollegeHumorIE(InfoExtractor):
'id': '3505939',
'ext': 'mp4',
'title': 'Font Conference',
'description': 'This video wasn\'t long enough,',
'description': "This video wasn't long enough, so we made it double-spaced.",
'age_limit': 10,
'duration': 179,
},
@@ -87,6 +88,7 @@ class CollegeHumorIE(InfoExtractor):
self._sort_formats(formats)
duration = int_or_none(vdata.get('duration'), 1000)
like_count = int_or_none(vdata.get('likes'))
return {
'id': video_id,
@@ -96,4 +98,5 @@ class CollegeHumorIE(InfoExtractor):
'formats': formats,
'age_limit': age_limit,
'duration': duration,
'like_count': like_count,
}

View File

@@ -14,7 +14,7 @@ from ..utils import (
class ComedyCentralIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?comedycentral\.com/
_VALID_URL = r'''(?x)https?://(?:www\.)?(comedycentral|cc)\.com/
(video-clips|episodes|cc-studios|video-collections)
/(?P<title>.*)'''
_FEED_URL = 'http://comedycentral.com/feeds/mrss/'

View File

@@ -97,7 +97,9 @@ class InfoExtractor(object):
thumbnail: Full URL to a video thumbnail image.
description: One-line video description.
uploader: Full name of the video uploader.
timestamp: UNIX timestamp of the moment the video became available.
upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader.
location: Physical location of the video.
subtitles: The subtitle file contents as a dictionary in the format

View File

@@ -34,12 +34,14 @@ class FunnyOrDieIE(InfoExtractor):
if mobj.group('type') == 'embed':
post_json = self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details')
post = json.loads(post_json)['attachment']
post = json.loads(post_json)
title = post['name']
description = post.get('description')
thumbnail = post.get('picture')
else:
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = None
return {
'id': video_id,
@@ -47,4 +49,5 @@ class FunnyOrDieIE(InfoExtractor):
'ext': 'mp4',
'title': title,
'description': description,
'thumbnail': thumbnail,
}

View File

@@ -24,6 +24,7 @@ from ..utils import (
)
from .brightcove import BrightcoveIE
from .ooyala import OoyalaIE
from .rutv import RUTVIE
class GenericIE(InfoExtractor):
@@ -143,8 +144,34 @@ class GenericIE(InfoExtractor):
'ext': 'mp4',
'title': 'Between Two Ferns with Zach Galifianakis: President Barack Obama',
'description': 'Episode 18: President Barack Obama sits down with Zach Galifianakis for his most memorable interview yet.',
}
},
},
# RUTV embed
{
'url': 'http://www.rg.ru/2014/03/15/reg-dfo/anklav-anons.html',
'info_dict': {
'id': '776940',
'ext': 'mp4',
'title': 'Охотское море стало целиком российским',
'description': 'md5:5ed62483b14663e2a95ebbe115eb8f43',
},
'params': {
# m3u8 download
'skip_download': True,
},
},
# Embedded TED video
{
'url': 'http://en.support.wordpress.com/videos/ted-talks/',
'md5': 'deeeabcc1085eb2ba205474e7235a3d5',
'info_dict': {
'id': '981',
'ext': 'mp4',
'title': 'My web playroom',
'uploader': 'Ze Frank',
'description': 'md5:ddb2a40ecd6b6a147e400e535874947b',
}
}
]
def report_download_webpage(self, video_id):
@@ -170,9 +197,14 @@ class GenericIE(InfoExtractor):
newurl = newurl.replace(' ', '%20')
newheaders = dict((k,v) for k,v in req.headers.items()
if k.lower() not in ("content-length", "content-type"))
try:
# This function was deprecated in python 3.3 and removed in 3.4
origin_req_host = req.get_origin_req_host()
except AttributeError:
origin_req_host = req.origin_req_host
return HEADRequest(newurl,
headers=newheaders,
origin_req_host=req.get_origin_req_host(),
origin_req_host=origin_req_host,
unverifiable=True)
else:
raise compat_urllib_error.HTTPError(req.get_full_url(), code, msg, headers, fp)
@@ -324,9 +356,9 @@ class GenericIE(InfoExtractor):
# Look for embedded (iframe) Vimeo player
mobj = re.search(
r'<iframe[^>]+?src="((?:https?:)?//player\.vimeo\.com/video/.+?)"', webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/.+?)\1', webpage)
if mobj:
player_url = unescapeHTML(mobj.group(1))
player_url = unescapeHTML(mobj.group('url'))
surl = smuggle_url(player_url, {'Referer': url})
return self.url_result(surl, 'Vimeo')
@@ -451,6 +483,11 @@ class GenericIE(InfoExtractor):
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
# Look for embedded RUTV player
rutv_url = RUTVIE._extract_url(webpage)
if rutv_url:
return self.url_result(rutv_url, 'RUTV')
# Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None:
@@ -462,6 +499,13 @@ class GenericIE(InfoExtractor):
if mobj is None:
# Broaden the search a little bit: JWPlayer JS loader
mobj = re.search(r'[^A-Za-z0-9]?file["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage)
# Look for embedded TED player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://embed\.ted\.com/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'TED')
if mobj is None:
# Try to find twitter cards info
mobj = re.search(r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage)

View File

@@ -46,6 +46,6 @@ class GoogleSearchIE(SearchInfoExtractor):
'url': mobj.group(1)
})
if (len(entries) >= n) or not re.search(r'class="pn" id="pnnext"', webpage):
if (len(entries) >= n) or not re.search(r'id="pnnext"', webpage):
res['entries'] = entries[:n]
return res

View File

@@ -6,7 +6,10 @@ from random import random
from math import floor
from .common import InfoExtractor
from ..utils import compat_urllib_request
from ..utils import (
compat_urllib_request,
ExtractorError,
)
class IPrimaIE(InfoExtractor):
@@ -36,6 +39,7 @@ class IPrimaIE(InfoExtractor):
'params': {
'skip_download': True, # requires rtmpdump
},
'skip': 'Do not have permission to access this page',
}]
def _real_extract(self, url):
@@ -44,6 +48,10 @@ class IPrimaIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
if re.search(r'Nemáte oprávnění přistupovat na tuto stránku\.\s*</div>', webpage):
raise ExtractorError(
'%s said: You do not have permission to access this page' % self.IE_NAME, expected=True)
player_url = (
'http://embed.livebox.cz/iprimaplay/player-embed-v2.js?__tok%s__=%s' %
(floor(random()*1073741824), floor(random()*1073741824))

View File

@@ -15,7 +15,9 @@ class NineGagIE(InfoExtractor):
"file": "1912.mp4",
"info_dict": {
"description": "This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)",
"title": "\"People Are Awesome 2013\" Is Absolutely Awesome"
"title": "\"People Are Awesome 2013\" Is Absolutely Awesome",
"view_count": int,
"thumbnail": "re:^https?://",
},
'add_ie': ['Youtube']
}
@@ -25,21 +27,27 @@ class NineGagIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
data_json = self._html_search_regex(r'''(?x)
<div\s*id="tv-video"\s*data-video-source="youtube"\s*
data-video-meta="([^"]+)"''', webpage, 'video metadata')
data = json.loads(data_json)
youtube_id = self._html_search_regex(
r'(?s)id="jsid-video-post-container".*?data-external-id="([^"]+)"',
webpage, 'video ID')
description = self._html_search_regex(
r'(?s)<div class="video-caption">.*?<p>(.*?)</p>', webpage,
'description', fatal=False)
view_count_str = self._html_search_regex(
r'<p><b>([0-9][0-9,]*)</b> views</p>', webpage, 'view count',
fatal=False)
view_count = (
None if view_count_str is None
else int(view_count_str.replace(',', '')))
return {
'_type': 'url_transparent',
'url': data['youtubeVideoId'],
'url': youtube_id,
'ie_key': 'Youtube',
'id': video_id,
'title': data['title'],
'description': data['description'],
'view_count': int(data['view_count']),
'like_count': int(data['statistic']['like']),
'dislike_count': int(data['statistic']['dislike']),
'thumbnail': data['thumbnail_url'],
'title': self._og_search_title(webpage),
'description': description,
'view_count': view_count,
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@@ -10,141 +10,13 @@ from ..utils import (
)
class VGTRKIE(InfoExtractor):
IE_DESC = 'ВГТРК'
_VALID_URL = r'http://(?:.+?\.)?(?:vesti\.ru|russia2?\.tv|tvkultura\.ru|rutv\.ru)/(?P<id>.+)'
class RUTVIE(InfoExtractor):
IE_DESC = 'RUTV.RU'
_VALID_URL = r'https?://player\.(?:rutv\.ru|vgtrk\.com)/(?:flash2v/container\.swf\?id=|iframe/(?P<type>swf|video|live)/id/)(?P<id>\d+)'
_TESTS = [
{
'url': 'http://www.vesti.ru/videos?vid=575582&cid=1',
'info_dict': {
'id': '765035',
'ext': 'mp4',
'title': 'Вести.net: биткоины в России не являются законными',
'description': 'md5:d4bb3859dc1177b28a94c5014c35a36b',
'duration': 302,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://www.vesti.ru/doc.html?id=1349233',
'info_dict': {
'id': '773865',
'ext': 'mp4',
'title': 'Участники митинга штурмуют Донецкую областную администрацию',
'description': 'md5:1a160e98b3195379b4c849f2f4958009',
'duration': 210,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://www.vesti.ru/only_video.html?vid=576180',
'info_dict': {
'id': '766048',
'ext': 'mp4',
'title': 'США заморозило, Британию затопило',
'description': 'md5:f0ed0695ec05aed27c56a70a58dc4cc1',
'duration': 87,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://hitech.vesti.ru/news/view/id/4000',
'info_dict': {
'id': '766888',
'ext': 'mp4',
'title': 'Вести.net: интернет-гиганты начали перетягивание программных "одеял"',
'description': 'md5:65ddd47f9830c4f42ed6475f8730c995',
'duration': 279,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://sochi2014.vesti.ru/video/index/video_id/766403',
'info_dict': {
'id': '766403',
'ext': 'mp4',
'title': 'XXII зимние Олимпийские игры. Российские хоккеисты стартовали на Олимпиаде с победы',
'description': 'md5:55805dfd35763a890ff50fa9e35e31b3',
'duration': 271,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Blocked outside Russia',
},
{
'url': 'http://sochi2014.vesti.ru/live/play/live_id/301',
'info_dict': {
'id': '51499',
'ext': 'flv',
'title': 'Сочи-2014. Биатлон. Индивидуальная гонка. Мужчины ',
'description': 'md5:9e0ed5c9d2fa1efbfdfed90c9a6d179c',
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Translation has finished'
},
{
'url': 'http://russia.tv/video/show/brand_id/5169/episode_id/970443/video_id/975648',
'info_dict': {
'id': '771852',
'ext': 'mp4',
'title': 'Прямой эфир. Жертвы загадочной болезни: смерть от старости в 17 лет',
'description': 'md5:b81c8c55247a4bd996b43ce17395b2d8',
'duration': 3096,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://russia.tv/brand/show/brand_id/57638',
'info_dict': {
'id': '774016',
'ext': 'mp4',
'title': 'Чужой в семье Сталина',
'description': '',
'duration': 2539,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://2.russia.tv/video/show/brand_id/48863/episode_id/972920/video_id/978667/viewtype/picture',
'info_dict': {
'id': '775081',
'ext': 'mp4',
'title': 'XXII зимние Олимпийские игры. Россияне заняли весь пьедестал в лыжных гонках',
'description': 'md5:15d3741dd8d04b203fbc031c6a47fb0f',
'duration': 101,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Blocked outside Russia',
},
{
'url': 'http://tvkultura.ru/video/show/brand_id/31724/episode_id/972347/video_id/978186',
'url': 'http://player.rutv.ru/flash2v/container.swf?id=774471&sid=kultura&fbv=true&isPlay=true&ssl=false&i=560&acc_video_id=episode_id/972347/video_id/978186/brand_id/31724',
'info_dict': {
'id': '774471',
'ext': 'mp4',
@@ -158,64 +30,97 @@ class VGTRKIE(InfoExtractor):
},
},
{
'url': 'http://rutv.ru/brand/show/id/6792/channel/75',
'url': 'https://player.vgtrk.com/flash2v/container.swf?id=774016&sid=russiatv&fbv=true&isPlay=true&ssl=false&i=560&acc_video_id=episode_id/972098/video_id/977760/brand_id/57638',
'info_dict': {
'id': '125521',
'id': '774016',
'ext': 'mp4',
'title': 'Грустная дама червей. Х',
'title': 'Чужой в семье Сталина',
'description': '',
'duration': 4882,
'duration': 2539,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://player.rutv.ru/iframe/swf/id/766888/sid/hitech/?acc_video_id=4000',
'info_dict': {
'id': '766888',
'ext': 'mp4',
'title': 'Вести.net: интернет-гиганты начали перетягивание программных "одеял"',
'description': 'md5:65ddd47f9830c4f42ed6475f8730c995',
'duration': 279,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://player.rutv.ru/iframe/video/id/771852/start_zoom/true/showZoomBtn/false/sid/russiatv/?acc_video_id=episode_id/970443/video_id/975648/brand_id/5169',
'info_dict': {
'id': '771852',
'ext': 'mp4',
'title': 'Прямой эфир. Жертвы загадочной болезни: смерть от старости в 17 лет',
'description': 'md5:b81c8c55247a4bd996b43ce17395b2d8',
'duration': 3096,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://player.rutv.ru/iframe/live/id/51499/showZoomBtn/false/isPlay/true/sid/sochi2014',
'info_dict': {
'id': '51499',
'ext': 'flv',
'title': 'Сочи-2014. Биатлон. Индивидуальная гонка. Мужчины ',
'description': 'md5:9e0ed5c9d2fa1efbfdfed90c9a6d179c',
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Translation has finished',
},
]
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://player\.rutv\.ru/iframe/(?:swf|video|live)/id/.+?)\1', webpage)
if mobj:
return mobj.group('url')
mobj = re.search(
r'<meta[^>]+?property=(["\'])og:video\1[^>]+?content=(["\'])(?P<url>http://player\.(?:rutv\.ru|vgtrk\.com)/flash2v/container\.swf\?id=.+?\2)',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_type = mobj.group('type')
page = self._download_webpage(url, video_id, 'Downloading page')
mobj = re.search(
r'<meta property="og:video" content="http://www\.vesti\.ru/i/flvplayer_videoHost\.swf\?vid=(?P<id>\d+)',
page)
if mobj:
video_id = mobj.group('id')
page = self._download_webpage('http://www.vesti.ru/only_video.html?vid=%s' % video_id, video_id,
'Downloading video page')
mobj = re.search(
r'<meta property="og:video" content="http://player\.rutv\.ru/flash2v/container\.swf\?id=(?P<id>\d+)', page)
if mobj:
if not video_type or video_type == 'swf':
video_type = 'video'
video_id = mobj.group('id')
else:
mobj = re.search(
r'<iframe.+?src="http://player\.rutv\.ru/iframe/(?P<type>[^/]+)/id/(?P<id>\d+)[^"]*".*?></iframe>',
page)
if not mobj:
raise ExtractorError('No media found', expected=True)
video_type = mobj.group('type')
video_id = mobj.group('id')
json_data = self._download_json(
'http://player.rutv.ru/iframe/%splay/id/%s' % ('live-' if video_type == 'live' else '', video_id),
video_id, 'Downloading JSON')
if json_data['errors']:
raise ExtractorError('vesti returned error: %s' % json_data['errors'], expected=True)
raise ExtractorError('%s said: %s' % (self.IE_NAME, json_data['errors']), expected=True)
playlist = json_data['data']['playlist']
medialist = playlist['medialist']
media = medialist[0]
if media['errors']:
raise ExtractorError('vesti returned error: %s' % media['errors'], expected=True)
raise ExtractorError('%s said: %s' % (self.IE_NAME, media['errors']), expected=True)
view_count = playlist.get('count_views')
priority_transport = playlist['priority_transport']

View File

@@ -11,7 +11,9 @@ from ..utils import (
class TEDIE(SubtitlesInfoExtractor):
_VALID_URL = r'''(?x)http://www\.ted\.com/
_VALID_URL = r'''(?x)
(?P<proto>https?://)
(?P<type>www|embed)(?P<urlmain>\.ted\.com/
(
(?P<type_playlist>playlists(?:/\d+)?) # We have a playlist
|
@@ -19,6 +21,7 @@ class TEDIE(SubtitlesInfoExtractor):
)
(/lang/(.*?))? # The url may contain the language
/(?P<name>\w+) # Here goes the name and then ".html"
.*)$
'''
_TEST = {
'url': 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html',
@@ -48,6 +51,9 @@ class TEDIE(SubtitlesInfoExtractor):
def _real_extract(self, url):
m = re.match(self._VALID_URL, url, re.VERBOSE)
if m.group('type') == 'embed':
desktop_url = m.group('proto') + 'www' + m.group('urlmain')
return self.url_result(desktop_url, 'TED')
name = m.group('name')
if m.group('type_talk'):
return self._talk_info(url, name)
@@ -93,11 +99,14 @@ class TEDIE(SubtitlesInfoExtractor):
self._list_available_subtitles(video_id, talk_info)
return
thumbnail = talk_info['thumb']
if not thumbnail.startswith('http'):
thumbnail = 'http://' + thumbnail
return {
'id': video_id,
'title': talk_info['title'],
'uploader': talk_info['speaker'],
'thumbnail': talk_info['thumb'],
'thumbnail': thumbnail,
'description': self._og_search_description(webpage),
'subtitles': video_subtitles,
'formats': formats,

View File

@@ -0,0 +1,164 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
compat_urllib_request,
ExtractorError,
)
class UdemyIE(InfoExtractor):
IE_NAME = 'udemy'
_VALID_URL = r'https?://www\.udemy\.com/(?:[^#]+#/lecture/|lecture/view/?\?lectureId=)(?P<id>\d+)'
_LOGIN_URL = 'https://www.udemy.com/join/login-submit/'
_NETRC_MACHINE = 'udemy'
_TESTS = [{
'url': 'https://www.udemy.com/java-tutorial/#/lecture/172757',
'md5': '98eda5b657e752cf945d8445e261b5c5',
'info_dict': {
'id': '160614',
'ext': 'mp4',
'title': 'Introduction and Installation',
'description': 'md5:c0d51f6f21ef4ec65f091055a5eef876',
'duration': 579.29,
},
'skip': 'Requires udemy account credentials',
}]
def _handle_error(self, response):
if not isinstance(response, dict):
return
error = response.get('error')
if error:
error_str = 'Udemy returned error #%s: %s' % (error.get('code'), error.get('message'))
error_data = error.get('data')
if error_data:
error_str += ' - %s' % error_data.get('formErrors')
raise ExtractorError(error_str, expected=True)
def _download_json(self, url, video_id, note='Downloading JSON metadata'):
response = super(UdemyIE, self)._download_json(url, video_id, note)
self._handle_error(response)
return response
def _real_initialize(self):
self._login()
def _login(self):
(username, password) = self._get_login_info()
if username is None:
raise ExtractorError(
'Udemy account is required, use --username and --password options to provide account credentials.',
expected=True)
login_popup = self._download_webpage(
'https://www.udemy.com/join/login-popup?displayType=ajax&showSkipButton=1', None,
'Downloading login popup')
if login_popup == '<div class="run-command close-popup redirect" data-url="https://www.udemy.com/"></div>':
return
csrf = self._html_search_regex(r'<input type="hidden" name="csrf" value="(.+?)"', login_popup, 'csrf token')
login_form = {
'email': username,
'password': password,
'csrf': csrf,
'displayType': 'json',
'isSubmitted': '1',
}
request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
response = self._download_json(request, None, 'Logging in as %s' % username)
if 'returnUrl' not in response:
raise ExtractorError('Unable to log in')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
lecture_id = mobj.group('id')
lecture = self._download_json(
'https://www.udemy.com/api-1.1/lectures/%s' % lecture_id, lecture_id, 'Downloading lecture JSON')
if lecture['assetType'] != 'Video':
raise ExtractorError('Lecture %s is not a video' % lecture_id, expected=True)
asset = lecture['asset']
stream_url = asset['streamUrl']
mobj = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', stream_url)
if mobj:
return self.url_result(mobj.group(1), 'Youtube')
video_id = asset['id']
thumbnail = asset['thumbnailUrl']
duration = asset['data']['duration']
download_url = asset['downloadUrl']
formats = [
{
'url': download_url['Video480p'][0],
'format_id': '360p',
},
{
'url': download_url['Video'][0],
'format_id': '720p',
},
]
title = lecture['title']
description = lecture['description']
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats
}
class UdemyCourseIE(UdemyIE):
IE_NAME = 'udemy:course'
_VALID_URL = r'https?://www\.udemy\.com/(?P<coursepath>[\da-z-]+)'
_SUCCESSFULLY_ENROLLED = '>You have enrolled in this course!<'
_ALREADY_ENROLLED = '>You are already taking this course.<'
_TESTS = []
@classmethod
def suitable(cls, url):
return False if UdemyIE.suitable(url) else super(UdemyCourseIE, cls).suitable(url)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
course_path = mobj.group('coursepath')
response = self._download_json(
'https://www.udemy.com/api-1.1/courses/%s' % course_path, course_path, 'Downloading course JSON')
course_id = int(response['id'])
course_title = response['title']
webpage = self._download_webpage(
'https://www.udemy.com/course/subscribe/?courseId=%s' % course_id, course_id, 'Enrolling in the course')
if self._SUCCESSFULLY_ENROLLED in webpage:
self.to_screen('%s: Successfully enrolled in' % course_id)
elif self._ALREADY_ENROLLED in webpage:
self.to_screen('%s: Already enrolled in' % course_id)
response = self._download_json('https://www.udemy.com/api-1.1/courses/%s/curriculum' % course_id,
course_id, 'Downloading course curriculum')
entries = [
self.url_result('https://www.udemy.com/%s/#/lecture/%s' % (course_path, asset['id']), 'Udemy')
for asset in response if asset.get('assetType') == 'Video'
]
return self.playlist_result(entries, course_id, course_title)

View File

@@ -0,0 +1,121 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from .rutv import RUTVIE
class VestiIE(InfoExtractor):
IE_DESC = 'Вести.Ru'
_VALID_URL = r'http://(?:.+?\.)?vesti\.ru/(?P<id>.+)'
_TESTS = [
{
'url': 'http://www.vesti.ru/videos?vid=575582&cid=1',
'info_dict': {
'id': '765035',
'ext': 'mp4',
'title': 'Вести.net: биткоины в России не являются законными',
'description': 'md5:d4bb3859dc1177b28a94c5014c35a36b',
'duration': 302,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://www.vesti.ru/doc.html?id=1349233',
'info_dict': {
'id': '773865',
'ext': 'mp4',
'title': 'Участники митинга штурмуют Донецкую областную администрацию',
'description': 'md5:1a160e98b3195379b4c849f2f4958009',
'duration': 210,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://www.vesti.ru/only_video.html?vid=576180',
'info_dict': {
'id': '766048',
'ext': 'mp4',
'title': 'США заморозило, Британию затопило',
'description': 'md5:f0ed0695ec05aed27c56a70a58dc4cc1',
'duration': 87,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://hitech.vesti.ru/news/view/id/4000',
'info_dict': {
'id': '766888',
'ext': 'mp4',
'title': 'Вести.net: интернет-гиганты начали перетягивание программных "одеял"',
'description': 'md5:65ddd47f9830c4f42ed6475f8730c995',
'duration': 279,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://sochi2014.vesti.ru/video/index/video_id/766403',
'info_dict': {
'id': '766403',
'ext': 'mp4',
'title': 'XXII зимние Олимпийские игры. Российские хоккеисты стартовали на Олимпиаде с победы',
'description': 'md5:55805dfd35763a890ff50fa9e35e31b3',
'duration': 271,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Blocked outside Russia',
},
{
'url': 'http://sochi2014.vesti.ru/live/play/live_id/301',
'info_dict': {
'id': '51499',
'ext': 'flv',
'title': 'Сочи-2014. Биатлон. Индивидуальная гонка. Мужчины ',
'description': 'md5:9e0ed5c9d2fa1efbfdfed90c9a6d179c',
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Translation has finished'
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id, 'Downloading page')
mobj = re.search(
r'<meta[^>]+?property="og:video"[^>]+?content="http://www\.vesti\.ru/i/flvplayer_videoHost\.swf\?vid=(?P<id>\d+)',
page)
if mobj:
video_id = mobj.group('id')
page = self._download_webpage('http://www.vesti.ru/only_video.html?vid=%s' % video_id, video_id,
'Downloading video page')
rutv_url = RUTVIE._extract_url(page)
if rutv_url:
return self.url_result(rutv_url, 'RUTV')
raise ExtractorError('No video found', expected=True)

View File

@@ -2,7 +2,6 @@ from __future__ import unicode_literals
import re
import xml.etree.ElementTree
import datetime
from .common import InfoExtractor
from ..utils import (
@@ -22,6 +21,7 @@ class VevoIE(InfoExtractor):
https?://videoplayer\.vevo\.com/embed/embedded\?videoId=|
vevo:)
(?P<id>[^&?#]+)'''
_TESTS = [{
'url': 'http://www.vevo.com/watch/hurts/somebody-to-die-for/GB1101300280',
"md5": "06bea460acb744eab74a9d7dcb4bfd61",
@@ -34,6 +34,8 @@ class VevoIE(InfoExtractor):
"duration": 230.12,
"width": 1920,
"height": 1080,
# timestamp and upload_date are often incorrect; seem to change randomly
'timestamp': int,
}
}, {
'note': 'v3 SMIL format',
@@ -47,6 +49,7 @@ class VevoIE(InfoExtractor):
'title': 'I Wish I Could Break Your Heart',
'duration': 226.101,
'age_limit': 0,
'timestamp': int,
}
}, {
'note': 'Age-limited video',
@@ -57,7 +60,8 @@ class VevoIE(InfoExtractor):
'age_limit': 18,
'title': 'Tunnel Vision (Explicit)',
'uploader': 'Justin Timberlake',
'upload_date': '20130703',
'upload_date': 're:2013070[34]',
'timestamp': int,
},
'params': {
'skip_download': 'true',
@@ -169,13 +173,13 @@ class VevoIE(InfoExtractor):
timestamp_ms = int(self._search_regex(
r'/Date\((\d+)\)/', video_info['launchDate'], 'launch date'))
upload_date = datetime.datetime.utcfromtimestamp(timestamp_ms // 1000)
return {
'id': video_id,
'title': video_info['title'],
'formats': formats,
'thumbnail': video_info['imageUrl'],
'upload_date': upload_date.strftime('%Y%m%d'),
'timestamp': timestamp_ms // 1000,
'uploader': video_info['mainArtists'][0]['artistName'],
'duration': video_info['duration'],
'age_limit': age_limit,

View File

@@ -29,6 +29,7 @@ class VideoBamIE(InfoExtractor):
'info_dict': {
'id': 'pqLvq',
'ext': 'mp4',
'title': '_',
}
},
]
@@ -61,7 +62,7 @@ class VideoBamIE(InfoExtractor):
self._sort_formats(formats)
title = self._og_search_title(page, default='VideoBam', fatal=False)
title = self._og_search_title(page, default='_', fatal=False)
description = self._og_search_description(page, default=None)
thumbnail = self._og_search_thumbnail(page)
uploader = self._html_search_regex(r'Upload by ([^<]+)</a>', page, 'uploader', fatal=False, default=None)

View File

@@ -102,6 +102,15 @@ class VimeoIE(SubtitlesInfoExtractor):
},
]
@classmethod
def suitable(cls, url):
if VimeoChannelIE.suitable(url):
# Otherwise channel urls like http://vimeo.com/channels/31259 would
# match
return False
else:
return super(VimeoIE, cls).suitable(url)
def _login(self):
(username, password) = self._get_login_info()
if username is None:
@@ -332,7 +341,7 @@ class VimeoIE(SubtitlesInfoExtractor):
class VimeoChannelIE(InfoExtractor):
IE_NAME = 'vimeo:channel'
_VALID_URL = r'(?:https?://)?vimeo\.com/channels/(?P<id>[^/]+)'
_VALID_URL = r'(?:https?://)?vimeo\.com/channels/(?P<id>[^/]+)/?(\?.*)?$'
_MORE_PAGES_INDICATOR = r'<a.+?rel="next"'
_TITLE_RE = r'<link rel="alternate"[^>]+?title="(.*?)"'

114
youtube_dl/extractor/wdr.py Normal file
View File

@@ -0,0 +1,114 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
unified_strdate,
compat_urlparse,
determine_ext,
)
class WDRIE(InfoExtractor):
_PLAYER_REGEX = '-(?:video|audio)player(?:_size-[LMS])?'
_VALID_URL = r'(?P<url>https?://www\d?\.(?:wdr\d?|funkhauseuropa)\.de/)(?P<id>.+?)(?P<player>%s)?\.html' % _PLAYER_REGEX
_TESTS = [
{
'url': 'http://www1.wdr.de/mediathek/video/sendungen/servicezeit/videoservicezeit560-videoplayer_size-L.html',
'info_dict': {
'id': 'mdb-362427',
'ext': 'flv',
'title': 'Servicezeit',
'description': 'md5:c8f43e5e815eeb54d0b96df2fba906cb',
'upload_date': '20140310',
},
'params': {
'skip_download': True,
},
},
{
'url': 'http://www1.wdr.de/themen/av/videomargaspiegelisttot101-videoplayer.html',
'info_dict': {
'id': 'mdb-363194',
'ext': 'flv',
'title': 'Marga Spiegel ist tot',
'description': 'md5:2309992a6716c347891c045be50992e4',
'upload_date': '20140311',
},
'params': {
'skip_download': True,
},
},
{
'url': 'http://www1.wdr.de/themen/kultur/audioerlebtegeschichtenmargaspiegel100-audioplayer.html',
'md5': '83e9e8fefad36f357278759870805898',
'info_dict': {
'id': 'mdb-194332',
'ext': 'mp3',
'title': 'Erlebte Geschichten: Marga Spiegel (29.11.2009)',
'description': 'md5:2309992a6716c347891c045be50992e4',
'upload_date': '20091129',
},
},
{
'url': 'http://www.funkhauseuropa.de/av/audiogrenzenlosleckerbaklava101-audioplayer.html',
'md5': 'cfff440d4ee64114083ac44676df5d15',
'info_dict': {
'id': 'mdb-363068',
'ext': 'mp3',
'title': 'Grenzenlos lecker - Baklava',
'description': 'md5:7b29e97e10dfb6e265238b32fa35b23a',
'upload_date': '20140311',
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_url = mobj.group('url')
page_id = mobj.group('id')
webpage = self._download_webpage(url, page_id)
if mobj.group('player') is None:
entries = [
self.url_result(page_url + href, 'WDR')
for href in re.findall(r'<a href="/?(.+?%s\.html)" rel="nofollow"' % self._PLAYER_REGEX, webpage)
]
return self.playlist_result(entries, page_id)
flashvars = compat_urlparse.parse_qs(
self._html_search_regex(r'<param name="flashvars" value="([^"]+)"', webpage, 'flashvars'))
page_id = flashvars['trackerClipId'][0]
video_url = flashvars['dslSrc'][0]
title = flashvars['trackerClipTitle'][0]
thumbnail = flashvars['startPicture'][0] if 'startPicture' in flashvars else None
if 'trackerClipAirTime' in flashvars:
upload_date = flashvars['trackerClipAirTime'][0]
else:
upload_date = self._html_search_meta('DC.Date', webpage, 'upload date')
if upload_date:
upload_date = unified_strdate(upload_date)
if video_url.endswith('.f4m'):
video_url += '?hdcore=3.2.0&plugin=aasp-3.2.0.77.18'
ext = 'flv'
else:
ext = determine_ext(video_url)
description = self._html_search_meta('Description', webpage, 'description')
return {
'id': page_id,
'url': video_url,
'ext': ext,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
}

View File

@@ -194,14 +194,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'151': {'ext': 'mp4', 'height': 72, 'resolution': '72p', 'format_note': 'HLS', 'preference': -10},
# DASH mp4 video
'133': {'ext': 'mp4', 'height': 240, 'resolution': '240p', 'format_note': 'DASH video', 'preference': -40},
'134': {'ext': 'mp4', 'height': 360, 'resolution': '360p', 'format_note': 'DASH video', 'preference': -40},
'135': {'ext': 'mp4', 'height': 480, 'resolution': '480p', 'format_note': 'DASH video', 'preference': -40},
'136': {'ext': 'mp4', 'height': 720, 'resolution': '720p', 'format_note': 'DASH video', 'preference': -40},
'137': {'ext': 'mp4', 'height': 1080, 'resolution': '1080p', 'format_note': 'DASH video', 'preference': -40},
'138': {'ext': 'mp4', 'height': 2160, 'resolution': '2160p', 'format_note': 'DASH video', 'preference': -40},
'160': {'ext': 'mp4', 'height': 192, 'resolution': '192p', 'format_note': 'DASH video', 'preference': -40},
'264': {'ext': 'mp4', 'height': 1440, 'resolution': '1440p', 'format_note': 'DASH video', 'preference': -40},
'133': {'ext': 'mp4', 'height': 240, 'resolution': '240p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'134': {'ext': 'mp4', 'height': 360, 'resolution': '360p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'135': {'ext': 'mp4', 'height': 480, 'resolution': '480p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'136': {'ext': 'mp4', 'height': 720, 'resolution': '720p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'137': {'ext': 'mp4', 'height': 1080, 'resolution': '1080p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'138': {'ext': 'mp4', 'height': 2160, 'resolution': '2160p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'160': {'ext': 'mp4', 'height': 192, 'resolution': '192p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
'264': {'ext': 'mp4', 'height': 1440, 'resolution': '1440p', 'format_note': 'DASH video', 'acodec': 'none', 'preference': -40},
# Dash mp4 audio
'139': {'ext': 'm4a', 'format_note': 'DASH audio', 'vcodec': 'none', 'abr': 48, 'preference': -50},
@@ -209,12 +209,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'141': {'ext': 'm4a', 'format_note': 'DASH audio', 'vcodec': 'none', 'abr': 256, 'preference': -50},
# Dash webm
'167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'168': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'169': {'ext': 'webm', 'height': 720, 'width': 1280, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'170': {'ext': 'webm', 'height': 1080, 'width': 1920, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'218': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'219': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'168': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'169': {'ext': 'webm', 'height': 720, 'width': 1280, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'170': {'ext': 'webm', 'height': 1080, 'width': 1920, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'218': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'219': {'ext': 'webm', 'height': 480, 'width': 854, 'format_note': 'DASH video', 'acodec': 'none', 'container': 'webm', 'vcodec': 'VP8', 'acodec': 'none', 'preference': -40},
'242': {'ext': 'webm', 'height': 240, 'resolution': '240p', 'format_note': 'DASH webm', 'preference': -40},
'243': {'ext': 'webm', 'height': 360, 'resolution': '360p', 'format_note': 'DASH webm', 'preference': -40},
'244': {'ext': 'webm', 'height': 480, 'resolution': '480p', 'format_note': 'DASH webm', 'preference': -40},

View File

@@ -6,6 +6,7 @@ import ctypes
import datetime
import email.utils
import errno
import getpass
import gzip
import itertools
import io
@@ -778,6 +779,7 @@ def unified_strdate(date_str):
'%Y/%m/%d %H:%M:%S',
'%Y-%m-%d %H:%M:%S',
'%d.%m.%Y %H:%M',
'%d.%m.%Y %H.%M',
'%Y-%m-%dT%H:%M:%SZ',
'%Y-%m-%dT%H:%M:%S.%fZ',
'%Y-%m-%dT%H:%M:%S.%f0Z',
@@ -1278,3 +1280,12 @@ def parse_xml(s):
parser = xml.etree.ElementTree.XMLParser(target=TreeBuilder())
kwargs = {'parser': parser} if sys.version_info >= (2, 7) else {}
return xml.etree.ElementTree.XML(s.encode('utf-8'), **kwargs)
if sys.version_info < (3, 0) and sys.platform == 'win32':
def compat_getpass(prompt, *args, **kwargs):
if isinstance(prompt, compat_str):
prompt = prompt.encode(preferredencoding())
return getpass.getpass(prompt, *args, **kwargs)
else:
compat_getpass = getpass.getpass

View File

@@ -1,2 +1,2 @@
__version__ = '2014.03.11'
__version__ = '2014.03.20'