Compare commits

..

67 Commits

Author SHA1 Message Date
43d7895ea0 release 2013.10.29 2013-10-29 06:48:39 +01:00
f7ff55aa78 Merge remote-tracking branch 'origin/master' 2013-10-29 06:48:18 +01:00
795f28f871 [youtube] Fix login (Fixes #1681) 2013-10-29 06:45:54 +01:00
f6cc16f5d8 [tests] a HTTP 503 is a transient issue 2013-10-28 19:07:16 -04:00
321a01f971 [mtv] Remove the templates from the mediagen url 2013-10-28 23:37:01 +01:00
646e17a53d Fix YouTubeDL test 2013-10-28 23:18:13 +01:00
dd508b7c4f [tests] don't fail on network errors
This is suboptimal, but at least this way we will need to look at the logs
only to check for network errors that happen too often, instead of
parsing a ton of lines each time to see if there is some true test failing
2013-10-28 18:03:26 -04:00
2563bcc85c Add an extractor for MySpace (closes #1666) 2013-10-28 22:02:17 +01:00
702665c085 tests: build the filename from the info_dict if the 'file' key is missing
It will need to have the 'id' and 'ext' keys to work.
2013-10-28 22:01:37 +01:00
369a759acc setup.py: Make sure the setuptools_available variable is set
Otherwise it would crash if it can't import setuptools.
2013-10-28 16:54:48 +01:00
79b3f61228 Merge pull request #1675 from rzhxeo/fix
Check if description and thumbnail are None to prevent crash
2013-10-28 08:35:40 -07:00
216d71d001 Check if description and thumbnail are None to prevent crash 2013-10-28 16:28:35 +01:00
78a3a9f89e Make "requested format not available" expected (#1655) 2013-10-28 11:41:59 +01:00
a7685f3bf4 mixcloud does not do any format selection 2013-10-28 11:41:32 +01:00
f088ea5486 release 2013.10.28 2013-10-28 11:34:21 +01:00
1003d108d5 [vimeo] Support hash in URL (Fixes #1669) 2013-10-28 11:32:22 +01:00
8abeeb9449 Nicer --list-formats output 2013-10-28 11:31:12 +01:00
c1002e96e9 Let extractors omit ext in formats 2013-10-28 11:28:02 +01:00
77d0a82fef [addanime] Use new formats system 2013-10-28 11:24:47 +01:00
ebc14f251c Merge remote-tracking branch 'origin/master' 2013-10-28 10:44:13 +01:00
d41e6efc85 New debug option --write-pages 2013-10-28 10:44:02 +01:00
8ffa13e03e [Instagram] get the non-https link, as they are serving Akamai cert from a instagram.com domain 2013-10-28 02:34:29 -04:00
db477d3a37 Merge pull request #1620 from jaimeMF/console_script
Use the console_scripts entry point if setuptools is available
2013-10-27 23:08:59 -07:00
750e9833b8 Add the missing age_limit tags; added a devscript to do a superficial check for porn sites without the age_limit tag in the test 2013-10-28 01:50:17 -04:00
82f0ac657c Merge pull request #1657 by @rzhxeo
[YouPornIE] Extract all encrypted links and remove doubles at the end
2013-10-28 01:45:52 -04:00
eb6a2277a2 Merge pull request #1659 by @rzhxeo
Add support for http://www.tube8.com
2013-10-28 01:38:28 -04:00
f8778fb0fa Merge pull request #1663 by @rzhxeo
Add support for http://www.spankwire.com
2013-10-28 01:35:11 -04:00
e2f9de207c Merge pull request #1664 by @rzhxeo
Add support for http://www.keezmovies.com
2013-10-28 01:25:46 -04:00
a93cc0d943 Merge pull request #1661 by @rzhxeo
Add support for http://www.pornhub.com
2013-10-28 00:50:39 -04:00
7d8c2e07f2 [Exfm] replace the failing Soundcloud test vector (broken also in browser) 2013-10-28 00:33:43 -04:00
efb4c36b18 Merge pull request #1660 from pyed/master
[addanime] try to download HQ before normal
2013-10-27 21:14:19 -07:00
29526d0d2b Merge pull request #1656 from rzhxeo/xhamster
[XHamsterIE] Extract SD and HD video
2013-10-27 10:12:59 -07:00
198e370f23 [addanime] better regex. 2013-10-27 19:48:02 +03:00
c19f7764a5 [generic] Detect bandcamp pages that use custom domains (closes #1662)
They embed the original url in the 'og:url' property.
2013-10-27 14:40:25 +01:00
bc63d9d329 [rtlnow] Change the test for rtlnitronow 2013-10-27 14:26:19 +01:00
aa929c37d5 [generic] Fix test video's checksum 2013-10-27 14:21:37 +01:00
af4d506eb3 [faz] Use a regex for getting the description
The page cannot be parsed in python2.6 with the html parser.
2013-10-27 14:18:55 +01:00
5da0549581 [KeezMoviesIE] Correct return value for embedded videos 2013-10-27 12:48:09 +01:00
749a4fd2fd [facebook] Don't recommend to report the issue if the video is private. 2013-10-27 12:13:55 +01:00
6f71ef580c [facebook] Report a more meaningful message if the video cannot be accessed (closes #1658) 2013-10-27 12:09:46 +01:00
67874aeffa [facebook] Fix the login process (fixes #1244) 2013-10-27 12:07:58 +01:00
3e6a330d38 [addanime] fix md5sum 2013-10-27 13:51:26 +03:00
aee5e18c8f [addanime] catch 'RegexNotFoundError' 2013-10-27 13:36:43 +03:00
5b11143d05 Add support for http://www.keezmovies.com 2013-10-27 10:10:28 +01:00
7b2212e954 Add support for http://www.spankwire.com 2013-10-27 01:59:26 +02:00
71865091ab [Tube8IE] Fix regex for uploader extraction 2013-10-27 01:08:03 +02:00
125cfd78e8 Add support for http://www.pornhub.com 2013-10-27 01:04:22 +02:00
8cb57d9b91 [Tube8IE] Escape dot in regex 2013-10-27 00:21:27 +02:00
14e10b2b6e [addanime] try to download HQ before normal 2013-10-27 01:19:38 +03:00
6e76104d66 [YouPornIE] Make webpage download more robust 2013-10-26 23:33:32 +02:00
1d45a23b74 Add support for http://www.tube8.com 2013-10-26 23:27:30 +02:00
7df286540f [YouPornIE] Extract all encrypted links and remove doubles at the end 2013-10-26 21:57:10 +02:00
5d0c97541a [XHamsterIE] Extract SD and HD video 2013-10-26 20:38:54 +02:00
49a25557b0 [8tracks] Use track count instead of looking at at_last_track property
This fixes the error:

$ youtube-dl http://8tracks.com/vladmc/counting-stars
[8tracks] counting-stars: Downloading webpage
[8tracks] counting-stars: Downloading song information 1/4
[8tracks] counting-stars: Downloading song information 2/4
[8tracks] counting-stars: Downloading song information 3/4
[8tracks] counting-stars: Downloading song information 4/4
[8tracks] counting-stars: Downloading song information 5/4
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/phihag/projects/youtube-dl/youtube_dl/__main__.py", line 18, in <module>
    youtube_dl.main()
  File "/home/phihag/projects/youtube-dl/youtube_dl/__init__.py", line 761, in main
    _real_main(argv)
  File "/home/phihag/projects/youtube-dl/youtube_dl/__init__.py", line 714, in _real_main
    retcode = ydl.download(all_urls)
  File "/home/phihag/projects/youtube-dl/youtube_dl/YoutubeDL.py", line 701, in download
    videos = self.extract_info(url)
  File "/home/phihag/projects/youtube-dl/youtube_dl/YoutubeDL.py", line 342, in extract_info
    ie_result = ie.extract(url)
  File "/home/phihag/projects/youtube-dl/youtube_dl/extractor/common.py", line 121, in extract
    return self._real_extract(url)
  File "/home/phihag/projects/youtube-dl/youtube_dl/extractor/eighttracks.py", line 111, in _real_extract
    'id': track_data['id'],
KeyError: 'id'
2013-10-25 23:46:19 +02:00
b5936c0059 Document the %(format_id)s field for the output template 2013-10-25 17:18:06 +02:00
600cc1a4f0 [youtube] Set the format_id field to the itag of the format (closes #1624) 2013-10-25 17:17:46 +02:00
ea32fbacc8 Fix the extensions of two tests with youtube videos
The best quality is now a mp4 video.
2013-10-25 16:55:37 +02:00
00fe14fc75 [youtube] Also use the 'adaptative_fmts' field from the /get_video_info page (fixes #1649)
The 'adaptative_fmts' field from the video page is not added to the 'url_encoded_fmt_stream_map'
2013-10-25 16:52:58 +02:00
fcc28edb2f [cinemassacre] Simplify
* Remove some rtmp parameters that are not needed.
* Remove the md5 checksums, the video is not downloaded.
* Remove the code used before the current format system.
2013-10-23 20:21:41 +02:00
fac6be2dd5 Merge pull request #1632 from rzhxeo/cinemassacre
[Cinemassacre] Download video that is shown in flash player
2013-10-23 20:15:39 +02:00
1cf64ee468 release 2013.10.23.2 2013-10-23 18:38:09 +02:00
cdec0190c4 [dailymotion] Extract all the available formats (closes #1028) 2013-10-23 17:33:38 +02:00
2450bcb28b [nowvideo] Fix key extraction
Extract it from the embed page
2013-10-23 17:00:33 +02:00
3126050c0f Hide the video password on verbose mode 2013-10-23 16:32:17 +02:00
93b22c7828 [vimeo] fix the extraction for videos protected with password
Added a test video.
2013-10-23 16:31:53 +02:00
b0505eb611 [CinemassacreIE] Fix information extraction 2013-10-19 16:46:17 +02:00
f44415360e Use the console_scripts entry point if setuptools is available 2013-10-18 13:49:25 +02:00
35 changed files with 740 additions and 253 deletions

View File

@ -79,16 +79,17 @@ which means you can modify it, redistribute it or use it however you like.
different, %(autonumber)s to get an automatically different, %(autonumber)s to get an automatically
incremented number, %(ext)s for the filename incremented number, %(ext)s for the filename
extension, %(format)s for the format description extension, %(format)s for the format description
(like "22 - 1280x720" or "HD")%(upload_date)s for (like "22 - 1280x720" or "HD"),%(format_id)s for
the upload date (YYYYMMDD), %(extractor)s for the the unique id of the format (like Youtube's
provider (youtube, metacafe, etc), %(id)s for the itags: "137"),%(upload_date)s for the upload date
video id , %(playlist)s for the playlist the (YYYYMMDD), %(extractor)s for the provider
video is in, %(playlist_index)s for the position (youtube, metacafe, etc), %(id)s for the video id
in the playlist and %% for a literal percent. Use , %(playlist)s for the playlist the video is in,
- to output to stdout. Can also be used to %(playlist_index)s for the position in the
download to a different directory, for example playlist and %% for a literal percent. Use - to
with -o '/my/downloads/%(uploader)s/%(title)s-%(i output to stdout. Can also be used to download to
d)s.%(ext)s' . a different directory, for example with -o '/my/d
ownloads/%(uploader)s/%(title)s-%(id)s.%(ext)s' .
--autonumber-size NUMBER Specifies the number of digits in %(autonumber)s --autonumber-size NUMBER Specifies the number of digits in %(autonumber)s
when it is present in output filename template or when it is present in output filename template or
--autonumber option is given --autonumber option is given
@ -126,6 +127,8 @@ which means you can modify it, redistribute it or use it however you like.
-v, --verbose print various debugging information -v, --verbose print various debugging information
--dump-intermediate-pages print downloaded pages to debug problems(very --dump-intermediate-pages print downloaded pages to debug problems(very
verbose) verbose)
--write-pages Write downloaded pages to files in the current
directory
## Video Format Options: ## Video Format Options:
-f, --format FORMAT video format code, specifiy the order of -f, --format FORMAT video format code, specifiy the order of

39
devscripts/check-porn.py Normal file
View File

@ -0,0 +1,39 @@
#!/usr/bin/env python
"""
This script employs a VERY basic heuristic ('porn' in webpage.lower()) to check
if we are not 'age_limit' tagging some porn site
"""
# Allow direct execution
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from youtube_dl.utils import compat_urllib_request
for test in get_testcases():
try:
webpage = compat_urllib_request.urlopen(test['url'], timeout=10).read()
except:
print('\nFail: {0}'.format(test['name']))
continue
webpage = webpage.decode('utf8', 'replace')
if 'porn' in webpage.lower() and ('info_dict' not in test
or 'age_limit' not in test['info_dict']
or test['info_dict']['age_limit'] != 18):
print('\nPotential missing age_limit check: {0}'.format(test['name']))
elif 'porn' not in webpage.lower() and ('info_dict' in test and
'age_limit' in test['info_dict'] and
test['info_dict']['age_limit'] == 18):
print('\nPotential false negative: {0}'.format(test['name']))
else:
sys.stdout.write('.')
sys.stdout.flush()
print()

View File

@ -8,8 +8,10 @@ import sys
try: try:
from setuptools import setup from setuptools import setup
setuptools_available = True
except ImportError: except ImportError:
from distutils.core import setup from distutils.core import setup
setuptools_available = False
try: try:
# This will create an exe that needs Microsoft Visual C++ 2008 # This will create an exe that needs Microsoft Visual C++ 2008
@ -43,13 +45,16 @@ if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
params = py2exe_params params = py2exe_params
else: else:
params = { params = {
'scripts': ['bin/youtube-dl'],
'data_files': [ # Installing system-wide would require sudo... 'data_files': [ # Installing system-wide would require sudo...
('etc/bash_completion.d', ['youtube-dl.bash-completion']), ('etc/bash_completion.d', ['youtube-dl.bash-completion']),
('share/doc/youtube_dl', ['README.txt']), ('share/doc/youtube_dl', ['README.txt']),
('share/man/man1/', ['youtube-dl.1']) ('share/man/man1/', ['youtube-dl.1'])
] ]
} }
if setuptools_available:
params['entry_points'] = {'console_scripts': ['youtube-dl = youtube_dl:main']}
else:
params['scripts'] = ['bin/youtube-dl']
# Get the version from youtube_dl/version.py without importing the package # Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(), exec(compile(open('youtube_dl/version.py').read(),

View File

@ -5,9 +5,11 @@ import json
import os.path import os.path
import re import re
import types import types
import sys
import youtube_dl.extractor import youtube_dl.extractor
from youtube_dl import YoutubeDL from youtube_dl import YoutubeDL
from youtube_dl.utils import preferredencoding
def global_setup(): def global_setup():
@ -33,6 +35,21 @@ def try_rm(filename):
raise raise
def report_warning(message):
'''
Print the message to stderr, it will be prefixed with 'WARNING:'
If stderr is a tty file the 'WARNING:' will be colored
'''
if sys.stderr.isatty() and os.name != 'nt':
_msg_header = u'\033[0;33mWARNING:\033[0m'
else:
_msg_header = u'WARNING:'
output = u'%s %s\n' % (_msg_header, message)
if 'b' in getattr(sys.stderr, 'mode', '') or sys.version_info[0] < 3:
output = output.encode(preferredencoding())
sys.stderr.write(output)
class FakeYDL(YoutubeDL): class FakeYDL(YoutubeDL):
def __init__(self, override=None): def __init__(self, override=None):
# Different instances of the downloader can't share the same dictionary # Different instances of the downloader can't share the same dictionary

View File

@ -62,10 +62,10 @@ class TestFormatSelection(unittest.TestCase):
def test_format_limit(self): def test_format_limit(self):
formats = [ formats = [
{u'format_id': u'meh'}, {u'format_id': u'meh', u'url': u'http://example.com/meh'},
{u'format_id': u'good'}, {u'format_id': u'good', u'url': u'http://example.com/good'},
{u'format_id': u'great'}, {u'format_id': u'great', u'url': u'http://example.com/great'},
{u'format_id': u'excellent'}, {u'format_id': u'excellent', u'url': u'http://example.com/exc'},
] ]
info_dict = { info_dict = {
u'formats': formats, u'extractor': u'test', 'id': 'testvid'} u'formats': formats, u'extractor': u'test', 'id': 'testvid'}

View File

@ -6,7 +6,14 @@ import sys
import unittest import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, get_testcases, global_setup, try_rm, md5 from test.helper import (
get_params,
get_testcases,
global_setup,
try_rm,
md5,
report_warning
)
global_setup() global_setup()
@ -19,6 +26,7 @@ import youtube_dl.YoutubeDL
from youtube_dl.utils import ( from youtube_dl.utils import (
compat_str, compat_str,
compat_urllib_error, compat_urllib_error,
compat_HTTPError,
DownloadError, DownloadError,
ExtractorError, ExtractorError,
UnavailableVideoError, UnavailableVideoError,
@ -60,8 +68,11 @@ def generator(test_case):
if not ie._WORKING: if not ie._WORKING:
print_skipping('IE marked as not _WORKING') print_skipping('IE marked as not _WORKING')
return return
if 'playlist' not in test_case and not test_case['file']: if 'playlist' not in test_case:
print_skipping('No output file specified') info_dict = test_case.get('info_dict', {})
if not test_case.get('file') and not (info_dict.get('id') and info_dict.get('ext')):
print_skipping('The output file cannot be know, the "file" '
'key is missing or the info_dict is incomplete')
return return
if 'skip' in test_case: if 'skip' in test_case:
print_skipping(test_case['skip']) print_skipping(test_case['skip'])
@ -77,35 +88,47 @@ def generator(test_case):
finished_hook_called.add(status['filename']) finished_hook_called.add(status['filename'])
ydl.fd.add_progress_hook(_hook) ydl.fd.add_progress_hook(_hook)
def get_tc_filename(tc):
return tc.get('file') or ydl.prepare_filename(tc.get('info_dict', {}))
test_cases = test_case.get('playlist', [test_case]) test_cases = test_case.get('playlist', [test_case])
def try_rm_tcs_files():
for tc in test_cases: for tc in test_cases:
try_rm(tc['file']) tc_filename = get_tc_filename(tc)
try_rm(tc['file'] + '.part') try_rm(tc_filename)
try_rm(tc['file'] + '.info.json') try_rm(tc_filename + '.part')
try_rm(tc_filename + '.info.json')
try_rm_tcs_files()
try: try:
for retry in range(1, RETRIES + 1): try_num = 1
while True:
try: try:
ydl.download([test_case['url']]) ydl.download([test_case['url']])
except (DownloadError, ExtractorError) as err: except (DownloadError, ExtractorError) as err:
if retry == RETRIES: raise
# Check if the exception is not a network related one # Check if the exception is not a network related one
if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError): if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError) or (err.exc_info[0] == compat_HTTPError and err.exc_info[1].code == 503):
raise raise
print('Retrying: {0} failed tries\n\n##########\n\n'.format(retry)) if try_num == RETRIES:
report_warning(u'Failed due to network errors, skipping...')
return
print('Retrying: {0} failed tries\n\n##########\n\n'.format(try_num))
try_num += 1
else: else:
break break
for tc in test_cases: for tc in test_cases:
tc_filename = get_tc_filename(tc)
if not test_case.get('params', {}).get('skip_download', False): if not test_case.get('params', {}).get('skip_download', False):
self.assertTrue(os.path.exists(tc['file']), msg='Missing file ' + tc['file']) self.assertTrue(os.path.exists(tc_filename), msg='Missing file ' + tc_filename)
self.assertTrue(tc['file'] in finished_hook_called) self.assertTrue(tc_filename in finished_hook_called)
self.assertTrue(os.path.exists(tc['file'] + '.info.json')) self.assertTrue(os.path.exists(tc_filename + '.info.json'))
if 'md5' in tc: if 'md5' in tc:
md5_for_file = _file_md5(tc['file']) md5_for_file = _file_md5(tc_filename)
self.assertEqual(md5_for_file, tc['md5']) self.assertEqual(md5_for_file, tc['md5'])
with io.open(tc['file'] + '.info.json', encoding='utf-8') as infof: with io.open(tc_filename + '.info.json', encoding='utf-8') as infof:
info_dict = json.load(infof) info_dict = json.load(infof)
for (info_field, expected) in tc.get('info_dict', {}).items(): for (info_field, expected) in tc.get('info_dict', {}).items():
if isinstance(expected, compat_str) and expected.startswith('md5:'): if isinstance(expected, compat_str) and expected.startswith('md5:'):
@ -126,10 +149,7 @@ def generator(test_case):
for key in ('id', 'url', 'title', 'ext'): for key in ('id', 'url', 'title', 'ext'):
self.assertTrue(key in info_dict.keys() and info_dict[key]) self.assertTrue(key in info_dict.keys() and info_dict[key])
finally: finally:
for tc in test_cases: try_rm_tcs_files()
try_rm(tc['file'])
try_rm(tc['file'] + '.part')
try_rm(tc['file'] + '.info.json')
return test_template return test_template

View File

@ -272,7 +272,7 @@ class YoutubeDL(object):
autonumber_size = 5 autonumber_size = 5
autonumber_templ = u'%0' + str(autonumber_size) + u'd' autonumber_templ = u'%0' + str(autonumber_size) + u'd'
template_dict['autonumber'] = autonumber_templ % self._num_downloads template_dict['autonumber'] = autonumber_templ % self._num_downloads
if template_dict['playlist_index'] is not None: if template_dict.get('playlist_index') is not None:
template_dict['playlist_index'] = u'%05d' % template_dict['playlist_index'] template_dict['playlist_index'] = u'%05d' % template_dict['playlist_index']
sanitize = lambda k, v: sanitize_filename( sanitize = lambda k, v: sanitize_filename(
@ -462,7 +462,7 @@ class YoutubeDL(object):
info_dict['playlist_index'] = None info_dict['playlist_index'] = None
# This extractors handle format selection themselves # This extractors handle format selection themselves
if info_dict['extractor'] in [u'youtube', u'Youku', u'YouPorn', u'mixcloud']: if info_dict['extractor'] in [u'youtube', u'Youku']:
if download: if download:
self.process_info(info_dict) self.process_info(info_dict)
return info_dict return info_dict
@ -484,6 +484,9 @@ class YoutubeDL(object):
res=self.format_resolution(format), res=self.format_resolution(format),
note=u' ({})'.format(format['format_note']) if format.get('format_note') is not None else '', note=u' ({})'.format(format['format_note']) if format.get('format_note') is not None else '',
) )
# Automatically determine file extension if missing
if 'ext' not in format:
format['ext'] = determine_ext(format['url'])
if self.params.get('listformats', None): if self.params.get('listformats', None):
self.list_formats(info_dict) self.list_formats(info_dict)
@ -521,7 +524,8 @@ class YoutubeDL(object):
formats_to_download = [selected_format] formats_to_download = [selected_format]
break break
if not formats_to_download: if not formats_to_download:
raise ExtractorError(u'requested format not available') raise ExtractorError(u'requested format not available',
expected=True)
if download: if download:
if len(formats_to_download) > 1: if len(formats_to_download) > 1:
@ -571,9 +575,9 @@ class YoutubeDL(object):
if self.params.get('forceurl', False): if self.params.get('forceurl', False):
# For RTMP URLs, also include the playpath # For RTMP URLs, also include the playpath
compat_print(info_dict['url'] + info_dict.get('play_path', u'')) compat_print(info_dict['url'] + info_dict.get('play_path', u''))
if self.params.get('forcethumbnail', False) and 'thumbnail' in info_dict: if self.params.get('forcethumbnail', False) and info_dict.get('thumbnail') is not None:
compat_print(info_dict['thumbnail']) compat_print(info_dict['thumbnail'])
if self.params.get('forcedescription', False) and 'description' in info_dict: if self.params.get('forcedescription', False) and info_dict.get('description') is not None:
compat_print(info_dict['description']) compat_print(info_dict['description'])
if self.params.get('forcefilename', False) and filename is not None: if self.params.get('forcefilename', False) and filename is not None:
compat_print(filename) compat_print(filename)
@ -754,23 +758,23 @@ class YoutubeDL(object):
archive_file.write(vid_id + u'\n') archive_file.write(vid_id + u'\n')
@staticmethod @staticmethod
def format_resolution(format): def format_resolution(format, default='unknown'):
if format.get('height') is not None: if format.get('height') is not None:
if format.get('width') is not None: if format.get('width') is not None:
res = u'%sx%s' % (format['width'], format['height']) res = u'%sx%s' % (format['width'], format['height'])
else: else:
res = u'%sp' % format['height'] res = u'%sp' % format['height']
else: else:
res = '???' res = default
return res return res
def list_formats(self, info_dict): def list_formats(self, info_dict):
formats_s = [] formats_s = []
for format in info_dict.get('formats', [info_dict]): for format in info_dict.get('formats', [info_dict]):
formats_s.append(u'%-15s: %-5s %-15s[%s]' % ( formats_s.append(u'%-15s%-7s %-15s%s' % (
format['format_id'], format['format_id'],
format['ext'], format['ext'],
format.get('format_note') or '-', format.get('format_note', ''),
self.format_resolution(format), self.format_resolution(format),
) )
) )

View File

@ -133,7 +133,7 @@ def parseOpts(overrideArguments=None):
def _hide_login_info(opts): def _hide_login_info(opts):
opts = list(opts) opts = list(opts)
for private_opt in ['-p', '--password', '-u', '--username']: for private_opt in ['-p', '--password', '-u', '--username', '--video-password']:
try: try:
i = opts.index(private_opt) i = opts.index(private_opt)
opts[i+1] = '<PRIVATE>' opts[i+1] = '<PRIVATE>'
@ -316,6 +316,9 @@ def parseOpts(overrideArguments=None):
verbosity.add_option('--dump-intermediate-pages', verbosity.add_option('--dump-intermediate-pages',
action='store_true', dest='dump_intermediate_pages', default=False, action='store_true', dest='dump_intermediate_pages', default=False,
help='print downloaded pages to debug problems(very verbose)') help='print downloaded pages to debug problems(very verbose)')
verbosity.add_option('--write-pages',
action='store_true', dest='write_pages', default=False,
help='Write downloaded pages to files in the current directory')
verbosity.add_option('--youtube-print-sig-code', verbosity.add_option('--youtube-print-sig-code',
action='store_true', dest='youtube_print_sig_code', default=False, action='store_true', dest='youtube_print_sig_code', default=False,
help=optparse.SUPPRESS_HELP) help=optparse.SUPPRESS_HELP)
@ -336,7 +339,8 @@ def parseOpts(overrideArguments=None):
'%(uploader)s for the uploader name, %(uploader_id)s for the uploader nickname if different, ' '%(uploader)s for the uploader name, %(uploader_id)s for the uploader nickname if different, '
'%(autonumber)s to get an automatically incremented number, ' '%(autonumber)s to get an automatically incremented number, '
'%(ext)s for the filename extension, ' '%(ext)s for the filename extension, '
'%(format)s for the format description (like "22 - 1280x720" or "HD")' '%(format)s for the format description (like "22 - 1280x720" or "HD"),'
'%(format_id)s for the unique id of the format (like Youtube\'s itags: "137"),'
'%(upload_date)s for the upload date (YYYYMMDD), ' '%(upload_date)s for the upload date (YYYYMMDD), '
'%(extractor)s for the provider (youtube, metacafe, etc), ' '%(extractor)s for the provider (youtube, metacafe, etc), '
'%(id)s for the video id , %(playlist)s for the playlist the video is in, ' '%(id)s for the video id , %(playlist)s for the playlist the video is in, '
@ -651,6 +655,7 @@ def _real_main(argv=None):
'prefer_free_formats': opts.prefer_free_formats, 'prefer_free_formats': opts.prefer_free_formats,
'verbose': opts.verbose, 'verbose': opts.verbose,
'dump_intermediate_pages': opts.dump_intermediate_pages, 'dump_intermediate_pages': opts.dump_intermediate_pages,
'write_pages': opts.write_pages,
'test': opts.test, 'test': opts.test,
'keepvideo': opts.keepvideo, 'keepvideo': opts.keepvideo,
'min_filesize': opts.min_filesize, 'min_filesize': opts.min_filesize,

View File

@ -72,6 +72,7 @@ from .jeuxvideo import JeuxVideoIE
from .jukebox import JukeboxIE from .jukebox import JukeboxIE
from .justintv import JustinTVIE from .justintv import JustinTVIE
from .kankan import KankanIE from .kankan import KankanIE
from .keezmovies import KeezMoviesIE
from .kickstarter import KickStarterIE from .kickstarter import KickStarterIE
from .keek import KeekIE from .keek import KeekIE
from .liveleak import LiveLeakIE from .liveleak import LiveLeakIE
@ -82,6 +83,7 @@ from .mit import TechTVMITIE, MITIE
from .mixcloud import MixcloudIE from .mixcloud import MixcloudIE
from .mtv import MTVIE from .mtv import MTVIE
from .muzu import MuzuTVIE from .muzu import MuzuTVIE
from .myspace import MySpaceIE
from .myspass import MySpassIE from .myspass import MySpassIE
from .myvideo import MyVideoIE from .myvideo import MyVideoIE
from .naver import NaverIE from .naver import NaverIE
@ -94,6 +96,7 @@ from .ooyala import OoyalaIE
from .orf import ORFIE from .orf import ORFIE
from .pbs import PBSIE from .pbs import PBSIE
from .photobucket import PhotobucketIE from .photobucket import PhotobucketIE
from .pornhub import PornHubIE
from .pornotube import PornotubeIE from .pornotube import PornotubeIE
from .rbmaradio import RBMARadioIE from .rbmaradio import RBMARadioIE
from .redtube import RedTubeIE from .redtube import RedTubeIE
@ -109,6 +112,7 @@ from .slideshare import SlideshareIE
from .sohu import SohuIE from .sohu import SohuIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE from .soundcloud import SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE
from .southparkstudios import SouthParkStudiosIE from .southparkstudios import SouthParkStudiosIE
from .spankwire import SpankwireIE
from .spiegel import SpiegelIE from .spiegel import SpiegelIE
from .stanfordoc import StanfordOpenClassroomIE from .stanfordoc import StanfordOpenClassroomIE
from .statigram import StatigramIE from .statigram import StatigramIE
@ -121,6 +125,7 @@ from .tf1 import TF1IE
from .thisav import ThisAVIE from .thisav import ThisAVIE
from .traileraddict import TrailerAddictIE from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE from .trilulilu import TriluliluIE
from .tube8 import Tube8IE
from .tudou import TudouIE from .tudou import TudouIE
from .tumblr import TumblrIE from .tumblr import TumblrIE
from .tutv import TutvIE from .tutv import TutvIE

View File

@ -17,8 +17,8 @@ class AddAnimeIE(InfoExtractor):
IE_NAME = u'AddAnime' IE_NAME = u'AddAnime'
_TEST = { _TEST = {
u'url': u'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9', u'url': u'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
u'file': u'24MR3YO5SAS9.flv', u'file': u'24MR3YO5SAS9.mp4',
u'md5': u'1036a0e0cd307b95bd8a8c3a5c8cfaf1', u'md5': u'72954ea10bc979ab5e2eb288b21425a0',
u'info_dict': { u'info_dict': {
u"description": u"One Piece 606", u"description": u"One Piece 606",
u"title": u"One Piece 606" u"title": u"One Piece 606"
@ -31,7 +31,8 @@ class AddAnimeIE(InfoExtractor):
video_id = mobj.group('video_id') video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
except ExtractorError as ee: except ExtractorError as ee:
if not isinstance(ee.cause, compat_HTTPError): if not isinstance(ee.cause, compat_HTTPError) or \
ee.cause.code != 503:
raise raise
redir_webpage = ee.cause.read().decode('utf-8') redir_webpage = ee.cause.read().decode('utf-8')
@ -60,16 +61,26 @@ class AddAnimeIE(InfoExtractor):
note=u'Confirming after redirect') note=u'Confirming after redirect')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r"var normal_video_file = '(.*?)';", formats = []
webpage, u'video file URL') for format_id in ('normal', 'hq'):
rex = r"var %s_video_file = '(.*?)';" % re.escape(format_id)
video_url = self._search_regex(rex, webpage, u'video file URLx',
fatal=False)
if not video_url:
continue
formats.append({
'format_id': format_id,
'url': video_url,
})
if not formats:
raise ExtractorError(u'Cannot find any video format!')
video_title = self._og_search_title(webpage) video_title = self._og_search_title(webpage)
video_description = self._og_search_description(webpage) video_description = self._og_search_description(webpage)
return { return {
'_type': 'video', '_type': 'video',
'id': video_id, 'id': video_id,
'url': video_url, 'formats': formats,
'ext': 'flv',
'title': video_title, 'title': video_title,
'description': video_description 'description': video_description
} }

View File

@ -55,30 +55,30 @@ class CinemassacreIE(InfoExtractor):
video_description = None video_description = None
playerdata = self._download_webpage(playerdata_url, video_id) playerdata = self._download_webpage(playerdata_url, video_id)
base_url = self._html_search_regex(r'\'streamer\': \'(?P<base_url>rtmp://.*?)/(?:vod|Cinemassacre)\'', url = self._html_search_regex(r'\'streamer\': \'(?P<url>[^\']+)\'', playerdata, u'url')
playerdata, u'base_url')
base_url += '/Cinemassacre/' sd_file = self._html_search_regex(r'\'file\': \'(?P<sd_file>[^\']+)\'', playerdata, u'sd_file')
# Important: The file names in playerdata are not used by the player and even wrong for some videos hd_file = self._html_search_regex(r'\'?file\'?: "(?P<hd_file>[^"]+)"', playerdata, u'hd_file')
sd_file = 'Cinemassacre-%s_high.mp4' % video_id video_thumbnail = self._html_search_regex(r'\'image\': \'(?P<thumbnail>[^\']+)\'', playerdata, u'thumbnail', fatal=False)
hd_file = 'Cinemassacre-%s.mp4' % video_id
video_thumbnail = 'http://image.screenwavemedia.com/Cinemassacre/Cinemassacre-%s_thumb_640x360.jpg' % video_id
formats = [ formats = [
{ {
'url': base_url + sd_file, 'url': url,
'play_path': 'mp4:' + sd_file,
'ext': 'flv', 'ext': 'flv',
'format': 'sd', 'format': 'sd',
'format_id': 'sd', 'format_id': 'sd',
}, },
{ {
'url': base_url + hd_file, 'url': url,
'play_path': 'mp4:' + hd_file,
'ext': 'flv', 'ext': 'flv',
'format': 'hd', 'format': 'hd',
'format_id': 'hd', 'format_id': 'hd',
}, },
] ]
info = { return {
'id': video_id, 'id': video_id,
'title': video_title, 'title': video_title,
'formats': formats, 'formats': formats,
@ -86,6 +86,3 @@ class CinemassacreIE(InfoExtractor):
'upload_date': video_date, 'upload_date': video_date,
'thumbnail': video_thumbnail, 'thumbnail': video_thumbnail,
} }
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -15,6 +15,7 @@ from ..utils import (
compiled_regex_type, compiled_regex_type,
ExtractorError, ExtractorError,
RegexNotFoundError, RegexNotFoundError,
sanitize_filename,
unescapeHTML, unescapeHTML,
) )
@ -182,6 +183,17 @@ class InfoExtractor(object):
self.to_screen(u'Dumping request to ' + url) self.to_screen(u'Dumping request to ' + url)
dump = base64.b64encode(webpage_bytes).decode('ascii') dump = base64.b64encode(webpage_bytes).decode('ascii')
self._downloader.to_screen(dump) self._downloader.to_screen(dump)
if self._downloader.params.get('write_pages', False):
try:
url = url_or_request.get_full_url()
except AttributeError:
url = url_or_request
raw_filename = ('%s_%s.dump' % (video_id, url))
filename = sanitize_filename(raw_filename, restricted=True)
self.to_screen(u'Saving request to ' + filename)
with open(filename, 'wb') as outf:
outf.write(webpage_bytes)
content = webpage_bytes.decode(encoding, 'replace') content = webpage_bytes.decode(encoding, 'replace')
return (content, urlh) return (content, urlh)
@ -318,10 +330,10 @@ class InfoExtractor(object):
def _og_search_title(self, html, **kargs): def _og_search_title(self, html, **kargs):
return self._og_search_property('title', html, **kargs) return self._og_search_property('title', html, **kargs)
def _og_search_video_url(self, html, name='video url', **kargs): def _og_search_video_url(self, html, name='video url', secure=True, **kargs):
return self._html_search_regex([self._og_regex('video:secure_url'), regexes = [self._og_regex('video')]
self._og_regex('video')], if secure: regexes.insert(0, self._og_regex('video:secure_url'))
html, name, **kargs) return self._html_search_regex(regexes, html, name, **kargs)
def _rta_search(self, html): def _rta_search(self, html):
# See http://www.rtalabel.org/index.php?content=howtofaq#single # See http://www.rtalabel.org/index.php?content=howtofaq#single

View File

@ -28,6 +28,15 @@ class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/(?:embed/)?video/([^/]+)' _VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/(?:embed/)?video/([^/]+)'
IE_NAME = u'dailymotion' IE_NAME = u'dailymotion'
_FORMATS = [
(u'stream_h264_ld_url', u'ld'),
(u'stream_h264_url', u'standard'),
(u'stream_h264_hq_url', u'hq'),
(u'stream_h264_hd_url', u'hd'),
(u'stream_h264_hd1080_url', u'hd180'),
]
_TESTS = [ _TESTS = [
{ {
u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech', u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech',
@ -60,7 +69,6 @@ class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
video_id = mobj.group(1).split('_')[0].split('?')[0] video_id = mobj.group(1).split('_')[0].split('?')[0]
video_extension = 'mp4'
url = 'http://www.dailymotion.com/video/%s' % video_id url = 'http://www.dailymotion.com/video/%s' % video_id
# Retrieve video webpage to extract further information # Retrieve video webpage to extract further information
@ -99,18 +107,24 @@ class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title'] msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title']
raise ExtractorError(msg, expected=True) raise ExtractorError(msg, expected=True)
# TODO: support choosing qualities formats = []
for (key, format_id) in self._FORMATS:
for key in ['stream_h264_hd1080_url','stream_h264_hd_url', video_url = info.get(key)
'stream_h264_hq_url','stream_h264_url', if video_url is not None:
'stream_h264_ld_url']: m_size = re.search(r'H264-(\d+)x(\d+)', video_url)
if info.get(key):#key in info and info[key]: if m_size is not None:
max_quality = key width, height = m_size.group(1), m_size.group(2)
self.to_screen(u'Using %s' % key)
break
else: else:
width, height = None, None
formats.append({
'url': video_url,
'ext': 'mp4',
'format_id': format_id,
'width': width,
'height': height,
})
if not formats:
raise ExtractorError(u'Unable to extract video URL') raise ExtractorError(u'Unable to extract video URL')
video_url = info[max_quality]
# subtitles # subtitles
video_subtitles = self.extract_subtitles(video_id) video_subtitles = self.extract_subtitles(video_id)
@ -120,11 +134,10 @@ class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
return [{ return [{
'id': video_id, 'id': video_id,
'url': video_url, 'formats': formats,
'uploader': video_uploader, 'uploader': video_uploader,
'upload_date': video_upload_date, 'upload_date': video_upload_date,
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'ext': video_extension,
'subtitles': video_subtitles, 'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url'] 'thumbnail': info['thumbnail_url']
}] }]

View File

@ -101,7 +101,7 @@ class EightTracksIE(InfoExtractor):
first_url = 'http://8tracks.com/sets/%s/play?player=sm&mix_id=%s&format=jsonh' % (session, mix_id) first_url = 'http://8tracks.com/sets/%s/play?player=sm&mix_id=%s&format=jsonh' % (session, mix_id)
next_url = first_url next_url = first_url
res = [] res = []
for i in itertools.count(): for i in range(track_count):
api_json = self._download_webpage(next_url, playlist_id, api_json = self._download_webpage(next_url, playlist_id,
note=u'Downloading song information %s/%s' % (str(i+1), track_count), note=u'Downloading song information %s/%s' % (str(i+1), track_count),
errnote=u'Failed to download song information') errnote=u'Failed to download song information')
@ -116,7 +116,5 @@ class EightTracksIE(InfoExtractor):
'ext': 'm4a', 'ext': 'm4a',
} }
res.append(info) res.append(info)
if api_data['set']['at_last_track']:
break
next_url = 'http://8tracks.com/sets/%s/next?player=sm&mix_id=%s&format=jsonh&track_id=%s' % (session, mix_id, track_data['id']) next_url = 'http://8tracks.com/sets/%s/next?player=sm&mix_id=%s&format=jsonh&track_id=%s' % (session, mix_id, track_data['id'])
return res return res

View File

@ -11,14 +11,14 @@ class ExfmIE(InfoExtractor):
_SOUNDCLOUD_URL = r'(?:http://)?(?:www\.)?api\.soundcloud.com/tracks/([^/]+)/stream' _SOUNDCLOUD_URL = r'(?:http://)?(?:www\.)?api\.soundcloud.com/tracks/([^/]+)/stream'
_TESTS = [ _TESTS = [
{ {
u'url': u'http://ex.fm/song/1bgtzg', u'url': u'http://ex.fm/song/eh359',
u'file': u'95223130.mp3', u'file': u'44216187.mp3',
u'md5': u'8a7967a3fef10e59a1d6f86240fd41cf', u'md5': u'e45513df5631e6d760970b14cc0c11e7',
u'info_dict': { u'info_dict': {
u"title": u"We Can't Stop - Miley Cyrus", u"title": u"Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive",
u"uploader": u"Miley Cyrus", u"uploader": u"deadjournalist",
u'upload_date': u'20130603', u'upload_date': u'20120424',
u'description': u'Download "We Can\'t Stop" \r\niTunes: http://smarturl.it/WeCantStop?IQid=SC\r\nAmazon: http://smarturl.it/WeCantStopAMZ?IQid=SC', u'description': u'Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive',
}, },
u'note': u'Soundcloud song', u'note': u'Soundcloud song',
}, },

View File

@ -19,7 +19,8 @@ class FacebookIE(InfoExtractor):
"""Information Extractor for Facebook""" """Information Extractor for Facebook"""
_VALID_URL = r'^(?:https?://)?(?:\w+\.)?facebook\.com/(?:video/video|photo)\.php\?(?:.*?)v=(?P<ID>\d+)(?:.*)' _VALID_URL = r'^(?:https?://)?(?:\w+\.)?facebook\.com/(?:video/video|photo)\.php\?(?:.*?)v=(?P<ID>\d+)(?:.*)'
_LOGIN_URL = 'https://login.facebook.com/login.php?m&next=http%3A%2F%2Fm.facebook.com%2Fhome.php&' _LOGIN_URL = 'https://www.facebook.com/login.php?next=http%3A%2F%2Ffacebook.com%2Fhome.php&login_attempt=1'
_CHECKPOINT_URL = 'https://www.facebook.com/checkpoint/?next=http%3A%2F%2Ffacebook.com%2Fhome.php&_fb_noscript=1'
_NETRC_MACHINE = 'facebook' _NETRC_MACHINE = 'facebook'
IE_NAME = u'facebook' IE_NAME = u'facebook'
_TEST = { _TEST = {
@ -36,50 +37,56 @@ class FacebookIE(InfoExtractor):
"""Report attempt to log in.""" """Report attempt to log in."""
self.to_screen(u'Logging in') self.to_screen(u'Logging in')
def _real_initialize(self): def _login(self):
if self._downloader is None: (useremail, password) = self._get_login_info()
return
useremail = None
password = None
downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data
if downloader_params.get('username', None) is not None:
useremail = downloader_params['username']
password = downloader_params['password']
elif downloader_params.get('usenetrc', False):
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
useremail = info[0]
password = info[2]
else:
raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning(u'parsing .netrc: %s' % compat_str(err))
return
if useremail is None: if useremail is None:
return return
# Log in login_page_req = compat_urllib_request.Request(self._LOGIN_URL)
login_page_req.add_header('Cookie', 'locale=en_US')
self.report_login()
login_page = self._download_webpage(login_page_req, None, note=False,
errnote=u'Unable to download login page')
lsd = self._search_regex(r'"lsd":"(\w*?)"', login_page, u'lsd')
lgnrnd = self._search_regex(r'name="lgnrnd" value="([^"]*?)"', login_page, u'lgnrnd')
login_form = { login_form = {
'email': useremail, 'email': useremail,
'pass': password, 'pass': password,
'login': 'Log+In' 'lsd': lsd,
'lgnrnd': lgnrnd,
'next': 'http://facebook.com/home.php',
'default_persistent': '0',
'legacy_return': '1',
'timezone': '-60',
'trynum': '1',
} }
request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form)) request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
try: try:
self.report_login()
login_results = compat_urllib_request.urlopen(request).read() login_results = compat_urllib_request.urlopen(request).read()
if re.search(r'<form(.*)name="login"(.*)</form>', login_results) is not None: if re.search(r'<form(.*)name="login"(.*)</form>', login_results) is not None:
self._downloader.report_warning(u'unable to log in: bad username/password, or exceded login rate limit (~3/min). Check credentials or wait.') self._downloader.report_warning(u'unable to log in: bad username/password, or exceded login rate limit (~3/min). Check credentials or wait.')
return return
check_form = {
'fb_dtsg': self._search_regex(r'"fb_dtsg":"(.*?)"', login_results, u'fb_dtsg'),
'nh': self._search_regex(r'name="nh" value="(\w*?)"', login_results, u'nh'),
'name_action_selected': 'dont_save',
'submit[Continue]': self._search_regex(r'<input value="(.*?)" name="submit\[Continue\]"', login_results, u'continue'),
}
check_req = compat_urllib_request.Request(self._CHECKPOINT_URL, compat_urllib_parse.urlencode(check_form))
check_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
check_response = compat_urllib_request.urlopen(check_req).read()
if re.search(r'id="checkpointSubmitButton"', check_response) is not None:
self._downloader.report_warning(u'Unable to confirm login, you have to login in your brower and authorize the login.')
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err: except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self._downloader.report_warning(u'unable to log in: %s' % compat_str(err)) self._downloader.report_warning(u'unable to log in: %s' % compat_str(err))
return return
def _real_initialize(self):
self._login()
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
if mobj is None: if mobj is None:
@ -93,6 +100,12 @@ class FacebookIE(InfoExtractor):
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});' AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
m = re.search(re.escape(BEFORE) + '(.*?)' + re.escape(AFTER), webpage) m = re.search(re.escape(BEFORE) + '(.*?)' + re.escape(AFTER), webpage)
if not m: if not m:
m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage)
if m_msg is not None:
raise ExtractorError(
u'The video is not available, Facebook said: "%s"' % m_msg.group(1),
expected=True)
else:
raise ExtractorError(u'Cannot parse data') raise ExtractorError(u'Cannot parse data')
data = dict(json.loads(m.group(1))) data = dict(json.loads(m.group(1)))
params_raw = compat_urllib_parse.unquote(data['params']) params_raw = compat_urllib_parse.unquote(data['params'])

View File

@ -5,8 +5,6 @@ import xml.etree.ElementTree
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
clean_html,
get_element_by_attribute,
) )
@ -47,12 +45,12 @@ class FazIE(InfoExtractor):
'format_id': code.lower(), 'format_id': code.lower(),
}) })
descr_html = get_element_by_attribute('class', 'Content Copy', webpage) descr = self._html_search_regex(r'<p class="Content Copy">(.*?)</p>', webpage, u'description')
info = { info = {
'id': video_id, 'id': video_id,
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'formats': formats, 'formats': formats,
'description': clean_html(descr_html), 'description': descr,
'thumbnail': config.find('STILL/STILL_BIG').text, 'thumbnail': config.find('STILL/STILL_BIG').text,
} }
# TODO: Remove when #980 has been merged # TODO: Remove when #980 has been merged

View File

@ -25,7 +25,7 @@ class GenericIE(InfoExtractor):
{ {
u'url': u'http://www.hodiho.fr/2013/02/regis-plante-sa-jeep.html', u'url': u'http://www.hodiho.fr/2013/02/regis-plante-sa-jeep.html',
u'file': u'13601338388002.mp4', u'file': u'13601338388002.mp4',
u'md5': u'85b90ccc9d73b4acd9138d3af4c27f89', u'md5': u'6e15c93721d7ec9e9ca3fdbf07982cfd',
u'info_dict': { u'info_dict': {
u"uploader": u"www.hodiho.fr", u"uploader": u"www.hodiho.fr",
u"title": u"R\u00e9gis plante sa Jeep" u"title": u"R\u00e9gis plante sa Jeep"
@ -41,7 +41,17 @@ class GenericIE(InfoExtractor):
u"uploader_id": u"skillsmatter", u"uploader_id": u"skillsmatter",
u"uploader": u"Skills Matter", u"uploader": u"Skills Matter",
} }
} },
# bandcamp page with custom domain
{
u'url': u'http://bronyrock.com/track/the-pony-mash',
u'file': u'3235767654.mp3',
u'info_dict': {
u'title': u'The Pony Mash',
u'uploader': u'M_Pallante',
},
u'skip': u'There is a limit of 200 free downloads / month for the test song',
},
] ]
def report_download_webpage(self, video_id): def report_download_webpage(self, video_id):
@ -155,6 +165,12 @@ class GenericIE(InfoExtractor):
surl = unescapeHTML(mobj.group(1)) surl = unescapeHTML(mobj.group(1))
return self.url_result(surl, 'Youtube') return self.url_result(surl, 'Youtube')
# Look for Bandcamp pages with custom domain
mobj = re.search(r'<meta property="og:url"[^>]*?content="(.*?bandcamp\.com.*?)"', webpage)
if mobj is not None:
burl = unescapeHTML(mobj.group(1))
return self.url_result(burl, 'Bandcamp')
# Start with something easy: JW Player in SWFObject # Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage) mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None: if mobj is None:

View File

@ -26,7 +26,7 @@ class InstagramIE(InfoExtractor):
return [{ return [{
'id': video_id, 'id': video_id,
'url': self._og_search_video_url(webpage), 'url': self._og_search_video_url(webpage, secure=False),
'ext': 'mp4', 'ext': 'mp4',
'title': u'Video by %s' % uploader_id, 'title': u'Video by %s' % uploader_id,
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),

View File

@ -0,0 +1,61 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
)
from ..aes import (
aes_decrypt_text
)
class KeezMoviesIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>keezmovies\.com/video/.+?(?P<videoid>[0-9]+))'
_TEST = {
u'url': u'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711',
u'file': u'1214711.mp4',
u'md5': u'6e297b7e789329923fcf83abb67c9289',
u'info_dict': {
u"title": u"Petite Asian Lady Mai Playing In Bathtub",
u"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
# embedded video
mobj = re.search(r'href="([^"]+)"></iframe>', webpage)
if mobj:
embedded_url = mobj.group(1)
return self.url_result(embedded_url)
video_title = self._html_search_regex(r'<h1 [^>]*>([^<]+)', webpage, u'title')
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'video_url=(.+?)&amp;', webpage, u'video_url'))
if webpage.find('encrypted=true')!=-1:
password = self._html_search_regex(r'video_title=(.+?)&amp;', webpage, u'password')
video_url = aes_decrypt_text(video_url, password, 32).decode('utf-8')
path = compat_urllib_parse_urlparse( video_url ).path
extension = os.path.splitext( path )[1][1:]
format = path.split('/')[4].split('_')[:2]
format = "-".join( format )
age_limit = self._rta_search(webpage)
return {
'id': video_id,
'title': video_title,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'age_limit': age_limit,
}

View File

@ -23,7 +23,7 @@ class MetacafeIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
u"add_ie": ["Youtube"], u"add_ie": ["Youtube"],
u"url": u"http://metacafe.com/watch/yt-_aUehQsCQtM/the_electric_company_short_i_pbs_kids_go/", u"url": u"http://metacafe.com/watch/yt-_aUehQsCQtM/the_electric_company_short_i_pbs_kids_go/",
u"file": u"_aUehQsCQtM.flv", u"file": u"_aUehQsCQtM.mp4",
u"info_dict": { u"info_dict": {
u"upload_date": u"20090102", u"upload_date": u"20090102",
u"title": u"The Electric Company | \"Short I\" | PBS KIDS GO!", u"title": u"The Electric Company | \"Short I\" | PBS KIDS GO!",

View File

@ -80,6 +80,8 @@ class MTVIE(InfoExtractor):
video_id = self._id_from_uri(uri) video_id = self._id_from_uri(uri)
self.report_extraction(video_id) self.report_extraction(video_id)
mediagen_url = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content'))).attrib['url'] mediagen_url = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content'))).attrib['url']
# Remove the templates, like &device={device}
mediagen_url = re.sub(r'&[^=]*?={.*?}(?=(&|$))', u'', mediagen_url)
if 'acceptMethods' not in mediagen_url: if 'acceptMethods' not in mediagen_url:
mediagen_url += '&acceptMethods=fms' mediagen_url += '&acceptMethods=fms'
mediagen_page = self._download_webpage(mediagen_url, video_id, mediagen_page = self._download_webpage(mediagen_url, video_id,

View File

@ -0,0 +1,48 @@
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_str,
)
class MySpaceIE(InfoExtractor):
_VALID_URL = r'https?://myspace\.com/([^/]+)/video/[^/]+/(?P<id>\d+)'
_TEST = {
u'url': u'https://myspace.com/coldplay/video/viva-la-vida/100008689',
u'info_dict': {
u'id': u'100008689',
u'ext': u'flv',
u'title': u'Viva La Vida',
u'description': u'The official Viva La Vida video, directed by Hype Williams',
u'uploader': u'Coldplay',
u'uploader_id': u'coldplay',
},
u'params': {
# rtmp download
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
context = json.loads(self._search_regex(r'context = ({.*?});', webpage,
u'context'))
video = context['video']
rtmp_url, play_path = video['streamUrl'].split(';', 1)
return {
'id': compat_str(video['mediaId']),
'title': video['title'],
'url': rtmp_url,
'play_path': play_path,
'ext': 'flv',
'description': video['description'],
'thumbnail': video['imageUrl'],
'uploader': video['artistName'],
'uploader_id': video['artistUsername'],
}

View File

@ -20,7 +20,10 @@ class NowVideoIE(InfoExtractor):
video_id = mobj.group('id') video_id = mobj.group('id')
webpage_url = 'http://www.nowvideo.ch/video/' + video_id webpage_url = 'http://www.nowvideo.ch/video/' + video_id
embed_url = 'http://embed.nowvideo.ch/embed.php?v=' + video_id
webpage = self._download_webpage(webpage_url, video_id) webpage = self._download_webpage(webpage_url, video_id)
embed_page = self._download_webpage(embed_url, video_id,
u'Downloading embed page')
self.report_extraction(video_id) self.report_extraction(video_id)
@ -28,7 +31,7 @@ class NowVideoIE(InfoExtractor):
webpage, u'video title') webpage, u'video title')
video_key = self._search_regex(r'var fkzd="(.*)";', video_key = self._search_regex(r'var fkzd="(.*)";',
webpage, u'video key') embed_page, u'video key')
api_call = "http://www.nowvideo.ch/api/player.api.php?file={0}&numOfErrors=0&cid=1&key={1}".format(video_id, video_key) api_call = "http://www.nowvideo.ch/api/player.api.php?file={0}&numOfErrors=0&cid=1&key={1}".format(video_id, video_key)
api_response = self._download_webpage(api_call, video_id, api_response = self._download_webpage(api_call, video_id,

View File

@ -0,0 +1,69 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
unescapeHTML,
)
from ..aes import (
aes_decrypt_text
)
class PornHubIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>pornhub\.com/view_video\.php\?viewkey=(?P<videoid>[0-9]+))'
_TEST = {
u'url': u'http://www.pornhub.com/view_video.php?viewkey=648719015',
u'file': u'648719015.mp4',
u'md5': u'882f488fa1f0026f023f33576004a2ed',
u'info_dict': {
u"uploader": u"BABES-COM",
u"title": u"Seductive Indian beauty strips down and fingers her pink pussy",
u"age_limit": 18
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, u'title')
video_uploader = self._html_search_regex(r'<b>From: </b>(?:\s|<[^>]*>)*(.+?)<', webpage, u'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, u'thumbnail', fatal=False)
if thumbnail:
thumbnail = compat_urllib_parse.unquote(thumbnail)
video_urls = list(map(compat_urllib_parse.unquote , re.findall(r'"quality_[0-9]{3}p":"([^"]+)', webpage)))
if webpage.find('"encrypted":true') != -1:
password = self._html_search_regex(r'"video_title":"([^"]+)', webpage, u'password').replace('+', ' ')
video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls))
formats = []
for video_url in video_urls:
path = compat_urllib_parse_urlparse( video_url ).path
extension = os.path.splitext( path )[1][1:]
format = path.split('/')[5].split('_')[:2]
format = "-".join( format )
formats.append({
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
})
formats.sort(key=lambda format: list(map(lambda s: s.zfill(6), format['format'].split('-'))))
return {
'id': video_id,
'uploader': video_uploader,
'title': video_title,
'thumbnail': thumbnail,
'formats': formats,
'age_limit': 18,
}

View File

@ -16,7 +16,8 @@ class PornotubeIE(InfoExtractor):
u'md5': u'374dd6dcedd24234453b295209aa69b6', u'md5': u'374dd6dcedd24234453b295209aa69b6',
u'info_dict': { u'info_dict': {
u"upload_date": u"20090708", u"upload_date": u"20090708",
u"title": u"Marilyn-Monroe-Bathing" u"title": u"Marilyn-Monroe-Bathing",
u"age_limit": 18
} }
} }

View File

@ -63,13 +63,12 @@ class RTLnowIE(InfoExtractor):
}, },
}, },
{ {
u'url': u'http://www.rtlnitronow.de/recht-ordnung/lebensmittelkontrolle-erlangenordnungsamt-berlin.php?film_id=127367&player=1&season=1', u'url': u'http://www.rtlnitronow.de/recht-ordnung/stadtpolizei-frankfurt-gerichtsvollzieher-leipzig.php?film_id=129679&player=1&season=1',
u'file': u'127367.flv', u'file': u'129679.flv',
u'info_dict': { u'info_dict': {
u'upload_date': u'20130926', u'upload_date': u'20131016',
u'title': u'Recht & Ordnung - Lebensmittelkontrolle Erlangen/Ordnungsamt...', u'title': u'Recht & Ordnung - Stadtpolizei Frankfurt/ Gerichtsvollzieher...',
u'description': u'Lebensmittelkontrolle Erlangen/Ordnungsamt Berlin', u'description': u'Stadtpolizei Frankfurt/ Gerichtsvollzieher Leipzig',
u'thumbnail': u'http://autoimg.static-fra.de/nitronow/344787/1500x1500/image2.jpg',
}, },
u'params': { u'params': {
u'skip_download': True, u'skip_download': True,

View File

@ -0,0 +1,74 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
unescapeHTML,
)
from ..aes import (
aes_decrypt_text
)
class SpankwireIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<videoid>[0-9]+)/?)'
_TEST = {
u'url': u'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
u'file': u'103545.mp4',
u'md5': u'1b3f55e345500552dbc252a3e9c1af43',
u'info_dict': {
u"uploader": u"oreusz",
u"title": u"Buckcherry`s X Rated Music Video Crazy Bitch",
u"description": u"Crazy Bitch X rated music video.",
u"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1>([^<]+)', webpage, u'title')
video_uploader = self._html_search_regex(r'by:\s*<a [^>]*>(.+?)</a>', webpage, u'uploader', fatal=False)
thumbnail = self._html_search_regex(r'flashvars\.image_url = "([^"]+)', webpage, u'thumbnail', fatal=False)
description = self._html_search_regex(r'>\s*Description:</div>\s*<[^>]*>([^<]+)', webpage, u'description', fatal=False)
if len(description) == 0:
description = None
video_urls = list(map(compat_urllib_parse.unquote , re.findall(r'flashvars\.quality_[0-9]{3}p = "([^"]+)', webpage)))
if webpage.find('flashvars\.encrypted = "true"') != -1:
password = self._html_search_regex(r'flashvars\.video_title = "([^"]+)', webpage, u'password').replace('+', ' ')
video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls))
formats = []
for video_url in video_urls:
path = compat_urllib_parse_urlparse( video_url ).path
extension = os.path.splitext( path )[1][1:]
format = path.split('/')[4].split('_')[:2]
format = "-".join( format )
formats.append({
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
})
formats.sort(key=lambda format: list(map(lambda s: s.zfill(6), format['format'].split('-'))))
age_limit = self._rta_search(webpage)
return {
'id': video_id,
'uploader': video_uploader,
'title': video_title,
'thumbnail': thumbnail,
'description': description,
'formats': formats,
'age_limit': age_limit,
}

View File

@ -0,0 +1,65 @@
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
unescapeHTML,
)
from ..aes import (
aes_decrypt_text
)
class Tube8IE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>tube8\.com/[^/]+/[^/]+/(?P<videoid>[0-9]+)/?)'
_TEST = {
u'url': u'http://www.tube8.com/teen/kasia-music-video/229795/',
u'file': u'229795.mp4',
u'md5': u'e9e0b0c86734e5e3766e653509475db0',
u'info_dict': {
u"description": u"hot teen Kasia grinding",
u"uploader": u"unknown",
u"title": u"Kasia music video",
u"age_limit": 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'videotitle ="([^"]+)', webpage, u'title')
video_description = self._html_search_regex(r'>Description:</strong>(.+?)<', webpage, u'description', fatal=False)
video_uploader = self._html_search_regex(r'>Submitted by:</strong>(?:\s|<[^>]*>)*(.+?)<', webpage, u'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, u'thumbnail', fatal=False)
if thumbnail:
thumbnail = thumbnail.replace('\\/', '/')
video_url = self._html_search_regex(r'"video_url":"([^"]+)', webpage, u'video_url')
if webpage.find('"encrypted":true')!=-1:
password = self._html_search_regex(r'"video_title":"([^"]+)', webpage, u'password')
video_url = aes_decrypt_text(video_url, password, 32).decode('utf-8')
path = compat_urllib_parse_urlparse( video_url ).path
extension = os.path.splitext( path )[1][1:]
format = path.split('/')[4].split('_')[:2]
format = "-".join( format )
return {
'id': video_id,
'uploader': video_uploader,
'title': video_title,
'thumbnail': thumbnail,
'description': video_description,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'age_limit': 18,
}

View File

@ -1,3 +1,4 @@
# encoding: utf-8
import json import json
import re import re
import itertools import itertools
@ -19,12 +20,12 @@ class VimeoIE(InfoExtractor):
"""Information extractor for vimeo.com.""" """Information extractor for vimeo.com."""
# _VALID_URL matches Vimeo URLs # _VALID_URL matches Vimeo URLs
_VALID_URL = r'(?P<proto>https?://)?(?:(?:www|player)\.)?vimeo(?P<pro>pro)?\.com/(?:(?:(?:groups|album)/[^/]+)|(?:.*?)/)?(?P<direct_link>play_redirect_hls\?clip_id=)?(?:videos?/)?(?P<id>[0-9]+)/?(?:[?].*)?$' _VALID_URL = r'(?P<proto>https?://)?(?:(?:www|player)\.)?vimeo(?P<pro>pro)?\.com/(?:(?:(?:groups|album)/[^/]+)|(?:.*?)/)?(?P<direct_link>play_redirect_hls\?clip_id=)?(?:videos?/)?(?P<id>[0-9]+)/?(?:[?].*)?(?:#.*)?$'
_NETRC_MACHINE = 'vimeo' _NETRC_MACHINE = 'vimeo'
IE_NAME = u'vimeo' IE_NAME = u'vimeo'
_TESTS = [ _TESTS = [
{ {
u'url': u'http://vimeo.com/56015672', u'url': u'http://vimeo.com/56015672#at=0',
u'file': u'56015672.mp4', u'file': u'56015672.mp4',
u'md5': u'ae7a1d8b183758a0506b0622f37dfa14', u'md5': u'ae7a1d8b183758a0506b0622f37dfa14',
u'info_dict': { u'info_dict': {
@ -55,7 +56,22 @@ class VimeoIE(InfoExtractor):
u'title': u'Kathy Sierra: Building the minimum Badass User, Business of Software', u'title': u'Kathy Sierra: Building the minimum Badass User, Business of Software',
u'uploader': u'The BLN & Business of Software', u'uploader': u'The BLN & Business of Software',
}, },
} },
{
u'url': u'http://vimeo.com/68375962',
u'file': u'68375962.mp4',
u'md5': u'aaf896bdb7ddd6476df50007a0ac0ae7',
u'note': u'Video protected with password',
u'info_dict': {
u'title': u'youtube-dl password protected test video',
u'upload_date': u'20130614',
u'uploader_id': u'user18948128',
u'uploader': u'Jaime Marquínez Ferrándiz',
},
u'params': {
u'videopassword': u'youtube-dl',
},
},
] ]
def _login(self): def _login(self):
@ -129,6 +145,7 @@ class VimeoIE(InfoExtractor):
self.report_extraction(video_id) self.report_extraction(video_id)
# Extract the config JSON # Extract the config JSON
try:
try: try:
config_url = self._html_search_regex( config_url = self._html_search_regex(
r' data-config-url="(.+?)"', webpage, u'config URL') r' data-config-url="(.+?)"', webpage, u'config URL')
@ -143,7 +160,7 @@ class VimeoIE(InfoExtractor):
if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage): if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage):
raise ExtractorError(u'The author has restricted the access to this video, try with the "--referer" option') raise ExtractorError(u'The author has restricted the access to this video, try with the "--referer" option')
if re.search('If so please provide the correct password.', webpage): if re.search('<form[^>]+?id="pw_form"', webpage) is not None:
self._verify_video_password(url, video_id, webpage) self._verify_video_password(url, video_id, webpage)
return self._real_extract(url) return self._real_extract(url)
else: else:

View File

@ -36,20 +36,24 @@ class XHamsterIE(InfoExtractor):
}] }]
def _real_extract(self,url): def _real_extract(self,url):
mobj = re.match(self._VALID_URL, url) def extract_video_url(webpage):
video_id = mobj.group('id')
seo = mobj.group('seo')
mrss_url = 'http://xhamster.com/movies/%s/%s.html?hd' % (video_id, seo)
webpage = self._download_webpage(mrss_url, video_id)
mobj = re.search(r'\'srv\': \'(?P<server>[^\']*)\',\s*\'file\': \'(?P<file>[^\']+)\',', webpage) mobj = re.search(r'\'srv\': \'(?P<server>[^\']*)\',\s*\'file\': \'(?P<file>[^\']+)\',', webpage)
if mobj is None: if mobj is None:
raise ExtractorError(u'Unable to extract media URL') raise ExtractorError(u'Unable to extract media URL')
if len(mobj.group('server')) == 0: if len(mobj.group('server')) == 0:
video_url = compat_urllib_parse.unquote(mobj.group('file')) return compat_urllib_parse.unquote(mobj.group('file'))
else: else:
video_url = mobj.group('server')+'/key='+mobj.group('file') return mobj.group('server')+'/key='+mobj.group('file')
def is_hd(webpage):
return webpage.find('<div class=\'icon iconHD\'>') != -1
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
seo = mobj.group('seo')
mrss_url = 'http://xhamster.com/movies/%s/%s.html' % (video_id, seo)
webpage = self._download_webpage(mrss_url, video_id)
video_title = self._html_search_regex(r'<title>(?P<title>.+?) - xHamster\.com</title>', video_title = self._html_search_regex(r'<title>(?P<title>.+?) - xHamster\.com</title>',
webpage, u'title') webpage, u'title')
@ -76,14 +80,32 @@ class XHamsterIE(InfoExtractor):
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)
return [{ video_url = extract_video_url(webpage)
'id': video_id, hd = is_hd(webpage)
formats = [{
'url': video_url, 'url': video_url,
'ext': determine_ext(video_url), 'ext': determine_ext(video_url),
'format': 'hd' if hd else 'sd',
'format_id': 'hd' if hd else 'sd',
}]
if not hd:
webpage = self._download_webpage(mrss_url+'?hd', video_id)
if is_hd(webpage):
video_url = extract_video_url(webpage)
formats.append({
'url': video_url,
'ext': determine_ext(video_url),
'format': 'hd',
'format_id': 'hd',
})
return {
'id': video_id,
'title': video_title, 'title': video_title,
'formats': formats,
'description': video_description, 'description': video_description,
'upload_date': video_upload_date, 'upload_date': video_upload_date,
'uploader_id': video_uploader_id, 'uploader_id': video_uploader_id,
'thumbnail': video_thumbnail, 'thumbnail': video_thumbnail,
'age_limit': age_limit, 'age_limit': age_limit,
}] }

View File

@ -13,7 +13,8 @@ class YouJizzIE(InfoExtractor):
u'file': u'2189178.flv', u'file': u'2189178.flv',
u'md5': u'07e15fa469ba384c7693fd246905547c', u'md5': u'07e15fa469ba384c7693fd246905547c',
u'info_dict': { u'info_dict': {
u"title": u"Zeichentrick 1" u"title": u"Zeichentrick 1",
u"age_limit": 18,
} }
} }
@ -25,6 +26,8 @@ class YouJizzIE(InfoExtractor):
# Get webpage content # Get webpage content
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
age_limit = self._rta_search(webpage)
# Get the video title # Get the video title
video_title = self._html_search_regex(r'<title>(?P<title>.*)</title>', video_title = self._html_search_regex(r'<title>(?P<title>.*)</title>',
webpage, u'title').strip() webpage, u'title').strip()
@ -60,6 +63,7 @@ class YouJizzIE(InfoExtractor):
'title': video_title, 'title': video_title,
'ext': 'flv', 'ext': 'flv',
'format': 'flv', 'format': 'flv',
'player_url': embed_page_url} 'player_url': embed_page_url,
'age_limit': age_limit}
return [info] return [info]

View File

@ -17,7 +17,7 @@ from ..aes import (
) )
class YouPornIE(InfoExtractor): class YouPornIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:\w+\.)?youporn\.com/watch/(?P<videoid>[0-9]+)/(?P<title>[^/]+)' _VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>youporn\.com/watch/(?P<videoid>[0-9]+)/(?P<title>[^/]+))'
_TEST = { _TEST = {
u'url': u'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/', u'url': u'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
u'file': u'505835.mp4', u'file': u'505835.mp4',
@ -31,23 +31,10 @@ class YouPornIE(InfoExtractor):
} }
} }
def _print_formats(self, formats):
"""Print all available formats"""
print(u'Available formats:')
print(u'ext\t\tformat')
print(u'---------------------------------')
for format in formats:
print(u'%s\t\t%s' % (format['ext'], format['format']))
def _specific(self, req_format, formats):
for x in formats:
if x["format"] == req_format:
return x
return None
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid') video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url) req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1') req.add_header('Cookie', 'age_verified=1')
@ -71,27 +58,22 @@ class YouPornIE(InfoExtractor):
except KeyError: except KeyError:
raise ExtractorError('Missing JSON parameter: ' + sys.exc_info()[1]) raise ExtractorError('Missing JSON parameter: ' + sys.exc_info()[1])
# Get all of the formats available # Get all of the links from the page
DOWNLOAD_LIST_RE = r'(?s)<ul class="downloadList">(?P<download_list>.*?)</ul>' DOWNLOAD_LIST_RE = r'(?s)<ul class="downloadList">(?P<download_list>.*?)</ul>'
download_list_html = self._search_regex(DOWNLOAD_LIST_RE, download_list_html = self._search_regex(DOWNLOAD_LIST_RE,
webpage, u'download list').strip() webpage, u'download list').strip()
LINK_RE = r'<a href="([^"]+)">'
# Get all of the links from the page
LINK_RE = r'(?s)<a href="(?P<url>[^"]+)">'
links = re.findall(LINK_RE, download_list_html) links = re.findall(LINK_RE, download_list_html)
# Get link of hd video if available # Get all encrypted links
mobj = re.search(r'var encryptedQuality720URL = \'(?P<encrypted_video_url>[a-zA-Z0-9+/]+={0,2})\';', webpage) encrypted_links = re.findall(r'var encryptedQuality[0-9]{3}URL = \'([a-zA-Z0-9+/]+={0,2})\';', webpage)
if mobj != None: for encrypted_link in encrypted_links:
encrypted_video_url = mobj.group(u'encrypted_video_url') link = aes_decrypt_text(encrypted_link, video_title, 32).decode('utf-8')
video_url = aes_decrypt_text(encrypted_video_url, video_title, 32).decode('utf-8') links.append(link)
links = [video_url] + links
if not links: if not links:
raise ExtractorError(u'ERROR: no known formats available for video') raise ExtractorError(u'ERROR: no known formats available for video')
self.to_screen(u'Links found: %d' % len(links))
formats = [] formats = []
for link in links: for link in links:
@ -103,39 +85,32 @@ class YouPornIE(InfoExtractor):
path = compat_urllib_parse_urlparse( video_url ).path path = compat_urllib_parse_urlparse( video_url ).path
extension = os.path.splitext( path )[1][1:] extension = os.path.splitext( path )[1][1:]
format = path.split('/')[4].split('_')[:2] format = path.split('/')[4].split('_')[:2]
# size = format[0] # size = format[0]
# bitrate = format[1] # bitrate = format[1]
format = "-".join( format ) format = "-".join( format )
# title = u'%s-%s-%s' % (video_title, size, bitrate) # title = u'%s-%s-%s' % (video_title, size, bitrate)
formats.append({ formats.append({
'id': video_id,
'url': video_url, 'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
})
# Sort and remove doubles
formats.sort(key=lambda format: list(map(lambda s: s.zfill(6), format['format'].split('-'))))
for i in range(len(formats)-1,0,-1):
if formats[i]['format_id'] == formats[i-1]['format_id']:
del formats[i]
return {
'id': video_id,
'uploader': video_uploader, 'uploader': video_uploader,
'upload_date': upload_date, 'upload_date': upload_date,
'title': video_title, 'title': video_title,
'ext': extension,
'format': format,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'description': video_description, 'description': video_description,
'age_limit': age_limit, 'age_limit': age_limit,
}) 'formats': formats,
}
if self._downloader.params.get('listformats', None):
self._print_formats(formats)
return
req_format = self._downloader.params.get('format', 'best')
self.to_screen(u'Format: %s' % req_format)
if req_format is None or req_format == 'best':
return [formats[0]]
elif req_format == 'worst':
return [formats[-1]]
elif req_format in ('-1', 'all'):
return formats
else:
format = self._specific( req_format, formats )
if format is None:
raise ExtractorError(u'Requested format not available')
return [format]

View File

@ -74,14 +74,8 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
self._downloader.report_warning(u'unable to fetch login page: %s' % compat_str(err)) self._downloader.report_warning(u'unable to fetch login page: %s' % compat_str(err))
return False return False
galx = None galx = self._search_regex(r'(?s)<input.+?name="GALX".+?value="(.+?)"',
dsh = None login_page, u'Login GALX parameter')
match = re.search(re.compile(r'<input.+?name="GALX".+?value="(.+?)"', re.DOTALL), login_page)
if match:
galx = match.group(1)
match = re.search(re.compile(r'<input.+?name="dsh".+?value="(.+?)"', re.DOTALL), login_page)
if match:
dsh = match.group(1)
# Log in # Log in
login_form_strs = { login_form_strs = {
@ -95,7 +89,6 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
u'checkConnection': u'', u'checkConnection': u'',
u'checkedDomains': u'youtube', u'checkedDomains': u'youtube',
u'dnConn': u'', u'dnConn': u'',
u'dsh': dsh,
u'pstMsg': u'0', u'pstMsg': u'0',
u'rmShown': u'1', u'rmShown': u'1',
u'secTok': u'', u'secTok': u'',
@ -348,7 +341,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
}, },
{ {
u"url": u"http://www.youtube.com/watch?v=1ltcDfZMA3U", u"url": u"http://www.youtube.com/watch?v=1ltcDfZMA3U",
u"file": u"1ltcDfZMA3U.flv", u"file": u"1ltcDfZMA3U.mp4",
u"note": u"Test VEVO video (#897)", u"note": u"Test VEVO video (#897)",
u"info_dict": { u"info_dict": {
u"upload_date": u"20070518", u"upload_date": u"20070518",
@ -1405,32 +1398,29 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
# this signatures are encrypted # this signatures are encrypted
if 'url_encoded_fmt_stream_map' not in args: if 'url_encoded_fmt_stream_map' not in args:
raise ValueError(u'No stream_map present') # caught below raise ValueError(u'No stream_map present') # caught below
m_s = re.search(r'[&,]s=', args['url_encoded_fmt_stream_map']) re_signature = re.compile(r'[&,]s=')
m_s = re_signature.search(args['url_encoded_fmt_stream_map'])
if m_s is not None: if m_s is not None:
self.to_screen(u'%s: Encrypted signatures detected.' % video_id) self.to_screen(u'%s: Encrypted signatures detected.' % video_id)
video_info['url_encoded_fmt_stream_map'] = [args['url_encoded_fmt_stream_map']] video_info['url_encoded_fmt_stream_map'] = [args['url_encoded_fmt_stream_map']]
m_s = re.search(r'[&,]s=', args.get('adaptive_fmts', u'')) m_s = re_signature.search(args.get('adaptive_fmts', u''))
if m_s is not None: if m_s is not None:
if 'url_encoded_fmt_stream_map' in video_info: if 'adaptive_fmts' in video_info:
video_info['url_encoded_fmt_stream_map'][0] += ',' + args['adaptive_fmts'] video_info['adaptive_fmts'][0] += ',' + args['adaptive_fmts']
else: else:
video_info['url_encoded_fmt_stream_map'] = [args['adaptive_fmts']] video_info['adaptive_fmts'] = [args['adaptive_fmts']]
elif 'adaptive_fmts' in video_info:
if 'url_encoded_fmt_stream_map' in video_info:
video_info['url_encoded_fmt_stream_map'][0] += ',' + video_info['adaptive_fmts'][0]
else:
video_info['url_encoded_fmt_stream_map'] = video_info['adaptive_fmts']
except ValueError: except ValueError:
pass pass
if 'conn' in video_info and video_info['conn'][0].startswith('rtmp'): if 'conn' in video_info and video_info['conn'][0].startswith('rtmp'):
self.report_rtmp_download() self.report_rtmp_download()
video_url_list = [(None, video_info['conn'][0])] video_url_list = [(None, video_info['conn'][0])]
elif 'url_encoded_fmt_stream_map' in video_info and len(video_info['url_encoded_fmt_stream_map']) >= 1: elif len(video_info.get('url_encoded_fmt_stream_map', [])) >= 1 or len(video_info.get('adaptive_fmts', [])) >= 1:
if 'rtmpe%3Dyes' in video_info['url_encoded_fmt_stream_map'][0]: encoded_url_map = video_info.get('url_encoded_fmt_stream_map', [''])[0] + ',' + video_info.get('adaptive_fmts',[''])[0]
if 'rtmpe%3Dyes' in encoded_url_map:
raise ExtractorError('rtmpe downloads are not supported, see https://github.com/rg3/youtube-dl/issues/343 for more information.', expected=True) raise ExtractorError('rtmpe downloads are not supported, see https://github.com/rg3/youtube-dl/issues/343 for more information.', expected=True)
url_map = {} url_map = {}
for url_data_str in video_info['url_encoded_fmt_stream_map'][0].split(','): for url_data_str in encoded_url_map.split(','):
url_data = compat_parse_qs(url_data_str) url_data = compat_parse_qs(url_data_str)
if 'itag' in url_data and 'url' in url_data: if 'itag' in url_data and 'url' in url_data:
url = url_data['url'][0] url = url_data['url'][0]
@ -1483,13 +1473,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
raise ExtractorError(u'no conn, hlsvp or url_encoded_fmt_stream_map information found in video info') raise ExtractorError(u'no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
results = [] results = []
for format_param, video_real_url in video_url_list: for itag, video_real_url in video_url_list:
# Extension # Extension
video_extension = self._video_extensions.get(format_param, 'flv') video_extension = self._video_extensions.get(itag, 'flv')
video_format = '{0} - {1}{2}'.format(format_param if format_param else video_extension, video_format = '{0} - {1}{2}'.format(itag if itag else video_extension,
self._video_dimensions.get(format_param, '???'), self._video_dimensions.get(itag, '???'),
' ('+self._special_itags[format_param]+')' if format_param in self._special_itags else '') ' ('+self._special_itags[itag]+')' if itag in self._special_itags else '')
results.append({ results.append({
'id': video_id, 'id': video_id,
@ -1500,6 +1490,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'title': video_title, 'title': video_title,
'ext': video_extension, 'ext': video_extension,
'format': video_format, 'format': video_format,
'format_id': itag,
'thumbnail': video_thumbnail, 'thumbnail': video_thumbnail,
'description': video_description, 'description': video_description,
'player_url': player_url, 'player_url': player_url,

View File

@ -1,2 +1,2 @@
__version__ = '2013.10.23.1' __version__ = '2013.10.29'