Compare commits

...

65 Commits

Author SHA1 Message Date
612ee37365 release 2015.02.10.4 2015-02-10 11:28:34 +01:00
442c37b7a9 [YoutubeDL] Do not perform filter matching on partial results (Fixes #4921) 2015-02-10 11:28:28 +01:00
04bbe41330 release 2015.02.10.3 2015-02-10 05:42:47 +01:00
8f84f57183 [ccc] Add new extractor (Fixes #4890) 2015-02-10 05:42:41 +01:00
6a78740211 [test/test_youtube_signature] Use fake YDL 2015-02-10 05:28:59 +01:00
c0e1a415fd [firstpost] Modernize 2015-02-10 05:28:48 +01:00
bf8f082a90 [vimeo:album] Add support for album passwords (Fixes #4917) 2015-02-10 04:53:21 +01:00
2f543a2142 [options] Add alias --dump-header for --print-traffic 2015-02-10 04:52:33 +01:00
7e5db8c930 [options] Add --no-color 2015-02-10 04:22:10 +01:00
f7a211dcc8 [pornhd] Fix extraction (fixes #4915) 2015-02-10 03:41:31 +01:00
845734773d release 2015.02.10.2 2015-02-10 03:32:55 +01:00
347de4931c [YoutubeDL] Add generic video filtering (Fixes #4916)
This functionality is intended to eventually encompass the current format filtering.
2015-02-10 03:32:24 +01:00
8829650513 release 2015.02.10.1 2015-02-10 01:46:09 +01:00
c73fae1e2e [commonmistakes] Detect BOMs at the beginning of URLs
Reported at https://bugzilla.redhat.com/show_bug.cgi?id=1093517 .
2015-02-10 01:40:55 +01:00
834bf069d2 [bandcamp] Correct variable name 2015-02-10 01:37:14 +01:00
c06a9fa34f Use snake_case instead of camelCase 2015-02-10 01:36:38 +01:00
753fad4adc [commonmistakes] Correct logic error 2015-02-10 01:34:01 +01:00
34814eb66e release 2015.02.10 2015-02-10 01:19:52 +01:00
3a5bcd0326 [extractor/common] Wrap extractor errors (Fixes #1194)
For now, we just wrap some common errors. More may follow. We do not want to catch actual programming errors in the extractors, such as 1 // 0.
2015-02-10 01:17:23 +01:00
99c2398bc6 [bandcamp] Use our API to get more stable error messages (#1194) 2015-02-09 19:08:51 +01:00
28f1272870 [svtplay] Correct test case 2015-02-09 16:05:01 +01:00
f18e3a2fc0 release 2015.02.09.3 2015-02-09 15:59:19 +01:00
c4c5dc27cb Merge branch 'master' of github.com:rg3/youtube-dl 2015-02-09 15:59:14 +01:00
2caf182f37 [trilulilu] Add support for videos without category in the URL (Closes #4067)
Also, update the testcase, detect private/country restricted videos and modernize a bit.
2015-02-09 17:00:05 +02:00
43f244b6d5 [YoutubeDL] Do not show worst in --list-formats output
Nobody wants to know what the worst possible format is. And if they do, they can still provide -f worst.
2015-02-09 15:57:42 +01:00
1309b396d0 [svtplay] Add new extractor (Fixes #4914) 2015-02-09 15:56:59 +01:00
ba61796458 [youtube] Don't override format info from the dash manifest (fixes #4911) 2015-02-09 15:04:22 +01:00
3255fe7141 release 2015.02.09.2 2015-02-09 14:46:30 +01:00
e98b8e79ea [generic] Improve SBS detection (Fixes #4899) 2015-02-09 14:46:10 +01:00
196121c51b release 2015.02.09.1 2015-02-09 10:49:10 +01:00
5269028951 [rtlnow] Add test for @mmue's extension (#4908) 2015-02-09 10:47:19 +01:00
f7bc056b5a Merge remote-tracking branch 'mmue/fix-rtlnow' 2015-02-09 10:44:55 +01:00
a0f7198544 [generic] Add support for jwPlayer YouTube videos
This makes nationalarchives.gov.uk work (Fixes #4907, fixes #4876)
2015-02-09 10:43:01 +01:00
dd8930684e release 2015.02.09 2015-02-09 10:28:16 +01:00
bdb186f3b0 fix rtlnow for newer series like "Der Bachelor" season 5 2015-02-08 21:55:39 +01:00
64f9baa084 [options] Mention asr as possible filter 2015-02-09 01:35:16 +06:00
b29231c040 release 2015.02.08 2015-02-08 20:28:38 +01:00
6128bf07a9 [options] Update help on string comparisons 2015-02-09 01:27:27 +06:00
2ec19e9558 [YoutubeDL] Allow filtering by audio sampling rate 2015-02-09 01:09:45 +06:00
9ddb6925bf [YoutubeDL] Allow filtering by string properties (#4906) 2015-02-09 01:07:43 +06:00
12931e1c6e Credit @robin007bond for tweakers (#4881) and gamekings fixes (#4901) 2015-02-08 23:33:29 +06:00
41c23b0da5 [gamekings] Support videos from news pages 2015-02-08 23:12:59 +06:00
2578ab19e4 Merge branch 'robin007bond-gamekings' 2015-02-08 23:03:31 +06:00
d87ec897e9 [gamekings] Improve extraction 2015-02-08 23:03:12 +06:00
3bd4bffb1c Merge branch 'gamekings' of https://github.com/robin007bond/youtube-dl into robin007bond-gamekings 2015-02-08 22:46:43 +06:00
c36b09a502 [Gamekings] Use thumbnail in return statement 2015-02-08 16:46:13 +01:00
641eb10d34 Use _family_friendly_search for determining age_limit 2015-02-08 17:45:38 +02:00
955c5505e7 [Gamekings] Use xpath
XPath is used for extracting the video url and the thumbnail
2015-02-08 16:44:25 +01:00
69319969de [extractor/common] Add new helper method _family_friendly_search 2015-02-08 17:39:00 +02:00
a14292e848 [soulanime] Remove extractor (#4554)
Was supposed to be deleted by 67c2bcd
2015-02-08 16:57:07 +02:00
5d678df64a [Gamekings] Download playlist
Todo: URL and Thumbnail should be extracted with XPath
2015-02-08 15:34:37 +01:00
8ca8cbe2bd [Gamekings] Check string for vimeo, fix test
The test now doesn't fail anymore. It just checks the string for having
"vimeo" in it, instead of using the method for URL-checking, since it's
returns an error.

The tests don't fail, and the extractor works fine now.
2015-02-08 14:41:14 +01:00
ba322d8209 [Gamekings] Added test and replaced video_url
Quick and dirty fix for the Gamekings extractor. It gives an error about
the video_url, but it downloads it now instead of giving a 404 error on
newer Gamekings videos
2015-02-08 14:23:37 +01:00
2f38289b79 [Gamekings] Fix order of replacement string
Oops.
2015-02-08 13:49:32 +01:00
f23a3ca699 [Gamekings] Fixed typo in URL replacement 2015-02-08 13:47:27 +01:00
77d2b106cc [Gamekings] Fix 404 when large isn't available
When trying to download some GameKings videos, not all worked. This was
because not all videos had a "/large"-URL available. The extractor
checks now if the /large URL is available, if it isn't, it tries to get
the normal URL.
2015-02-08 13:42:41 +01:00
c0e46412e9 [aparat] Fix extraction (Closes #4897) 2015-02-08 17:30:29 +06:00
0161353d7d [test/test_YoutubeDL] Remove debug print call 2015-02-06 23:58:01 +01:00
2b4ecde2c8 [test/YoutubeDL] Add a simple test for postprocesors
Just checks that the 'keepvideo' option works as intended.
2015-02-06 23:54:25 +01:00
b3a286d69d [YoutubeDL] _calc_cookies: add get_header method to _PseudoRequest (#4861) 2015-02-06 22:23:06 +01:00
467d3c9a0c [ffmpeg] --extrac-audio: Use the same options for avconv and ffmpeg
They have been available in ffmpeg since version 0.9, and we require 1.0 or higher.
2015-02-06 22:05:11 +01:00
ad5747bad1 [rtp] Construct regular HTTP download URLs (#4882) 2015-02-06 23:00:54 +02:00
d6eb66ed3c [aftenposten] Add extractor (Closes #4863) 2015-02-07 01:46:54 +06:00
7f2a9f1b49 [tvigle] Add support for cloud URLs (Closes #4887) 2015-02-06 21:15:01 +06:00
1e1896f2de [extractor/common] Correct sort order.
We should look at height and width before ext_preference.
2015-02-06 15:16:45 +01:00
34 changed files with 851 additions and 241 deletions

View File

@ -109,3 +109,4 @@ David Luhmer
Shaya Goldberg Shaya Goldberg
Paul Hartmann Paul Hartmann
Frans de Jonge Frans de Jonge
Robin de Rooij

View File

@ -77,6 +77,7 @@ which means you can modify it, redistribute it or use it however you like.
on Windows) on Windows)
--flat-playlist Do not extract the videos of a playlist, --flat-playlist Do not extract the videos of a playlist,
only list them. only list them.
--no-color Do not emit color codes in output.
## Network Options: ## Network Options:
--proxy URL Use the specified HTTP/HTTPS proxy. Pass in --proxy URL Use the specified HTTP/HTTPS proxy. Pass in
@ -119,6 +120,23 @@ which means you can modify it, redistribute it or use it however you like.
COUNT views COUNT views
--max-views COUNT Do not download any videos with more than --max-views COUNT Do not download any videos with more than
COUNT views COUNT views
--match-filter FILTER (Experimental) Generic video filter.
Specify any key (see help for -o for a list
of available keys) to match if the key is
present, !key to check if the key is not
present,key > NUMBER (like "comment_count >
12", also works with >=, <, <=, !=, =) to
compare against a number, and & to require
multiple matches. Values which are not
known are excluded unless you put a
question mark (?) after the operator.For
example, to only match videos that have
been liked more than 100 times and disliked
less than 50 times (or the dislike
functionality is not available at the given
service), but who also have a description,
use --match-filter "like_count > 100 &
dislike_count <? 50 & description" .
--no-playlist If the URL refers to a video and a --no-playlist If the URL refers to a video and a
playlist, download only the video. playlist, download only the video.
--age-limit YEARS download only videos suitable for the given --age-limit YEARS download only videos suitable for the given
@ -292,18 +310,20 @@ which means you can modify it, redistribute it or use it however you like.
video results by putting a condition in video results by putting a condition in
brackets, as in -f "best[height=720]" (or brackets, as in -f "best[height=720]" (or
-f "[filesize>10M]"). This works for -f "[filesize>10M]"). This works for
filesize, height, width, tbr, abr, vbr, and filesize, height, width, tbr, abr, vbr,
fps and the comparisons <, <=, >, >=, =, != asr, and fps and the comparisons <, <=, >,
. Formats for which the value is not known >=, =, != and for ext, acodec, vcodec,
are excluded unless you put a question mark container, and protocol and the comparisons
(?) after the operator. You can combine =, != . Formats for which the value is not
format filters, so -f "[height <=? known are excluded unless you put a
720][tbr>500]" selects up to 720p videos question mark (?) after the operator. You
(or videos where the height is not known) can combine format filters, so -f "[height
with a bitrate of at least 500 KBit/s. By <=? 720][tbr>500]" selects up to 720p
default, youtube-dl will pick the best videos (or videos where the height is not
quality. Use commas to download multiple known) with a bitrate of at least 500
audio formats, such as -f KBit/s. By default, youtube-dl will pick
the best quality. Use commas to download
multiple audio formats, such as -f
136/137/mp4/bestvideo,140/m4a/bestaudio. 136/137/mp4/bestvideo,140/m4a/bestaudio.
You can merge the video and audio of two You can merge the video and audio of two
formats into a single file using -f <video- formats into a single file using -f <video-

View File

@ -14,6 +14,7 @@
- **AddAnime** - **AddAnime**
- **AdobeTV** - **AdobeTV**
- **AdultSwim** - **AdultSwim**
- **Aftenposten**
- **Aftonbladet** - **Aftonbladet**
- **AlJazeera** - **AlJazeera**
- **Allocine** - **Allocine**
@ -224,6 +225,7 @@
- **mailru**: Видео@Mail.Ru - **mailru**: Видео@Mail.Ru
- **Malemotion** - **Malemotion**
- **MDR** - **MDR**
- **media.ccc.de**
- **metacafe** - **metacafe**
- **Metacritic** - **Metacritic**
- **Mgoon** - **Mgoon**
@ -391,6 +393,7 @@
- **StreamCZ** - **StreamCZ**
- **StreetVoice** - **StreetVoice**
- **SunPorno** - **SunPorno**
- **SVTPlay**
- **SWRMediathek** - **SWRMediathek**
- **Syfy** - **Syfy**
- **SztvHu** - **SztvHu**

View File

@ -13,6 +13,7 @@ import copy
from test.helper import FakeYDL, assertRegexpMatches from test.helper import FakeYDL, assertRegexpMatches
from youtube_dl import YoutubeDL from youtube_dl import YoutubeDL
from youtube_dl.extractor import YoutubeIE from youtube_dl.extractor import YoutubeIE
from youtube_dl.postprocessor.common import PostProcessor
class YDL(FakeYDL): class YDL(FakeYDL):
@ -370,5 +371,35 @@ class TestFormatSelection(unittest.TestCase):
'vbr': 10, 'vbr': 10,
}), '^\s*10k$') }), '^\s*10k$')
def test_postprocessors(self):
filename = 'post-processor-testfile.mp4'
audiofile = filename + '.mp3'
class SimplePP(PostProcessor):
def run(self, info):
with open(audiofile, 'wt') as f:
f.write('EXAMPLE')
info['filepath']
return False, info
def run_pp(params):
with open(filename, 'wt') as f:
f.write('EXAMPLE')
ydl = YoutubeDL(params)
ydl.add_post_processor(SimplePP())
ydl.post_process(filename, {'filepath': filename})
run_pp({'keepvideo': True})
self.assertTrue(os.path.exists(filename), '%s doesn\'t exist' % filename)
self.assertTrue(os.path.exists(audiofile), '%s doesn\'t exist' % audiofile)
os.unlink(filename)
os.unlink(audiofile)
run_pp({'keepvideo': False})
self.assertFalse(os.path.exists(filename), '%s exists' % filename)
self.assertTrue(os.path.exists(audiofile), '%s doesn\'t exist' % audiofile)
os.unlink(audiofile)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -53,6 +53,7 @@ from youtube_dl.utils import (
version_tuple, version_tuple,
xpath_with_ns, xpath_with_ns,
render_table, render_table,
match_str,
) )
@ -459,6 +460,37 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
'123 4\n' '123 4\n'
'9999 51') '9999 51')
def test_match_str(self):
self.assertRaises(ValueError, match_str, 'xy>foobar', {})
self.assertFalse(match_str('xy', {'x': 1200}))
self.assertTrue(match_str('!xy', {'x': 1200}))
self.assertTrue(match_str('x', {'x': 1200}))
self.assertFalse(match_str('!x', {'x': 1200}))
self.assertTrue(match_str('x', {'x': 0}))
self.assertFalse(match_str('x>0', {'x': 0}))
self.assertFalse(match_str('x>0', {}))
self.assertTrue(match_str('x>?0', {}))
self.assertTrue(match_str('x>1K', {'x': 1200}))
self.assertFalse(match_str('x>2K', {'x': 1200}))
self.assertTrue(match_str('x>=1200 & x < 1300', {'x': 1200}))
self.assertFalse(match_str('x>=1100 & x < 1200', {'x': 1200}))
self.assertFalse(match_str('y=a212', {'y': 'foobar42'}))
self.assertTrue(match_str('y=foobar42', {'y': 'foobar42'}))
self.assertFalse(match_str('y!=foobar42', {'y': 'foobar42'}))
self.assertTrue(match_str('y!=foobar2', {'y': 'foobar42'}))
self.assertFalse(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 90, 'description': 'foo'}))
self.assertTrue(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 190, 'description': 'foo'}))
self.assertFalse(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 190, 'dislike_count': 60, 'description': 'foo'}))
self.assertFalse(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 190, 'dislike_count': 10}))
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -8,11 +8,11 @@ import sys
import unittest import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import io import io
import re import re
import string import string
from test.helper import FakeYDL
from youtube_dl.extractor import YoutubeIE from youtube_dl.extractor import YoutubeIE
from youtube_dl.compat import compat_str, compat_urlretrieve from youtube_dl.compat import compat_str, compat_urlretrieve
@ -88,7 +88,8 @@ def make_tfunc(url, stype, sig_input, expected_sig):
if not os.path.exists(fn): if not os.path.exists(fn):
compat_urlretrieve(url, fn) compat_urlretrieve(url, fn)
ie = YoutubeIE() ydl = FakeYDL()
ie = YoutubeIE(ydl)
if stype == 'js': if stype == 'js':
with io.open(fn, encoding='utf-8') as testf: with io.open(fn, encoding='utf-8') as testf:
jscode = testf.read() jscode = testf.read()

View File

@ -228,6 +228,12 @@ class YoutubeDL(object):
external_downloader: Executable of the external downloader to call. external_downloader: Executable of the external downloader to call.
listformats: Print an overview of available video formats and exit. listformats: Print an overview of available video formats and exit.
list_thumbnails: Print a table of all thumbnails and exit. list_thumbnails: Print a table of all thumbnails and exit.
match_filter: A function that gets called with the info_dict of
every video.
If it returns a message, the video is ignored.
If it returns None, the video is downloaded.
match_filter_func in utils.py is one example for this.
no_color: Do not emit color codes in output.
The following parameters are not used by YoutubeDL itself, they are used by The following parameters are not used by YoutubeDL itself, they are used by
@ -485,7 +491,7 @@ class YoutubeDL(object):
else: else:
if self.params.get('no_warnings'): if self.params.get('no_warnings'):
return return
if self._err_file.isatty() and os.name != 'nt': if not self.params.get('no_color') and self._err_file.isatty() and os.name != 'nt':
_msg_header = '\033[0;33mWARNING:\033[0m' _msg_header = '\033[0;33mWARNING:\033[0m'
else: else:
_msg_header = 'WARNING:' _msg_header = 'WARNING:'
@ -497,7 +503,7 @@ class YoutubeDL(object):
Do the same as trouble, but prefixes the message with 'ERROR:', colored Do the same as trouble, but prefixes the message with 'ERROR:', colored
in red if stderr is a tty file. in red if stderr is a tty file.
''' '''
if self._err_file.isatty() and os.name != 'nt': if not self.params.get('no_color') and self._err_file.isatty() and os.name != 'nt':
_msg_header = '\033[0;31mERROR:\033[0m' _msg_header = '\033[0;31mERROR:\033[0m'
else: else:
_msg_header = 'ERROR:' _msg_header = 'ERROR:'
@ -554,7 +560,7 @@ class YoutubeDL(object):
self.report_error('Error in output template: ' + str(err) + ' (encoding: ' + repr(preferredencoding()) + ')') self.report_error('Error in output template: ' + str(err) + ' (encoding: ' + repr(preferredencoding()) + ')')
return None return None
def _match_entry(self, info_dict): def _match_entry(self, info_dict, incomplete):
""" Returns None iff the file should be downloaded """ """ Returns None iff the file should be downloaded """
video_title = info_dict.get('title', info_dict.get('id', 'video')) video_title = info_dict.get('title', info_dict.get('id', 'video'))
@ -583,9 +589,17 @@ class YoutubeDL(object):
if max_views is not None and view_count > max_views: if max_views is not None and view_count > max_views:
return 'Skipping %s, because it has exceeded the maximum view count (%d/%d)' % (video_title, view_count, max_views) return 'Skipping %s, because it has exceeded the maximum view count (%d/%d)' % (video_title, view_count, max_views)
if age_restricted(info_dict.get('age_limit'), self.params.get('age_limit')): if age_restricted(info_dict.get('age_limit'), self.params.get('age_limit')):
return 'Skipping "%s" because it is age restricted' % title return 'Skipping "%s" because it is age restricted' % video_title
if self.in_download_archive(info_dict): if self.in_download_archive(info_dict):
return '%s has already been recorded in archive' % video_title return '%s has already been recorded in archive' % video_title
if not incomplete:
match_filter = self.params.get('match_filter')
if match_filter is not None:
ret = match_filter(info_dict)
if ret is not None:
return ret
return None return None
@staticmethod @staticmethod
@ -779,7 +793,7 @@ class YoutubeDL(object):
'extractor_key': ie_result['extractor_key'], 'extractor_key': ie_result['extractor_key'],
} }
reason = self._match_entry(entry) reason = self._match_entry(entry, incomplete=True)
if reason is not None: if reason is not None:
self.to_screen('[download] ' + reason) self.to_screen('[download] ' + reason)
continue continue
@ -826,27 +840,44 @@ class YoutubeDL(object):
'!=': operator.ne, '!=': operator.ne,
} }
operator_rex = re.compile(r'''(?x)\s*\[ operator_rex = re.compile(r'''(?x)\s*\[
(?P<key>width|height|tbr|abr|vbr|filesize|fps) (?P<key>width|height|tbr|abr|vbr|asr|filesize|fps)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s* \s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?P<value>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?) (?P<value>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)
\]$ \]$
''' % '|'.join(map(re.escape, OPERATORS.keys()))) ''' % '|'.join(map(re.escape, OPERATORS.keys())))
m = operator_rex.search(format_spec) m = operator_rex.search(format_spec)
if m:
try:
comparison_value = int(m.group('value'))
except ValueError:
comparison_value = parse_filesize(m.group('value'))
if comparison_value is None:
comparison_value = parse_filesize(m.group('value') + 'B')
if comparison_value is None:
raise ValueError(
'Invalid value %r in format specification %r' % (
m.group('value'), format_spec))
op = OPERATORS[m.group('op')]
if not m:
STR_OPERATORS = {
'=': operator.eq,
'!=': operator.ne,
}
str_operator_rex = re.compile(r'''(?x)\s*\[
\s*(?P<key>ext|acodec|vcodec|container|protocol)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9_-]+)
\s*\]$
''' % '|'.join(map(re.escape, STR_OPERATORS.keys())))
m = str_operator_rex.search(format_spec)
if m:
comparison_value = m.group('value')
op = STR_OPERATORS[m.group('op')]
if not m: if not m:
raise ValueError('Invalid format specification %r' % format_spec) raise ValueError('Invalid format specification %r' % format_spec)
try:
comparison_value = int(m.group('value'))
except ValueError:
comparison_value = parse_filesize(m.group('value'))
if comparison_value is None:
comparison_value = parse_filesize(m.group('value') + 'B')
if comparison_value is None:
raise ValueError(
'Invalid value %r in format specification %r' % (
m.group('value'), format_spec))
op = OPERATORS[m.group('op')]
def _filter(f): def _filter(f):
actual_value = f.get(m.group('key')) actual_value = f.get(m.group('key'))
if actual_value is None: if actual_value is None:
@ -938,6 +969,9 @@ class YoutubeDL(object):
def has_header(self, h): def has_header(self, h):
return h in self.headers return h in self.headers
def get_header(self, h, default=None):
return self.headers.get(h, default)
pr = _PseudoRequest(info_dict['url']) pr = _PseudoRequest(info_dict['url'])
self.cookiejar.add_cookie_header(pr) self.cookiejar.add_cookie_header(pr)
return pr.headers.get('Cookie') return pr.headers.get('Cookie')
@ -1133,7 +1167,7 @@ class YoutubeDL(object):
if 'format' not in info_dict: if 'format' not in info_dict:
info_dict['format'] = info_dict['ext'] info_dict['format'] = info_dict['ext']
reason = self._match_entry(info_dict) reason = self._match_entry(info_dict, incomplete=False)
if reason is not None: if reason is not None:
self.to_screen('[download] ' + reason) self.to_screen('[download] ' + reason)
return return
@ -1526,7 +1560,6 @@ class YoutubeDL(object):
line(f, idlen) for f in formats line(f, idlen) for f in formats
if f.get('preference') is None or f['preference'] >= -1000] if f.get('preference') is None or f['preference'] >= -1000]
if len(formats) > 1: if len(formats) > 1:
formats_s[0] += (' ' if self._format_note(formats[0]) else '') + '(worst)'
formats_s[-1] += (' ' if self._format_note(formats[-1]) else '') + '(best)' formats_s[-1] += (' ' if self._format_note(formats[-1]) else '') + '(best)'
header_line = line({ header_line = line({

View File

@ -23,9 +23,10 @@ from .compat import (
) )
from .utils import ( from .utils import (
DateRange, DateRange,
DEFAULT_OUTTMPL,
decodeOption, decodeOption,
DEFAULT_OUTTMPL,
DownloadError, DownloadError,
match_filter_func,
MaxDownloadsReached, MaxDownloadsReached,
preferredencoding, preferredencoding,
read_batch_urls, read_batch_urls,
@ -247,6 +248,9 @@ def _real_main(argv=None):
xattr # Confuse flake8 xattr # Confuse flake8
except ImportError: except ImportError:
parser.error('setting filesize xattr requested but python-xattr is not available') parser.error('setting filesize xattr requested but python-xattr is not available')
match_filter = (
None if opts.match_filter is None
else match_filter_func(opts.match_filter))
ydl_opts = { ydl_opts = {
'usenetrc': opts.usenetrc, 'usenetrc': opts.usenetrc,
@ -344,6 +348,8 @@ def _real_main(argv=None):
'list_thumbnails': opts.list_thumbnails, 'list_thumbnails': opts.list_thumbnails,
'playlist_items': opts.playlist_items, 'playlist_items': opts.playlist_items,
'xattr_set_filesize': opts.xattr_set_filesize, 'xattr_set_filesize': opts.xattr_set_filesize,
'match_filter': match_filter,
'no_color': opts.no_color,
} }
with YoutubeDL(ydl_opts) as ydl: with YoutubeDL(ydl_opts) as ydl:

View File

@ -6,6 +6,7 @@ from .academicearth import AcademicEarthCourseIE
from .addanime import AddAnimeIE from .addanime import AddAnimeIE
from .adobetv import AdobeTVIE from .adobetv import AdobeTVIE
from .adultswim import AdultSwimIE from .adultswim import AdultSwimIE
from .aftenposten import AftenpostenIE
from .aftonbladet import AftonbladetIE from .aftonbladet import AftonbladetIE
from .aljazeera import AlJazeeraIE from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE from .alphaporno import AlphaPornoIE
@ -53,6 +54,7 @@ from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE from .canalc2 import Canalc2IE
from .cbs import CBSIE from .cbs import CBSIE
from .cbsnews import CBSNewsIE from .cbsnews import CBSNewsIE
from .ccc import CCCIE
from .ceskatelevize import CeskaTelevizeIE from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE from .channel9 import Channel9IE
from .chilloutzone import ChilloutzoneIE from .chilloutzone import ChilloutzoneIE
@ -73,7 +75,7 @@ from .collegehumor import CollegeHumorIE
from .collegerama import CollegeRamaIE from .collegerama import CollegeRamaIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .comcarcoff import ComCarCoffIE from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .condenast import CondeNastIE from .condenast import CondeNastIE
from .cracked import CrackedIE from .cracked import CrackedIE
from .criterion import CriterionIE from .criterion import CriterionIE
@ -427,6 +429,7 @@ from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE from .streetvoice import StreetVoiceIE
from .sunporno import SunPornoIE from .sunporno import SunPornoIE
from .svtplay import SVTPlayIE
from .swrmediathek import SWRMediathekIE from .swrmediathek import SWRMediathekIE
from .syfy import SyfyIE from .syfy import SyfyIE
from .sztvhu import SztvHuIE from .sztvhu import SztvHuIE

View File

@ -0,0 +1,103 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
xpath_with_ns,
xpath_text,
find_xpath_attr,
)
class AftenpostenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?aftenposten\.no/webtv/([^/]+/)*(?P<id>[^/]+)-\d+\.html'
_TEST = {
'url': 'http://www.aftenposten.no/webtv/serier-og-programmer/sweatshopenglish/TRAILER-SWEATSHOP---I-cant-take-any-more-7800835.html?paging=&section=webtv_serierogprogrammer_sweatshop_sweatshopenglish',
'md5': 'fd828cd29774a729bf4d4425fe192972',
'info_dict': {
'id': '21039',
'ext': 'mov',
'title': 'TRAILER: "Sweatshop" - I can´t take any more',
'description': 'md5:21891f2b0dd7ec2f78d84a50e54f8238',
'timestamp': 1416927969,
'upload_date': '20141125',
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._html_search_regex(
r'data-xs-id="(\d+)"', webpage, 'video id')
data = self._download_xml(
'http://frontend.xstream.dk/ap/feed/video/?platform=web&id=%s' % video_id, video_id)
NS_MAP = {
'atom': 'http://www.w3.org/2005/Atom',
'xt': 'http://xstream.dk/',
'media': 'http://search.yahoo.com/mrss/',
}
entry = data.find(xpath_with_ns('./atom:entry', NS_MAP))
title = xpath_text(
entry, xpath_with_ns('./atom:title', NS_MAP), 'title')
description = xpath_text(
entry, xpath_with_ns('./atom:summary', NS_MAP), 'description')
timestamp = parse_iso8601(xpath_text(
entry, xpath_with_ns('./atom:published', NS_MAP), 'upload date'))
formats = []
media_group = entry.find(xpath_with_ns('./media:group', NS_MAP))
for media_content in media_group.findall(xpath_with_ns('./media:content', NS_MAP)):
media_url = media_content.get('url')
if not media_url:
continue
tbr = int_or_none(media_content.get('bitrate'))
mobj = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', media_url)
if mobj:
formats.append({
'url': mobj.group('url'),
'play_path': 'mp4:%s' % mobj.group('playpath'),
'app': mobj.group('app'),
'ext': 'flv',
'tbr': tbr,
'format_id': 'rtmp-%d' % tbr,
})
else:
formats.append({
'url': media_url,
'tbr': tbr,
})
self._sort_formats(formats)
link = find_xpath_attr(
entry, xpath_with_ns('./atom:link', NS_MAP), 'rel', 'original')
if link is not None:
formats.append({
'url': link.get('href'),
'format_id': link.get('rel'),
})
thumbnails = [{
'url': splash.get('url'),
'width': int_or_none(splash.get('width')),
'height': int_or_none(splash.get('height')),
} for splash in media_group.findall(xpath_with_ns('./xt:splash', NS_MAP))]
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'formats': formats,
'thumbnails': thumbnails,
}

View File

@ -20,6 +20,7 @@ class AparatIE(InfoExtractor):
'id': 'wP8On', 'id': 'wP8On',
'ext': 'mp4', 'ext': 'mp4',
'title': 'تیم گلکسی 11 - زومیت', 'title': 'تیم گلکسی 11 - زومیت',
'age_limit': 0,
}, },
# 'skip': 'Extremely unreliable', # 'skip': 'Extremely unreliable',
} }
@ -34,7 +35,8 @@ class AparatIE(InfoExtractor):
video_id + '/vt/frame') video_id + '/vt/frame')
webpage = self._download_webpage(embed_url, video_id) webpage = self._download_webpage(embed_url, video_id)
video_urls = re.findall(r'fileList\[[0-9]+\]\s*=\s*"([^"]+)"', webpage) video_urls = [video_url.replace('\\/', '/') for video_url in re.findall(
r'(?:fileList\[[0-9]+\]\s*=|"file"\s*:)\s*"([^"]+)"', webpage)]
for i, video_url in enumerate(video_urls): for i, video_url in enumerate(video_urls):
req = HEADRequest(video_url) req = HEADRequest(video_url)
res = self._request_webpage( res = self._request_webpage(
@ -46,7 +48,7 @@ class AparatIE(InfoExtractor):
title = self._search_regex(r'\s+title:\s*"([^"]+)"', webpage, 'title') title = self._search_regex(r'\s+title:\s*"([^"]+)"', webpage, 'title')
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'\s+image:\s*"([^"]+)"', webpage, 'thumbnail', fatal=False) r'image:\s*"([^"]+)"', webpage, 'thumbnail', fatal=False)
return { return {
'id': video_id, 'id': video_id,
@ -54,4 +56,5 @@ class AparatIE(InfoExtractor):
'url': video_url, 'url': video_url,
'ext': 'mp4', 'ext': 'mp4',
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'age_limit': self._family_friendly_search(webpage),
} }

View File

@ -72,26 +72,29 @@ class BandcampIE(InfoExtractor):
download_link = m_download.group(1) download_link = m_download.group(1)
video_id = self._search_regex( video_id = self._search_regex(
r'var TralbumData = {.*?id: (?P<id>\d+),?$', r'(?ms)var TralbumData = {.*?id: (?P<id>\d+),?$',
webpage, 'video id', flags=re.MULTILINE | re.DOTALL) webpage, 'video id')
download_webpage = self._download_webpage(download_link, video_id, 'Downloading free downloads page') download_webpage = self._download_webpage(download_link, video_id, 'Downloading free downloads page')
# We get the dictionary of the track from some javascript code # We get the dictionary of the track from some javascript code
info = re.search(r'items: (.*?),$', download_webpage, re.MULTILINE).group(1) all_info = self._parse_json(self._search_regex(
info = json.loads(info)[0] r'(?sm)items: (.*?),$', download_webpage, 'items'), video_id)
info = all_info[0]
# We pick mp3-320 for now, until format selection can be easily implemented. # We pick mp3-320 for now, until format selection can be easily implemented.
mp3_info = info['downloads']['mp3-320'] mp3_info = info['downloads']['mp3-320']
# If we try to use this url it says the link has expired # If we try to use this url it says the link has expired
initial_url = mp3_info['url'] initial_url = mp3_info['url']
re_url = r'(?P<server>http://(.*?)\.bandcamp\.com)/download/track\?enc=mp3-320&fsig=(?P<fsig>.*?)&id=(?P<id>.*?)&ts=(?P<ts>.*)$' m_url = re.match(
m_url = re.match(re_url, initial_url) r'(?P<server>http://(.*?)\.bandcamp\.com)/download/track\?enc=mp3-320&fsig=(?P<fsig>.*?)&id=(?P<id>.*?)&ts=(?P<ts>.*)$',
initial_url)
# We build the url we will use to get the final track url # We build the url we will use to get the final track url
# This url is build in Bandcamp in the script download_bunde_*.js # This url is build in Bandcamp in the script download_bunde_*.js
request_url = '%s/statdownload/track?enc=mp3-320&fsig=%s&id=%s&ts=%s&.rand=665028774616&.vrs=1' % (m_url.group('server'), m_url.group('fsig'), video_id, m_url.group('ts')) request_url = '%s/statdownload/track?enc=mp3-320&fsig=%s&id=%s&ts=%s&.rand=665028774616&.vrs=1' % (m_url.group('server'), m_url.group('fsig'), video_id, m_url.group('ts'))
final_url_webpage = self._download_webpage(request_url, video_id, 'Requesting download url') final_url_webpage = self._download_webpage(request_url, video_id, 'Requesting download url')
# If we could correctly generate the .rand field the url would be # If we could correctly generate the .rand field the url would be
# in the "download_url" key # in the "download_url" key
final_url = re.search(r'"retry_url":"(.*?)"', final_url_webpage).group(1) final_url = self._search_regex(
r'"retry_url":"(.*?)"', final_url_webpage, 'final video URL')
return { return {
'id': video_id, 'id': video_id,

View File

@ -0,0 +1,99 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
qualities,
unified_strdate,
)
class CCCIE(InfoExtractor):
IE_NAME = 'media.ccc.de'
_VALID_URL = r'https?://(?:www\.)?media\.ccc\.de/[^?#]+/[^?#/]*?_(?P<id>[0-9]{8,})._[^?#/]*\.html'
_TEST = {
'url': 'http://media.ccc.de/browse/congress/2013/30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor.html#video',
'md5': '205a365d0d57c0b1e43a12c9ffe8f9be',
'info_dict': {
'id': '20131228183',
'ext': 'mp4',
'title': 'Introduction to Processor Design',
'description': 'md5:5ddbf8c734800267f2cee4eab187bc1b',
'thumbnail': 're:^https?://.*\.jpg$',
'view_count': int,
'upload_date': '20131229',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
if self._downloader.params.get('prefer_free_formats'):
preference = qualities(['mp3', 'opus', 'mp4-lq', 'webm-lq', 'h264-sd', 'mp4-sd', 'webm-sd', 'mp4', 'webm', 'mp4-hd', 'h264-hd', 'webm-hd'])
else:
preference = qualities(['opus', 'mp3', 'webm-lq', 'mp4-lq', 'webm-sd', 'h264-sd', 'mp4-sd', 'webm', 'mp4', 'webm-hd', 'mp4-hd', 'h264-hd'])
title = self._html_search_regex(
r'(?s)<h1>(.*?)</h1>', webpage, 'title')
description = self._html_search_regex(
r"(?s)<p class='description'>(.*?)</p>",
webpage, 'description', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r"(?s)<span class='[^']*fa-calendar-o'></span>(.*?)</li>",
webpage, 'upload date', fatal=False))
view_count = int_or_none(self._html_search_regex(
r"(?s)<span class='[^']*fa-eye'></span>(.*?)</li>",
webpage, 'view count', fatal=False))
matches = re.finditer(r'''(?xs)
<(?:span|div)\s+class='label\s+filetype'>(?P<format>.*?)</(?:span|div)>\s*
<a\s+href='(?P<http_url>[^']+)'>\s*
(?:
.*?
<a\s+href='(?P<torrent_url>[^']+\.torrent)'
)?''', webpage)
formats = []
for m in matches:
format = m.group('format')
format_id = self._search_regex(
r'.*/([a-z0-9_-]+)/[^/]*$',
m.group('http_url'), 'format id', default=None)
vcodec = 'h264' if 'h264' in format_id else (
'none' if format_id in ('mp3', 'opus') else None
)
formats.append({
'format_id': format_id,
'format': format,
'url': m.group('http_url'),
'vcodec': vcodec,
'preference': preference(format_id),
})
if m.group('torrent_url'):
formats.append({
'format_id': 'torrent-%s' % (format if format_id is None else format_id),
'format': '%s (torrent)' % format,
'proto': 'torrent',
'format_note': '(unsupported; will just download the .torrent file)',
'vcodec': vcodec,
'preference': -100 + preference(format_id),
'url': m.group('torrent_url'),
})
self._sort_formats(formats)
thumbnail = self._html_search_regex(
r"<video.*?poster='([^']+)'", webpage, 'thumbnail', fatal=False)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'view_count': view_count,
'upload_date': upload_date,
'formats': formats,
}

View File

@ -264,8 +264,15 @@ class InfoExtractor(object):
def extract(self, url): def extract(self, url):
"""Extracts URL information and returns it in list of dicts.""" """Extracts URL information and returns it in list of dicts."""
self.initialize() try:
return self._real_extract(url) self.initialize()
return self._real_extract(url)
except ExtractorError:
raise
except compat_http_client.IncompleteRead as e:
raise ExtractorError('A network error has occured.', cause=e, expected=True)
except (KeyError,) as e:
raise ExtractorError('An extractor error has occured.', cause=e)
def set_downloader(self, downloader): def set_downloader(self, downloader):
"""Sets the downloader for this IE.""" """Sets the downloader for this IE."""
@ -507,7 +514,7 @@ class InfoExtractor(object):
if mobj: if mobj:
break break
if os.name != 'nt' and sys.stderr.isatty(): if not self._downloader.params.get('no_color') and os.name != 'nt' and sys.stderr.isatty():
_name = '\033[0;34m%s\033[0m' % name _name = '\033[0;34m%s\033[0m' % name
else: else:
_name = name _name = name
@ -656,6 +663,21 @@ class InfoExtractor(object):
} }
return RATING_TABLE.get(rating.lower(), None) return RATING_TABLE.get(rating.lower(), None)
def _family_friendly_search(self, html):
# See http://schema.org/VideoObj
family_friendly = self._html_search_meta('isFamilyFriendly', html)
if not family_friendly:
return None
RATING_TABLE = {
'1': 0,
'true': 0,
'0': 18,
'false': 18,
}
return RATING_TABLE.get(family_friendly.lower(), None)
def _twitter_search_player(self, html): def _twitter_search_player(self, html):
return self._html_search_meta('twitter:player', html, return self._html_search_meta('twitter:player', html,
'twitter card player') 'twitter card player')
@ -707,9 +729,9 @@ class InfoExtractor(object):
f.get('quality') if f.get('quality') is not None else -1, f.get('quality') if f.get('quality') is not None else -1,
f.get('tbr') if f.get('tbr') is not None else -1, f.get('tbr') if f.get('tbr') is not None else -1,
f.get('vbr') if f.get('vbr') is not None else -1, f.get('vbr') if f.get('vbr') is not None else -1,
ext_preference,
f.get('height') if f.get('height') is not None else -1, f.get('height') if f.get('height') is not None else -1,
f.get('width') if f.get('width') is not None else -1, f.get('width') if f.get('width') is not None else -1,
ext_preference,
f.get('abr') if f.get('abr') is not None else -1, f.get('abr') if f.get('abr') is not None else -1,
audio_ext_preference, audio_ext_preference,
f.get('fps') if f.get('fps') is not None else -1, f.get('fps') if f.get('fps') is not None else -1,

View File

@ -24,6 +24,23 @@ class CommonMistakesIE(InfoExtractor):
'That doesn\'t make any sense. ' 'That doesn\'t make any sense. '
'Simply remove the parameter in your command or configuration.' 'Simply remove the parameter in your command or configuration.'
) % url ) % url
if self._downloader.params.get('verbose'): if not self._downloader.params.get('verbose'):
msg += ' Add -v to the command line to see what arguments and configuration youtube-dl got.' msg += ' Add -v to the command line to see what arguments and configuration youtube-dl got.'
raise ExtractorError(msg, expected=True) raise ExtractorError(msg, expected=True)
class UnicodeBOMIE(InfoExtractor):
IE_DESC = False
_VALID_URL = r'(?P<bom>\ufeff)(?P<id>.*)$'
_TESTS = [{
'url': '\ufeffhttp://www.youtube.com/watch?v=BaW_jenozKc',
'only_matching': True,
}]
def _real_extract(self, url):
real_url = self._match_id(url)
self.report_warning(
'Your URL starts with a Byte Order Mark (BOM). '
'Removing the BOM and looking for "%s" ...' % real_url)
return self.url_result(real_url)

View File

@ -1,7 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
@ -20,11 +18,10 @@ class FirstpostIE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id) page = self._download_webpage(url, video_id)
title = self._html_search_meta('twitter:title', page, 'title')
title = self._html_search_meta('twitter:title', page, 'title', fatal=True)
description = self._html_search_meta('twitter:description', page, 'title') description = self._html_search_meta('twitter:description', page, 'title')
data = self._download_xml( data = self._download_xml(
@ -42,6 +39,7 @@ class FirstpostIE(InfoExtractor):
'height': int(details.find('./height').text.strip()), 'height': int(details.find('./height').text.strip()),
} for details in item.findall('./source/file_details') if details.find('./file').text } for details in item.findall('./source/file_details') if details.find('./file').text
] ]
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,

View File

@ -1,41 +1,67 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
xpath_text,
xpath_with_ns,
)
class GamekingsIE(InfoExtractor): class GamekingsIE(InfoExtractor):
_VALID_URL = r'http://www\.gamekings\.tv/videos/(?P<name>[0-9a-z\-]+)' _VALID_URL = r'http://www\.gamekings\.tv/(?:videos|nieuws)/(?P<id>[^/]+)'
_TEST = { _TESTS = [{
'url': 'http://www.gamekings.tv/videos/phoenix-wright-ace-attorney-dual-destinies-review/', 'url': 'http://www.gamekings.tv/videos/phoenix-wright-ace-attorney-dual-destinies-review/',
# MD5 is flaky, seems to change regularly # MD5 is flaky, seems to change regularly
# 'md5': '2f32b1f7b80fdc5cb616efb4f387f8a3', # 'md5': '2f32b1f7b80fdc5cb616efb4f387f8a3',
'info_dict': { 'info_dict': {
'id': '20130811', 'id': 'phoenix-wright-ace-attorney-dual-destinies-review',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Phoenix Wright: Ace Attorney \u2013 Dual Destinies Review', 'title': 'Phoenix Wright: Ace Attorney \u2013 Dual Destinies Review',
'description': 'md5:36fd701e57e8c15ac8682a2374c99731', 'description': 'md5:36fd701e57e8c15ac8682a2374c99731',
} 'thumbnail': 're:^https?://.*\.jpg$',
} },
}, {
# vimeo video
'url': 'http://www.gamekings.tv/videos/the-legend-of-zelda-majoras-mask/',
'md5': '12bf04dfd238e70058046937657ea68d',
'info_dict': {
'id': 'the-legend-of-zelda-majoras-mask',
'ext': 'mp4',
'title': 'The Legend of Zelda: Majoras Mask',
'description': 'md5:9917825fe0e9f4057601fe1e38860de3',
'thumbnail': 're:^https?://.*\.jpg$',
},
}, {
'url': 'http://www.gamekings.tv/nieuws/gamekings-extra-shelly-en-david-bereiden-zich-voor-op-de-livestream/',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url) webpage = self._download_webpage(url, video_id)
name = mobj.group('name')
webpage = self._download_webpage(url, name)
video_url = self._og_search_video_url(webpage)
video = re.search(r'[0-9]+', video_url) playlist_id = self._search_regex(
video_id = video.group(0) r'gogoVideo\(\s*\d+\s*,\s*"([^"]+)', webpage, 'playlist id')
# Todo: add medium format playlist = self._download_xml(
video_url = video_url.replace(video_id, 'large/' + video_id) 'http://www.gamekings.tv/wp-content/themes/gk2010/rss_playlist.php?id=%s' % playlist_id,
video_id)
NS_MAP = {
'jwplayer': 'http://rss.jwpcdn.com/'
}
item = playlist.find('./channel/item')
thumbnail = xpath_text(item, xpath_with_ns('./jwplayer:image', NS_MAP), 'thumbnail')
video_url = item.find(xpath_with_ns('./jwplayer:source', NS_MAP)).get('file')
return { return {
'id': video_id, 'id': video_id,
'ext': 'mp4',
'url': video_url, 'url': video_url,
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'thumbnail': thumbnail,
} }

View File

@ -524,6 +524,19 @@ class GenericIE(InfoExtractor):
'upload_date': '20150126', 'upload_date': '20150126',
}, },
'add_ie': ['Viddler'], 'add_ie': ['Viddler'],
},
# jwplayer YouTube
{
'url': 'http://media.nationalarchives.gov.uk/index.php/webinar-using-discovery-national-archives-online-catalogue/',
'info_dict': {
'id': 'Mrj4DVp2zeA',
'ext': 'mp4',
'upload_date': '20150204',
'uploader': 'The National Archives UK',
'description': 'md5:a236581cd2449dd2df4f93412f3f01c6',
'uploader_id': 'NationalArchives08',
'title': 'Webinar: Using Discovery, The National Archives online catalogue',
},
} }
] ]
@ -1034,7 +1047,12 @@ class GenericIE(InfoExtractor):
# Look for embedded sbs.com.au player # Look for embedded sbs.com.au player
mobj = re.search( mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:www\.)sbs\.com\.au/ondemand/video/single/.+?)\1', r'''(?x)
(?:
<meta\s+property="og:video"\s+content=|
<iframe[^>]+?src=
)
(["\'])(?P<url>https?://(?:www\.)?sbs\.com\.au/ondemand/video/.+?)\1''',
webpage) webpage)
if mobj is not None: if mobj is not None:
return self.url_result(mobj.group('url'), 'SBS') return self.url_result(mobj.group('url'), 'SBS')
@ -1065,6 +1083,8 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'), 'Livestream') return self.url_result(mobj.group('url'), 'Livestream')
def check_video(vurl): def check_video(vurl):
if YoutubeIE.suitable(vurl):
return True
vpath = compat_urlparse.urlparse(vurl).path vpath = compat_urlparse.urlparse(vurl).path
vext = determine_ext(vpath) vext = determine_ext(vpath)
return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml') return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml')
@ -1082,7 +1102,8 @@ class GenericIE(InfoExtractor):
JWPlayerOptions| JWPlayerOptions|
jwplayer\s*\(\s*["'][^'"]+["']\s*\)\s*\.setup jwplayer\s*\(\s*["'][^'"]+["']\s*\)\s*\.setup
) )
.*?file\s*:\s*["\'](.*?)["\']''', webpage)) .*?
['"]?file['"]?\s*:\s*["\'](.*?)["\']''', webpage))
if not found: if not found:
# Broaden the search a little bit # Broaden the search a little bit
found = filter_video(re.findall(r'[^A-Za-z0-9]?(?:file|source)=(http[^\'"&]*)', webpage)) found = filter_video(re.findall(r'[^A-Za-z0-9]?(?:file|source)=(http[^\'"&]*)', webpage))

View File

@ -34,8 +34,6 @@ class GoshgayIE(InfoExtractor):
duration = parse_duration(self._html_search_regex( duration = parse_duration(self._html_search_regex(
r'<span class="duration">\s*-?\s*(.*?)</span>', r'<span class="duration">\s*-?\s*(.*?)</span>',
webpage, 'duration', fatal=False)) webpage, 'duration', fatal=False))
family_friendly = self._html_search_meta(
'isFamilyFriendly', webpage, default='false')
flashvars = compat_parse_qs(self._html_search_regex( flashvars = compat_parse_qs(self._html_search_regex(
r'<embed.+?id="flash-player-embed".+?flashvars="([^"]+)"', r'<embed.+?id="flash-player-embed".+?flashvars="([^"]+)"',
@ -49,5 +47,5 @@ class GoshgayIE(InfoExtractor):
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'age_limit': 0 if family_friendly == 'true' else 18, 'age_limit': self._family_friendly_search(webpage),
} }

View File

@ -80,9 +80,6 @@ class IzleseneIE(InfoExtractor):
r'comment_count\s*=\s*\'([^\']+)\';', r'comment_count\s*=\s*\'([^\']+)\';',
webpage, 'comment_count', fatal=False) webpage, 'comment_count', fatal=False)
family_friendly = self._html_search_meta(
'isFamilyFriendly', webpage, 'age limit', fatal=False)
content_url = self._html_search_meta( content_url = self._html_search_meta(
'contentURL', webpage, 'content URL', fatal=False) 'contentURL', webpage, 'content URL', fatal=False)
ext = determine_ext(content_url, 'mp4') ext = determine_ext(content_url, 'mp4')
@ -120,6 +117,6 @@ class IzleseneIE(InfoExtractor):
'duration': duration, 'duration': duration,
'view_count': int_or_none(view_count), 'view_count': int_or_none(view_count),
'comment_count': int_or_none(comment_count), 'comment_count': int_or_none(comment_count),
'age_limit': 18 if family_friendly == 'False' else 0, 'age_limit': self._family_friendly_search(webpage),
'formats': formats, 'formats': formats,
} }

View File

@ -46,16 +46,17 @@ class PornHdIE(InfoExtractor):
quality = qualities(['sd', 'hd']) quality = qualities(['sd', 'hd'])
sources = json.loads(js_to_json(self._search_regex( sources = json.loads(js_to_json(self._search_regex(
r"(?s)'sources'\s*:\s*(\{.+?\})\s*\}\);", webpage, 'sources'))) r"(?s)'sources'\s*:\s*(\{.+?\})\s*\}[;,)]",
webpage, 'sources')))
formats = [] formats = []
for container, s in sources.items(): for qname, video_url in sources.items():
for qname, video_url in s.items(): if not video_url:
formats.append({ continue
'url': video_url, formats.append({
'container': container, 'url': video_url,
'format_id': '%s-%s' % (container, qname), 'format_id': qname,
'quality': quality(qname), 'quality': quality(qname),
}) })
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -91,6 +91,15 @@ class RTLnowIE(InfoExtractor):
}, },
}, },
{ {
'url': 'http://rtl-now.rtl.de/der-bachelor/folge-4.php?film_id=188729&player=1&season=5',
'info_dict': {
'id': '188729',
'ext': 'flv',
'upload_date': '20150204',
'description': 'md5:5e1ce23095e61a79c166d134b683cecc',
'title': 'Der Bachelor - Folge 4',
}
}, {
'url': 'http://www.n-tvnow.de/deluxe-alles-was-spass-macht/thema-ua-luxushotel-fuer-vierbeiner.php?container_id=153819&player=1&season=0', 'url': 'http://www.n-tvnow.de/deluxe-alles-was-spass-macht/thema-ua-luxushotel-fuer-vierbeiner.php?container_id=153819&player=1&season=0',
'only_matching': True, 'only_matching': True,
}, },
@ -134,9 +143,18 @@ class RTLnowIE(InfoExtractor):
'player_url': video_page_url + 'includes/vodplayer.swf', 'player_url': video_page_url + 'includes/vodplayer.swf',
} }
else: else:
fmt = { mobj = re.search(r'.*/(?P<hoster>[^/]+)/videos/(?P<play_path>.+)\.f4m', filename.text)
'url': filename.text, if mobj:
} fmt = {
'url': 'rtmpe://fmspay-fra2.rtl.de/' + mobj.group('hoster'),
'play_path': 'mp4:' + mobj.group('play_path'),
'page_url': url,
'player_url': video_page_url + 'includes/vodplayer.swf',
}
else:
fmt = {
'url': filename.text,
}
fmt.update({ fmt.update({
'width': int_or_none(filename.get('width')), 'width': int_or_none(filename.get('width')),
'height': int_or_none(filename.get('height')), 'height': int_or_none(filename.get('height')),

View File

@ -1,16 +1,16 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import js_to_json
class RTPIE(InfoExtractor): class RTPIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rtp\.pt/play/p(?P<program_id>[0-9]+)/(?P<id>[^/?#]+)/?' _VALID_URL = r'https?://(?:www\.)?rtp\.pt/play/p(?P<program_id>[0-9]+)/(?P<id>[^/?#]+)/?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.rtp.pt/play/p405/e174042/paixoes-cruzadas', 'url': 'http://www.rtp.pt/play/p405/e174042/paixoes-cruzadas',
'md5': 'e736ce0c665e459ddb818546220b4ef8',
'info_dict': { 'info_dict': {
'id': 'e174042', 'id': 'e174042',
'ext': 'mp3', 'ext': 'mp3',
@ -18,9 +18,6 @@ class RTPIE(InfoExtractor):
'description': 'As paixões musicais de António Cartaxo e António Macedo', 'description': 'As paixões musicais de António Cartaxo e António Macedo',
'thumbnail': 're:^https?://.*\.jpg', 'thumbnail': 're:^https?://.*\.jpg',
}, },
'params': {
'skip_download': True, # RTMP download
},
}, { }, {
'url': 'http://www.rtp.pt/play/p831/a-quimica-das-coisas', 'url': 'http://www.rtp.pt/play/p831/a-quimica-das-coisas',
'only_matching': True, 'only_matching': True,
@ -37,21 +34,48 @@ class RTPIE(InfoExtractor):
player_config = self._search_regex( player_config = self._search_regex(
r'(?s)RTPPLAY\.player\.newPlayer\(\s*(\{.*?\})\s*\)', webpage, 'player config') r'(?s)RTPPLAY\.player\.newPlayer\(\s*(\{.*?\})\s*\)', webpage, 'player config')
config = json.loads(js_to_json(player_config)) config = self._parse_json(player_config, video_id)
path, ext = config.get('file').rsplit('.', 1) path, ext = config.get('file').rsplit('.', 1)
formats = [{ formats = [{
'format_id': 'rtmp',
'ext': ext,
'vcodec': config.get('type') == 'audio' and 'none' or None,
'preference': -2,
'url': 'rtmp://{streamer:s}/{application:s}'.format(**config),
'app': config.get('application'), 'app': config.get('application'),
'play_path': '{ext:s}:{path:s}'.format(ext=ext, path=path), 'play_path': '{ext:s}:{path:s}'.format(ext=ext, path=path),
'page_url': url, 'page_url': url,
'url': 'rtmp://{streamer:s}/{application:s}'.format(**config),
'rtmp_live': config.get('live', False), 'rtmp_live': config.get('live', False),
'ext': ext,
'vcodec': config.get('type') == 'audio' and 'none' or None,
'player_url': 'http://programas.rtp.pt/play/player.swf?v3', 'player_url': 'http://programas.rtp.pt/play/player.swf?v3',
'rtmp_real_time': True, 'rtmp_real_time': True,
}] }]
# Construct regular HTTP download URLs
replacements = {
'audio': {
'format_id': 'mp3',
'pattern': r'^nas2\.share/wavrss/',
'repl': 'http://rsspod.rtp.pt/podcasts/',
'vcodec': 'none',
},
'video': {
'format_id': 'mp4_h264',
'pattern': r'^nas2\.share/h264/',
'repl': 'http://rsspod.rtp.pt/videocasts/',
'vcodec': 'h264',
},
}
r = replacements[config['type']]
if re.match(r['pattern'], config['file']) is not None:
formats.append({
'format_id': r['format_id'],
'url': re.sub(r['pattern'], r['repl'], config['file']),
'vcodec': r['vcodec'],
})
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,

View File

@ -1,80 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
HEADRequest,
urlhandle_detect_ext,
)
class SoulAnimeWatchingIE(InfoExtractor):
IE_NAME = "soulanime:watching"
IE_DESC = "SoulAnime video"
_TEST = {
'url': 'http://www.soul-anime.net/watching/seirei-tsukai-no-blade-dance-episode-9/',
'md5': '05fae04abf72298098b528e98abf4298',
'info_dict': {
'id': 'seirei-tsukai-no-blade-dance-episode-9',
'ext': 'mp4',
'title': 'seirei-tsukai-no-blade-dance-episode-9',
'description': 'seirei-tsukai-no-blade-dance-episode-9'
}
}
_VALID_URL = r'http://[w.]*soul-anime\.(?P<domain>[^/]+)/watch[^/]*/(?P<id>[^/]+)'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
domain = mobj.group('domain')
page = self._download_webpage(url, video_id)
video_url_encoded = self._html_search_regex(
r'<div id="download">[^<]*<a href="(?P<url>[^"]+)"', page, 'url')
video_url = "http://www.soul-anime." + domain + video_url_encoded
ext_req = HEADRequest(video_url)
ext_handle = self._request_webpage(
ext_req, video_id, note='Determining extension')
ext = urlhandle_detect_ext(ext_handle)
return {
'id': video_id,
'url': video_url,
'ext': ext,
'title': video_id,
'description': video_id
}
class SoulAnimeSeriesIE(InfoExtractor):
IE_NAME = "soulanime:series"
IE_DESC = "SoulAnime Series"
_VALID_URL = r'http://[w.]*soul-anime\.(?P<domain>[^/]+)/anime./(?P<id>[^/]+)'
_EPISODE_REGEX = r'<option value="(/watch[^/]*/[^"]+)">[^<]*</option>'
_TEST = {
'url': 'http://www.soul-anime.net/anime1/black-rock-shooter-tv/',
'info_dict': {
'id': 'black-rock-shooter-tv'
},
'playlist_count': 8
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
series_id = mobj.group('id')
domain = mobj.group('domain')
pattern = re.compile(self._EPISODE_REGEX)
page = self._download_webpage(url, series_id, "Downloading series page")
mobj = pattern.findall(page)
entries = [self.url_result("http://www.soul-anime." + domain + obj) for obj in mobj]
return self.playlist_result(entries, series_id)

View File

@ -0,0 +1,56 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
)
class SVTPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?svtplay\.se/video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.svtplay.se/video/2609989/sm-veckan/sm-veckan-rally-final-sasong-1-sm-veckan-rally-final',
'md5': 'f4a184968bc9c802a9b41316657aaa80',
'info_dict': {
'id': '2609989',
'ext': 'mp4',
'title': 'SM veckan vinter, Örebro - Rally, final',
'duration': 4500,
'thumbnail': 're:^https?://.*[\.-]jpg$',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._download_json(
'http://www.svtplay.se/video/%s?output=json' % video_id, video_id)
title = info['context']['title']
thumbnail = info['context'].get('thumbnailImage')
video_info = info['video']
formats = []
for vr in video_info['videoReferences']:
vurl = vr['url']
if determine_ext(vurl) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
vurl, video_id,
ext='mp4', entry_protocol='m3u8_native',
m3u8_id=vr.get('playerType')))
else:
formats.append({
'format_id': vr.get('playerType'),
'url': vurl,
})
self._sort_formats(formats)
duration = video_info.get('materialLength')
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'duration': duration,
}

View File

@ -15,7 +15,8 @@ class TeamcocoIE(InfoExtractor):
'id': '80187', 'id': '80187',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Conan Becomes A Mary Kay Beauty Consultant', 'title': 'Conan Becomes A Mary Kay Beauty Consultant',
'description': 'Mary Kay is perhaps the most trusted name in female beauty, so of course Conan is a natural choice to sell their products.' 'description': 'Mary Kay is perhaps the most trusted name in female beauty, so of course Conan is a natural choice to sell their products.',
'age_limit': 0,
} }
}, { }, {
'url': 'http://teamcoco.com/video/louis-ck-interview-george-w-bush', 'url': 'http://teamcoco.com/video/louis-ck-interview-george-w-bush',
@ -24,7 +25,8 @@ class TeamcocoIE(InfoExtractor):
'id': '19705', 'id': '19705',
'ext': 'mp4', 'ext': 'mp4',
"description": "Louis C.K. got starstruck by George W. Bush, so what? Part one.", "description": "Louis C.K. got starstruck by George W. Bush, so what? Part one.",
"title": "Louis C.K. Interview Pt. 1 11/3/11" "title": "Louis C.K. Interview Pt. 1 11/3/11",
'age_limit': 0,
} }
} }
] ]
@ -83,4 +85,5 @@ class TeamcocoIE(InfoExtractor):
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'age_limit': self._family_friendly_search(webpage),
} }

View File

@ -1,40 +1,55 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError
class TriluliluIE(InfoExtractor): class TriluliluIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?trilulilu\.ro/video-[^/]+/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?trilulilu\.ro/(?:video-[^/]+/)?(?P<id>[^/#\?]+)'
_TEST = { _TEST = {
'url': 'http://www.trilulilu.ro/video-animatie/big-buck-bunny-1', 'url': 'http://www.trilulilu.ro/video-animatie/big-buck-bunny-1',
'md5': 'c1450a00da251e2769b74b9005601cac',
'info_dict': { 'info_dict': {
'id': 'big-buck-bunny-1', 'id': 'ae2899e124140b',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Big Buck Bunny', 'title': 'Big Buck Bunny',
'description': ':) pentru copilul din noi', 'description': ':) pentru copilul din noi',
}, },
# Server ignores Range headers (--test)
'params': {
'skip_download': True
}
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, display_id)
if re.search(r'Fişierul nu este disponibil pentru vizionare în ţara dumneavoastră', webpage):
raise ExtractorError(
'This video is not available in your country.', expected=True)
elif re.search('Fişierul poate fi accesat doar de către prietenii lui', webpage):
raise ExtractorError('This video is private.', expected=True)
flashvars_str = self._search_regex(
r'block_flash_vars\s*=\s*(\{[^\}]+\})', webpage, 'flashvars', fatal=False, default=None)
if flashvars_str:
flashvars = self._parse_json(flashvars_str, display_id)
else:
raise ExtractorError(
'This page does not contain videos', expected=True)
if flashvars['isMP3'] == 'true':
raise ExtractorError(
'Audio downloads are currently not supported', expected=True)
video_id = flashvars['hash']
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
description = self._og_search_description(webpage) description = self._og_search_description(webpage, default=None)
log_str = self._search_regex(
r'block_flash_vars[ ]=[ ]({[^}]+})', webpage, 'log info')
log = json.loads(log_str)
format_url = ('http://fs%(server)s.trilulilu.ro/%(hash)s/' format_url = ('http://fs%(server)s.trilulilu.ro/%(hash)s/'
'video-formats2' % log) 'video-formats2' % flashvars)
format_doc = self._download_xml( format_doc = self._download_xml(
format_url, video_id, format_url, video_id,
note='Downloading formats', note='Downloading formats',
@ -44,10 +59,10 @@ class TriluliluIE(InfoExtractor):
'http://fs%(server)s.trilulilu.ro/stream.php?type=video' 'http://fs%(server)s.trilulilu.ro/stream.php?type=video'
'&source=site&hash=%(hash)s&username=%(userid)s&' '&source=site&hash=%(hash)s&username=%(userid)s&'
'key=ministhebest&format=%%s&sig=&exp=' % 'key=ministhebest&format=%%s&sig=&exp=' %
log) flashvars)
formats = [ formats = [
{ {
'format': fnode.text, 'format_id': fnode.text.partition('-')[2],
'url': video_url_template % fnode.text, 'url': video_url_template % fnode.text,
'ext': fnode.text.partition('-')[0] 'ext': fnode.text.partition('-')[0]
} }
@ -56,8 +71,8 @@ class TriluliluIE(InfoExtractor):
] ]
return { return {
'_type': 'video',
'id': video_id, 'id': video_id,
'display_id': display_id,
'formats': formats, 'formats': formats,
'title': title, 'title': title,
'description': description, 'description': description,

View File

@ -1,6 +1,8 @@
# encoding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
float_or_none, float_or_none,
@ -11,7 +13,7 @@ from ..utils import (
class TvigleIE(InfoExtractor): class TvigleIE(InfoExtractor):
IE_NAME = 'tvigle' IE_NAME = 'tvigle'
IE_DESC = 'Интернет-телевидение Tvigle.ru' IE_DESC = 'Интернет-телевидение Tvigle.ru'
_VALID_URL = r'http://(?:www\.)?tvigle\.ru/(?:[^/]+/)+(?P<id>[^/]+)/$' _VALID_URL = r'https?://(?:www\.)?(?:tvigle\.ru/(?:[^/]+/)+(?P<display_id>[^/]+)/$|cloud\.tvigle\.ru/video/(?P<id>\d+))'
_TESTS = [ _TESTS = [
{ {
@ -38,16 +40,22 @@ class TvigleIE(InfoExtractor):
'duration': 186.080, 'duration': 186.080,
'age_limit': 0, 'age_limit': 0,
}, },
}, }, {
'url': 'https://cloud.tvigle.ru/video/5267604/',
'only_matching': True,
}
] ]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id) if not video_id:
webpage = self._download_webpage(url, display_id)
video_id = self._html_search_regex( video_id = self._html_search_regex(
r'<li class="video-preview current_playing" id="(\d+)">', webpage, 'video id') r'<li class="video-preview current_playing" id="(\d+)">',
webpage, 'video id')
video_data = self._download_json( video_data = self._download_json(
'http://cloud.tvigle.ru/api/play/video/%s/' % video_id, display_id) 'http://cloud.tvigle.ru/api/play/video/%s/' % video_id, display_id)

View File

@ -188,9 +188,9 @@ class VimeoIE(VimeoBaseInfoExtractor, SubtitlesInfoExtractor):
password_request = compat_urllib_request.Request(pass_url + '/password', data) password_request = compat_urllib_request.Request(pass_url + '/password', data)
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded') password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
password_request.add_header('Cookie', 'xsrft=%s' % token) password_request.add_header('Cookie', 'xsrft=%s' % token)
self._download_webpage(password_request, video_id, return self._download_webpage(
'Verifying the password', password_request, video_id,
'Wrong password') 'Verifying the password', 'Wrong password')
def _verify_player_video_password(self, url, video_id): def _verify_player_video_password(self, url, video_id):
password = self._downloader.params.get('videopassword', None) password = self._downloader.params.get('videopassword', None)
@ -266,7 +266,7 @@ class VimeoIE(VimeoBaseInfoExtractor, SubtitlesInfoExtractor):
if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage): if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage):
raise ExtractorError('The author has restricted the access to this video, try with the "--referer" option') raise ExtractorError('The author has restricted the access to this video, try with the "--referer" option')
if re.search('<form[^>]+?id="pw_form"', webpage) is not None: if re.search(r'<form[^>]+?id="pw_form"', webpage) is not None:
self._verify_video_password(url, video_id, webpage) self._verify_video_password(url, video_id, webpage)
return self._real_extract(url) return self._real_extract(url)
else: else:
@ -412,12 +412,47 @@ class VimeoChannelIE(InfoExtractor):
def _extract_list_title(self, webpage): def _extract_list_title(self, webpage):
return self._html_search_regex(self._TITLE_RE, webpage, 'list title') return self._html_search_regex(self._TITLE_RE, webpage, 'list title')
def _login_list_password(self, page_url, list_id, webpage):
login_form = self._search_regex(
r'(?s)<form[^>]+?id="pw_form"(.*?)</form>',
webpage, 'login form', default=None)
if not login_form:
return webpage
password = self._downloader.params.get('videopassword', None)
if password is None:
raise ExtractorError('This album is protected by a password, use the --video-password option', expected=True)
fields = dict(re.findall(r'''(?x)<input\s+
type="hidden"\s+
name="([^"]+)"\s+
value="([^"]*)"
''', login_form))
token = self._search_regex(r'xsrft: \'(.*?)\'', webpage, 'login token')
fields['token'] = token
fields['password'] = password
post = compat_urllib_parse.urlencode(fields)
password_path = self._search_regex(
r'action="([^"]+)"', login_form, 'password URL')
password_url = compat_urlparse.urljoin(page_url, password_path)
password_request = compat_urllib_request.Request(password_url, post)
password_request.add_header('Content-type', 'application/x-www-form-urlencoded')
self._set_cookie('vimeo.com', 'xsrft', token)
return self._download_webpage(
password_request, list_id,
'Verifying the password', 'Wrong password')
def _extract_videos(self, list_id, base_url): def _extract_videos(self, list_id, base_url):
video_ids = [] video_ids = []
for pagenum in itertools.count(1): for pagenum in itertools.count(1):
page_url = self._page_url(base_url, pagenum)
webpage = self._download_webpage( webpage = self._download_webpage(
self._page_url(base_url, pagenum), list_id, page_url, list_id,
'Downloading page %s' % pagenum) 'Downloading page %s' % pagenum)
if pagenum == 1:
webpage = self._login_list_password(page_url, list_id, webpage)
video_ids.extend(re.findall(r'id="clip_(\d+?)"', webpage)) video_ids.extend(re.findall(r'id="clip_(\d+?)"', webpage))
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None: if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
break break
@ -464,14 +499,24 @@ class VimeoAlbumIE(VimeoChannelIE):
'title': 'Staff Favorites: November 2013', 'title': 'Staff Favorites: November 2013',
}, },
'playlist_mincount': 13, 'playlist_mincount': 13,
}, {
'note': 'Password-protected album',
'url': 'https://vimeo.com/album/3253534',
'info_dict': {
'title': 'test',
'id': '3253534',
},
'playlist_count': 1,
'params': {
'videopassword': 'youtube-dl',
}
}] }]
def _page_url(self, base_url, pagenum): def _page_url(self, base_url, pagenum):
return '%s/page:%d/' % (base_url, pagenum) return '%s/page:%d/' % (base_url, pagenum)
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) album_id = self._match_id(url)
album_id = mobj.group('id')
return self._extract_videos(album_id, 'http://vimeo.com/album/%s' % album_id) return self._extract_videos(album_id, 'http://vimeo.com/album/%s' % album_id)

View File

@ -780,8 +780,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
fo for fo in formats fo for fo in formats
if fo['format_id'] == format_id) if fo['format_id'] == format_id)
except StopIteration: except StopIteration:
f.update(self._formats.get(format_id, {}).items()) full_info = self._formats.get(format_id, {}).copy()
formats.append(f) full_info.update(f)
formats.append(full_info)
else: else:
existing_format.update(f) existing_format.update(f)
return formats return formats

View File

@ -165,6 +165,11 @@ def parseOpts(overrideArguments=None):
action='store_const', dest='extract_flat', const='in_playlist', action='store_const', dest='extract_flat', const='in_playlist',
default=False, default=False,
help='Do not extract the videos of a playlist, only list them.') help='Do not extract the videos of a playlist, only list them.')
general.add_option(
'--no-color', '--no-colors',
action='store_true', dest='no_color',
default=False,
help='Do not emit color codes in output.')
network = optparse.OptionGroup(parser, 'Network Options') network = optparse.OptionGroup(parser, 'Network Options')
network.add_option( network.add_option(
@ -244,6 +249,25 @@ def parseOpts(overrideArguments=None):
'--max-views', '--max-views',
metavar='COUNT', dest='max_views', default=None, type=int, metavar='COUNT', dest='max_views', default=None, type=int,
help='Do not download any videos with more than COUNT views') help='Do not download any videos with more than COUNT views')
selection.add_option(
'--match-filter',
metavar='FILTER', dest='match_filter', default=None,
help=(
'(Experimental) Generic video filter. '
'Specify any key (see help for -o for a list of available keys) to'
' match if the key is present, '
'!key to check if the key is not present,'
'key > NUMBER (like "comment_count > 12", also works with '
'>=, <, <=, !=, =) to compare against a number, and '
'& to require multiple matches. '
'Values which are not known are excluded unless you'
' put a question mark (?) after the operator.'
'For example, to only match videos that have been liked more than '
'100 times and disliked less than 50 times (or the dislike '
'functionality is not available at the given service), but who '
'also have a description, use --match-filter '
'"like_count > 100 & dislike_count <? 50 & description" .'
))
selection.add_option( selection.add_option(
'--no-playlist', '--no-playlist',
action='store_true', dest='noplaylist', default=False, action='store_true', dest='noplaylist', default=False,
@ -297,8 +321,10 @@ def parseOpts(overrideArguments=None):
' You can filter the video results by putting a condition in' ' You can filter the video results by putting a condition in'
' brackets, as in -f "best[height=720]"' ' brackets, as in -f "best[height=720]"'
' (or -f "[filesize>10M]"). ' ' (or -f "[filesize>10M]"). '
' This works for filesize, height, width, tbr, abr, vbr, and fps' ' This works for filesize, height, width, tbr, abr, vbr, asr, and fps'
' and the comparisons <, <=, >, >=, =, != .' ' and the comparisons <, <=, >, >=, =, !='
' and for ext, acodec, vcodec, container, and protocol'
' and the comparisons =, != .'
' Formats for which the value is not known are excluded unless you' ' Formats for which the value is not known are excluded unless you'
' put a question mark (?) after the operator.' ' put a question mark (?) after the operator.'
' You can combine format filters, so ' ' You can combine format filters, so '
@ -531,7 +557,7 @@ def parseOpts(overrideArguments=None):
action='store_true', dest='youtube_print_sig_code', default=False, action='store_true', dest='youtube_print_sig_code', default=False,
help=optparse.SUPPRESS_HELP) help=optparse.SUPPRESS_HELP)
verbosity.add_option( verbosity.add_option(
'--print-traffic', '--print-traffic', '--dump-headers',
dest='debug_printtraffic', action='store_true', default=False, dest='debug_printtraffic', action='store_true', default=False,
help='Display sent and read HTTP traffic') help='Display sent and read HTTP traffic')
verbosity.add_option( verbosity.add_option(
@ -732,22 +758,22 @@ def parseOpts(overrideArguments=None):
if opts.verbose: if opts.verbose:
write_string('[debug] Override config: ' + repr(overrideArguments) + '\n') write_string('[debug] Override config: ' + repr(overrideArguments) + '\n')
else: else:
commandLineConf = sys.argv[1:] command_line_conf = sys.argv[1:]
if '--ignore-config' in commandLineConf: if '--ignore-config' in command_line_conf:
systemConf = [] system_conf = []
userConf = [] user_conf = []
else: else:
systemConf = _readOptions('/etc/youtube-dl.conf') system_conf = _readOptions('/etc/youtube-dl.conf')
if '--ignore-config' in systemConf: if '--ignore-config' in system_conf:
userConf = [] user_conf = []
else: else:
userConf = _readUserConf() user_conf = _readUserConf()
argv = systemConf + userConf + commandLineConf argv = system_conf + user_conf + command_line_conf
opts, args = parser.parse_args(argv) opts, args = parser.parse_args(argv)
if opts.verbose: if opts.verbose:
write_string('[debug] System config: ' + repr(_hide_login_info(systemConf)) + '\n') write_string('[debug] System config: ' + repr(_hide_login_info(system_conf)) + '\n')
write_string('[debug] User config: ' + repr(_hide_login_info(userConf)) + '\n') write_string('[debug] User config: ' + repr(_hide_login_info(user_conf)) + '\n')
write_string('[debug] Command-line args: ' + repr(_hide_login_info(commandLineConf)) + '\n') write_string('[debug] Command-line args: ' + repr(_hide_login_info(command_line_conf)) + '\n')
return parser, opts, args return parser, opts, args

View File

@ -166,14 +166,13 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
if filecodec is None: if filecodec is None:
raise PostProcessingError('WARNING: unable to obtain file audio codec with ffprobe') raise PostProcessingError('WARNING: unable to obtain file audio codec with ffprobe')
uses_avconv = self._uses_avconv()
more_opts = [] more_opts = []
if self._preferredcodec == 'best' or self._preferredcodec == filecodec or (self._preferredcodec == 'm4a' and filecodec == 'aac'): if self._preferredcodec == 'best' or self._preferredcodec == filecodec or (self._preferredcodec == 'm4a' and filecodec == 'aac'):
if filecodec == 'aac' and self._preferredcodec in ['m4a', 'best']: if filecodec == 'aac' and self._preferredcodec in ['m4a', 'best']:
# Lossless, but in another container # Lossless, but in another container
acodec = 'copy' acodec = 'copy'
extension = 'm4a' extension = 'm4a'
more_opts = ['-bsf:a' if uses_avconv else '-absf', 'aac_adtstoasc'] more_opts = ['-bsf:a', 'aac_adtstoasc']
elif filecodec in ['aac', 'mp3', 'vorbis', 'opus']: elif filecodec in ['aac', 'mp3', 'vorbis', 'opus']:
# Lossless if possible # Lossless if possible
acodec = 'copy' acodec = 'copy'
@ -189,9 +188,9 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
more_opts = [] more_opts = []
if self._preferredquality is not None: if self._preferredquality is not None:
if int(self._preferredquality) < 10: if int(self._preferredquality) < 10:
more_opts += ['-q:a' if uses_avconv else '-aq', self._preferredquality] more_opts += ['-q:a', self._preferredquality]
else: else:
more_opts += ['-b:a' if uses_avconv else '-ab', self._preferredquality + 'k'] more_opts += ['-b:a', self._preferredquality + 'k']
else: else:
# We convert the audio (lossy) # We convert the audio (lossy)
acodec = {'mp3': 'libmp3lame', 'aac': 'aac', 'm4a': 'aac', 'opus': 'opus', 'vorbis': 'libvorbis', 'wav': None}[self._preferredcodec] acodec = {'mp3': 'libmp3lame', 'aac': 'aac', 'm4a': 'aac', 'opus': 'opus', 'vorbis': 'libvorbis', 'wav': None}[self._preferredcodec]
@ -200,13 +199,13 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
if self._preferredquality is not None: if self._preferredquality is not None:
# The opus codec doesn't support the -aq option # The opus codec doesn't support the -aq option
if int(self._preferredquality) < 10 and extension != 'opus': if int(self._preferredquality) < 10 and extension != 'opus':
more_opts += ['-q:a' if uses_avconv else '-aq', self._preferredquality] more_opts += ['-q:a', self._preferredquality]
else: else:
more_opts += ['-b:a' if uses_avconv else '-ab', self._preferredquality + 'k'] more_opts += ['-b:a', self._preferredquality + 'k']
if self._preferredcodec == 'aac': if self._preferredcodec == 'aac':
more_opts += ['-f', 'adts'] more_opts += ['-f', 'adts']
if self._preferredcodec == 'm4a': if self._preferredcodec == 'm4a':
more_opts += ['-bsf:a' if uses_avconv else '-absf', 'aac_adtstoasc'] more_opts += ['-bsf:a', 'aac_adtstoasc']
if self._preferredcodec == 'vorbis': if self._preferredcodec == 'vorbis':
extension = 'ogg' extension = 'ogg'
if self._preferredcodec == 'wav': if self._preferredcodec == 'wav':

View File

@ -17,6 +17,7 @@ import io
import json import json
import locale import locale
import math import math
import operator
import os import os
import pipes import pipes
import platform import platform
@ -1678,3 +1679,79 @@ def render_table(header_row, data):
max_lens = [max(len(compat_str(v)) for v in col) for col in zip(*table)] max_lens = [max(len(compat_str(v)) for v in col) for col in zip(*table)]
format_str = ' '.join('%-' + compat_str(ml + 1) + 's' for ml in max_lens[:-1]) + '%s' format_str = ' '.join('%-' + compat_str(ml + 1) + 's' for ml in max_lens[:-1]) + '%s'
return '\n'.join(format_str % tuple(row) for row in table) return '\n'.join(format_str % tuple(row) for row in table)
def _match_one(filter_part, dct):
COMPARISON_OPERATORS = {
'<': operator.lt,
'<=': operator.le,
'>': operator.gt,
'>=': operator.ge,
'=': operator.eq,
'!=': operator.ne,
}
operator_rex = re.compile(r'''(?x)\s*
(?P<key>[a-z_]+)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?:
(?P<intval>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)|
(?P<strval>(?![0-9.])[a-z0-9A-Z]*)
)
\s*$
''' % '|'.join(map(re.escape, COMPARISON_OPERATORS.keys())))
m = operator_rex.search(filter_part)
if m:
op = COMPARISON_OPERATORS[m.group('op')]
if m.group('strval') is not None:
if m.group('op') not in ('=', '!='):
raise ValueError(
'Operator %s does not support string values!' % m.group('op'))
comparison_value = m.group('strval')
else:
try:
comparison_value = int(m.group('intval'))
except ValueError:
comparison_value = parse_filesize(m.group('intval'))
if comparison_value is None:
comparison_value = parse_filesize(m.group('intval') + 'B')
if comparison_value is None:
raise ValueError(
'Invalid integer value %r in filter part %r' % (
m.group('intval'), filter_part))
actual_value = dct.get(m.group('key'))
if actual_value is None:
return m.group('none_inclusive')
return op(actual_value, comparison_value)
UNARY_OPERATORS = {
'': lambda v: v is not None,
'!': lambda v: v is None,
}
operator_rex = re.compile(r'''(?x)\s*
(?P<op>%s)\s*(?P<key>[a-z_]+)
\s*$
''' % '|'.join(map(re.escape, UNARY_OPERATORS.keys())))
m = operator_rex.search(filter_part)
if m:
op = UNARY_OPERATORS[m.group('op')]
actual_value = dct.get(m.group('key'))
return op(actual_value)
raise ValueError('Invalid filter part %r' % filter_part)
def match_str(filter_str, dct):
""" Filter a dictionary with a simple string syntax. Returns True (=passes filter) or false """
return all(
_match_one(filter_part, dct) for filter_part in filter_str.split('&'))
def match_filter_func(filter_str):
def _match_func(info_dict):
if match_str(filter_str, info_dict):
return None
else:
video_title = info_dict.get('title', info_dict.get('id', 'video'))
return '%s does not pass filter %s, skipping ..' % (video_title, filter_str)
return _match_func

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2015.02.06' __version__ = '2015.02.10.4'