Compare commits

...

63 Commits

Author SHA1 Message Date
Philipp Hagemeister
a21420389e release 2015.02.19.3 2015-02-19 19:28:17 +01:00
Jaime Marquínez Ferrándiz
6140baf4e1 [nationalgeographic] Add extractor (closes #4960) 2015-02-19 18:17:31 +01:00
Sergey M․
8fc642eb5b [pornhub] Fix uploader regex 2015-02-19 22:15:49 +06:00
Sergey M․
e66e1a0046 [pornhub] Add support for playlists (Closes #4995) 2015-02-19 22:15:19 +06:00
Sergey M․
d5c69f1da4 [5min] Cover joystiq.com URLs (Closes #4962) 2015-02-19 21:47:11 +06:00
Jaime Marquínez Ferrándiz
5c8a3f862a [nbc] Use a test video that works outside the US 2015-02-19 15:00:39 +01:00
Jaime Marquínez Ferrándiz
a3b9157f49 [cbssports] Add extractor (closes #4996) 2015-02-19 13:06:53 +01:00
Philipp Hagemeister
b88ba05356 [imgur] Simplify 2015-02-19 05:53:09 +01:00
Philipp Hagemeister
b74d505577 Merge remote-tracking branch 'jbboehr/imgur-gifv-improvements' 2015-02-19 05:16:11 +01:00
John Boehr
9e2d7dca87 [imgur] improve error check for non-video URLs 2015-02-18 19:47:54 -08:00
John Boehr
d236b37ac9 [imgur] improve regex #4998 2015-02-18 19:28:19 -08:00
Philipp Hagemeister
e880c66bd8 [theonion] Modernize 2015-02-19 04:12:40 +01:00
Philipp Hagemeister
383456aa29 [Makefile] Also delete *.avi files in clean 2015-02-19 04:09:52 +01:00
John Boehr
1a13940c8d [imgur] support regular URL 2015-02-18 18:12:48 -08:00
Philipp Hagemeister
3d54788495 [webofstories] Fix extraction 2015-02-19 02:12:08 +01:00
Philipp Hagemeister
71d53ace2f [sockshare] Do not require thumbnail anymore
Thumbnail is not present on the website anymore.
2015-02-19 02:04:30 +01:00
Philipp Hagemeister
f37e3f99f0 [generic] Correct test case
Video has been reuploaded / edited
2015-02-19 02:00:52 +01:00
Philipp Hagemeister
bd03ffc16e [netzkino] Skip download in test case
Works fine from Germany, but fails from everywhere else
2015-02-19 01:58:54 +01:00
Philipp Hagemeister
1ac1af9b47 release 2015.02.19.2 2015-02-19 01:43:28 +01:00
Philipp Hagemeister
3bf5705316 [imgur] Add new extractor 2015-02-19 01:43:20 +01:00
Philipp Hagemeister
1c2528c8a3 [cbs] Modernize 2015-02-19 01:22:50 +01:00
Philipp Hagemeister
7bd15b1a03 release 2015.02.19.1 2015-02-19 01:04:24 +01:00
Philipp Hagemeister
6b961a85fd [patreon] Add support for embedlies (fixes #4969) 2015-02-19 01:04:19 +01:00
Philipp Hagemeister
7707004043 [patreon] Modernize 2015-02-19 00:38:05 +01:00
Philipp Hagemeister
a025d3c5a5 release 2015.02.19 2015-02-19 00:31:23 +01:00
Philipp Hagemeister
c460bdd56b [sandia] Add new extractor (#4974) 2015-02-19 00:31:01 +01:00
Philipp Hagemeister
b81a359eb6 [YoutubeDL] Use render_table for format listing 2015-02-19 00:28:58 +01:00
Philipp Hagemeister
d61aefb24c Merge remote-tracking branch 'origin/master' 2015-02-19 00:01:14 +01:00
Philipp Hagemeister
d305dd73a3 [utils] Fix js_to_json
Previously, the runtime could be atrocious for longer inputs.
2015-02-18 23:59:51 +01:00
Jaime Marquínez Ferrándiz
93a16ba238 [vimeo] Raise the ExtractorError with expected=True when no video password is given 2015-02-18 22:00:12 +01:00
Philipp Hagemeister
85d5866177 [yahoo] Remove md5sum from test case
The md5 sum has changed repeatedly, and we check whether it looks like a video anyways nowadays.
2015-02-18 20:03:04 +01:00
Philipp Hagemeister
9789d7535d [xtube] Fix test case 2015-02-18 19:58:41 +01:00
Philipp Hagemeister
d8443cd3f7 [wsj] Correct test case 2015-02-18 19:56:24 +01:00
Philipp Hagemeister
d47c26e168 [brightcove] Correct keys in playlists 2015-02-18 19:56:10 +01:00
Philipp Hagemeister
81975f4693 release 2015.02.18.1 2015-02-18 10:54:56 +01:00
Philipp Hagemeister
b8b928d5cb [README] Add an FAQ entry for the player change in anticipation of many more bug reports 2015-02-18 10:54:45 +01:00
Philipp Hagemeister
3eff81fbf7 [jsinterp] Disable comment support
We need a proper lexer to be able to understand YouTube's code, which contains /* inside of strings.
For now it's sufficient to just disable comment support altogether.

Fixes #4976, fixes #4979, fixes #4980, fixes #4981, fixes #4982.
Closes #4977.
2015-02-18 10:47:42 +01:00
Philipp Hagemeister
785521bf4f [youtube] Remove useless if 2015-02-18 10:42:23 +01:00
Philipp Hagemeister
6d1a55a521 [youtube] Show entire player URL when -v is given 2015-02-18 10:39:14 +01:00
Philipp Hagemeister
9cad27008b release 2015.02.18 2015-02-18 00:49:34 +01:00
Philipp Hagemeister
11e611a7fa Extend various playlist tests 2015-02-18 00:49:10 +01:00
Philipp Hagemeister
72c1f8de06 [bandcamp:album] Fix extractor results and associated test 2015-02-18 00:48:52 +01:00
Philipp Hagemeister
6e99868e4c [buzzfeed] Fix playlist test case 2015-02-18 00:41:45 +01:00
Philipp Hagemeister
4d278fde64 [ign] Amend playlist test 2015-02-18 00:38:55 +01:00
Philipp Hagemeister
f21e915fb9 [test/helper] Render info_dict with a final comma 2015-02-18 00:38:42 +01:00
Philipp Hagemeister
6f53c63df6 [test/helper] Only output a newline for forgotten keys if keys are really missing 2015-02-18 00:37:54 +01:00
Philipp Hagemeister
1def5f359e [livestream] Correct playlist ID and add a test for it 2015-02-18 00:34:45 +01:00
Philipp Hagemeister
15ec669374 [vk] Amend playlist test 2015-02-18 00:33:41 +01:00
Philipp Hagemeister
a3fa5da496 [vimeo] Amend playlist tests 2015-02-18 00:33:31 +01:00
Philipp Hagemeister
30965ac66a [vimeo] Prevent infinite loops if video password verification fails
We're seeing this in the tests¹ right now, which do not terminate.

¹  https://travis-ci.org/jaimeMF/youtube-dl/jobs/51135858
2015-02-18 00:27:58 +01:00
Philipp Hagemeister
09ab40b7d1 Merge branch 'progress-as-hook2' 2015-02-17 23:41:48 +01:00
Philipp Hagemeister
fa15607773 PEP8 fixes 2015-02-17 21:46:20 +01:00
Philipp Hagemeister
a91a2c1a83 [downloader] Remove various unneeded assignments and imports 2015-02-17 21:44:41 +01:00
Philipp Hagemeister
16e7711e22 [downloader/http] Remove gruesome import 2015-02-17 21:42:31 +01:00
Philipp Hagemeister
5cda4eda72 [YoutubeDL] Use a progress hook for progress reporting
Instead of every downloader calling two helper functions, let our progress report be an ordinary progress hook like everyone else's.
Closes #4875.
2015-02-17 21:40:35 +01:00
Philipp Hagemeister
98f000409f [radio.de] Fix extraction 2015-02-17 21:40:09 +01:00
Sergey M․
4a8d4a53b1 [videolecturesnet] Fix rtmp stream glitches (Closes #4968) 2015-02-18 01:16:49 +06:00
Jaime Marquínez Ferrándiz
4cd95bcbc3 [twitch:stream] Prefer the 'source' format (fixes #4972) 2015-02-17 18:57:01 +01:00
Philipp Hagemeister
be24c8697f release 2015.02.17.2 2015-02-17 17:38:31 +01:00
Sergey M?
0d93378887 [videolecturesnet] Check http format URLs (Closes #4968) 2015-02-17 22:35:27 +06:00
Sergey M?
4069766c52 [extractor/common] Test URLs with GET 2015-02-17 22:35:27 +06:00
Philipp Hagemeister
7010577720 release 2015.02.17.1 2015-02-17 17:35:08 +01:00
Philipp Hagemeister
8ac27a68e6 [hls] Switch to available as a property 2015-02-17 17:35:03 +01:00
51 changed files with 707 additions and 213 deletions

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean: clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 CONTRIBUTING.md.tmp youtube-dl youtube-dl.exe rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp youtube-dl youtube-dl.exe
PREFIX ?= /usr/local PREFIX ?= /usr/local
BINDIR ?= $(PREFIX)/bin BINDIR ?= $(PREFIX)/bin

View File

@@ -515,11 +515,15 @@ If you want to play the video on a machine that is not running youtube-dl, you c
### ERROR: no fmt_url_map or conn information found in video info ### ERROR: no fmt_url_map or conn information found in video info
youtube has switched to a new video info format in July 2011 which is not supported by old versions of youtube-dl. You can update youtube-dl with `sudo youtube-dl --update`. YouTube has switched to a new video info format in July 2011 which is not supported by old versions of youtube-dl. See [above](#how-do-i-update-youtube-dl) for how to update youtube-dl.
### ERROR: unable to download video ### ### ERROR: unable to download video ###
youtube requires an additional signature since September 2012 which is not supported by old versions of youtube-dl. You can update youtube-dl with `sudo youtube-dl --update`. YouTube requires an additional signature since September 2012 which is not supported by old versions of youtube-dl. See [above](#how-do-i-update-youtube-dl) for how to update youtube-dl.
### ExtractorError: Could not find JS function u'OF'
In February 2015, the new YouTube player contained a character sequence in a string that was misinterpreted by old versions of youtube-dl. See [above](#how-do-i-update-youtube-dl) for how to update youtube-dl.
### SyntaxError: Non-ASCII character ### ### SyntaxError: Non-ASCII character ###

View File

@@ -68,6 +68,7 @@
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv - **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **CBS** - **CBS**
- **CBSNews**: CBS News - **CBSNews**: CBS News
- **CBSSports**
- **CeskaTelevize** - **CeskaTelevize**
- **channel9**: Channel 9 - **channel9**: Channel 9
- **Chilloutzone** - **Chilloutzone**
@@ -121,6 +122,7 @@
- **EllenTV** - **EllenTV**
- **EllenTV:clips** - **EllenTV:clips**
- **ElPais**: El País - **ElPais**: El País
- **Embedly**
- **EMPFlix** - **EMPFlix**
- **Engadget** - **Engadget**
- **Eporner** - **Eporner**
@@ -190,6 +192,7 @@
- **ign.com** - **ign.com**
- **imdb**: Internet Movie Database trailers - **imdb**: Internet Movie Database trailers
- **imdb:list**: Internet Movie Database lists - **imdb:list**: Internet Movie Database lists
- **Imgur**
- **Ina** - **Ina**
- **InfoQ** - **InfoQ**
- **Instagram** - **Instagram**
@@ -262,6 +265,7 @@
- **myvideo** - **myvideo**
- **MyVidster** - **MyVidster**
- **n-tv.de** - **n-tv.de**
- **NationalGeographic**
- **Naver** - **Naver**
- **NBA** - **NBA**
- **NBC** - **NBC**
@@ -319,6 +323,7 @@
- **podomatic** - **podomatic**
- **PornHd** - **PornHd**
- **PornHub** - **PornHub**
- **PornHubPlaylist**
- **Pornotube** - **Pornotube**
- **PornoXO** - **PornoXO**
- **PromptFile** - **PromptFile**
@@ -352,6 +357,7 @@
- **rutube:movie**: Rutube movies - **rutube:movie**: Rutube movies
- **rutube:person**: Rutube person videos - **rutube:person**: Rutube person videos
- **RUTV**: RUTV.RU - **RUTV**: RUTV.RU
- **Sandia**: Sandia National Laboratories
- **Sapo**: SAPO Vídeos - **Sapo**: SAPO Vídeos
- **savefrom.net** - **savefrom.net**
- **SBS**: sbs.com.au - **SBS**: sbs.com.au

View File

@@ -113,6 +113,16 @@ def expect_info_dict(self, got_dict, expected_dict):
self.assertTrue( self.assertTrue(
got.startswith(start_str), got.startswith(start_str),
'field %s (value: %r) should start with %r' % (info_field, got, start_str)) 'field %s (value: %r) should start with %r' % (info_field, got, start_str))
elif isinstance(expected, compat_str) and expected.startswith('contains:'):
got = got_dict.get(info_field)
contains_str = expected[len('contains:'):]
self.assertTrue(
isinstance(got, compat_str),
'Expected a %s object, but got %s for field %s' % (
compat_str.__name__, type(got).__name__, info_field))
self.assertTrue(
contains_str in got,
'field %s (value: %r) should contain %r' % (info_field, got, contains_str))
elif isinstance(expected, type): elif isinstance(expected, type):
got = got_dict.get(info_field) got = got_dict.get(info_field)
self.assertTrue(isinstance(got, expected), self.assertTrue(isinstance(got, expected),
@@ -163,12 +173,14 @@ def expect_info_dict(self, got_dict, expected_dict):
info_dict_str += ''.join( info_dict_str += ''.join(
' %s: %s,\n' % (_repr(k), _repr(v)) ' %s: %s,\n' % (_repr(k), _repr(v))
for k, v in test_info_dict.items() if k not in missing_keys) for k, v in test_info_dict.items() if k not in missing_keys)
info_dict_str += '\n'
if info_dict_str:
info_dict_str += '\n'
info_dict_str += ''.join( info_dict_str += ''.join(
' %s: %s,\n' % (_repr(k), _repr(test_info_dict[k])) ' %s: %s,\n' % (_repr(k), _repr(test_info_dict[k]))
for k in missing_keys) for k in missing_keys)
write_string( write_string(
'\n\'info_dict\': {\n' + info_dict_str + '}\n', out=sys.stderr) '\n\'info_dict\': {\n' + info_dict_str + '},\n', out=sys.stderr)
self.assertFalse( self.assertFalse(
missing_keys, missing_keys,
'Missing keys in test definition: %s' % ( 'Missing keys in test definition: %s' % (

View File

@@ -70,6 +70,8 @@ class TestJSInterpreter(unittest.TestCase):
self.assertEqual(jsi.call_function('f'), -11) self.assertEqual(jsi.call_function('f'), -11)
def test_comments(self): def test_comments(self):
'Skipping: Not yet fully implemented'
return
jsi = JSInterpreter(''' jsi = JSInterpreter('''
function x() { function x() {
var x = /* 1 + */ 2; var x = /* 1 + */ 2;
@@ -80,6 +82,15 @@ class TestJSInterpreter(unittest.TestCase):
''') ''')
self.assertEqual(jsi.call_function('x'), 52) self.assertEqual(jsi.call_function('x'), 52)
jsi = JSInterpreter('''
function f() {
var x = "/*";
var y = 1 /* comment */ + 2;
return y;
}
''')
self.assertEqual(jsi.call_function('f'), 3)
def test_precedence(self): def test_precedence(self):
jsi = JSInterpreter(''' jsi = JSInterpreter('''
function x() { function x() {

View File

@@ -370,6 +370,10 @@ class TestUtil(unittest.TestCase):
"playlist":[{"controls":{"all":null}}] "playlist":[{"controls":{"all":null}}]
}''') }''')
inp = '"SAND Number: SAND 2013-7800P\\nPresenter: Tom Russo\\nHabanero Software Training - Xyce Software\\nXyce, Sandia\\u0027s"'
json_code = js_to_json(inp)
self.assertEqual(json.loads(json_code), json.loads(inp))
def test_js_to_json_edgecases(self): def test_js_to_json_edgecases(self):
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}") on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"}) self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})

View File

@@ -64,6 +64,12 @@ _TESTS = [
'js', 'js',
'4646B5181C6C3020DF1D9C7FCFEA.AD80ABF70C39BD369CCCAE780AFBB98FA6B6CB42766249D9488C288', '4646B5181C6C3020DF1D9C7FCFEA.AD80ABF70C39BD369CCCAE780AFBB98FA6B6CB42766249D9488C288',
'82C8849D94266724DC6B6AF89BBFA087EACCD963.B93C07FBA084ACAEFCF7C9D1FD0203C6C1815B6B' '82C8849D94266724DC6B6AF89BBFA087EACCD963.B93C07FBA084ACAEFCF7C9D1FD0203C6C1815B6B'
),
(
'https://s.ytimg.com/yts/jsbin/html5player-en_US-vflKjOTVq/html5player.js',
'js',
'312AA52209E3623129A412D56A40F11CB0AF14AE.3EE09501CB14E3BCDC3B2AE808BF3F1D14E7FBF12',
'112AA5220913623229A412D56A40F11CB0AF14AE.3EE0950FCB14EEBCDC3B2AE808BF331D14E7FBF3',
) )
] ]

View File

@@ -199,18 +199,25 @@ class YoutubeDL(object):
postprocessor. postprocessor.
progress_hooks: A list of functions that get called on download progress_hooks: A list of functions that get called on download
progress, with a dictionary with the entries progress, with a dictionary with the entries
* status: One of "downloading" and "finished". * status: One of "downloading", "error", or "finished".
Check this first and ignore unknown values. Check this first and ignore unknown values.
If status is one of "downloading" or "finished", the If status is one of "downloading", or "finished", the
following properties may also be present: following properties may also be present:
* filename: The final filename (always present) * filename: The final filename (always present)
* tmpfilename: The filename we're currently writing to
* downloaded_bytes: Bytes on disk * downloaded_bytes: Bytes on disk
* total_bytes: Size of the whole file, None if unknown * total_bytes: Size of the whole file, None if unknown
* tmpfilename: The filename we're currently writing to * total_bytes_estimate: Guess of the eventual file size,
None if unavailable.
* elapsed: The number of seconds since download started.
* eta: The estimated time in seconds, None if unknown * eta: The estimated time in seconds, None if unknown
* speed: The download speed in bytes/second, None if * speed: The download speed in bytes/second, None if
unknown unknown
* fragment_index: The counter of the currently
downloaded video fragment.
* fragment_count: The number of fragments (= individual
files that will be merged)
Progress hooks are guaranteed to be called at least once Progress hooks are guaranteed to be called at least once
(with status "finished") if the download is successful. (with status "finished") if the download is successful.
@@ -1527,29 +1534,18 @@ class YoutubeDL(object):
return res return res
def list_formats(self, info_dict): def list_formats(self, info_dict):
def line(format, idlen=20):
return (('%-' + compat_str(idlen + 1) + 's%-10s%-12s%s') % (
format['format_id'],
format['ext'],
self.format_resolution(format),
self._format_note(format),
))
formats = info_dict.get('formats', [info_dict]) formats = info_dict.get('formats', [info_dict])
idlen = max(len('format code'), table = [
max(len(f['format_id']) for f in formats)) [f['format_id'], f['ext'], self.format_resolution(f), self._format_note(f)]
formats_s = [ for f in formats
line(f, idlen) for f in formats
if f.get('preference') is None or f['preference'] >= -1000] if f.get('preference') is None or f['preference'] >= -1000]
if len(formats) > 1: if len(formats) > 1:
formats_s[-1] += (' ' if self._format_note(formats[-1]) else '') + '(best)' table[-1][-1] += (' ' if table[-1][-1] else '') + '(best)'
header_line = line({ header_line = ['format code', 'extension', 'resolution', 'note']
'format_id': 'format code', 'ext': 'extension',
'resolution': 'resolution', 'format_note': 'note'}, idlen=idlen)
self.to_screen( self.to_screen(
'[info] Available formats for %s:\n%s\n%s' % '[info] Available formats for %s:\n%s' %
(info_dict['id'], header_line, '\n'.join(formats_s))) (info_dict['id'], render_table(header_line, table)))
def list_thumbnails(self, info_dict): def list_thumbnails(self, info_dict):
thumbnails = info_dict.get('thumbnails') thumbnails = info_dict.get('thumbnails')

View File

@@ -1,4 +1,4 @@
from __future__ import unicode_literals from __future__ import division, unicode_literals
import os import os
import re import re
@@ -54,6 +54,7 @@ class FileDownloader(object):
self.ydl = ydl self.ydl = ydl
self._progress_hooks = [] self._progress_hooks = []
self.params = params self.params = params
self.add_progress_hook(self.report_progress)
@staticmethod @staticmethod
def format_seconds(seconds): def format_seconds(seconds):
@@ -226,42 +227,64 @@ class FileDownloader(object):
self.to_screen(clear_line + fullmsg, skip_eol=not is_last_line) self.to_screen(clear_line + fullmsg, skip_eol=not is_last_line)
self.to_console_title('youtube-dl ' + msg) self.to_console_title('youtube-dl ' + msg)
def report_progress(self, percent, data_len_str, speed, eta): def report_progress(self, s):
"""Report download progress.""" if s['status'] == 'finished':
if self.params.get('noprogress', False): if self.params.get('noprogress', False):
self.to_screen('[download] Download completed')
else:
s['_total_bytes_str'] = format_bytes(s['total_bytes'])
if s.get('elapsed') is not None:
s['_elapsed_str'] = self.format_seconds(s['elapsed'])
msg_template = '100%% of %(_total_bytes_str)s in %(_elapsed_str)s'
else:
msg_template = '100%% of %(_total_bytes_str)s'
self._report_progress_status(
msg_template % s, is_last_line=True)
if self.params.get('noprogress'):
return return
if eta is not None:
eta_str = self.format_eta(eta)
else:
eta_str = 'Unknown ETA'
if percent is not None:
percent_str = self.format_percent(percent)
else:
percent_str = 'Unknown %'
speed_str = self.format_speed(speed)
msg = ('%s of %s at %s ETA %s' % if s['status'] != 'downloading':
(percent_str, data_len_str, speed_str, eta_str))
self._report_progress_status(msg)
def report_progress_live_stream(self, downloaded_data_len, speed, elapsed):
if self.params.get('noprogress', False):
return return
downloaded_str = format_bytes(downloaded_data_len)
speed_str = self.format_speed(speed)
elapsed_str = FileDownloader.format_seconds(elapsed)
msg = '%s at %s (%s)' % (downloaded_str, speed_str, elapsed_str)
self._report_progress_status(msg)
def report_finish(self, data_len_str, tot_time): if s.get('eta') is not None:
"""Report download finished.""" s['_eta_str'] = self.format_eta(s['eta'])
if self.params.get('noprogress', False):
self.to_screen('[download] Download completed')
else: else:
self._report_progress_status( s['_eta_str'] = 'Unknown ETA'
('100%% of %s in %s' %
(data_len_str, self.format_seconds(tot_time))), if s.get('total_bytes') and s.get('downloaded_bytes') is not None:
is_last_line=True) s['_percent_str'] = self.format_percent(100 * s['downloaded_bytes'] / s['total_bytes'])
elif s.get('total_bytes_estimate') and s.get('downloaded_bytes') is not None:
s['_percent_str'] = self.format_percent(100 * s['downloaded_bytes'] / s['total_bytes_estimate'])
else:
if s.get('downloaded_bytes') == 0:
s['_percent_str'] = self.format_percent(0)
else:
s['_percent_str'] = 'Unknown %'
if s.get('speed') is not None:
s['_speed_str'] = self.format_speed(s['speed'])
else:
s['_speed_str'] = 'Unknown speed'
if s.get('total_bytes') is not None:
s['_total_bytes_str'] = format_bytes(s['total_bytes'])
msg_template = '%(_percent_str)s of %(_total_bytes_str)s at %(_speed_str)s ETA %(_eta_str)s'
elif s.get('total_bytes_estimate') is not None:
s['_total_bytes_estimate_str'] = format_bytes(s['total_bytes_estimate'])
msg_template = '%(_percent_str)s of ~%(_total_bytes_estimate_str)s at %(_speed_str)s ETA %(_eta_str)s'
else:
if s.get('downloaded_bytes') is not None:
s['_downloaded_bytes_str'] = format_bytes(s['downloaded_bytes'])
if s.get('elapsed'):
s['_elapsed_str'] = self.format_seconds(s['elapsed'])
msg_template = '%(_downloaded_bytes_str)s at %(_speed_str)s (%(_elapsed_str)s)'
else:
msg_template = '%(_downloaded_bytes_str)s at %(_speed_str)s'
else:
msg_template = '%(_percent_str)s % at %(_speed_str)s ETA %(_eta_str)s'
self._report_progress_status(msg_template % s)
def report_resuming_byte(self, resume_len): def report_resuming_byte(self, resume_len):
"""Report attempt to resume at given byte.""" """Report attempt to resume at given byte."""

View File

@@ -1,4 +1,4 @@
from __future__ import unicode_literals from __future__ import division, unicode_literals
import base64 import base64
import io import io
@@ -15,7 +15,6 @@ from ..compat import (
from ..utils import ( from ..utils import (
struct_pack, struct_pack,
struct_unpack, struct_unpack,
format_bytes,
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
xpath_text, xpath_text,
@@ -252,17 +251,6 @@ class F4mFD(FileDownloader):
requested_bitrate = info_dict.get('tbr') requested_bitrate = info_dict.get('tbr')
self.to_screen('[download] Downloading f4m manifest') self.to_screen('[download] Downloading f4m manifest')
manifest = self.ydl.urlopen(man_url).read() manifest = self.ydl.urlopen(man_url).read()
self.report_destination(filename)
http_dl = HttpQuietDownloader(
self.ydl,
{
'continuedl': True,
'quiet': True,
'noprogress': True,
'ratelimit': self.params.get('ratelimit', None),
'test': self.params.get('test', False),
}
)
doc = etree.fromstring(manifest) doc = etree.fromstring(manifest)
formats = [(int(f.attrib.get('bitrate', -1)), f) formats = [(int(f.attrib.get('bitrate', -1)), f)
@@ -298,39 +286,65 @@ class F4mFD(FileDownloader):
# For some akamai manifests we'll need to add a query to the fragment url # For some akamai manifests we'll need to add a query to the fragment url
akamai_pv = xpath_text(doc, _add_ns('pv-2.0')) akamai_pv = xpath_text(doc, _add_ns('pv-2.0'))
self.report_destination(filename)
http_dl = HttpQuietDownloader(
self.ydl,
{
'continuedl': True,
'quiet': True,
'noprogress': True,
'ratelimit': self.params.get('ratelimit', None),
'test': self.params.get('test', False),
}
)
tmpfilename = self.temp_name(filename) tmpfilename = self.temp_name(filename)
(dest_stream, tmpfilename) = sanitize_open(tmpfilename, 'wb') (dest_stream, tmpfilename) = sanitize_open(tmpfilename, 'wb')
write_flv_header(dest_stream) write_flv_header(dest_stream)
write_metadata_tag(dest_stream, metadata) write_metadata_tag(dest_stream, metadata)
# This dict stores the download progress, it's updated by the progress # This dict stores the download progress, it's updated by the progress
# hook # hook
state = { state = {
'status': 'downloading',
'downloaded_bytes': 0, 'downloaded_bytes': 0,
'frag_counter': 0, 'frag_index': 0,
'frag_count': total_frags,
'filename': filename,
'tmpfilename': tmpfilename,
} }
start = time.time() start = time.time()
def frag_progress_hook(status): def frag_progress_hook(s):
frag_total_bytes = status.get('total_bytes', 0) if s['status'] not in ('downloading', 'finished'):
estimated_size = (state['downloaded_bytes'] + return
(total_frags - state['frag_counter']) * frag_total_bytes)
if status['status'] == 'finished': frag_total_bytes = s.get('total_bytes', 0)
if s['status'] == 'finished':
state['downloaded_bytes'] += frag_total_bytes state['downloaded_bytes'] += frag_total_bytes
state['frag_counter'] += 1 state['frag_index'] += 1
progress = self.calc_percent(state['frag_counter'], total_frags)
byte_counter = state['downloaded_bytes'] estimated_size = (
(state['downloaded_bytes'] + frag_total_bytes)
/ (state['frag_index'] + 1) * total_frags)
time_now = time.time()
state['total_bytes_estimate'] = estimated_size
state['elapsed'] = time_now - start
if s['status'] == 'finished':
progress = self.calc_percent(state['frag_index'], total_frags)
else: else:
frag_downloaded_bytes = status['downloaded_bytes'] frag_downloaded_bytes = s['downloaded_bytes']
byte_counter = state['downloaded_bytes'] + frag_downloaded_bytes
frag_progress = self.calc_percent(frag_downloaded_bytes, frag_progress = self.calc_percent(frag_downloaded_bytes,
frag_total_bytes) frag_total_bytes)
progress = self.calc_percent(state['frag_counter'], total_frags) progress = self.calc_percent(state['frag_index'], total_frags)
progress += frag_progress / float(total_frags) progress += frag_progress / float(total_frags)
eta = self.calc_eta(start, time.time(), estimated_size, byte_counter) state['eta'] = self.calc_eta(
self.report_progress(progress, format_bytes(estimated_size), start, time_now, estimated_size, state['downloaded_bytes'] + frag_downloaded_bytes)
status.get('speed'), eta) state['speed'] = s.get('speed')
self._hook_progress(state)
http_dl.add_progress_hook(frag_progress_hook) http_dl.add_progress_hook(frag_progress_hook)
frags_filenames = [] frags_filenames = []
@@ -354,8 +368,8 @@ class F4mFD(FileDownloader):
frags_filenames.append(frag_filename) frags_filenames.append(frag_filename)
dest_stream.close() dest_stream.close()
self.report_finish(format_bytes(state['downloaded_bytes']), time.time() - start)
elapsed = time.time() - start
self.try_rename(tmpfilename, filename) self.try_rename(tmpfilename, filename)
for frag_file in frags_filenames: for frag_file in frags_filenames:
os.remove(frag_file) os.remove(frag_file)
@@ -366,6 +380,7 @@ class F4mFD(FileDownloader):
'total_bytes': fsize, 'total_bytes': fsize,
'filename': filename, 'filename': filename,
'status': 'finished', 'status': 'finished',
'elapsed': elapsed,
}) })
return True return True

View File

@@ -23,7 +23,7 @@ class HlsFD(FileDownloader):
tmpfilename = self.temp_name(filename) tmpfilename = self.temp_name(filename)
ffpp = FFmpegPostProcessor(downloader=self) ffpp = FFmpegPostProcessor(downloader=self)
if not ffpp.available(): if not ffpp.available:
self.report_error('m3u8 download detected but ffmpeg or avconv could not be found. Please install one.') self.report_error('m3u8 download detected but ffmpeg or avconv could not be found. Please install one.')
return False return False
ffpp.check_version() ffpp.check_version()

View File

@@ -1,10 +1,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os
import time
from socket import error as SocketError
import errno import errno
import os
import socket
import time
from .common import FileDownloader from .common import FileDownloader
from ..compat import ( from ..compat import (
@@ -15,7 +14,6 @@ from ..utils import (
ContentTooShortError, ContentTooShortError,
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
format_bytes,
) )
@@ -102,7 +100,7 @@ class HttpFD(FileDownloader):
resume_len = 0 resume_len = 0
open_mode = 'wb' open_mode = 'wb'
break break
except SocketError as e: except socket.error as e:
if e.errno != errno.ECONNRESET: if e.errno != errno.ECONNRESET:
# Connection reset is no problem, just retry # Connection reset is no problem, just retry
raise raise
@@ -137,7 +135,6 @@ class HttpFD(FileDownloader):
self.to_screen('\r[download] File is larger than max-filesize (%s bytes > %s bytes). Aborting.' % (data_len, max_data_len)) self.to_screen('\r[download] File is larger than max-filesize (%s bytes > %s bytes). Aborting.' % (data_len, max_data_len))
return False return False
data_len_str = format_bytes(data_len)
byte_counter = 0 + resume_len byte_counter = 0 + resume_len
block_size = self.params.get('buffersize', 1024) block_size = self.params.get('buffersize', 1024)
start = time.time() start = time.time()
@@ -196,20 +193,19 @@ class HttpFD(FileDownloader):
# Progress message # Progress message
speed = self.calc_speed(start, now, byte_counter - resume_len) speed = self.calc_speed(start, now, byte_counter - resume_len)
if data_len is None: if data_len is None:
eta = percent = None eta = None
else: else:
percent = self.calc_percent(byte_counter, data_len)
eta = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len) eta = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
self.report_progress(percent, data_len_str, speed, eta)
self._hook_progress({ self._hook_progress({
'status': 'downloading',
'downloaded_bytes': byte_counter, 'downloaded_bytes': byte_counter,
'total_bytes': data_len, 'total_bytes': data_len,
'tmpfilename': tmpfilename, 'tmpfilename': tmpfilename,
'filename': filename, 'filename': filename,
'status': 'downloading',
'eta': eta, 'eta': eta,
'speed': speed, 'speed': speed,
'elapsed': now - start,
}) })
if is_test and byte_counter == data_len: if is_test and byte_counter == data_len:
@@ -221,7 +217,13 @@ class HttpFD(FileDownloader):
return False return False
if tmpfilename != '-': if tmpfilename != '-':
stream.close() stream.close()
self.report_finish(data_len_str, (time.time() - start))
self._hook_progress({
'downloaded_bytes': byte_counter,
'total_bytes': data_len,
'tmpfilename': tmpfilename,
'status': 'error',
})
if data_len is not None and byte_counter != data_len: if data_len is not None and byte_counter != data_len:
raise ContentTooShortError(byte_counter, int(data_len)) raise ContentTooShortError(byte_counter, int(data_len))
self.try_rename(tmpfilename, filename) self.try_rename(tmpfilename, filename)
@@ -235,6 +237,7 @@ class HttpFD(FileDownloader):
'total_bytes': byte_counter, 'total_bytes': byte_counter,
'filename': filename, 'filename': filename,
'status': 'finished', 'status': 'finished',
'elapsed': time.time() - start,
}) })
return True return True

View File

@@ -11,7 +11,6 @@ from ..compat import compat_str
from ..utils import ( from ..utils import (
check_executable, check_executable,
encodeFilename, encodeFilename,
format_bytes,
get_exe_version, get_exe_version,
) )
@@ -51,23 +50,23 @@ class RtmpFD(FileDownloader):
if not resume_percent: if not resume_percent:
resume_percent = percent resume_percent = percent
resume_downloaded_data_len = downloaded_data_len resume_downloaded_data_len = downloaded_data_len
eta = self.calc_eta(start, time.time(), 100 - resume_percent, percent - resume_percent) time_now = time.time()
speed = self.calc_speed(start, time.time(), downloaded_data_len - resume_downloaded_data_len) eta = self.calc_eta(start, time_now, 100 - resume_percent, percent - resume_percent)
speed = self.calc_speed(start, time_now, downloaded_data_len - resume_downloaded_data_len)
data_len = None data_len = None
if percent > 0: if percent > 0:
data_len = int(downloaded_data_len * 100 / percent) data_len = int(downloaded_data_len * 100 / percent)
data_len_str = '~' + format_bytes(data_len)
self.report_progress(percent, data_len_str, speed, eta)
cursor_in_new_line = False
self._hook_progress({ self._hook_progress({
'status': 'downloading',
'downloaded_bytes': downloaded_data_len, 'downloaded_bytes': downloaded_data_len,
'total_bytes': data_len, 'total_bytes_estimate': data_len,
'tmpfilename': tmpfilename, 'tmpfilename': tmpfilename,
'filename': filename, 'filename': filename,
'status': 'downloading',
'eta': eta, 'eta': eta,
'elapsed': time_now - start,
'speed': speed, 'speed': speed,
}) })
cursor_in_new_line = False
else: else:
# no percent for live streams # no percent for live streams
mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec', line) mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec', line)
@@ -75,15 +74,15 @@ class RtmpFD(FileDownloader):
downloaded_data_len = int(float(mobj.group(1)) * 1024) downloaded_data_len = int(float(mobj.group(1)) * 1024)
time_now = time.time() time_now = time.time()
speed = self.calc_speed(start, time_now, downloaded_data_len) speed = self.calc_speed(start, time_now, downloaded_data_len)
self.report_progress_live_stream(downloaded_data_len, speed, time_now - start)
cursor_in_new_line = False
self._hook_progress({ self._hook_progress({
'downloaded_bytes': downloaded_data_len, 'downloaded_bytes': downloaded_data_len,
'tmpfilename': tmpfilename, 'tmpfilename': tmpfilename,
'filename': filename, 'filename': filename,
'status': 'downloading', 'status': 'downloading',
'elapsed': time_now - start,
'speed': speed, 'speed': speed,
}) })
cursor_in_new_line = False
elif self.params.get('verbose', False): elif self.params.get('verbose', False):
if not cursor_in_new_line: if not cursor_in_new_line:
self.to_screen('') self.to_screen('')

View File

@@ -58,6 +58,7 @@ from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE from .canalc2 import Canalc2IE
from .cbs import CBSIE from .cbs import CBSIE
from .cbsnews import CBSNewsIE from .cbsnews import CBSNewsIE
from .cbssports import CBSSportsIE
from .ccc import CCCIE from .ccc import CCCIE
from .ceskatelevize import CeskaTelevizeIE from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE from .channel9 import Channel9IE
@@ -121,6 +122,7 @@ from .ellentv import (
EllenTVClipsIE, EllenTVClipsIE,
) )
from .elpais import ElPaisIE from .elpais import ElPaisIE
from .embedly import EmbedlyIE
from .empflix import EMPFlixIE from .empflix import EMPFlixIE
from .engadget import EngadgetIE from .engadget import EngadgetIE
from .eporner import EpornerIE from .eporner import EpornerIE
@@ -204,6 +206,7 @@ from .imdb import (
ImdbIE, ImdbIE,
ImdbListIE ImdbListIE
) )
from .imgur import ImgurIE
from .ina import InaIE from .ina import InaIE
from .infoq import InfoQIE from .infoq import InfoQIE
from .instagram import InstagramIE, InstagramUserIE from .instagram import InstagramIE, InstagramUserIE
@@ -282,6 +285,7 @@ from .myspace import MySpaceIE, MySpaceAlbumIE
from .myspass import MySpassIE from .myspass import MySpassIE
from .myvideo import MyVideoIE from .myvideo import MyVideoIE
from .myvidster import MyVidsterIE from .myvidster import MyVidsterIE
from .nationalgeographic import NationalGeographicIE
from .naver import NaverIE from .naver import NaverIE
from .nba import NBAIE from .nba import NBAIE
from .nbc import ( from .nbc import (
@@ -350,7 +354,10 @@ from .playfm import PlayFMIE
from .playvid import PlayvidIE from .playvid import PlayvidIE
from .podomatic import PodomaticIE from .podomatic import PodomaticIE
from .pornhd import PornHdIE from .pornhd import PornHdIE
from .pornhub import PornHubIE from .pornhub import (
PornHubIE,
PornHubPlaylistIE,
)
from .pornotube import PornotubeIE from .pornotube import PornotubeIE
from .pornoxo import PornoXOIE from .pornoxo import PornoXOIE
from .promptfile import PromptFileIE from .promptfile import PromptFileIE
@@ -386,6 +393,7 @@ from .rutube import (
RutubePersonIE, RutubePersonIE,
) )
from .rutv import RUTVIE from .rutv import RUTVIE
from .sandia import SandiaIE
from .sapo import SapoIE from .sapo import SapoIE
from .savefrom import SaveFromIE from .savefrom import SaveFromIE
from .sbs import SBSIE from .sbs import SBSIE

View File

@@ -38,6 +38,7 @@ class AdultSwimIE(InfoExtractor):
}, },
], ],
'info_dict': { 'info_dict': {
'id': 'rQxZvXQ4ROaSOqq-or2Mow',
'title': 'Rick and Morty - Pilot', 'title': 'Rick and Morty - Pilot',
'description': "Rick moves in with his daughter's family and establishes himself as a bad influence on his grandson, Morty. " 'description': "Rick moves in with his daughter's family and establishes himself as a bad influence on his grandson, Morty. "
} }
@@ -55,6 +56,7 @@ class AdultSwimIE(InfoExtractor):
} }
], ],
'info_dict': { 'info_dict': {
'id': '-t8CamQlQ2aYZ49ItZCFog',
'title': 'American Dad - Putting Francine Out of Business', 'title': 'American Dad - Putting Francine Out of Business',
'description': 'Stan hatches a plan to get Francine out of the real estate business.Watch more American Dad on [adult swim].' 'description': 'Stan hatches a plan to get Francine out of the real estate business.Watch more American Dad on [adult swim].'
}, },

View File

@@ -14,6 +14,9 @@ class AppleTrailersIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?trailers\.apple\.com/trailers/(?P<company>[^/]+)/(?P<movie>[^/]+)' _VALID_URL = r'https?://(?:www\.)?trailers\.apple\.com/trailers/(?P<company>[^/]+)/(?P<movie>[^/]+)'
_TEST = { _TEST = {
"url": "http://trailers.apple.com/trailers/wb/manofsteel/", "url": "http://trailers.apple.com/trailers/wb/manofsteel/",
'info_dict': {
'id': 'manofsteel',
},
"playlist": [ "playlist": [
{ {
"md5": "d97a8e575432dbcb81b7c3acb741f8a8", "md5": "d97a8e575432dbcb81b7c3acb741f8a8",

View File

@@ -109,7 +109,7 @@ class BandcampIE(InfoExtractor):
class BandcampAlbumIE(InfoExtractor): class BandcampAlbumIE(InfoExtractor):
IE_NAME = 'Bandcamp:album' IE_NAME = 'Bandcamp:album'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<title>[^?#]+)|/?(?:$|[?#]))' _VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<album_id>[^?#]+)|/?(?:$|[?#]))'
_TESTS = [{ _TESTS = [{
'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1', 'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
@@ -133,31 +133,37 @@ class BandcampAlbumIE(InfoExtractor):
], ],
'info_dict': { 'info_dict': {
'title': 'Jazz Format Mixtape vol.1', 'title': 'Jazz Format Mixtape vol.1',
'id': 'jazz-format-mixtape-vol-1',
'uploader_id': 'blazo',
}, },
'params': { 'params': {
'playlistend': 2 'playlistend': 2
}, },
'skip': 'Bandcamp imposes download limits. See test_playlists:test_bandcamp_album for the playlist test' 'skip': 'Bandcamp imposes download limits.'
}, { }, {
'url': 'http://nightbringer.bandcamp.com/album/hierophany-of-the-open-grave', 'url': 'http://nightbringer.bandcamp.com/album/hierophany-of-the-open-grave',
'info_dict': { 'info_dict': {
'title': 'Hierophany of the Open Grave', 'title': 'Hierophany of the Open Grave',
'uploader_id': 'nightbringer',
'id': 'hierophany-of-the-open-grave',
}, },
'playlist_mincount': 9, 'playlist_mincount': 9,
}, { }, {
'url': 'http://dotscale.bandcamp.com', 'url': 'http://dotscale.bandcamp.com',
'info_dict': { 'info_dict': {
'title': 'Loom', 'title': 'Loom',
'id': 'dotscale',
'uploader_id': 'dotscale',
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('subdomain') uploader_id = mobj.group('subdomain')
title = mobj.group('title') album_id = mobj.group('album_id')
display_id = title or playlist_id playlist_id = album_id or uploader_id
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, playlist_id)
tracks_paths = re.findall(r'<a href="(.*?)" itemprop="url">', webpage) tracks_paths = re.findall(r'<a href="(.*?)" itemprop="url">', webpage)
if not tracks_paths: if not tracks_paths:
raise ExtractorError('The page doesn\'t contain any tracks') raise ExtractorError('The page doesn\'t contain any tracks')
@@ -168,8 +174,8 @@ class BandcampAlbumIE(InfoExtractor):
r'album_title\s*:\s*"(.*?)"', webpage, 'title', fatal=False) r'album_title\s*:\s*"(.*?)"', webpage, 'title', fatal=False)
return { return {
'_type': 'playlist', '_type': 'playlist',
'uploader_id': uploader_id,
'id': playlist_id, 'id': playlist_id,
'display_id': display_id,
'title': title, 'title': title,
'entries': entries, 'entries': entries,
} }

View File

@@ -95,6 +95,7 @@ class BrightcoveIE(InfoExtractor):
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL', 'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL',
'info_dict': { 'info_dict': {
'title': 'Sealife', 'title': 'Sealife',
'id': '3550319591001',
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
}, },
@@ -247,7 +248,7 @@ class BrightcoveIE(InfoExtractor):
playlist_info = json_data['videoList'] playlist_info = json_data['videoList']
videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']] videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']]
return self.playlist_result(videos, playlist_id=playlist_info['id'], return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'],
playlist_title=playlist_info['mediaCollectionDTO']['displayName']) playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info): def _extract_video_info(self, video_info):

View File

@@ -33,6 +33,7 @@ class BuzzFeedIE(InfoExtractor):
'skip_download': True, # Got enough YouTube download tests 'skip_download': True, # Got enough YouTube download tests
}, },
'info_dict': { 'info_dict': {
'id': 'look-at-this-cute-dog-omg',
'description': 're:Munchkin the Teddy Bear is back ?!', 'description': 're:Munchkin the Teddy Bear is back ?!',
'title': 'You Need To Stop What You\'re Doing And Watching This Dog Walk On A Treadmill', 'title': 'You Need To Stop What You\'re Doing And Watching This Dog Walk On A Treadmill',
}, },
@@ -42,8 +43,8 @@ class BuzzFeedIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20141124', 'upload_date': '20141124',
'uploader_id': 'CindysMunchkin', 'uploader_id': 'CindysMunchkin',
'description': 're:© 2014 Munchkin the Shih Tzu', 'description': 're:© 2014 Munchkin the',
'uploader': 'Munchkin the Shih Tzu', 'uploader': 're:^Munchkin the',
'title': 're:Munchkin the Teddy Bear gets her exercise', 'title': 're:Munchkin the Teddy Bear gets her exercise',
}, },
}] }]

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
@@ -39,8 +37,7 @@ class CBSIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
real_id = self._search_regex( real_id = self._search_regex(
r"video\.settings\.pid\s*=\s*'([^']+)';", r"video\.settings\.pid\s*=\s*'([^']+)';",

View File

@@ -0,0 +1,30 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class CBSSportsIE(InfoExtractor):
_VALID_URL = r'http://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)'
_TEST = {
'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s',
'info_dict': {
'id': '_d5_GbO8p1sT',
'ext': 'flv',
'title': 'US Open flashbacks: 1990s',
'description': 'Bill Macatee relives the best moments in US Open history from the 1990s.',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
section = mobj.group('section')
video_id = mobj.group('id')
all_videos = self._download_json(
'http://www.cbssports.com/data/video/player/getVideos/%s?as=json' % section,
video_id)
# The json file contains the info of all the videos in the section
video_info = next(v for v in all_videos if v['pcid'] == video_id)
return self.url_result('theplatform:%s' % video_info['pid'], 'ThePlatform')

View File

@@ -27,7 +27,6 @@ from ..utils import (
compiled_regex_type, compiled_regex_type,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
HEADRequest,
int_or_none, int_or_none,
RegexNotFoundError, RegexNotFoundError,
sanitize_filename, sanitize_filename,
@@ -753,9 +752,7 @@ class InfoExtractor(object):
def _is_valid_url(self, url, video_id, item='video'): def _is_valid_url(self, url, video_id, item='video'):
try: try:
self._request_webpage( self._request_webpage(url, video_id, 'Checking %s URL' % item)
HEADRequest(url), video_id,
'Checking %s URL' % item)
return True return True
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError): if isinstance(e.cause, compat_HTTPError):
@@ -841,6 +838,7 @@ class InfoExtractor(object):
note='Downloading m3u8 information', note='Downloading m3u8 information',
errnote='Failed to download m3u8 information') errnote='Failed to download m3u8 information')
last_info = None last_info = None
last_media = None
kv_rex = re.compile( kv_rex = re.compile(
r'(?P<key>[a-zA-Z_-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)') r'(?P<key>[a-zA-Z_-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)')
for line in m3u8_doc.splitlines(): for line in m3u8_doc.splitlines():
@@ -851,6 +849,13 @@ class InfoExtractor(object):
if v.startswith('"'): if v.startswith('"'):
v = v[1:-1] v = v[1:-1]
last_info[m.group('key')] = v last_info[m.group('key')] = v
elif line.startswith('#EXT-X-MEDIA:'):
last_media = {}
for m in kv_rex.finditer(line):
v = m.group('val')
if v.startswith('"'):
v = v[1:-1]
last_media[m.group('key')] = v
elif line.startswith('#') or not line.strip(): elif line.startswith('#') or not line.strip():
continue continue
else: else:
@@ -879,6 +884,9 @@ class InfoExtractor(object):
width_str, height_str = resolution.split('x') width_str, height_str = resolution.split('x')
f['width'] = int(width_str) f['width'] = int(width_str)
f['height'] = int(height_str) f['height'] = int(height_str)
if last_media is not None:
f['m3u8_media'] = last_media
last_media = None
formats.append(f) formats.append(f)
last_info = {} last_info = {}
self._sort_formats(formats) self._sort_formats(formats)

View File

@@ -194,6 +194,7 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q', 'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
'info_dict': { 'info_dict': {
'title': 'SPORT', 'title': 'SPORT',
'id': 'xv4bw_nqtv_sport',
}, },
'playlist_mincount': 20, 'playlist_mincount': 20,
}] }]

View File

@@ -0,0 +1,16 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
class EmbedlyIE(InfoExtractor):
_VALID_URL = r'https?://(?:www|cdn\.)?embedly\.com/widgets/media\.html\?(?:[^#]*?&)?url=(?P<id>[^#&]+)'
_TESTS = [{
'url': 'https://cdn.embedly.com/widgets/media.html?src=http%3A%2F%2Fwww.youtube.com%2Fembed%2Fvideoseries%3Flist%3DUUGLim4T2loE5rwCMdpCIPVg&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DSU4fj_aEMVw%26list%3DUUGLim4T2loE5rwCMdpCIPVg&image=http%3A%2F%2Fi.ytimg.com%2Fvi%2FSU4fj_aEMVw%2Fhqdefault.jpg&key=8ee8a2e6a8cc47aab1a5ee67f9a178e0&type=text%2Fhtml&schema=youtube&autoplay=1',
'only_matching': True,
}]
def _real_extract(self, url):
return self.url_result(compat_urllib_parse_unquote(self._match_id(url)))

View File

@@ -14,6 +14,7 @@ class FiveMinIE(InfoExtractor):
IE_NAME = '5min' IE_NAME = '5min'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?:https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(?:.*?&)?playList=| (?:https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(?:.*?&)?playList=|
https?://(?:(?:massively|www)\.)?joystiq\.com/video/|
5min:) 5min:)
(?P<id>\d+) (?P<id>\d+)
''' '''

View File

@@ -473,6 +473,7 @@ class GenericIE(InfoExtractor):
{ {
'url': 'http://discourse.ubuntu.com/t/unity-8-desktop-mode-windows-on-mir/1986', 'url': 'http://discourse.ubuntu.com/t/unity-8-desktop-mode-windows-on-mir/1986',
'info_dict': { 'info_dict': {
'id': '1986',
'title': 'Unity 8 desktop-mode windows on Mir! - Ubuntu Discourse', 'title': 'Unity 8 desktop-mode windows on Mir! - Ubuntu Discourse',
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
@@ -531,7 +532,7 @@ class GenericIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'Mrj4DVp2zeA', 'id': 'Mrj4DVp2zeA',
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20150204', 'upload_date': '20150212',
'uploader': 'The National Archives UK', 'uploader': 'The National Archives UK',
'description': 'md5:a236581cd2449dd2df4f93412f3f01c6', 'description': 'md5:a236581cd2449dd2df4f93412f3f01c6',
'uploader_id': 'NationalArchives08', 'uploader_id': 'NationalArchives08',

View File

@@ -34,6 +34,9 @@ class IGNIE(InfoExtractor):
}, },
{ {
'url': 'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind', 'url': 'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
'info_dict': {
'id': '100-little-things-in-gta-5-that-will-blow-your-mind',
},
'playlist': [ 'playlist': [
{ {
'info_dict': { 'info_dict': {

View File

@@ -0,0 +1,97 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
js_to_json,
mimetype2ext,
ExtractorError,
)
class ImgurIE(InfoExtractor):
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?P<id>[a-zA-Z0-9]+)(?:\.mp4|\.gifv)?'
_TESTS = [{
'url': 'https://i.imgur.com/A61SaA1.gifv',
'info_dict': {
'id': 'A61SaA1',
'ext': 'mp4',
'title': 'MRW gifv is up and running without any bugs',
'description': 'The Internet\'s visual storytelling community. Explore, share, and discuss the best visual stories the Internet has to offer.',
},
}, {
'url': 'https://imgur.com/A61SaA1',
'info_dict': {
'id': 'A61SaA1',
'ext': 'mp4',
'title': 'MRW gifv is up and running without any bugs',
'description': 'The Internet\'s visual storytelling community. Explore, share, and discuss the best visual stories the Internet has to offer.',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
width = int_or_none(self._search_regex(
r'<param name="width" value="([0-9]+)"',
webpage, 'width', fatal=False))
height = int_or_none(self._search_regex(
r'<param name="height" value="([0-9]+)"',
webpage, 'height', fatal=False))
video_elements = self._search_regex(
r'(?s)<div class="video-elements">(.*?)</div>',
webpage, 'video elements', default=None)
if not video_elements:
raise ExtractorError(
'No sources found for video %s. Maybe an image?' % video_id,
expected=True)
formats = []
for m in re.finditer(r'<source\s+src="(?P<src>[^"]+)"\s+type="(?P<type>[^"]+)"', video_elements):
formats.append({
'format_id': m.group('type').partition('/')[2],
'url': self._proto_relative_url(m.group('src')),
'ext': mimetype2ext(m.group('type')),
'acodec': 'none',
'width': width,
'height': height,
'http_headers': {
'User-Agent': 'youtube-dl (like wget)',
},
})
gif_json = self._search_regex(
r'(?s)var\s+videoItem\s*=\s*(\{.*?\})',
webpage, 'GIF code', fatal=False)
if gif_json:
gifd = self._parse_json(
gif_json, video_id, transform_source=js_to_json)
formats.append({
'format_id': 'gif',
'preference': -10,
'width': width,
'height': height,
'ext': 'gif',
'acodec': 'none',
'vcodec': 'gif',
'container': 'gif',
'url': self._proto_relative_url(gifd['gifUrl']),
'filesize': gifd.get('size'),
'http_headers': {
'User-Agent': 'youtube-dl (like wget)',
},
})
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'description': self._og_search_description(webpage),
'title': self._og_search_title(webpage),
}

View File

@@ -37,6 +37,7 @@ class LivestreamIE(InfoExtractor):
'url': 'http://new.livestream.com/tedx/cityenglish', 'url': 'http://new.livestream.com/tedx/cityenglish',
'info_dict': { 'info_dict': {
'title': 'TEDCity2.0 (English)', 'title': 'TEDCity2.0 (English)',
'id': '2245590',
}, },
'playlist_mincount': 4, 'playlist_mincount': 4,
}, { }, {
@@ -148,7 +149,8 @@ class LivestreamIE(InfoExtractor):
if is_relevant(video_data, video_id)] if is_relevant(video_data, video_id)]
if video_id is None: if video_id is None:
# This is an event page: # This is an event page:
return self.playlist_result(videos, info['id'], info['full_name']) return self.playlist_result(
videos, '%s' % info['id'], info['full_name'])
else: else:
if not videos: if not videos:
raise ExtractorError('Cannot find video %s' % video_id) raise ExtractorError('Cannot find video %s' % video_id)

View File

@@ -0,0 +1,38 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
smuggle_url,
url_basename,
)
class NationalGeographicIE(InfoExtractor):
_VALID_URL = r'http://video\.nationalgeographic\.com/video/.*?'
_TEST = {
'url': 'http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin?source=featuredvideo',
'info_dict': {
'id': '4DmDACA6Qtk_',
'ext': 'flv',
'title': 'Mating Crabs Busted by Sharks',
'description': 'md5:16f25aeffdeba55aaa8ec37e093ad8b3',
},
'add_ie': ['ThePlatform'],
}
def _real_extract(self, url):
name = url_basename(url)
webpage = self._download_webpage(url, name)
feed_url = self._search_regex(r'data-feed-url="([^"]+)"', webpage, 'feed url')
guid = self._search_regex(r'data-video-guid="([^"]+)"', webpage, 'guid')
feed = self._download_xml('%s?byGuid=%s' % (feed_url, guid), name)
content = feed.find('.//{http://search.yahoo.com/mrss/}content')
theplatform_id = url_basename(content.attrib.get('url'))
return self.url_result(smuggle_url(
'http://link.theplatform.com/s/ngs/%s?format=SMIL&formats=MPEG4&manifest=f4m' % theplatform_id,
# For some reason, the normal links don't work and we must force the use of f4m
{'force_smil_url': True}))

View File

@@ -18,13 +18,13 @@ class NBCIE(InfoExtractor):
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.nbc.com/chicago-fire/video/i-am-a-firefighter/2734188', 'url': 'http://www.nbc.com/the-tonight-show/segments/112966',
# md5 checksum is not stable # md5 checksum is not stable
'info_dict': { 'info_dict': {
'id': 'bTmnLCvIbaaH', 'id': 'c9xnCo0YPOPH',
'ext': 'flv', 'ext': 'flv',
'title': 'I Am a Firefighter', 'title': 'Jimmy Fallon Surprises Fans at Ben & Jerry\'s',
'description': 'An emergency puts Dawson\'sf irefighter skills to the ultimate test in this four-part digital series.', 'description': 'Jimmy gives out free scoops of his new "Tonight Dough" ice cream flavor by surprising customers at the Ben & Jerry\'s scoop shop.',
}, },
}, },
{ {

View File

@@ -29,6 +29,9 @@ class NetzkinoIE(InfoExtractor):
'timestamp': 1344858571, 'timestamp': 1344858571,
'age_limit': 12, 'age_limit': 12,
}, },
'params': {
'skip_download': 'Download only works from Germany',
}
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -1,9 +1,6 @@
# encoding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
js_to_json, js_to_json,
@@ -11,7 +8,7 @@ from ..utils import (
class PatreonIE(InfoExtractor): class PatreonIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?patreon\.com/creation\?hid=(.+)' _VALID_URL = r'https?://(?:www\.)?patreon\.com/creation\?hid=(?P<id>[^&#]+)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.patreon.com/creation?hid=743933', 'url': 'http://www.patreon.com/creation?hid=743933',
@@ -35,6 +32,23 @@ class PatreonIE(InfoExtractor):
'thumbnail': 're:^https?://.*$', 'thumbnail': 're:^https?://.*$',
}, },
}, },
{
'url': 'https://www.patreon.com/creation?hid=1682498',
'info_dict': {
'id': 'SU4fj_aEMVw',
'ext': 'mp4',
'title': 'I\'m on Patreon!',
'uploader': 'TraciJHines',
'thumbnail': 're:^https?://.*$',
'upload_date': '20150211',
'description': 'md5:c5a706b1f687817a3de09db1eb93acd4',
'uploader_id': 'TraciJHines',
},
'params': {
'noplaylist': True,
'skip_download': True,
}
}
] ]
# Currently Patreon exposes download URL via hidden CSS, so login is not # Currently Patreon exposes download URL via hidden CSS, so login is not
@@ -65,26 +79,29 @@ class PatreonIE(InfoExtractor):
''' '''
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group(1)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage).strip() title = self._og_search_title(webpage).strip()
attach_fn = self._html_search_regex( attach_fn = self._html_search_regex(
r'<div class="attach"><a target="_blank" href="([^"]+)">', r'<div class="attach"><a target="_blank" href="([^"]+)">',
webpage, 'attachment URL', default=None) webpage, 'attachment URL', default=None)
embed = self._html_search_regex(
r'<div id="watchCreation">\s*<iframe class="embedly-embed" src="([^"]+)"',
webpage, 'embedded URL', default=None)
if attach_fn is not None: if attach_fn is not None:
video_url = 'http://www.patreon.com' + attach_fn video_url = 'http://www.patreon.com' + attach_fn
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'<strong>(.*?)</strong> is creating', webpage, 'uploader') r'<strong>(.*?)</strong> is creating', webpage, 'uploader')
elif embed is not None:
return self.url_result(embed)
else: else:
playlist_js = self._search_regex( playlist = self._parse_json(self._search_regex(
r'(?s)new\s+jPlayerPlaylist\(\s*\{\s*[^}]*},\s*(\[.*?,?\s*\])', r'(?s)new\s+jPlayerPlaylist\(\s*\{\s*[^}]*},\s*(\[.*?,?\s*\])',
webpage, 'playlist JSON') webpage, 'playlist JSON'),
playlist_json = js_to_json(playlist_js) video_id, transform_source=js_to_json)
playlist = json.loads(playlist_json)
data = playlist[0] data = playlist[0]
video_url = self._proto_relative_url(data['mp3']) video_url = self._proto_relative_url(data['mp3'])
thumbnail = self._proto_relative_url(data.get('cover')) thumbnail = self._proto_relative_url(data.get('cover'))

View File

@@ -56,7 +56,7 @@ class PornHubIE(InfoExtractor):
video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title') video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(
r'(?s)From:&nbsp;.+?<(?:a href="/users/|a href="/channels/|<span class="username)[^>]+>(.+?)<', r'(?s)From:&nbsp;.+?<(?:a href="/users/|a href="/channels/|span class="username)[^>]+>(.+?)<',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, 'thumbnail', fatal=False) thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, 'thumbnail', fatal=False)
if thumbnail: if thumbnail:
@@ -110,3 +110,33 @@ class PornHubIE(InfoExtractor):
'formats': formats, 'formats': formats,
'age_limit': 18, 'age_limit': 18,
} }
class PornHubPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/playlist/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.pornhub.com/playlist/6201671',
'info_dict': {
'id': '6201671',
'title': 'P0p4',
},
'playlist_mincount': 35,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result('http://www.pornhub.com/%s' % video_url, 'PornHub')
for video_url in set(re.findall('href="/?(view_video\.php\?viewkey=\d+[^"]*)"', webpage))
]
playlist = self._parse_json(
self._search_regex(
r'playlistObject\s*=\s*({.+?});', webpage, 'playlist'),
playlist_id)
return self.playlist_result(
entries, playlist_id, playlist.get('title'), playlist.get('description'))

View File

@@ -1,7 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import json
from .common import InfoExtractor from .common import InfoExtractor
@@ -10,13 +8,13 @@ class RadioDeIE(InfoExtractor):
_VALID_URL = r'https?://(?P<id>.+?)\.(?:radio\.(?:de|at|fr|pt|es|pl|it)|rad\.io)' _VALID_URL = r'https?://(?P<id>.+?)\.(?:radio\.(?:de|at|fr|pt|es|pl|it)|rad\.io)'
_TEST = { _TEST = {
'url': 'http://ndr2.radio.de/', 'url': 'http://ndr2.radio.de/',
'md5': '3b4cdd011bc59174596b6145cda474a4',
'info_dict': { 'info_dict': {
'id': 'ndr2', 'id': 'ndr2',
'ext': 'mp3', 'ext': 'mp3',
'title': 're:^NDR 2 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'title': 're:^NDR 2 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:591c49c702db1a33751625ebfb67f273', 'description': 'md5:591c49c702db1a33751625ebfb67f273',
'thumbnail': 're:^https?://.*\.png', 'thumbnail': 're:^https?://.*\.png',
'is_live': True,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -25,16 +23,15 @@ class RadioDeIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
radio_id = self._match_id(url) radio_id = self._match_id(url)
webpage = self._download_webpage(url, radio_id) webpage = self._download_webpage(url, radio_id)
jscode = self._search_regex(
r"'components/station/stationService':\s*\{\s*'?station'?:\s*(\{.*?\s*\}),\n",
webpage, 'broadcast')
broadcast = json.loads(self._search_regex( broadcast = self._parse_json(jscode, radio_id)
r'_getBroadcast\s*=\s*function\(\s*\)\s*{\s*return\s+({.+?})\s*;\s*}',
webpage, 'broadcast'))
title = self._live_title(broadcast['name']) title = self._live_title(broadcast['name'])
description = broadcast.get('description') or broadcast.get('shortDescription') description = broadcast.get('description') or broadcast.get('shortDescription')
thumbnail = broadcast.get('picture4Url') or broadcast.get('picture4TransUrl') thumbnail = broadcast.get('picture4Url') or broadcast.get('picture4TransUrl') or broadcast.get('logo100x100')
formats = [{ formats = [{
'url': stream['streamUrl'], 'url': stream['streamUrl'],

View File

@@ -0,0 +1,117 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import json
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
compat_urlparse,
)
from ..utils import (
int_or_none,
js_to_json,
mimetype2ext,
unified_strdate,
)
class SandiaIE(InfoExtractor):
IE_DESC = 'Sandia National Laboratories'
_VALID_URL = r'https?://digitalops\.sandia\.gov/Mediasite/Play/(?P<id>[0-9a-f]+)'
_TEST = {
'url': 'http://digitalops.sandia.gov/Mediasite/Play/24aace4429fc450fb5b38cdbf424a66e1d',
'md5': '9422edc9b9a60151727e4b6d8bef393d',
'info_dict': {
'id': '24aace4429fc450fb5b38cdbf424a66e1d',
'ext': 'mp4',
'title': 'Xyce Software Training - Section 1',
'description': 're:(?s)SAND Number: SAND 2013-7800.{200,}',
'upload_date': '20120904',
'duration': 7794,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'MediasitePlayerCaps=ClientPlugins=4')
webpage = self._download_webpage(req, video_id)
js_path = self._search_regex(
r'<script type="text/javascript" src="(/Mediasite/FileServer/Presentation/[^"]+)"',
webpage, 'JS code URL')
js_url = compat_urlparse.urljoin(url, js_path)
js_code = self._download_webpage(
js_url, video_id, note='Downloading player')
def extract_str(key, **args):
return self._search_regex(
r'Mediasite\.PlaybackManifest\.%s\s*=\s*(.+);\s*?\n' % re.escape(key),
js_code, key, **args)
def extract_data(key, **args):
data_json = extract_str(key, **args)
if data_json is None:
return data_json
return self._parse_json(
data_json, video_id, transform_source=js_to_json)
formats = []
for i in itertools.count():
fd = extract_data('VideoUrls[%d]' % i, default=None)
if fd is None:
break
formats.append({
'format_id': '%s' % i,
'format_note': fd['MimeType'].partition('/')[2],
'ext': mimetype2ext(fd['MimeType']),
'url': fd['Location'],
'protocol': 'f4m' if fd['MimeType'] == 'video/x-mp4-fragmented' else None,
})
self._sort_formats(formats)
slide_baseurl = compat_urlparse.urljoin(
url, extract_data('SlideBaseUrl'))
slide_template = slide_baseurl + re.sub(
r'\{0:D?([0-9+])\}', r'%0\1d', extract_data('SlideImageFileNameTemplate'))
slides = []
last_slide_time = 0
for i in itertools.count(1):
sd = extract_str('Slides[%d]' % i, default=None)
if sd is None:
break
timestamp = int_or_none(self._search_regex(
r'^Mediasite\.PlaybackManifest\.CreateSlide\("[^"]*"\s*,\s*([0-9]+),',
sd, 'slide %s timestamp' % i, fatal=False))
slides.append({
'url': slide_template % i,
'duration': timestamp - last_slide_time,
})
last_slide_time = timestamp
formats.append({
'format_id': 'slides',
'protocol': 'slideshow',
'url': json.dumps(slides),
'preference': -10000, # Downloader not yet written
})
self._sort_formats(formats)
title = extract_data('Title')
description = extract_data('Description', fatal=False)
duration = int_or_none(extract_data(
'Duration', fatal=False), scale=1000)
upload_date = unified_strdate(extract_data('AirDate', fatal=False))
return {
'id': video_id,
'title': title,
'description': description,
'formats': formats,
'upload_date': upload_date,
'duration': duration,
}

View File

@@ -25,7 +25,6 @@ class SockshareIE(InfoExtractor):
'id': '437BE28B89D799D7', 'id': '437BE28B89D799D7',
'title': 'big_buck_bunny_720p_surround.avi', 'title': 'big_buck_bunny_720p_surround.avi',
'ext': 'avi', 'ext': 'avi',
'thumbnail': 're:^http://.*\.jpg$',
} }
} }
@@ -45,7 +44,7 @@ class SockshareIE(InfoExtractor):
''', webpage, 'hash') ''', webpage, 'hash')
fields = { fields = {
"hash": confirm_hash, "hash": confirm_hash.encode('utf-8'),
"confirm": "Continue as Free User" "confirm": "Continue as Free User"
} }
@@ -68,7 +67,7 @@ class SockshareIE(InfoExtractor):
webpage, 'title', default=None) webpage, 'title', default=None)
thumbnail = self._html_search_regex( thumbnail = self._html_search_regex(
r'<img\s+src="([^"]*)".+?name="bg"', r'<img\s+src="([^"]*)".+?name="bg"',
webpage, 'thumbnail') webpage, 'thumbnail', default=None)
formats = [{ formats = [{
'format_id': 'sd', 'format_id': 'sd',

View File

@@ -4,11 +4,10 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError
class TheOnionIE(InfoExtractor): class TheOnionIE(InfoExtractor):
_VALID_URL = r'(?x)https?://(?:www\.)?theonion\.com/video/[^,]+,(?P<article_id>[0-9]+)/?' _VALID_URL = r'https?://(?:www\.)?theonion\.com/video/[^,]+,(?P<id>[0-9]+)/?'
_TEST = { _TEST = {
'url': 'http://www.theonion.com/video/man-wearing-mm-jacket-gods-image,36918/', 'url': 'http://www.theonion.com/video/man-wearing-mm-jacket-gods-image,36918/',
'md5': '19eaa9a39cf9b9804d982e654dc791ee', 'md5': '19eaa9a39cf9b9804d982e654dc791ee',
@@ -22,10 +21,8 @@ class TheOnionIE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id = self._match_id(url)
article_id = mobj.group('article_id') webpage = self._download_webpage(url, display_id)
webpage = self._download_webpage(url, article_id)
video_id = self._search_regex( video_id = self._search_regex(
r'"videoId":\s(\d+),', webpage, 'video ID') r'"videoId":\s(\d+),', webpage, 'video ID')
@@ -34,10 +31,6 @@ class TheOnionIE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
sources = re.findall(r'<source src="([^"]+)" type="([^"]+)"', webpage) sources = re.findall(r'<source src="([^"]+)" type="([^"]+)"', webpage)
if not sources:
raise ExtractorError(
'No sources found for video %s' % video_id, expected=True)
formats = [] formats = []
for src, type_ in sources: for src, type_ in sources:
if type_ == 'video/mp4': if type_ == 'video/mp4':
@@ -54,15 +47,15 @@ class TheOnionIE(InfoExtractor):
}) })
elif type_ == 'application/x-mpegURL': elif type_ == 'application/x-mpegURL':
formats.extend( formats.extend(
self._extract_m3u8_formats(src, video_id, preference=-1)) self._extract_m3u8_formats(src, display_id, preference=-1))
else: else:
self.report_warning( self.report_warning(
'Encountered unexpected format: %s' % type_) 'Encountered unexpected format: %s' % type_)
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id,
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'thumbnail': thumbnail, 'thumbnail': thumbnail,

View File

@@ -71,7 +71,9 @@ class ThePlatformIE(SubtitlesInfoExtractor):
if not provider_id: if not provider_id:
provider_id = 'dJ5BDC' provider_id = 'dJ5BDC'
if mobj.group('config'): if smuggled_data.get('force_smil_url', False):
smil_url = url
elif mobj.group('config'):
config_url = url + '&form=json' config_url = url + '&form=json'
config_url = config_url.replace('swf/', 'config/') config_url = config_url.replace('swf/', 'config/')
config_url = config_url.replace('onsite/', 'onsite/config/') config_url = config_url.replace('onsite/', 'onsite/config/')

View File

@@ -349,6 +349,13 @@ class TwitchStreamIE(TwitchBaseIE):
% (self._USHER_BASE, channel_id, compat_urllib_parse.urlencode(query).encode('utf-8')), % (self._USHER_BASE, channel_id, compat_urllib_parse.urlencode(query).encode('utf-8')),
channel_id, 'mp4') channel_id, 'mp4')
# prefer the 'source' stream, the others are limited to 30 fps
def _sort_source(f):
if f.get('m3u8_media') is not None and f['m3u8_media'].get('NAME') == 'Source':
return 1
return 0
formats = sorted(formats, key=_sort_source)
view_count = stream.get('viewers') view_count = stream.get('viewers')
timestamp = parse_iso8601(stream.get('created_at')) timestamp = parse_iso8601(stream.get('created_at'))

View File

@@ -49,15 +49,31 @@ class VideoLecturesNetIE(InfoExtractor):
thumbnail = ( thumbnail = (
None if thumbnail_el is None else thumbnail_el.attrib.get('src')) None if thumbnail_el is None else thumbnail_el.attrib.get('src'))
formats = [{ formats = []
'url': v.attrib['src'], for v in switch.findall('./video'):
'width': int_or_none(v.attrib.get('width')), proto = v.attrib.get('proto')
'height': int_or_none(v.attrib.get('height')), if proto not in ['http', 'rtmp']:
'filesize': int_or_none(v.attrib.get('size')), continue
'tbr': int_or_none(v.attrib.get('systemBitrate')) / 1000.0, f = {
'ext': v.attrib.get('ext'), 'width': int_or_none(v.attrib.get('width')),
} for v in switch.findall('./video') 'height': int_or_none(v.attrib.get('height')),
if v.attrib.get('proto') == 'http'] 'filesize': int_or_none(v.attrib.get('size')),
'tbr': int_or_none(v.attrib.get('systemBitrate')) / 1000.0,
'ext': v.attrib.get('ext'),
}
src = v.attrib['src']
if proto == 'http':
if self._is_valid_url(src, video_id):
f['url'] = src
formats.append(f)
elif proto == 'rtmp':
f.update({
'url': v.attrib['streamer'],
'play_path': src,
'rtmp_real_time': True,
})
formats.append(f)
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,

View File

@@ -18,6 +18,7 @@ from ..utils import (
InAdvancePagedList, InAdvancePagedList,
int_or_none, int_or_none,
RegexNotFoundError, RegexNotFoundError,
smuggle_url,
std_headers, std_headers,
unsmuggle_url, unsmuggle_url,
urlencode_postdata, urlencode_postdata,
@@ -174,7 +175,7 @@ class VimeoIE(VimeoBaseInfoExtractor, SubtitlesInfoExtractor):
def _verify_video_password(self, url, video_id, webpage): def _verify_video_password(self, url, video_id, webpage):
password = self._downloader.params.get('videopassword', None) password = self._downloader.params.get('videopassword', None)
if password is None: if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option') raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
token = self._search_regex(r'xsrft: \'(.*?)\'', webpage, 'login token') token = self._search_regex(r'xsrft: \'(.*?)\'', webpage, 'login token')
data = compat_urllib_parse.urlencode({ data = compat_urllib_parse.urlencode({
'password': password, 'password': password,
@@ -267,8 +268,11 @@ class VimeoIE(VimeoBaseInfoExtractor, SubtitlesInfoExtractor):
raise ExtractorError('The author has restricted the access to this video, try with the "--referer" option') raise ExtractorError('The author has restricted the access to this video, try with the "--referer" option')
if re.search(r'<form[^>]+?id="pw_form"', webpage) is not None: if re.search(r'<form[^>]+?id="pw_form"', webpage) is not None:
if data and '_video_password_verified' in data:
raise ExtractorError('video password verification failed!')
self._verify_video_password(url, video_id, webpage) self._verify_video_password(url, video_id, webpage)
return self._real_extract(url) return self._real_extract(
smuggle_url(url, {'_video_password_verified': 'verified'}))
else: else:
raise ExtractorError('Unable to extract info section', raise ExtractorError('Unable to extract info section',
cause=e) cause=e)
@@ -401,6 +405,7 @@ class VimeoChannelIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://vimeo.com/channels/tributes', 'url': 'http://vimeo.com/channels/tributes',
'info_dict': { 'info_dict': {
'id': 'tributes',
'title': 'Vimeo Tributes', 'title': 'Vimeo Tributes',
}, },
'playlist_mincount': 25, 'playlist_mincount': 25,
@@ -479,6 +484,7 @@ class VimeoUserIE(VimeoChannelIE):
'url': 'http://vimeo.com/nkistudio/videos', 'url': 'http://vimeo.com/nkistudio/videos',
'info_dict': { 'info_dict': {
'title': 'Nki', 'title': 'Nki',
'id': 'nkistudio',
}, },
'playlist_mincount': 66, 'playlist_mincount': 66,
}] }]
@@ -496,6 +502,7 @@ class VimeoAlbumIE(VimeoChannelIE):
_TESTS = [{ _TESTS = [{
'url': 'http://vimeo.com/album/2632481', 'url': 'http://vimeo.com/album/2632481',
'info_dict': { 'info_dict': {
'id': '2632481',
'title': 'Staff Favorites: November 2013', 'title': 'Staff Favorites: November 2013',
}, },
'playlist_mincount': 13, 'playlist_mincount': 13,
@@ -526,6 +533,7 @@ class VimeoGroupsIE(VimeoAlbumIE):
_TESTS = [{ _TESTS = [{
'url': 'http://vimeo.com/groups/rolexawards', 'url': 'http://vimeo.com/groups/rolexawards',
'info_dict': { 'info_dict': {
'id': 'rolexawards',
'title': 'Rolex Awards for Enterprise', 'title': 'Rolex Awards for Enterprise',
}, },
'playlist_mincount': 73, 'playlist_mincount': 73,
@@ -608,6 +616,7 @@ class VimeoLikesIE(InfoExtractor):
'url': 'https://vimeo.com/user755559/likes/', 'url': 'https://vimeo.com/user755559/likes/',
'playlist_mincount': 293, 'playlist_mincount': 293,
"info_dict": { "info_dict": {
'id': 'user755559_likes',
"description": "See all the videos urza likes", "description": "See all the videos urza likes",
"title": 'Videos urza likes', "title": 'Videos urza likes',
}, },

View File

@@ -217,6 +217,9 @@ class VKUserVideosIE(InfoExtractor):
_TEMPLATE_URL = 'https://vk.com/videos' _TEMPLATE_URL = 'https://vk.com/videos'
_TEST = { _TEST = {
'url': 'http://vk.com/videos205387401', 'url': 'http://vk.com/videos205387401',
'info_dict': {
'id': '205387401',
},
'playlist_mincount': 4, 'playlist_mincount': 4,
} }

View File

@@ -45,19 +45,17 @@ class WebOfStoriesIE(InfoExtractor):
description = self._html_search_meta('description', webpage) description = self._html_search_meta('description', webpage)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
story_filename = self._search_regex( embed_params = [s.strip(" \r\n\t'") for s in self._search_regex(
r'\.storyFileName\("([^"]+)"\)', webpage, 'story filename') r'(?s)\$\("#embedCode"\).html\(getEmbedCode\((.*?)\)',
speaker_id = self._search_regex( webpage, 'embed params').split(',')]
r'\.speakerId\("([^"]+)"\)', webpage, 'speaker ID')
story_id = self._search_regex( (
r'\.storyId\((\d+)\)', webpage, 'story ID') _, speaker_id, story_id, story_duration,
speaker_type = self._search_regex( speaker_type, great_life, _thumbnail, _has_subtitles,
r'\.speakerType\("([^"]+)"\)', webpage, 'speaker type') story_filename, _story_order) = embed_params
great_life = self._search_regex(
r'isGreatLifeStory\s*=\s*(true|false)', webpage, 'great life story')
is_great_life_series = great_life == 'true' is_great_life_series = great_life == 'true'
duration = int_or_none(self._search_regex( duration = int_or_none(story_duration)
r'\.duration\((\d+)\)', webpage, 'duration', fatal=False))
# URL building, see: http://www.webofstories.com/scripts/player.js # URL building, see: http://www.webofstories.com/scripts/player.js
ms_prefix = '' ms_prefix = ''

View File

@@ -18,8 +18,8 @@ class WSJIE(InfoExtractor):
'id': '1BD01A4C-BFE8-40A5-A42F-8A8AF9898B1A', 'id': '1BD01A4C-BFE8-40A5-A42F-8A8AF9898B1A',
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20150202', 'upload_date': '20150202',
'uploader_id': 'bbright', 'uploader_id': 'jdesai',
'creator': 'bbright', 'creator': 'jdesai',
'categories': list, # a long list 'categories': list, # a long list
'duration': 90, 'duration': 90,
'title': 'Bills Coach Rex Ryan Updates His Old Jets Tattoo', 'title': 'Bills Coach Rex Ryan Updates His Old Jets Tattoo',

View File

@@ -22,7 +22,7 @@ class XTubeIE(InfoExtractor):
'id': 'kVTUy_G222_', 'id': 'kVTUy_G222_',
'ext': 'mp4', 'ext': 'mp4',
'title': 'strange erotica', 'title': 'strange erotica',
'description': 'http://www.xtube.com an ET kind of thing', 'description': 'contains:an ET kind of thing',
'uploader': 'greenshowers', 'uploader': 'greenshowers',
'duration': 450, 'duration': 450,
'age_limit': 18, 'age_limit': 18,

View File

@@ -24,7 +24,6 @@ class YahooIE(InfoExtractor):
_TESTS = [ _TESTS = [
{ {
'url': 'http://screen.yahoo.com/julian-smith-travis-legg-watch-214727115.html', 'url': 'http://screen.yahoo.com/julian-smith-travis-legg-watch-214727115.html',
'md5': '4962b075c08be8690a922ee026d05e69',
'info_dict': { 'info_dict': {
'id': '2d25e626-2378-391f-ada0-ddaf1417e588', 'id': '2d25e626-2378-391f-ada0-ddaf1417e588',
'ext': 'mp4', 'ext': 'mp4',

View File

@@ -541,26 +541,30 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
if cache_spec is not None: if cache_spec is not None:
return lambda s: ''.join(s[i] for i in cache_spec) return lambda s: ''.join(s[i] for i in cache_spec)
download_note = (
'Downloading player %s' % player_url
if self._downloader.params.get('verbose') else
'Downloading %s player %s' % (player_type, player_id)
)
if player_type == 'js': if player_type == 'js':
code = self._download_webpage( code = self._download_webpage(
player_url, video_id, player_url, video_id,
note='Downloading %s player %s' % (player_type, player_id), note=download_note,
errnote='Download of %s failed' % player_url) errnote='Download of %s failed' % player_url)
res = self._parse_sig_js(code) res = self._parse_sig_js(code)
elif player_type == 'swf': elif player_type == 'swf':
urlh = self._request_webpage( urlh = self._request_webpage(
player_url, video_id, player_url, video_id,
note='Downloading %s player %s' % (player_type, player_id), note=download_note,
errnote='Download of %s failed' % player_url) errnote='Download of %s failed' % player_url)
code = urlh.read() code = urlh.read()
res = self._parse_sig_swf(code) res = self._parse_sig_swf(code)
else: else:
assert False, 'Invalid player type %r' % player_type assert False, 'Invalid player type %r' % player_type
if cache_spec is None: test_string = ''.join(map(compat_chr, range(len(example_sig))))
test_string = ''.join(map(compat_chr, range(len(example_sig)))) cache_res = res(test_string)
cache_res = res(test_string) cache_spec = [ord(c) for c in cache_res]
cache_spec = [ord(c) for c in cache_res]
self._downloader.cache.store('youtube-sigfuncs', func_id, cache_spec) self._downloader.cache.store('youtube-sigfuncs', func_id, cache_spec)
return res return res

View File

@@ -30,13 +30,10 @@ class JSInterpreter(object):
def __init__(self, code, objects=None): def __init__(self, code, objects=None):
if objects is None: if objects is None:
objects = {} objects = {}
self.code = self._remove_comments(code) self.code = code
self._functions = {} self._functions = {}
self._objects = objects self._objects = objects
def _remove_comments(self, code):
return re.sub(r'(?s)/\*.*?\*/', '', code)
def interpret_statement(self, stmt, local_vars, allow_recursion=100): def interpret_statement(self, stmt, local_vars, allow_recursion=100):
if allow_recursion < 0: if allow_recursion < 0:
raise ExtractorError('Recursion limit reached') raise ExtractorError('Recursion limit reached')

View File

@@ -1560,8 +1560,8 @@ def js_to_json(code):
return '"%s"' % v return '"%s"' % v
res = re.sub(r'''(?x) res = re.sub(r'''(?x)
"(?:[^"\\]*(?:\\\\|\\")?)*"| "(?:[^"\\]*(?:\\\\|\\['"nu]))*[^"\\]*"|
'(?:[^'\\]*(?:\\\\|\\')?)*'| '(?:[^'\\]*(?:\\\\|\\['"nu]))*[^'\\]*'|
[a-zA-Z_][.a-zA-Z_0-9]* [a-zA-Z_][.a-zA-Z_0-9]*
''', fix_kv, code) ''', fix_kv, code)
res = re.sub(r',(\s*\])', lambda m: m.group(1), res) res = re.sub(r',(\s*\])', lambda m: m.group(1), res)
@@ -1616,6 +1616,15 @@ def args_to_str(args):
return ' '.join(shlex_quote(a) for a in args) return ' '.join(shlex_quote(a) for a in args)
def mimetype2ext(mt):
_, _, res = mt.rpartition('/')
return {
'x-ms-wmv': 'wmv',
'x-mp4-fragmented': 'mp4',
}.get(res, res)
def urlhandle_detect_ext(url_handle): def urlhandle_detect_ext(url_handle):
try: try:
url_handle.headers url_handle.headers
@@ -1631,7 +1640,7 @@ def urlhandle_detect_ext(url_handle):
if e: if e:
return e return e
return getheader('Content-Type').split("/")[1] return mimetype2ext(getheader('Content-Type'))
def age_restricted(content_limit, age_limit): def age_restricted(content_limit, age_limit):

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2015.02.17' __version__ = '2015.02.19.3'