Compare commits

...

64 Commits

Author SHA1 Message Date
46312e0b46 release 2015.02.17 2015-02-17 17:29:32 +01:00
f9216ed6ad Merge remote-tracking branch 'origin/master' 2015-02-17 17:28:51 +01:00
65bf37ef83 [ffmpeg] Remove trivial helper method 2015-02-17 17:27:29 +01:00
f740fae2a4 [ffmpeg] Make available a property 2015-02-17 17:26:41 +01:00
fbc503d696 [downloader/hls] Fix detection of ffmpeg/avconv (reported in #4966) 2015-02-17 16:40:42 +01:00
662435f728 [YoutubeDL] Use a Request object for getting the cookies (fixes #4970)
So that we don't have to implement all the methods used by the cookiejar.
2015-02-17 16:29:24 +01:00
163d966707 [downloader/external] curl: Add the '--location' flag
curl doesn't follow redirections by default
2015-02-17 16:21:02 +01:00
85729c51af [downloader] Add --hls-prefer-native to use the native HLS downloader (#4966) 2015-02-17 12:09:12 +01:00
1db5fbcfe3 release 2015.02.16.1 2015-02-16 15:47:13 +01:00
59b8ab5834 [rtlnl|generic] Add support for rtl.nl embeds (Fixes #4959) 2015-02-16 15:45:45 +01:00
a568180441 release 2015.02.16 2015-02-16 04:51:20 +01:00
85e80f71cd [yam] Allow faults in optional fields (#4943) 2015-02-16 04:50:57 +01:00
bfa6bdcd8b Merge remote-tracking branch 'yan12125/IE_Yam' 2015-02-16 04:44:28 +01:00
03cd72b007 [extractor/common] Move up filesize
filesize and tbr should correlate, so it doesn't make sense to treat them differently.
2015-02-16 04:39:22 +01:00
5bfd430f81 Merge remote-tracking branch 'origin/master' 2015-02-16 04:09:10 +01:00
73fac4e911 [ffmpeg] Add --ffmpeg-location 2015-02-16 04:05:53 +01:00
8fb474fb17 [test/subtitles] Fix some tests
The checksym for the CeskaTelevize subtitles has changed again, so we just test that it has a reasonable length.
2015-02-15 15:01:07 +01:00
f813928e4b [bbccouk] Fix fallback to legacy playlist 2015-02-15 16:32:38 +06:00
b9c7a97318 [history] Add extractor (Closes #4934) 2015-02-15 04:57:52 +06:00
9fb2f1cd6d [theplatform] Add URL sign capability 2015-02-15 04:56:12 +06:00
6ca7732d5e [extractor/common] Fix link to external documentation 2015-02-14 22:20:24 +01:00
b0ab0fac49 Remove unused imports 2015-02-14 22:19:58 +01:00
a294bce82f [streamcz] Fix extraction (Closes #4940) 2015-02-14 17:48:04 +02:00
76d1466b08 [drtuber] Add one more title regex 2015-02-14 18:50:13 +06:00
1888d3f7b3 Merge pull request #4951 from peugeot/beeg
[beeg] fix test
2015-02-14 18:46:49 +06:00
c2787701cc Merge pull request #4950 from peugeot/drtuber
[drtuber] fix extraction
2015-02-14 18:46:43 +06:00
52e1d0ccc4 [beeg] fix test 2015-02-14 13:42:42 +01:00
10e3c4c221 [drtuber] fix extraction 2015-02-14 13:40:35 +01:00
68f2d273bf [sunporno] Keep old video regex just in case 2015-02-14 18:33:52 +06:00
7c86c21662 Merge pull request #4949 from peugeot/sunporno
[sunporno] fix extraction
2015-02-14 18:32:18 +06:00
ae1580d790 [sunporno] fix extraction 2015-02-14 13:29:44 +01:00
3215c50f25 Credit @ryandesign for nbcnews nightly news (#4948) 2015-02-14 17:44:24 +06:00
36f73e8044 Merge branch 'ryandesign-nbc-nightly-news' 2015-02-14 17:42:32 +06:00
a4f3d779db [nbcnews] Simplify 2015-02-14 17:42:12 +06:00
d9aa2b784d Support NBC Nightly News broadcasts 2015-02-14 04:10:23 -06:00
cffcbc02de [postprocessor/ffmpeg] Don't let ffmpeg read from stdin (fixes #4945)
If you run 'while read aurl ; do youtube-dl --extract-audio "${aurl}"; done < path_to_batch_file'  (batch_file contains one url per line) each call to youtube-dl consumed some characters and 'read' would assing to 'aurl' a non valid url, something like 'tube.com/watch?v=<id>'.
2015-02-13 22:25:34 +01:00
9347fddbfc [1tv] Cover arbitraty URLs 2015-02-14 02:04:28 +06:00
037e9437e4 [camdemy] Fix _VALID_URL 2015-02-13 20:10:42 +06:00
36e7a4ca2e [test/subtitles] Update checksums 2015-02-13 14:43:50 +01:00
ae6423d704 [bambuser] Fix 'uploader_id' extraction (fixes #4944) 2015-02-13 11:36:33 +01:00
7105440cec [Yam] Add new extractor 2015-02-13 15:14:23 +08:00
c80b9cd280 Merge branch 'robin007bond-nporadio' 2015-02-13 01:37:27 +06:00
171ca612af [npo:radio] Move to extractor to common npo place and add extractor for fragments 2015-02-13 01:36:54 +06:00
c3d64fc1b3 [nporadio] Edit to confirm to flake8 standards 2015-02-12 19:28:58 +01:00
7c24ce225d [NPORadio] Added extractor for live radio 2015-02-12 19:19:55 +01:00
08b38d5401 [camdemy] Simplify and make more robust (#4938)
Do not throw errors if view count or upload date extraction fails.
Dispose of re.MULTILINE, which had absolutely no effect without any ^ or $ in sight.
Follow PEP8 naming conventions.
2015-02-12 08:55:06 +01:00
024c53694d Merge remote-tracking branch 'yan12125/IE_camdemy' 2015-02-12 08:44:39 +01:00
7e6011101f [camdemy] Python2 compatibility 2015-02-12 14:23:25 +08:00
c40feaba77 [camdemy] Add support for folders 2015-02-12 14:13:19 +08:00
5277f09dfc release 2015.02.11 2015-02-11 19:02:39 +01:00
2d30521ab9 [youtube] Extract average rating (closes #2362) 2015-02-11 18:39:31 +01:00
050fa43561 flake8: Ignore some error added in pep8 1.6
* E402: we exectute code between imports, like modifying 'sys.path' in the tests
* E731: we assign to lambdas in a lot of places, we may want to consider defining functions in a single line instead (what pep8 recommends)
2015-02-11 18:15:15 +01:00
f36f92f4da [aes] style: Put __all__ variable at the end of the file 2015-02-11 18:15:15 +01:00
124f3bc67d [dotsub] Fix extraction and modernize 2015-02-11 22:33:03 +06:00
d304209a85 [test/parameters.json] Set 'fixup' to 'never'
The fixed audio files for Youtube have a size lower that the minimum required.
2015-02-11 17:25:04 +01:00
8367d3f3cb [camdemy] Detection of external sources 2015-02-12 00:11:33 +08:00
c56d7d899d [dctptv] Skip rtmp download 2015-02-11 22:10:33 +06:00
ea5db8469e [canalplus] Add support for itele.fr URLs (Closes #4931) 2015-02-11 16:21:52 +02:00
3811c567e7 [teamcoco] Fix video id extraction 2015-02-11 15:47:19 +02:00
8708d76425 [camdemy] Add new extractor
Single file download done, while folder extaction in plan
2015-02-11 16:39:15 +08:00
054fe3cc40 [ntvru] Adapt to new direct delivery and modernize (Closes #4918) 2015-02-10 21:35:34 +06:00
af0d11f244 release 2015.02.10.5 2015-02-10 15:56:04 +01:00
9650885be9 [escapist] Filter video differently (Fixes #4919) 2015-02-10 15:55:51 +01:00
596ac6e31f [escapist] Modernize 2015-02-10 15:45:36 +01:00
40 changed files with 815 additions and 253 deletions

View File

@ -110,3 +110,4 @@ Shaya Goldberg
Paul Hartmann Paul Hartmann
Frans de Jonge Frans de Jonge
Robin de Rooij Robin de Rooij
Ryan Schmidt

View File

@ -161,6 +161,8 @@ which means you can modify it, redistribute it or use it however you like.
--playlist-reverse Download playlist videos in reverse order --playlist-reverse Download playlist videos in reverse order
--xattr-set-filesize (experimental) set file xattribute --xattr-set-filesize (experimental) set file xattribute
ytdl.filesize with expected filesize ytdl.filesize with expected filesize
--hls-prefer-native (experimental) Use the native HLS
downloader instead of ffmpeg.
--external-downloader COMMAND (experimental) Use the specified external --external-downloader COMMAND (experimental) Use the specified external
downloader. Currently supports downloader. Currently supports
aria2c,curl,wget aria2c,curl,wget
@ -397,6 +399,9 @@ which means you can modify it, redistribute it or use it however you like.
postprocessors (default) postprocessors (default)
--prefer-ffmpeg Prefer ffmpeg over avconv for running the --prefer-ffmpeg Prefer ffmpeg over avconv for running the
postprocessors postprocessors
--ffmpeg-location PATH Location of the ffmpeg/avconv binary;
either the path to the binary or its
containing directory.
--exec CMD Execute a command on the file after --exec CMD Execute a command on the file after
downloading, similar to find's -exec downloading, similar to find's -exec
syntax. Example: --exec 'adb push {} syntax. Example: --exec 'adb push {}

View File

@ -1,4 +1,5 @@
# Supported sites # Supported sites
- **1tv**: Первый канал
- **1up.com** - **1up.com**
- **220.ro** - **220.ro**
- **24video** - **24video**
@ -60,6 +61,8 @@
- **Brightcove** - **Brightcove**
- **BuzzFeed** - **BuzzFeed**
- **BYUtv** - **BYUtv**
- **Camdemy**
- **CamdemyFolder**
- **Canal13cl** - **Canal13cl**
- **canalc2.tv** - **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv - **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
@ -134,7 +137,6 @@
- **fernsehkritik.tv:postecke** - **fernsehkritik.tv:postecke**
- **Firedrive** - **Firedrive**
- **Firstpost** - **Firstpost**
- **firsttv**: Видеоархив - Первый канал
- **Flickr** - **Flickr**
- **Folketinget**: Folketinget (ft.dk; Danish parliament) - **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **Foxgay** - **Foxgay**
@ -174,6 +176,7 @@
- **Helsinki**: helsinki.fi - **Helsinki**: helsinki.fi
- **HentaiStigma** - **HentaiStigma**
- **HistoricFilms** - **HistoricFilms**
- **History**
- **hitbox** - **hitbox**
- **hitbox:live** - **hitbox:live**
- **HornBunny** - **HornBunny**
@ -287,6 +290,8 @@
- **nowvideo**: NowVideo - **nowvideo**: NowVideo
- **npo.nl** - **npo.nl**
- **npo.nl:live** - **npo.nl:live**
- **npo.nl:radio**
- **npo.nl:radio:fragment**
- **NRK** - **NRK**
- **NRKTV** - **NRKTV**
- **ntv.ru** - **ntv.ru**
@ -333,9 +338,9 @@
- **Roxwel** - **Roxwel**
- **RTBF** - **RTBF**
- **Rte** - **Rte**
- **rtl.nl**: rtl.nl and rtlxl.nl
- **RTL2** - **RTL2**
- **RTLnow** - **RTLnow**
- **rtlxl.nl**
- **RTP** - **RTP**
- **RTS**: RTS.ch - **RTS**: RTS.ch
- **rtve.es:alacarta**: RTVE a la carta - **rtve.es:alacarta**: RTVE a la carta
@ -527,6 +532,7 @@
- **XVideos** - **XVideos**
- **XXXYMovies** - **XXXYMovies**
- **Yahoo**: Yahoo screen and movies - **Yahoo**: Yahoo screen and movies
- **Yam**
- **YesJapan** - **YesJapan**
- **Ynet** - **Ynet**
- **YouJizz** - **YouJizz**

View File

@ -3,4 +3,4 @@ universal = True
[flake8] [flake8]
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,setup.py,build,.git exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,setup.py,build,.git
ignore = E501 ignore = E402,E501,E731

View File

@ -39,5 +39,6 @@
"writesubtitles": false, "writesubtitles": false,
"allsubtitles": false, "allsubtitles": false,
"listssubtitles": false, "listssubtitles": false,
"socket_timeout": 20 "socket_timeout": 20,
"fixup": "never"
} }

View File

@ -138,7 +138,7 @@ class TestDailymotionSubtitles(BaseTestSubtitles):
self.DL.params['writesubtitles'] = True self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles() subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 5) self.assertTrue(len(subtitles.keys()) >= 6)
def test_list_subtitles(self): def test_list_subtitles(self):
self.DL.expect_warning('Automatic Captions not supported by this server') self.DL.expect_warning('Automatic Captions not supported by this server')
@ -247,7 +247,7 @@ class TestVimeoSubtitles(BaseTestSubtitles):
def test_subtitles(self): def test_subtitles(self):
self.DL.params['writesubtitles'] = True self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles() subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '26399116d23ae3cf2c087cea94bc43b4') self.assertEqual(md5(subtitles['en']), '8062383cf4dec168fc40a088aa6d5888')
def test_subtitles_lang(self): def test_subtitles_lang(self):
self.DL.params['writesubtitles'] = True self.DL.params['writesubtitles'] = True
@ -334,7 +334,7 @@ class TestCeskaTelevizeSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles() subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['cs'])) self.assertEqual(set(subtitles.keys()), set(['cs']))
self.assertEqual(md5(subtitles['cs']), '9bf52d9549533c32c427e264bf0847d4') self.assertTrue(len(subtitles['cs']) > 20000)
def test_nosubtitles(self): def test_nosubtitles(self):
self.DL.expect_warning('video doesn\'t have subtitles') self.DL.expect_warning('video doesn\'t have subtitles')

View File

@ -225,7 +225,6 @@ class YoutubeDL(object):
call_home: Boolean, true iff we are allowed to contact the call_home: Boolean, true iff we are allowed to contact the
youtube-dl servers for debugging. youtube-dl servers for debugging.
sleep_interval: Number of seconds to sleep before each download. sleep_interval: Number of seconds to sleep before each download.
external_downloader: Executable of the external downloader to call.
listformats: Print an overview of available video formats and exit. listformats: Print an overview of available video formats and exit.
list_thumbnails: Print a table of all thumbnails and exit. list_thumbnails: Print a table of all thumbnails and exit.
match_filter: A function that gets called with the info_dict of match_filter: A function that gets called with the info_dict of
@ -235,6 +234,10 @@ class YoutubeDL(object):
match_filter_func in utils.py is one example for this. match_filter_func in utils.py is one example for this.
no_color: Do not emit color codes in output. no_color: Do not emit color codes in output.
The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call.
None or unset for standard (built-in) downloader.
hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv.
The following parameters are not used by YoutubeDL itself, they are used by The following parameters are not used by YoutubeDL itself, they are used by
the FileDownloader: the FileDownloader:
@ -951,30 +954,9 @@ class YoutubeDL(object):
return res return res
def _calc_cookies(self, info_dict): def _calc_cookies(self, info_dict):
class _PseudoRequest(object): pr = compat_urllib_request.Request(info_dict['url'])
def __init__(self, url):
self.url = url
self.headers = {}
self.unverifiable = False
def add_unredirected_header(self, k, v):
self.headers[k] = v
def get_full_url(self):
return self.url
def is_unverifiable(self):
return self.unverifiable
def has_header(self, h):
return h in self.headers
def get_header(self, h, default=None):
return self.headers.get(h, default)
pr = _PseudoRequest(info_dict['url'])
self.cookiejar.add_cookie_header(pr) self.cookiejar.add_cookie_header(pr)
return pr.headers.get('Cookie') return pr.get_header('Cookie')
def process_video_result(self, info_dict, download=True): def process_video_result(self, info_dict, download=True):
assert info_dict.get('_type', 'video') == 'video' assert info_dict.get('_type', 'video') == 'video'
@ -1298,7 +1280,7 @@ class YoutubeDL(object):
downloaded = [] downloaded = []
success = True success = True
merger = FFmpegMergerPP(self, not self.params.get('keepvideo')) merger = FFmpegMergerPP(self, not self.params.get('keepvideo'))
if not merger._executable: if not merger.available:
postprocessors = [] postprocessors = []
self.report_warning('You have requested multiple ' self.report_warning('You have requested multiple '
'formats but ffmpeg or avconv are not installed.' 'formats but ffmpeg or avconv are not installed.'
@ -1647,7 +1629,7 @@ class YoutubeDL(object):
self._write_string('[debug] Python version %s - %s\n' % ( self._write_string('[debug] Python version %s - %s\n' % (
platform.python_version(), platform_name())) platform.python_version(), platform_name()))
exe_versions = FFmpegPostProcessor.get_versions() exe_versions = FFmpegPostProcessor.get_versions(self)
exe_versions['rtmpdump'] = rtmpdump_version() exe_versions['rtmpdump'] = rtmpdump_version()
exe_str = ', '.join( exe_str = ', '.join(
'%s %s' % (exe, v) '%s %s' % (exe, v)

View File

@ -350,6 +350,8 @@ def _real_main(argv=None):
'xattr_set_filesize': opts.xattr_set_filesize, 'xattr_set_filesize': opts.xattr_set_filesize,
'match_filter': match_filter, 'match_filter': match_filter,
'no_color': opts.no_color, 'no_color': opts.no_color,
'ffmpeg_location': opts.ffmpeg_location,
'hls_prefer_native': opts.hls_prefer_native,
} }
with YoutubeDL(ydl_opts) as ydl: with YoutubeDL(ydl_opts) as ydl:

View File

@ -1,7 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__all__ = ['aes_encrypt', 'key_expansion', 'aes_ctr_decrypt', 'aes_cbc_decrypt', 'aes_decrypt_text']
import base64 import base64
from math import ceil from math import ceil
@ -329,3 +327,5 @@ def inc(data):
data[i] = data[i] + 1 data[i] = data[i] + 1
break break
return data return data
__all__ = ['aes_encrypt', 'key_expansion', 'aes_ctr_decrypt', 'aes_cbc_decrypt', 'aes_decrypt_text']

View File

@ -34,6 +34,9 @@ def get_suitable_downloader(info_dict, params={}):
if ed.supports(info_dict): if ed.supports(info_dict):
return ed return ed
if protocol == 'm3u8' and params.get('hls_prefer_native'):
return NativeHlsFD
return PROTOCOL_MAP.get(protocol, HttpFD) return PROTOCOL_MAP.get(protocol, HttpFD)

View File

@ -75,7 +75,7 @@ class ExternalFD(FileDownloader):
class CurlFD(ExternalFD): class CurlFD(ExternalFD):
def _make_cmd(self, tmpfilename, info_dict): def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-o', tmpfilename] cmd = [self.exe, '--location', '-o', tmpfilename]
for key, val in info_dict['http_headers'].items(): for key, val in info_dict['http_headers'].items():
cmd += ['--header', '%s: %s' % (key, val)] cmd += ['--header', '%s: %s' % (key, val)]
cmd += self._source_address('--interface') cmd += self._source_address('--interface')

View File

@ -23,15 +23,14 @@ class HlsFD(FileDownloader):
tmpfilename = self.temp_name(filename) tmpfilename = self.temp_name(filename)
ffpp = FFmpegPostProcessor(downloader=self) ffpp = FFmpegPostProcessor(downloader=self)
program = ffpp._executable if not ffpp.available():
if program is None:
self.report_error('m3u8 download detected but ffmpeg or avconv could not be found. Please install one.') self.report_error('m3u8 download detected but ffmpeg or avconv could not be found. Please install one.')
return False return False
ffpp.check_version() ffpp.check_version()
args = [ args = [
encodeArgument(opt) encodeArgument(opt)
for opt in (program, '-y', '-i', url, '-f', 'mp4', '-c', 'copy', '-bsf:a', 'aac_adtstoasc')] for opt in (ffpp.executable, '-y', '-i', url, '-f', 'mp4', '-c', 'copy', '-bsf:a', 'aac_adtstoasc')]
args.append(encodeFilename(tmpfilename, True)) args.append(encodeFilename(tmpfilename, True))
retval = subprocess.call(args) retval = subprocess.call(args)
@ -48,7 +47,7 @@ class HlsFD(FileDownloader):
return True return True
else: else:
self.to_stderr('\n') self.to_stderr('\n')
self.report_error('%s exited with code %d' % (program, retval)) self.report_error('%s exited with code %d' % (ffpp.basename, retval))
return False return False

View File

@ -49,6 +49,10 @@ from .brightcove import BrightcoveIE
from .buzzfeed import BuzzFeedIE from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE from .byutv import BYUtvIE
from .c56 import C56IE from .c56 import C56IE
from .camdemy import (
CamdemyIE,
CamdemyFolderIE
)
from .canal13cl import Canal13clIE from .canal13cl import Canal13clIE
from .canalplus import CanalplusIE from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE from .canalc2 import Canalc2IE
@ -185,6 +189,7 @@ from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE from .hentaistigma import HentaiStigmaIE
from .historicfilms import HistoricFilmsIE from .historicfilms import HistoricFilmsIE
from .history import HistoryIE
from .hitbox import HitboxIE, HitboxLiveIE from .hitbox import HitboxIE, HitboxLiveIE
from .hornbunny import HornBunnyIE from .hornbunny import HornBunnyIE
from .hostingbulk import HostingBulkIE from .hostingbulk import HostingBulkIE
@ -314,6 +319,8 @@ from .nowvideo import NowVideoIE
from .npo import ( from .npo import (
NPOIE, NPOIE,
NPOLiveIE, NPOLiveIE,
NPORadioIE,
NPORadioFragmentIE,
TegenlichtVproIE, TegenlichtVproIE,
) )
from .nrk import ( from .nrk import (
@ -364,7 +371,7 @@ from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE from .roxwel import RoxwelIE
from .rtbf import RTBFIE from .rtbf import RTBFIE
from .rte import RteIE from .rte import RteIE
from .rtlnl import RtlXlIE from .rtlnl import RtlNlIE
from .rtlnow import RTLnowIE from .rtlnow import RTLnowIE
from .rtl2 import RTL2IE from .rtl2 import RTL2IE
from .rtp import RTPIE from .rtp import RTPIE
@ -572,6 +579,7 @@ from .yahoo import (
YahooIE, YahooIE,
YahooSearchIE, YahooSearchIE,
) )
from .yam import YamIE
from .yesjapan import YesJapanIE from .yesjapan import YesJapanIE
from .ynet import YnetIE from .ynet import YnetIE
from .youjizz import YouJizzIE from .youjizz import YouJizzIE

View File

@ -50,7 +50,7 @@ class BambuserIE(InfoExtractor):
'duration': int(info['length']), 'duration': int(info['length']),
'view_count': int(info['views_total']), 'view_count': int(info['views_total']),
'uploader': info['username'], 'uploader': info['username'],
'uploader_id': info['uid'], 'uploader_id': info['owner']['uid'],
} }

View File

@ -273,7 +273,7 @@ class BBCCoUkIE(SubtitlesInfoExtractor):
formats, subtitles = self._download_media_selector(programme_id) formats, subtitles = self._download_media_selector(programme_id)
return programme_id, title, description, duration, formats, subtitles return programme_id, title, description, duration, formats, subtitles
except ExtractorError as ee: except ExtractorError as ee:
if not isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404: if not (isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404):
raise raise
# fallback to legacy playlist # fallback to legacy playlist

View File

@ -9,7 +9,7 @@ class BeegIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?beeg\.com/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?beeg\.com/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://beeg.com/5416503', 'url': 'http://beeg.com/5416503',
'md5': '634526ae978711f6b748fe0dd6c11f57', 'md5': '1bff67111adb785c51d1b42959ec10e5',
'info_dict': { 'info_dict': {
'id': '5416503', 'id': '5416503',
'ext': 'mp4', 'ext': 'mp4',

View File

@ -0,0 +1,153 @@
# coding: utf-8
from __future__ import unicode_literals
import datetime
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urlparse,
)
from ..utils import (
parse_iso8601,
str_to_int,
)
class CamdemyIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?camdemy\.com/media/(?P<id>\d+)'
_TESTS = [{
# single file
'url': 'http://www.camdemy.com/media/5181/',
'md5': '5a5562b6a98b37873119102e052e311b',
'info_dict': {
'id': '5181',
'ext': 'mp4',
'title': 'Ch1-1 Introduction, Signals (02-23-2012)',
'thumbnail': 're:^https?://.*\.jpg$',
'description': '',
'creator': 'ss11spring',
'upload_date': '20130114',
'timestamp': 1358154556,
'view_count': int,
}
}, {
# With non-empty description
'url': 'http://www.camdemy.com/media/13885',
'md5': '4576a3bb2581f86c61044822adbd1249',
'info_dict': {
'id': '13885',
'ext': 'mp4',
'title': 'EverCam + Camdemy QuickStart',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'md5:050b62f71ed62928f8a35f1a41e186c9',
'creator': 'evercam',
'upload_date': '20140620',
'timestamp': 1403271569,
}
}, {
# External source
'url': 'http://www.camdemy.com/media/14842',
'md5': '50e1c3c3aa233d3d7b7daa2fa10b1cf7',
'info_dict': {
'id': '2vsYQzNIsJo',
'ext': 'mp4',
'upload_date': '20130211',
'uploader': 'Hun Kim',
'description': 'Excel 2013 Tutorial for Beginners - How to add Password Protection',
'uploader_id': 'hunkimtutorials',
'title': 'Excel 2013 Tutorial - How to add Password Protection',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
page = self._download_webpage(url, video_id)
src_from = self._html_search_regex(
r"<div class='srcFrom'>Source: <a title='([^']+)'", page,
'external source', default=None)
if src_from:
return self.url_result(src_from)
oembed_obj = self._download_json(
'http://www.camdemy.com/oembed/?format=json&url=' + url, video_id)
thumb_url = oembed_obj['thumbnail_url']
video_folder = compat_urlparse.urljoin(thumb_url, 'video/')
file_list_doc = self._download_xml(
compat_urlparse.urljoin(video_folder, 'fileList.xml'),
video_id, 'Filelist XML')
file_name = file_list_doc.find('./video/item/fileName').text
video_url = compat_urlparse.urljoin(video_folder, file_name)
timestamp = parse_iso8601(self._html_search_regex(
r"<div class='title'>Posted\s*:</div>\s*<div class='value'>([^<>]+)<",
page, 'creation time', fatal=False),
delimiter=' ', timezone=datetime.timedelta(hours=8))
view_count = str_to_int(self._html_search_regex(
r"<div class='title'>Views\s*:</div>\s*<div class='value'>([^<>]+)<",
page, 'view count', fatal=False))
return {
'id': video_id,
'url': video_url,
'title': oembed_obj['title'],
'thumbnail': thumb_url,
'description': self._html_search_meta('description', page),
'creator': oembed_obj['author_name'],
'duration': oembed_obj['duration'],
'timestamp': timestamp,
'view_count': view_count,
}
class CamdemyFolderIE(InfoExtractor):
_VALID_URL = r'http://www.camdemy.com/folder/(?P<id>\d+)'
_TESTS = [{
# links with trailing slash
'url': 'http://www.camdemy.com/folder/450',
'info_dict': {
'id': '450',
'title': '信號與系統 2012 & 2011 (Signals and Systems)',
},
'playlist_mincount': 145
}, {
# links without trailing slash
# and multi-page
'url': 'http://www.camdemy.com/folder/853',
'info_dict': {
'id': '853',
'title': '科學計算 - 使用 Matlab'
},
'playlist_mincount': 20
}, {
# with displayMode parameter. For testing the codes to add parameters
'url': 'http://www.camdemy.com/folder/853/?displayMode=defaultOrderByOrg',
'info_dict': {
'id': '853',
'title': '科學計算 - 使用 Matlab'
},
'playlist_mincount': 20
}]
def _real_extract(self, url):
folder_id = self._match_id(url)
# Add displayMode=list so that all links are displayed in a single page
parsed_url = list(compat_urlparse.urlparse(url))
query = dict(compat_urlparse.parse_qsl(parsed_url[4]))
query.update({'displayMode': 'list'})
parsed_url[4] = compat_urllib_parse.urlencode(query)
final_url = compat_urlparse.urlunparse(parsed_url)
page = self._download_webpage(final_url, folder_id)
matches = re.findall(r"href='(/media/\d+/?)'", page)
entries = [self.url_result('http://www.camdemy.com' + media_path)
for media_path in matches]
folder_title = self._html_search_meta('keywords', page)
return self.playlist_result(entries, folder_id, folder_title)

View File

@ -15,12 +15,13 @@ from ..utils import (
class CanalplusIE(InfoExtractor): class CanalplusIE(InfoExtractor):
IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv' IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
_VALID_URL = r'https?://(?:www\.(?P<site>canalplus\.fr|piwiplus\.fr|d8\.tv)/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))' _VALID_URL = r'https?://(?:www\.(?P<site>canalplus\.fr|piwiplus\.fr|d8\.tv|itele\.fr)/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s' _VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s'
_SITE_ID_MAP = { _SITE_ID_MAP = {
'canalplus.fr': 'cplus', 'canalplus.fr': 'cplus',
'piwiplus.fr': 'teletoon', 'piwiplus.fr': 'teletoon',
'd8.tv': 'd8', 'd8.tv': 'd8',
'itele.fr': 'itele',
} }
_TESTS = [{ _TESTS = [{
@ -53,6 +54,16 @@ class CanalplusIE(InfoExtractor):
'upload_date': '20131108', 'upload_date': '20131108',
}, },
'skip': 'videos get deleted after a while', 'skip': 'videos get deleted after a while',
}, {
'url': 'http://www.itele.fr/france/video/aubervilliers-un-lycee-en-colere-111559',
'md5': '65aa83ad62fe107ce29e564bb8712580',
'info_dict': {
'id': '1213714',
'ext': 'flv',
'title': 'Aubervilliers : un lycée en colère - Le 11/02/2015 à 06h45',
'description': 'md5:8216206ec53426ea6321321f3b3c16db',
'upload_date': '20150211',
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -157,6 +157,7 @@ class InfoExtractor(object):
view_count: How many users have watched the video on the platform. view_count: How many users have watched the video on the platform.
like_count: Number of positive ratings of the video like_count: Number of positive ratings of the video
dislike_count: Number of negative ratings of the video dislike_count: Number of negative ratings of the video
average_rating: Average rating give by users, the scale used depends on the webpage
comment_count: Number of comments on the video comment_count: Number of comments on the video
comments: A list of comments, each with one or more of the following comments: A list of comments, each with one or more of the following
properties (all but one of text or html optional): properties (all but one of text or html optional):
@ -271,7 +272,7 @@ class InfoExtractor(object):
raise raise
except compat_http_client.IncompleteRead as e: except compat_http_client.IncompleteRead as e:
raise ExtractorError('A network error has occured.', cause=e, expected=True) raise ExtractorError('A network error has occured.', cause=e, expected=True)
except (KeyError,) as e: except (KeyError, StopIteration) as e:
raise ExtractorError('An extractor error has occured.', cause=e) raise ExtractorError('An extractor error has occured.', cause=e)
def set_downloader(self, downloader): def set_downloader(self, downloader):
@ -664,7 +665,7 @@ class InfoExtractor(object):
return RATING_TABLE.get(rating.lower(), None) return RATING_TABLE.get(rating.lower(), None)
def _family_friendly_search(self, html): def _family_friendly_search(self, html):
# See http://schema.org/VideoObj # See http://schema.org/VideoObject
family_friendly = self._html_search_meta('isFamilyFriendly', html) family_friendly = self._html_search_meta('isFamilyFriendly', html)
if not family_friendly: if not family_friendly:
@ -728,6 +729,7 @@ class InfoExtractor(object):
f.get('language_preference') if f.get('language_preference') is not None else -1, f.get('language_preference') if f.get('language_preference') is not None else -1,
f.get('quality') if f.get('quality') is not None else -1, f.get('quality') if f.get('quality') is not None else -1,
f.get('tbr') if f.get('tbr') is not None else -1, f.get('tbr') if f.get('tbr') is not None else -1,
f.get('filesize') if f.get('filesize') is not None else -1,
f.get('vbr') if f.get('vbr') is not None else -1, f.get('vbr') if f.get('vbr') is not None else -1,
f.get('height') if f.get('height') is not None else -1, f.get('height') if f.get('height') is not None else -1,
f.get('width') if f.get('width') is not None else -1, f.get('width') if f.get('width') is not None else -1,
@ -735,7 +737,6 @@ class InfoExtractor(object):
f.get('abr') if f.get('abr') is not None else -1, f.get('abr') if f.get('abr') is not None else -1,
audio_ext_preference, audio_ext_preference,
f.get('fps') if f.get('fps') is not None else -1, f.get('fps') if f.get('fps') is not None else -1,
f.get('filesize') if f.get('filesize') is not None else -1,
f.get('filesize_approx') if f.get('filesize_approx') is not None else -1, f.get('filesize_approx') if f.get('filesize_approx') is not None else -1,
f.get('source_preference') if f.get('source_preference') is not None else -1, f.get('source_preference') if f.get('source_preference') is not None else -1,
f.get('format_id'), f.get('format_id'),

View File

@ -14,6 +14,10 @@ class DctpTvIE(InfoExtractor):
'display_id': 'videoinstallation-fuer-eine-kaufhausfassade', 'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
'ext': 'flv', 'ext': 'flv',
'title': 'Videoinstallation für eine Kaufhausfassade' 'title': 'Videoinstallation für eine Kaufhausfassade'
},
'params': {
# rtmp download
'skip_download': True,
} }
} }

View File

@ -1,13 +1,14 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
)
class DotsubIE(InfoExtractor): class DotsubIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?dotsub\.com/view/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?dotsub\.com/view/(?P<id>[^/]+)'
_TEST = { _TEST = {
'url': 'http://dotsub.com/view/aed3b8b2-1889-4df5-ae63-ad85f5572f27', 'url': 'http://dotsub.com/view/aed3b8b2-1889-4df5-ae63-ad85f5572f27',
'md5': '0914d4d69605090f623b7ac329fea66e', 'md5': '0914d4d69605090f623b7ac329fea66e',
@ -15,28 +16,37 @@ class DotsubIE(InfoExtractor):
'id': 'aed3b8b2-1889-4df5-ae63-ad85f5572f27', 'id': 'aed3b8b2-1889-4df5-ae63-ad85f5572f27',
'ext': 'flv', 'ext': 'flv',
'title': 'Pyramids of Waste (2010), AKA The Lightbulb Conspiracy - Planned obsolescence documentary', 'title': 'Pyramids of Waste (2010), AKA The Lightbulb Conspiracy - Planned obsolescence documentary',
'description': 'md5:699a0f7f50aeec6042cb3b1db2d0d074',
'thumbnail': 're:^https?://dotsub.com/media/aed3b8b2-1889-4df5-ae63-ad85f5572f27/p',
'duration': 3169,
'uploader': '4v4l0n42', 'uploader': '4v4l0n42',
'description': 'Pyramids of Waste (2010) also known as "The lightbulb conspiracy" is a documentary about how our economic system based on consumerism and planned obsolescence is breaking our planet down.\r\n\r\nSolutions to this can be found at:\r\nhttp://robotswillstealyourjob.com\r\nhttp://www.federicopistono.org\r\n\r\nhttp://opensourceecology.org\r\nhttp://thezeitgeistmovement.com', 'timestamp': 1292248482.625,
'thumbnail': 'http://dotsub.com/media/aed3b8b2-1889-4df5-ae63-ad85f5572f27/p',
'upload_date': '20101213', 'upload_date': '20101213',
'view_count': int,
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
info_url = "https://dotsub.com/api/media/%s/metadata" % video_id info = self._download_json(
info = self._download_json(info_url, video_id) 'https://dotsub.com/api/media/%s/metadata' % video_id, video_id)
date = time.gmtime(info['dateCreated'] / 1000) # The timestamp is in miliseconds video_url = info.get('mediaURI')
if not video_url:
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'"file"\s*:\s*\'([^\']+)', webpage, 'video url')
return { return {
'id': video_id, 'id': video_id,
'url': info['mediaURI'], 'url': video_url,
'ext': 'flv', 'ext': 'flv',
'title': info['title'], 'title': info['title'],
'thumbnail': info['screenshotURI'], 'description': info.get('description'),
'description': info['description'], 'thumbnail': info.get('screenshotURI'),
'uploader': info['user'], 'duration': int_or_none(info.get('duration'), 1000),
'view_count': info['numberOfViews'], 'uploader': info.get('user'),
'upload_date': '%04i%02i%02i' % (date.tm_year, date.tm_mon, date.tm_mday), 'timestamp': float_or_none(info.get('dateCreated'), 1000),
'view_count': int_or_none(info.get('numberOfViews')),
} }

View File

@ -15,7 +15,7 @@ class DrTuberIE(InfoExtractor):
'id': '1740434', 'id': '1740434',
'display_id': 'hot-perky-blonde-naked-golf', 'display_id': 'hot-perky-blonde-naked-golf',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Hot Perky Blonde Naked Golf', 'title': 'hot perky blonde naked golf',
'like_count': int, 'like_count': int,
'dislike_count': int, 'dislike_count': int,
'comment_count': int, 'comment_count': int,
@ -36,7 +36,8 @@ class DrTuberIE(InfoExtractor):
r'<source src="([^"]+)"', webpage, 'video URL') r'<source src="([^"]+)"', webpage, 'video URL')
title = self._html_search_regex( title = self._html_search_regex(
r'<title>([^<]+)\s*-\s*Free', webpage, 'title') [r'class="hd_title" style="[^"]+">([^<]+)</h1>', r'<title>([^<]+) - \d+'],
webpage, 'title')
thumbnail = self._html_search_regex( thumbnail = self._html_search_regex(
r'poster="([^"]+)"', r'poster="([^"]+)"',

View File

@ -1,18 +1,17 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
js_to_json,
) )
class EscapistIE(InfoExtractor): class EscapistIE(InfoExtractor):
_VALID_URL = r'^https?://?(www\.)?escapistmagazine\.com/videos/view/(?P<showname>[^/]+)/(?P<id>[0-9]+)-' _VALID_URL = r'https?://?(www\.)?escapistmagazine\.com/videos/view/[^/?#]+/(?P<id>[0-9]+)-[^/?#]*(?:$|[?#])'
_TEST = { _TEST = {
'url': 'http://www.escapistmagazine.com/videos/view/the-escapist-presents/6618-Breaking-Down-Baldurs-Gate', 'url': 'http://www.escapistmagazine.com/videos/view/the-escapist-presents/6618-Breaking-Down-Baldurs-Gate',
'md5': 'ab3a706c681efca53f0a35f1415cf0d1', 'md5': 'ab3a706c681efca53f0a35f1415cf0d1',
@ -20,31 +19,30 @@ class EscapistIE(InfoExtractor):
'id': '6618', 'id': '6618',
'ext': 'mp4', 'ext': 'mp4',
'description': "Baldur's Gate: Original, Modded or Enhanced Edition? I'll break down what you can expect from the new Baldur's Gate: Enhanced Edition.", 'description': "Baldur's Gate: Original, Modded or Enhanced Edition? I'll break down what you can expect from the new Baldur's Gate: Enhanced Edition.",
'uploader': 'the-escapist-presents', 'uploader_id': 'the-escapist-presents',
'uploader': 'The Escapist Presents',
'title': "Breaking Down Baldur's Gate", 'title': "Breaking Down Baldur's Gate",
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
showName = mobj.group('showname')
video_id = mobj.group('id')
self.report_extraction(video_id)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
videoDesc = self._html_search_regex( uploader_id = self._html_search_regex(
r'<meta name="description" content="([^"]*)"', r"<h1 class='headline'><a href='/videos/view/(.*?)'",
webpage, 'description', fatal=False) webpage, 'uploader ID', fatal=False)
uploader = self._html_search_regex(
r"<h1 class='headline'>(.*?)</a>",
webpage, 'uploader', fatal=False)
description = self._html_search_meta('description', webpage)
playerUrl = self._og_search_video_url(webpage, name='player URL') raw_title = self._html_search_meta('title', webpage, fatal=True)
title = raw_title.partition(' : ')[2]
title = self._html_search_regex( player_url = self._og_search_video_url(webpage, name='player URL')
r'<meta name="title" content="([^"]*)"', config_url = compat_urllib_parse.unquote(self._search_regex(
webpage, 'title').split(' : ')[-1] r'config=(.*)$', player_url, 'config URL'))
configUrl = self._search_regex('config=(.*)$', playerUrl, 'config URL')
configUrl = compat_urllib_parse.unquote(configUrl)
formats = [] formats = []
@ -53,18 +51,21 @@ class EscapistIE(InfoExtractor):
cfgurl, video_id, cfgurl, video_id,
'Downloading ' + name + ' configuration', 'Downloading ' + name + ' configuration',
'Unable to download ' + name + ' configuration', 'Unable to download ' + name + ' configuration',
transform_source=lambda s: s.replace("'", '"')) transform_source=js_to_json)
playlist = config['playlist'] playlist = config['playlist']
video_url = next(
p['url'] for p in playlist
if p.get('eventCategory') == 'Video')
formats.append({ formats.append({
'url': playlist[1]['url'], 'url': video_url,
'format_id': name, 'format_id': name,
'quality': quality, 'quality': quality,
}) })
_add_format('normal', configUrl, quality=0) _add_format('normal', config_url, quality=0)
hq_url = (configUrl + hq_url = (config_url +
('&hq=1' if '?' in configUrl else configUrl + '?hq=1')) ('&hq=1' if '?' in config_url else config_url + '?hq=1'))
try: try:
_add_format('hq', hq_url, quality=1) _add_format('hq', hq_url, quality=1)
except ExtractorError: except ExtractorError:
@ -75,9 +76,10 @@ class EscapistIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'uploader': showName, 'uploader': uploader,
'uploader_id': uploader_id,
'title': title, 'title': title,
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
'description': videoDesc, 'description': description,
'player_url': playerUrl, 'player_url': player_url,
} }

View File

@ -1,52 +1,71 @@
# encoding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..utils import int_or_none
class FirstTVIE(InfoExtractor): class FirstTVIE(InfoExtractor):
IE_NAME = 'firsttv' IE_NAME = '1tv'
IE_DESC = 'Видеоархив - Первый канал' IE_DESC = 'Первый канал'
_VALID_URL = r'http://(?:www\.)?1tv\.ru/videoarchive/(?P<id>\d+)' _VALID_URL = r'http://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)'
_TEST = { _TESTS = [{
'url': 'http://www.1tv.ru/videoarchive/73390', 'url': 'http://www.1tv.ru/videoarchive/73390',
'md5': '3de6390cf0cca4a5eae1d1d83895e5ad', 'md5': '777f525feeec4806130f4f764bc18a4f',
'info_dict': { 'info_dict': {
'id': '73390', 'id': '73390',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Олимпийские канатные дороги', 'title': 'Олимпийские канатные дороги',
'description': 'md5:cc730d2bf4215463e37fff6a1e277b13', 'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': 'http://img1.1tv.ru/imgsize640x360/PR20140210114657.JPG', 'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'duration': 149, 'duration': 149,
'like_count': int,
'dislike_count': int,
}, },
'skip': 'Only works from Russia', 'skip': 'Only works from Russia',
} }, {
'url': 'http://www.1tv.ru/prj/inprivate/vypusk/35930',
'md5': 'a1b6b60d530ebcf8daacf4565762bbaf',
'info_dict': {
'id': '35930',
'ext': 'mp4',
'title': 'Наедине со всеми. Людмила Сенчина',
'description': 'md5:89553aed1d641416001fe8d450f06cb9',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'duration': 2694,
},
'skip': 'Only works from Russia',
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id, 'Downloading page') webpage = self._download_webpage(url, video_id, 'Downloading page')
video_url = self._html_search_regex( video_url = self._html_search_regex(
r'''(?s)jwplayer\('flashvideoportal_1'\)\.setup\({.*?'file': '([^']+)'.*?}\);''', webpage, 'video URL') r'''(?s)(?:jwplayer\('flashvideoportal_1'\)\.setup\({|var\s+playlistObj\s*=).*?'file'\s*:\s*'([^']+)'.*?}\);''',
webpage, 'video URL')
title = self._html_search_regex( title = self._html_search_regex(
r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>', webpage, 'title') [r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
r"'title'\s*:\s*'([^']+)'"], webpage, 'title')
description = self._html_search_regex( description = self._html_search_regex(
r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>', webpage, 'description', fatal=False) r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
webpage, 'description', default=None) or self._html_search_meta(
'description', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
duration = self._og_search_property('video:duration', webpage, 'video duration', fatal=False) duration = self._og_search_property(
'video:duration', webpage,
'video duration', fatal=False)
like_count = self._html_search_regex(r'title="Понравилось".*?/></label> \[(\d+)\]', like_count = self._html_search_regex(
webpage, 'like count', fatal=False) r'title="Понравилось".*?/></label> \[(\d+)\]',
dislike_count = self._html_search_regex(r'title="Не понравилось".*?/></label> \[(\d+)\]', webpage, 'like count', default=None)
webpage, 'dislike count', fatal=False) dislike_count = self._html_search_regex(
r'title="Не понравилось".*?/></label> \[(\d+)\]',
webpage, 'dislike count', default=None)
return { return {
'id': video_id, 'id': video_id,

View File

@ -537,6 +537,15 @@ class GenericIE(InfoExtractor):
'uploader_id': 'NationalArchives08', 'uploader_id': 'NationalArchives08',
'title': 'Webinar: Using Discovery, The National Archives online catalogue', 'title': 'Webinar: Using Discovery, The National Archives online catalogue',
}, },
},
# rtl.nl embed
{
'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen',
'playlist_mincount': 5,
'info_dict': {
'id': 'aanslagen-kopenhagen',
'title': 'Aanslagen Kopenhagen | RTL Nieuws',
}
} }
] ]
@ -782,6 +791,13 @@ class GenericIE(InfoExtractor):
'entries': entries, 'entries': entries,
} }
# Look for embedded rtl.nl player
matches = re.findall(
r'<iframe\s+(?:[a-zA-Z-]+="[^"]+"\s+)*?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+video_embed[^"]+)"',
webpage)
if matches:
return _playlist_from_matches(matches, ie='RtlNl')
# Look for embedded (iframe) Vimeo player # Look for embedded (iframe) Vimeo player
mobj = re.search( mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/.+?)\1', webpage) r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/.+?)\1', webpage)
@ -789,7 +805,6 @@ class GenericIE(InfoExtractor):
player_url = unescapeHTML(mobj.group('url')) player_url = unescapeHTML(mobj.group('url'))
surl = smuggle_url(player_url, {'Referer': url}) surl = smuggle_url(player_url, {'Referer': url})
return self.url_result(surl) return self.url_result(surl)
# Look for embedded (swf embed) Vimeo player # Look for embedded (swf embed) Vimeo player
mobj = re.search( mobj = re.search(
r'<embed[^>]+?src="((?:https?:)?//(?:www\.)?vimeo\.com/moogaloop\.swf.+?)"', webpage) r'<embed[^>]+?src="((?:https?:)?//(?:www\.)?vimeo\.com/moogaloop\.swf.+?)"', webpage)

View File

@ -0,0 +1,31 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
class HistoryIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?history\.com/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
_TESTS = [{
'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
'md5': '6fe632d033c92aa10b8d4a9be047a7c5',
'info_dict': {
'id': 'bLx5Dv5Aka1G',
'ext': 'mp4',
'title': "Bet You Didn't Know: Valentine's Day",
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
},
'add_ie': ['ThePlatform'],
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
webpage, 'video url')
return self.url_result(smuggle_url(video_url, {'sig': {'key': 'crazyjava', 'secret': 's3cr3t'}}))

View File

@ -1,7 +1,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
@ -52,9 +51,9 @@ class NBCIE(InfoExtractor):
class NBCNewsIE(InfoExtractor): class NBCNewsIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://www\.nbcnews\.com/ _VALID_URL = r'''(?x)https?://(?:www\.)?nbcnews\.com/
((video/.+?/(?P<id>\d+))| (?:video/.+?/(?P<id>\d+)|
(feature/[^/]+/(?P<title>.+))) (?:feature|nightly-news)/[^/]+/(?P<title>.+))
''' '''
_TESTS = [ _TESTS = [
@ -89,6 +88,16 @@ class NBCNewsIE(InfoExtractor):
'description': 'md5:757988edbaae9d7be1d585eb5d55cc04', 'description': 'md5:757988edbaae9d7be1d585eb5d55cc04',
}, },
}, },
{
'url': 'http://www.nbcnews.com/nightly-news/video/nightly-news-with-brian-williams-full-broadcast-february-4-394064451844',
'md5': 'b5dda8cddd8650baa0dcb616dd2cf60d',
'info_dict': {
'id': 'sekXqyTVnmN3',
'ext': 'mp4',
'title': 'Nightly News with Brian Williams Full Broadcast (February 4)',
'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
},
},
] ]
def _real_extract(self, url): def _real_extract(self, url):
@ -107,13 +116,13 @@ class NBCNewsIE(InfoExtractor):
'thumbnail': find_xpath_attr(info, 'media', 'type', 'thumbnail').text, 'thumbnail': find_xpath_attr(info, 'media', 'type', 'thumbnail').text,
} }
else: else:
# "feature" pages use theplatform.com # "feature" and "nightly-news" pages use theplatform.com
title = mobj.group('title') title = mobj.group('title')
webpage = self._download_webpage(url, title) webpage = self._download_webpage(url, title)
bootstrap_json = self._search_regex( bootstrap_json = self._search_regex(
r'var bootstrapJson = ({.+})\s*$', webpage, 'bootstrap json', r'var\s+(?:bootstrapJson|playlistData)\s*=\s*({.+});?\s*$',
flags=re.MULTILINE) webpage, 'bootstrap json', flags=re.MULTILINE)
bootstrap = json.loads(bootstrap_json) bootstrap = self._parse_json(bootstrap_json, video_id)
info = bootstrap['results'][0]['video'] info = bootstrap['results'][0]['video']
mpxid = info['mpxId'] mpxid = info['mpxId']

View File

@ -1,6 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .subtitles import SubtitlesInfoExtractor from .subtitles import SubtitlesInfoExtractor
from .common import InfoExtractor
from ..utils import ( from ..utils import (
fix_xml_ampersands, fix_xml_ampersands,
parse_duration, parse_duration,
@ -22,7 +23,7 @@ class NPOBaseIE(SubtitlesInfoExtractor):
class NPOIE(NPOBaseIE): class NPOIE(NPOBaseIE):
IE_NAME = 'npo.nl' IE_NAME = 'npo.nl'
_VALID_URL = r'https?://www\.npo\.nl/[^/]+/[^/]+/(?P<id>[^/?]+)' _VALID_URL = r'https?://(?:www\.)?npo\.nl/(?!live|radio)[^/]+/[^/]+/(?P<id>[^/?]+)'
_TESTS = [ _TESTS = [
{ {
@ -185,7 +186,7 @@ class NPOIE(NPOBaseIE):
class NPOLiveIE(NPOBaseIE): class NPOLiveIE(NPOBaseIE):
IE_NAME = 'npo.nl:live' IE_NAME = 'npo.nl:live'
_VALID_URL = r'https?://www\.npo\.nl/live/(?P<id>.+)' _VALID_URL = r'https?://(?:www\.)?npo\.nl/live/(?P<id>.+)'
_TEST = { _TEST = {
'url': 'http://www.npo.nl/live/npo-1', 'url': 'http://www.npo.nl/live/npo-1',
@ -260,6 +261,84 @@ class NPOLiveIE(NPOBaseIE):
} }
class NPORadioIE(InfoExtractor):
IE_NAME = 'npo.nl:radio'
_VALID_URL = r'https?://(?:www\.)?npo\.nl/radio/(?P<id>[^/]+)/?$'
_TEST = {
'url': 'http://www.npo.nl/radio/radio-1',
'info_dict': {
'id': 'radio-1',
'ext': 'mp3',
'title': 're:^NPO Radio 1 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'is_live': True,
},
'params': {
'skip_download': True,
}
}
@staticmethod
def _html_get_attribute_regex(attribute):
return r'{0}\s*=\s*\'([^\']+)\''.format(attribute)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
self._html_get_attribute_regex('data-channel'), webpage, 'title')
stream = self._parse_json(
self._html_search_regex(self._html_get_attribute_regex('data-streams'), webpage, 'data-streams'),
video_id)
codec = stream.get('codec')
return {
'id': video_id,
'url': stream['url'],
'title': self._live_title(title),
'acodec': codec,
'ext': codec,
'is_live': True,
}
class NPORadioFragmentIE(InfoExtractor):
IE_NAME = 'npo.nl:radio:fragment'
_VALID_URL = r'https?://(?:www\.)?npo\.nl/radio/[^/]+/fragment/(?P<id>\d+)'
_TEST = {
'url': 'http://www.npo.nl/radio/radio-5/fragment/174356',
'md5': 'dd8cc470dad764d0fdc70a9a1e2d18c2',
'info_dict': {
'id': '174356',
'ext': 'mp3',
'title': 'Jubileumconcert Willeke Alberti',
},
}
def _real_extract(self, url):
audio_id = self._match_id(url)
webpage = self._download_webpage(url, audio_id)
title = self._html_search_regex(
r'href="/radio/[^/]+/fragment/%s" title="([^"]+)"' % audio_id,
webpage, 'title')
audio_url = self._search_regex(
r"data-streams='([^']+)'", webpage, 'audio url')
return {
'id': audio_id,
'url': audio_url,
'title': title,
}
class TegenlichtVproIE(NPOIE): class TegenlichtVproIE(NPOIE):
IE_NAME = 'tegenlicht.vpro.nl' IE_NAME = 'tegenlicht.vpro.nl'
_VALID_URL = r'https?://tegenlicht\.vpro\.nl/afleveringen/.*?' _VALID_URL = r'https?://tegenlicht\.vpro\.nl/afleveringen/.*?'

View File

@ -3,7 +3,9 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
unescapeHTML clean_html,
xpath_text,
int_or_none,
) )
@ -14,73 +16,63 @@ class NTVRuIE(InfoExtractor):
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.ntv.ru/novosti/863142/', 'url': 'http://www.ntv.ru/novosti/863142/',
'md5': 'ba7ea172a91cb83eb734cad18c10e723',
'info_dict': { 'info_dict': {
'id': '746000', 'id': '746000',
'ext': 'flv', 'ext': 'mp4',
'title': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины', 'title': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
'description': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины', 'description': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
'thumbnail': 're:^http://.*\.jpg',
'duration': 136, 'duration': 136,
}, },
'params': {
# rtmp download
'skip_download': True,
},
}, },
{ {
'url': 'http://www.ntv.ru/video/novosti/750370/', 'url': 'http://www.ntv.ru/video/novosti/750370/',
'md5': 'adecff79691b4d71e25220a191477124',
'info_dict': { 'info_dict': {
'id': '750370', 'id': '750370',
'ext': 'flv', 'ext': 'mp4',
'title': 'Родные пассажиров пропавшего Boeing не верят в трагический исход', 'title': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
'description': 'Родные пассажиров пропавшего Boeing не верят в трагический исход', 'description': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
'thumbnail': 're:^http://.*\.jpg',
'duration': 172, 'duration': 172,
}, },
'params': {
# rtmp download
'skip_download': True,
},
}, },
{ {
'url': 'http://www.ntv.ru/peredacha/segodnya/m23700/o232416', 'url': 'http://www.ntv.ru/peredacha/segodnya/m23700/o232416',
'md5': '82dbd49b38e3af1d00df16acbeab260c',
'info_dict': { 'info_dict': {
'id': '747480', 'id': '747480',
'ext': 'flv', 'ext': 'mp4',
'title': '«Сегодня». 21 марта 2014 года. 16:00 ', 'title': '«Сегодня». 21 марта 2014 года. 16:00',
'description': '«Сегодня». 21 марта 2014 года. 16:00 ', 'description': '«Сегодня». 21 марта 2014 года. 16:00',
'thumbnail': 're:^http://.*\.jpg',
'duration': 1496, 'duration': 1496,
}, },
'params': {
# rtmp download
'skip_download': True,
},
}, },
{ {
'url': 'http://www.ntv.ru/kino/Koma_film', 'url': 'http://www.ntv.ru/kino/Koma_film',
'md5': 'f825770930937aa7e5aca0dc0d29319a',
'info_dict': { 'info_dict': {
'id': '758100', 'id': '1007609',
'ext': 'flv', 'ext': 'mp4',
'title': 'Остросюжетный фильм «Кома»', 'title': 'Остросюжетный фильм «Кома»',
'description': 'Остросюжетный фильм «Кома»', 'description': 'Остросюжетный фильм «Кома»',
'thumbnail': 're:^http://.*\.jpg',
'duration': 5592, 'duration': 5592,
}, },
'params': {
# rtmp download
'skip_download': True,
},
}, },
{ {
'url': 'http://www.ntv.ru/serial/Delo_vrachey/m31760/o233916/', 'url': 'http://www.ntv.ru/serial/Delo_vrachey/m31760/o233916/',
'md5': '9320cd0e23f3ea59c330dc744e06ff3b',
'info_dict': { 'info_dict': {
'id': '751482', 'id': '751482',
'ext': 'flv', 'ext': 'mp4',
'title': '«Дело врачей»: «Деревце жизни»', 'title': '«Дело врачей»: «Деревце жизни»',
'description': '«Дело врачей»: «Деревце жизни»', 'description': '«Дело врачей»: «Деревце жизни»',
'thumbnail': 're:^http://.*\.jpg',
'duration': 2590, 'duration': 2590,
}, },
'params': {
# rtmp download
'skip_download': True,
},
}, },
] ]
@ -92,45 +84,36 @@ class NTVRuIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
page = self._download_webpage(url, video_id)
video_id = self._html_search_regex(self._VIDEO_ID_REGEXES, page, 'video id') webpage = self._download_webpage(url, video_id)
player = self._download_xml('http://www.ntv.ru/vi%s/' % video_id, video_id, 'Downloading video XML') video_id = self._html_search_regex(self._VIDEO_ID_REGEXES, webpage, 'video id')
title = unescapeHTML(player.find('./data/title').text)
description = unescapeHTML(player.find('./data/description').text) player = self._download_xml(
'http://www.ntv.ru/vi%s/' % video_id,
video_id, 'Downloading video XML')
title = clean_html(xpath_text(player, './data/title', 'title', fatal=True))
description = clean_html(xpath_text(player, './data/description', 'description'))
video = player.find('./data/video') video = player.find('./data/video')
video_id = video.find('./id').text video_id = xpath_text(video, './id', 'video id')
thumbnail = video.find('./splash').text thumbnail = xpath_text(video, './splash', 'thumbnail')
duration = int(video.find('./totaltime').text) duration = int_or_none(xpath_text(video, './totaltime', 'duration'))
view_count = int(video.find('./views').text) view_count = int_or_none(xpath_text(video, './views', 'view count'))
puid22 = video.find('./puid22').text
apps = { token = self._download_webpage(
'4': 'video1', 'http://stat.ntv.ru/services/access/token',
'7': 'video2', video_id, 'Downloading access token')
}
app = apps.get(puid22, apps['4'])
formats = [] formats = []
for format_id in ['', 'hi', 'webm']: for format_id in ['', 'hi', 'webm']:
file = video.find('./%sfile' % format_id) file_ = video.find('./%sfile' % format_id)
if file is None: if file_ is None:
continue continue
size = video.find('./%ssize' % format_id) size = video.find('./%ssize' % format_id)
formats.append({ formats.append({
'url': 'rtmp://media.ntv.ru/%s' % app, 'url': 'http://media2.ntv.ru/vod/%s&tok=%s' % (file_.text, token),
'app': app, 'filesize': int_or_none(size.text if size is not None else None),
'play_path': file.text,
'rtmp_conn': 'B:1',
'player_url': 'http://www.ntv.ru/swf/vps1.swf?update=20131128',
'page_url': 'http://www.ntv.ru',
'flash_version': 'LNX 11,2,202,341',
'rtmp_live': True,
'ext': 'flv',
'filesize': int(size.text),
}) })
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -1,16 +1,25 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import parse_duration from ..utils import (
int_or_none,
parse_duration,
)
class RtlXlIE(InfoExtractor): class RtlNlIE(InfoExtractor):
IE_NAME = 'rtlxl.nl' IE_NAME = 'rtl.nl'
_VALID_URL = r'https?://(www\.)?rtlxl\.nl/#!/[^/]+/(?P<uuid>[^/?]+)' IE_DESC = 'rtl.nl and rtlxl.nl'
_VALID_URL = r'''(?x)
https?://(www\.)?
(?:
rtlxl\.nl/\#!/[^/]+/|
rtl\.nl/system/videoplayer/[^?#]+?/video_embed\.html\#uuid=
)
(?P<id>[0-9a-f-]+)'''
_TEST = { _TESTS = [{
'url': 'http://www.rtlxl.nl/#!/rtl-nieuws-132237/6e4203a6-0a5e-3596-8424-c599a59e0677', 'url': 'http://www.rtlxl.nl/#!/rtl-nieuws-132237/6e4203a6-0a5e-3596-8424-c599a59e0677',
'md5': 'cc16baa36a6c169391f0764fa6b16654', 'md5': 'cc16baa36a6c169391f0764fa6b16654',
'info_dict': { 'info_dict': {
@ -22,21 +31,30 @@ class RtlXlIE(InfoExtractor):
'upload_date': '20140814', 'upload_date': '20140814',
'duration': 576.880, 'duration': 576.880,
}, },
} }, {
'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false',
'md5': 'dea7474214af1271d91ef332fb8be7ea',
'info_dict': {
'id': '84ae5571-ac25-4225-ae0c-ef8d9efb2aed',
'ext': 'mp4',
'timestamp': 1424039400,
'title': 'RTL Nieuws - Nieuwe beelden Kopenhagen: chaos direct na aanslag',
'thumbnail': 're:^https?://screenshots\.rtl\.nl/system/thumb/sz=[0-9]+x[0-9]+/uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed$',
'upload_date': '20150215',
'description': 'Er zijn nieuwe beelden vrijgegeven die vlak na de aanslag in Kopenhagen zijn gemaakt. Op de video is goed te zien hoe omstanders zich bekommeren om één van de slachtoffers, terwijl de eerste agenten ter plaatse komen.',
}
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) uuid = self._match_id(url)
uuid = mobj.group('uuid')
info = self._download_json( info = self._download_json(
'http://www.rtl.nl/system/s4m/vfd/version=2/uuid=%s/fmt=flash/' % uuid, 'http://www.rtl.nl/system/s4m/vfd/version=2/uuid=%s/fmt=flash/' % uuid,
uuid) uuid)
material = info['material'][0] material = info['material'][0]
episode_info = info['episodes'][0]
progname = info['abstracts'][0]['name'] progname = info['abstracts'][0]['name']
subtitle = material['title'] or info['episodes'][0]['name'] subtitle = material['title'] or info['episodes'][0]['name']
description = material.get('synopsis') or info['episodes'][0]['synopsis']
# Use unencrypted m3u8 streams (See https://github.com/rg3/youtube-dl/issues/4118) # Use unencrypted m3u8 streams (See https://github.com/rg3/youtube-dl/issues/4118)
videopath = material['videopath'].replace('.f4m', '.m3u8') videopath = material['videopath'].replace('.f4m', '.m3u8')
@ -58,14 +76,29 @@ class RtlXlIE(InfoExtractor):
'quality': 0, 'quality': 0,
} }
]) ])
self._sort_formats(formats) self._sort_formats(formats)
thumbnails = []
meta = info.get('meta', {})
for p in ('poster_base_url', '"thumb_base_url"'):
if not meta.get(p):
continue
thumbnails.append({
'url': self._proto_relative_url(meta[p] + uuid),
'width': int_or_none(self._search_regex(
r'/sz=([0-9]+)', meta[p], 'thumbnail width', fatal=False)),
'height': int_or_none(self._search_regex(
r'/sz=[0-9]+x([0-9]+)',
meta[p], 'thumbnail height', fatal=False))
})
return { return {
'id': uuid, 'id': uuid,
'title': '%s - %s' % (progname, subtitle), 'title': '%s - %s' % (progname, subtitle),
'formats': formats, 'formats': formats,
'timestamp': material['original_date'], 'timestamp': material['original_date'],
'description': episode_info['synopsis'], 'description': description,
'duration': parse_duration(material.get('duration')), 'duration': parse_duration(material.get('duration')),
'thumbnails': thumbnails,
} }

View File

@ -1,14 +1,30 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
from __future__ import unicode_literals from __future__ import unicode_literals
import hashlib
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
)
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
) )
def _get_api_key(api_path):
if api_path.endswith('?'):
api_path = api_path[:-1]
api_key = 'fb5f58a820353bd7095de526253c14fd'
a = '{0:}{1:}{2:}'.format(api_key, api_path, int(round(time.time() / 24 / 3600)))
return hashlib.md5(a.encode('ascii')).hexdigest()
class StreamCZIE(InfoExtractor): class StreamCZIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?stream\.cz/.+/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?stream\.cz/.+/(?P<id>[0-9]+)'
_API_URL = 'http://www.stream.cz/API'
_TESTS = [{ _TESTS = [{
'url': 'http://www.stream.cz/peklonataliri/765767-ecka-pro-deti', 'url': 'http://www.stream.cz/peklonataliri/765767-ecka-pro-deti',
@ -36,8 +52,11 @@ class StreamCZIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
data = self._download_json( api_path = '/episode/%s' % video_id
'http://www.stream.cz/API/episode/%s' % video_id, video_id)
req = compat_urllib_request.Request(self._API_URL + api_path)
req.add_header('Api-Password', _get_api_key(api_path))
data = self._download_json(req, video_id)
formats = [] formats = []
for quality, video in enumerate(data['video_qualities']): for quality, video in enumerate(data['video_qualities']):

View File

@ -52,7 +52,7 @@ class SunPornoIE(InfoExtractor):
formats = [] formats = []
quality = qualities(['mp4', 'flv']) quality = qualities(['mp4', 'flv'])
for video_url in re.findall(r'<source src="([^"]+)"', webpage): for video_url in re.findall(r'<(?:source|video) src="([^"]+)"', webpage):
video_ext = determine_ext(video_url) video_ext = determine_ext(video_url)
formats.append({ formats.append({
'url': video_url, 'url': video_url,

View File

@ -30,6 +30,11 @@ class TeamcocoIE(InfoExtractor):
} }
} }
] ]
_VIDEO_ID_REGEXES = (
r'"eVar42"\s*:\s*(\d+)',
r'Ginger\.TeamCoco\.openInApp\("video",\s*"([^"]+)"',
r'"id_not"\s*:\s*(\d+)'
)
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -40,8 +45,7 @@ class TeamcocoIE(InfoExtractor):
video_id = mobj.group("video_id") video_id = mobj.group("video_id")
if not video_id: if not video_id:
video_id = self._html_search_regex( video_id = self._html_search_regex(
r'<div\s+class="player".*?data-id="(\d+?)"', self._VIDEO_ID_REGEXES, webpage, 'video id')
webpage, 'video id')
data_url = 'http://teamcoco.com/cvp/2.0/%s.xml' % video_id data_url = 'http://teamcoco.com/cvp/2.0/%s.xml' % video_id
data = self._download_xml( data = self._download_xml(

View File

@ -2,6 +2,11 @@ from __future__ import unicode_literals
import re import re
import json import json
import time
import hmac
import binascii
import hashlib
from .subtitles import SubtitlesInfoExtractor from .subtitles import SubtitlesInfoExtractor
from ..compat import ( from ..compat import (
@ -11,6 +16,7 @@ from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
xpath_with_ns, xpath_with_ns,
unsmuggle_url,
) )
_x = lambda p: xpath_with_ns(p, {'smil': 'http://www.w3.org/2005/SMIL21/Language'}) _x = lambda p: xpath_with_ns(p, {'smil': 'http://www.w3.org/2005/SMIL21/Language'})
@ -18,7 +24,7 @@ _x = lambda p: xpath_with_ns(p, {'smil': 'http://www.w3.org/2005/SMIL21/Language
class ThePlatformIE(SubtitlesInfoExtractor): class ThePlatformIE(SubtitlesInfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?:https?://(?:link|player)\.theplatform\.com/[sp]/[^/]+/ (?:https?://(?:link|player)\.theplatform\.com/[sp]/(?P<provider_id>[^/]+)/
(?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/)? (?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/)?
|theplatform:)(?P<id>[^/\?&]+)''' |theplatform:)(?P<id>[^/\?&]+)'''
@ -38,9 +44,33 @@ class ThePlatformIE(SubtitlesInfoExtractor):
}, },
} }
@staticmethod
def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False):
flags = '10' if include_qs else '00'
expiration_date = '%x' % (int(time.time()) + life)
def str_to_hex(str):
return binascii.b2a_hex(str.encode('ascii')).decode('ascii')
def hex_to_str(hex):
return binascii.a2b_hex(hex)
relative_path = url.split('http://link.theplatform.com/s/')[1].split('?')[0]
clear_text = hex_to_str(flags + expiration_date + str_to_hex(relative_path))
checksum = hmac.new(sig_key.encode('ascii'), clear_text, hashlib.sha1).hexdigest()
sig = flags + expiration_date + checksum + str_to_hex(sig_secret)
return '%s&sig=%s' % (url, sig)
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
provider_id = mobj.group('provider_id')
video_id = mobj.group('id') video_id = mobj.group('id')
if not provider_id:
provider_id = 'dJ5BDC'
if mobj.group('config'): if mobj.group('config'):
config_url = url + '&form=json' config_url = url + '&form=json'
config_url = config_url.replace('swf/', 'config/') config_url = config_url.replace('swf/', 'config/')
@ -48,8 +78,12 @@ class ThePlatformIE(SubtitlesInfoExtractor):
config = self._download_json(config_url, video_id, 'Downloading config') config = self._download_json(config_url, video_id, 'Downloading config')
smil_url = config['releaseUrl'] + '&format=SMIL&formats=MPEG4&manifest=f4m' smil_url = config['releaseUrl'] + '&format=SMIL&formats=MPEG4&manifest=f4m'
else: else:
smil_url = ('http://link.theplatform.com/s/dJ5BDC/{0}/meta.smil?' smil_url = ('http://link.theplatform.com/s/{0}/{1}/meta.smil?'
'format=smil&mbr=true'.format(video_id)) 'format=smil&mbr=true'.format(provider_id, video_id))
sig = smuggled_data.get('sig')
if sig:
smil_url = self._sign_url(smil_url, sig['key'], sig['secret'])
meta = self._download_xml(smil_url, video_id) meta = self._download_xml(smil_url, video_id)
try: try:
@ -62,7 +96,7 @@ class ThePlatformIE(SubtitlesInfoExtractor):
else: else:
raise ExtractorError(error_msg, expected=True) raise ExtractorError(error_msg, expected=True)
info_url = 'http://link.theplatform.com/s/dJ5BDC/{0}?format=preview'.format(video_id) info_url = 'http://link.theplatform.com/s/{0}/{1}?format=preview'.format(provider_id, video_id)
info_json = self._download_webpage(info_url, video_id) info_json = self._download_webpage(info_url, video_id)
info = json.loads(info_json) info = json.loads(info_json)

View File

@ -0,0 +1,81 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
float_or_none,
month_by_abbreviation,
)
class YamIE(InfoExtractor):
_VALID_URL = r'http://mymedia.yam.com/m/(?P<id>\d+)'
_TESTS = [{
# An audio hosted on Yam
'url': 'http://mymedia.yam.com/m/2283921',
'md5': 'c011b8e262a52d5473d9c2e3c9963b9c',
'info_dict': {
'id': '2283921',
'ext': 'mp3',
'title': '發現 - 趙薇 京華煙雲主題曲',
'uploader_id': 'princekt',
'upload_date': '20080807',
'duration': 313.0,
}
}, {
# An external video hosted on YouTube
'url': 'http://mymedia.yam.com/m/3598173',
'md5': '0238ceec479c654e8c2f1223755bf3e9',
'info_dict': {
'id': 'pJ2Deys283c',
'ext': 'mp4',
'upload_date': '20150202',
'uploader': '新莊社大瑜伽社',
'description': 'md5:f5cc72f0baf259a70fb731654b0d2eff',
'uploader_id': '2323agoy',
'title': '外婆的澎湖灣KTV-潘安邦',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
page = self._download_webpage(url, video_id)
# Is it hosted externally on YouTube?
youtube_url = self._html_search_regex(
r'<embed src="(http://www.youtube.com/[^"]+)"',
page, 'YouTube url', default=None)
if youtube_url:
return self.url_result(youtube_url, 'Youtube')
api_page = self._download_webpage(
'http://mymedia.yam.com/api/a/?pID=' + video_id, video_id,
note='Downloading API page')
api_result_obj = compat_urlparse.parse_qs(api_page)
uploader_id = self._html_search_regex(
r'<!-- 發表作者 -->[\n ]+<a href="/([a-z]+)"',
page, 'uploader id', fatal=False)
mobj = re.search(r'<!-- 發表於 -->(?P<mon>[A-Z][a-z]{2}) ' +
r'(?P<day>\d{1,2}), (?P<year>\d{4})', page)
if mobj:
upload_date = '%s%02d%02d' % (
mobj.group('year'),
month_by_abbreviation(mobj.group('mon')),
int(mobj.group('day')))
else:
upload_date = None
duration = float_or_none(api_result_obj['totaltime'][0], scale=1000)
return {
'id': video_id,
'url': api_result_obj['mp3file'][0],
'title': self._html_search_meta('description', page),
'duration': duration,
'uploader_id': uploader_id,
'upload_date': upload_date,
}

View File

@ -25,6 +25,7 @@ from ..compat import (
from ..utils import ( from ..utils import (
clean_html, clean_html,
ExtractorError, ExtractorError,
float_or_none,
get_element_by_attribute, get_element_by_attribute,
get_element_by_id, get_element_by_id,
int_or_none, int_or_none,
@ -1124,6 +1125,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
'view_count': view_count, 'view_count': view_count,
'like_count': like_count, 'like_count': like_count,
'dislike_count': dislike_count, 'dislike_count': dislike_count,
'average_rating': float_or_none(video_info.get('avg_rating', [None])[0]),
'formats': formats, 'formats': formats,
} }

View File

@ -424,6 +424,10 @@ def parseOpts(overrideArguments=None):
'--xattr-set-filesize', '--xattr-set-filesize',
dest='xattr_set_filesize', action='store_true', dest='xattr_set_filesize', action='store_true',
help='(experimental) set file xattribute ytdl.filesize with expected filesize') help='(experimental) set file xattribute ytdl.filesize with expected filesize')
downloader.add_option(
'--hls-prefer-native',
dest='hls_prefer_native', action='store_true',
help='(experimental) Use the native HLS downloader instead of ffmpeg.')
downloader.add_option( downloader.add_option(
'--external-downloader', '--external-downloader',
dest='external_downloader', metavar='COMMAND', dest='external_downloader', metavar='COMMAND',
@ -735,6 +739,10 @@ def parseOpts(overrideArguments=None):
'--prefer-ffmpeg', '--prefer-ffmpeg',
action='store_true', dest='prefer_ffmpeg', action='store_true', dest='prefer_ffmpeg',
help='Prefer ffmpeg over avconv for running the postprocessors') help='Prefer ffmpeg over avconv for running the postprocessors')
postproc.add_option(
'--ffmpeg-location', '--avconv-location', metavar='PATH',
dest='ffmpeg_location',
help='Location of the ffmpeg/avconv binary; either the path to the binary or its containing directory.')
postproc.add_option( postproc.add_option(
'--exec', '--exec',
metavar='CMD', dest='exec_cmd', metavar='CMD', dest='exec_cmd',

View File

@ -30,54 +30,95 @@ class FFmpegPostProcessorError(PostProcessingError):
class FFmpegPostProcessor(PostProcessor): class FFmpegPostProcessor(PostProcessor):
def __init__(self, downloader=None, deletetempfiles=False): def __init__(self, downloader=None, deletetempfiles=False):
PostProcessor.__init__(self, downloader) PostProcessor.__init__(self, downloader)
self._versions = self.get_versions()
self._deletetempfiles = deletetempfiles self._deletetempfiles = deletetempfiles
self._determine_executables()
def check_version(self): def check_version(self):
if not self._executable: if not self.available:
raise FFmpegPostProcessorError('ffmpeg or avconv not found. Please install one.') raise FFmpegPostProcessorError('ffmpeg or avconv not found. Please install one.')
required_version = '10-0' if self._uses_avconv() else '1.0' required_version = '10-0' if self.basename == 'avconv' else '1.0'
if is_outdated_version( if is_outdated_version(
self._versions[self._executable], required_version): self._versions[self.basename], required_version):
warning = 'Your copy of %s is outdated, update %s to version %s or newer if you encounter any errors.' % ( warning = 'Your copy of %s is outdated, update %s to version %s or newer if you encounter any errors.' % (
self._executable, self._executable, required_version) self.basename, self.basename, required_version)
if self._downloader: if self._downloader:
self._downloader.report_warning(warning) self._downloader.report_warning(warning)
@staticmethod @staticmethod
def get_versions(): def get_versions(downloader=None):
return FFmpegPostProcessor(downloader)._versions
def _determine_executables(self):
programs = ['avprobe', 'avconv', 'ffmpeg', 'ffprobe'] programs = ['avprobe', 'avconv', 'ffmpeg', 'ffprobe']
return dict((p, get_exe_version(p, args=['-version'])) for p in programs) prefer_ffmpeg = self._downloader.params.get('prefer_ffmpeg', False)
@property self.basename = None
def available(self): self.probe_basename = None
return self._executable is not None
@property self._paths = None
def _executable(self): self._versions = None
if self._downloader.params.get('prefer_ffmpeg', False): if self._downloader:
location = self._downloader.params.get('ffmpeg_location')
if location is not None:
if not os.path.exists(location):
self._downloader.report_warning(
'ffmpeg-location %s does not exist! '
'Continuing without avconv/ffmpeg.' % (location))
self._versions = {}
return
elif not os.path.isdir(location):
basename = os.path.splitext(os.path.basename(location))[0]
if basename not in programs:
self._downloader.report_warning(
'Cannot identify executable %s, its basename should be one of %s. '
'Continuing without avconv/ffmpeg.' %
(location, ', '.join(programs)))
self._versions = {}
return None
location = os.path.dirname(os.path.abspath(location))
if basename in ('ffmpeg', 'ffprobe'):
prefer_ffmpeg = True
self._paths = dict(
(p, os.path.join(location, p)) for p in programs)
self._versions = dict(
(p, get_exe_version(self._paths[p], args=['-version']))
for p in programs)
if self._versions is None:
self._versions = dict(
(p, get_exe_version(p, args=['-version'])) for p in programs)
self._paths = dict((p, p) for p in programs)
if prefer_ffmpeg:
prefs = ('ffmpeg', 'avconv') prefs = ('ffmpeg', 'avconv')
else: else:
prefs = ('avconv', 'ffmpeg') prefs = ('avconv', 'ffmpeg')
for p in prefs: for p in prefs:
if self._versions[p]: if self._versions[p]:
return p self.basename = p
return None break
@property if prefer_ffmpeg:
def _probe_executable(self):
if self._downloader.params.get('prefer_ffmpeg', False):
prefs = ('ffprobe', 'avprobe') prefs = ('ffprobe', 'avprobe')
else: else:
prefs = ('avprobe', 'ffprobe') prefs = ('avprobe', 'ffprobe')
for p in prefs: for p in prefs:
if self._versions[p]: if self._versions[p]:
return p self.probe_basename = p
return None break
def _uses_avconv(self): @property
return self._executable == 'avconv' def available(self):
return self.basename is not None
@property
def executable(self):
return self._paths[self.basename]
@property
def probe_executable(self):
return self._paths[self.probe_basename]
def run_ffmpeg_multiple_files(self, input_paths, out_path, opts): def run_ffmpeg_multiple_files(self, input_paths, out_path, opts):
self.check_version() self.check_version()
@ -88,14 +129,14 @@ class FFmpegPostProcessor(PostProcessor):
files_cmd = [] files_cmd = []
for path in input_paths: for path in input_paths:
files_cmd.extend([encodeArgument('-i'), encodeFilename(path, True)]) files_cmd.extend([encodeArgument('-i'), encodeFilename(path, True)])
cmd = ([encodeFilename(self._executable, True), encodeArgument('-y')] + cmd = ([encodeFilename(self.executable, True), encodeArgument('-y')] +
files_cmd + files_cmd +
[encodeArgument(o) for o in opts] + [encodeArgument(o) for o in opts] +
[encodeFilename(self._ffmpeg_filename_argument(out_path), True)]) [encodeFilename(self._ffmpeg_filename_argument(out_path), True)])
if self._downloader.params.get('verbose', False): if self._downloader.params.get('verbose', False):
self._downloader.to_screen('[debug] ffmpeg command line: %s' % shell_quote(cmd)) self._downloader.to_screen('[debug] ffmpeg command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
stdout, stderr = p.communicate() stdout, stderr = p.communicate()
if p.returncode != 0: if p.returncode != 0:
stderr = stderr.decode('utf-8', 'replace') stderr = stderr.decode('utf-8', 'replace')
@ -127,14 +168,16 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
def get_audio_codec(self, path): def get_audio_codec(self, path):
if not self._probe_executable: if not self.probe_executable:
raise PostProcessingError('ffprobe or avprobe not found. Please install one.') raise PostProcessingError('ffprobe or avprobe not found. Please install one.')
try: try:
cmd = [ cmd = [
encodeFilename(self._probe_executable, True), encodeFilename(self.probe_executable, True),
encodeArgument('-show_streams'), encodeArgument('-show_streams'),
encodeFilename(self._ffmpeg_filename_argument(path), True)] encodeFilename(self._ffmpeg_filename_argument(path), True)]
handle = subprocess.Popen(cmd, stderr=compat_subprocess_get_DEVNULL(), stdout=subprocess.PIPE) if self._downloader.params.get('verbose', False):
self._downloader.to_screen('[debug] %s command line: %s' % (self.basename, shell_quote(cmd)))
handle = subprocess.Popen(cmd, stderr=compat_subprocess_get_DEVNULL(), stdout=subprocess.PIPE, stdin=subprocess.PIPE)
output = handle.communicate()[0] output = handle.communicate()[0]
if handle.wait() != 0: if handle.wait() != 0:
return None return None
@ -223,14 +266,14 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
if self._nopostoverwrites and os.path.exists(encodeFilename(new_path)): if self._nopostoverwrites and os.path.exists(encodeFilename(new_path)):
self._downloader.to_screen('[youtube] Post-process file %s exists, skipping' % new_path) self._downloader.to_screen('[youtube] Post-process file %s exists, skipping' % new_path)
else: else:
self._downloader.to_screen('[' + self._executable + '] Destination: ' + new_path) self._downloader.to_screen('[' + self.basename + '] Destination: ' + new_path)
self.run_ffmpeg(path, new_path, acodec, more_opts) self.run_ffmpeg(path, new_path, acodec, more_opts)
except: except:
etype, e, tb = sys.exc_info() etype, e, tb = sys.exc_info()
if isinstance(e, AudioConversionError): if isinstance(e, AudioConversionError):
msg = 'audio conversion failed: ' + e.msg msg = 'audio conversion failed: ' + e.msg
else: else:
msg = 'error running ' + self._executable msg = 'error running ' + self.basename
raise PostProcessingError(msg) raise PostProcessingError(msg)
# Try to update the date time for extracted audio file. # Try to update the date time for extracted audio file.

View File

@ -62,6 +62,11 @@ std_headers = {
} }
ENGLISH_MONTH_NAMES = [
'January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December']
def preferredencoding(): def preferredencoding():
"""Get preferred encoding. """Get preferred encoding.
@ -666,26 +671,27 @@ class YoutubeDLHTTPSHandler(compat_urllib_request.HTTPSHandler):
req, **kwargs) req, **kwargs)
def parse_iso8601(date_str, delimiter='T'): def parse_iso8601(date_str, delimiter='T', timezone=None):
""" Return a UNIX timestamp from the given date """ """ Return a UNIX timestamp from the given date """
if date_str is None: if date_str is None:
return None return None
m = re.search( if timezone is None:
r'(\.[0-9]+)?(?:Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)', m = re.search(
date_str) r'(\.[0-9]+)?(?:Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
if not m: date_str)
timezone = datetime.timedelta() if not m:
else:
date_str = date_str[:-len(m.group(0))]
if not m.group('sign'):
timezone = datetime.timedelta() timezone = datetime.timedelta()
else: else:
sign = 1 if m.group('sign') == '+' else -1 date_str = date_str[:-len(m.group(0))]
timezone = datetime.timedelta( if not m.group('sign'):
hours=sign * int(m.group('hours')), timezone = datetime.timedelta()
minutes=sign * int(m.group('minutes'))) else:
sign = 1 if m.group('sign') == '+' else -1
timezone = datetime.timedelta(
hours=sign * int(m.group('hours')),
minutes=sign * int(m.group('minutes')))
date_format = '%Y-%m-%d{0}%H:%M:%S'.format(delimiter) date_format = '%Y-%m-%d{0}%H:%M:%S'.format(delimiter)
dt = datetime.datetime.strptime(date_str, date_format) - timezone dt = datetime.datetime.strptime(date_str, date_format) - timezone
return calendar.timegm(dt.timetuple()) return calendar.timegm(dt.timetuple())
@ -1184,11 +1190,18 @@ def get_term_width():
def month_by_name(name): def month_by_name(name):
""" Return the number of a month by (locale-independently) English name """ """ Return the number of a month by (locale-independently) English name """
ENGLISH_NAMES = [
'January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December']
try: try:
return ENGLISH_NAMES.index(name) + 1 return ENGLISH_MONTH_NAMES.index(name) + 1
except ValueError:
return None
def month_by_abbreviation(abbrev):
""" Return the number of a month by (locale-independently) English
abbreviations """
try:
return [s[:3] for s in ENGLISH_MONTH_NAMES].index(abbrev) + 1
except ValueError: except ValueError:
return None return None

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2015.02.10.4' __version__ = '2015.02.17'