Compare commits

..

2 Commits

Author SHA1 Message Date
97bc05116e Merge branch 'master' into totalwebcasting 2018-01-07 15:03:28 +01:00
7608a91ee7 [totalwebcasting] Add new extractor 2017-01-11 18:51:25 -05:00
86 changed files with 857 additions and 1898 deletions

View File

@ -6,13 +6,12 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.02.04*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.12.31*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.02.04** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.12.31**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones - [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
- [ ] Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser
### What is the purpose of your *issue*? ### What is the purpose of your *issue*?
- [ ] Bug report (encountered problems with youtube-dl) - [ ] Bug report (encountered problems with youtube-dl)
@ -36,7 +35,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.02.04 [debug] youtube-dl version 2017.12.31
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -12,7 +12,6 @@
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones - [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
- [ ] Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser
### What is the purpose of your *issue*? ### What is the purpose of your *issue*?
- [ ] Bug report (encountered problems with youtube-dl) - [ ] Bug report (encountered problems with youtube-dl)

View File

@ -231,5 +231,3 @@ John Dong
Tatsuyuki Ishi Tatsuyuki Ishi
Daniel Weber Daniel Weber
Kay Bouché Kay Bouché
Yang Hongbo
Lei Wang

115
ChangeLog
View File

@ -1,122 +1,9 @@
version 2018.02.04 version <unreleased>
Core
* [downloader/http] Randomize HTTP chunk size
+ [downloader/http] Add ability to pass downloader options via info dict
* [downloader/http] Fix 302 infinite loops by not reusing requests
+ Document http_chunk_size
Extractors Extractors
+ [brightcove] Pass embed page URL as referrer (#15486)
+ [youtube] Enforce using chunked HTTP downloading for DASH formats
version 2018.02.03
Core
+ Introduce --http-chunk-size for chunk-based HTTP downloading
+ Add support for IronPython
* [downloader/ism] Fix Python 3.2 support
Extractors
* [redbulltv] Fix extraction (#15481)
* [redtube] Fix metadata extraction (#15472)
* [pladform] Respect platform id and extract HLS formats (#15468)
- [rtlnl] Remove progressive formats (#15459)
* [6play] Do no modify asset URLs with a token (#15248)
* [nationalgeographic] Relax URL regular expression
* [dplay] Relax URL regular expression (#15458)
* [cbsinteractive] Fix data extraction (#15451)
+ [amcnetworks] Add support for sundancetv.com (#9260)
version 2018.01.27
Core
* [extractor/common] Improve _json_ld for articles
* Switch codebase to use compat_b64decode
+ [compat] Add compat_b64decode
Extractors
+ [seznamzpravy] Add support for seznam.cz and seznamzpravy.cz (#14102, #14616)
* [dplay] Bypass geo restriction
+ [dplay] Add support for disco-api videos (#15396)
* [youtube] Extract precise error messages (#15284)
* [teachertube] Capture and output error message
* [teachertube] Fix and relax thumbnail extraction (#15403)
+ [prosiebensat1] Add another clip id regular expression (#15378)
* [tbs] Update tokenizer url (#15395)
* [mixcloud] Use compat_b64decode (#15394)
- [thesixtyone] Remove extractor (#15341)
version 2018.01.21
Core
* [extractor/common] Improve jwplayer DASH formats extraction (#9242, #15187)
* [utils] Improve scientific notation handling in js_to_json (#14789)
Extractors
+ [southparkdk] Add support for southparkstudios.nu
+ [southpark] Add support for collections (#14803)
* [franceinter] Fix upload date extraction (#14996)
+ [rtvs] Add support for rtvs.sk (#9242, #15187)
* [restudy] Fix extraction and extend URL regular expression (#15347)
* [youtube:live] Improve live detection (#15365)
+ [springboardplatform] Add support for springboardplatform.com
* [prosiebensat1] Add another clip id regular expression (#15290)
- [ringtv] Remove extractor (#15345)
version 2018.01.18
Extractors
* [soundcloud] Update client id (#15306)
- [kamcord] Remove extractor (#15322)
+ [spiegel] Add support for nexx videos (#15285)
* [twitch] Fix authentication and error capture (#14090, #15264)
* [vk] Detect more errors due to copyright complaints (#15259)
version 2018.01.14
Extractors
* [youtube] Fix live streams extraction (#15202)
* [wdr] Bypass geo restriction
* [wdr] Rework extractors (#14598)
+ [wdr] Add support for wdrmaus.de/elefantenseite (#14598)
+ [gamestar] Add support for gamepro.de (#3384)
* [viafree] Skip rtmp formats (#15232)
+ [pandoratv] Add support for mobile URLs (#12441)
+ [pandoratv] Add support for new URL format (#15131)
+ [ximalaya] Add support for ximalaya.com (#14687)
+ [digg] Add support for digg.com (#15214)
* [limelight] Tolerate empty pc formats (#15150, #15151, #15207)
* [ndr:embed:base] Make separate formats extraction non fatal (#15203)
+ [weibo] Add extractor (#15079)
+ [ok] Add support for live streams
* [canalplus] Fix extraction (#15072)
* [bilibili] Fix extraction (#15188)
version 2018.01.07
Core
* [utils] Fix youtube-dl under PyPy3 on Windows
* [YoutubeDL] Output python implementation in debug header
Extractors
+ [jwplatform] Add support for multiple embeds (#15192)
* [mitele] Fix extraction (#15186)
+ [motherless] Add support for groups (#15124)
* [lynda] Relax URL regular expression (#15185)
* [soundcloud] Fallback to avatar picture for thumbnail (#12878)
* [youku] Fix list extraction (#15135) * [youku] Fix list extraction (#15135)
* [openload] Fix extraction (#15166) * [openload] Fix extraction (#15166)
* [lynda] Skip invalid subtitles (#15159)
* [twitch] Pass video id to url_result when extracting playlist (#15139)
* [rtve.es:alacarta] Fix extraction of some new URLs * [rtve.es:alacarta] Fix extraction of some new URLs
* [acast] Fix extraction (#15147)
version 2017.12.31 version 2017.12.31

View File

@ -46,7 +46,7 @@ Or with [MacPorts](https://www.macports.org/):
Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html). Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
# DESCRIPTION # DESCRIPTION
**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on macOS. It is released to the public domain, which means you can modify it, redistribute it or use it however you like. **youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on Mac OS X. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
youtube-dl [OPTIONS] URL [URL...] youtube-dl [OPTIONS] URL [URL...]
@ -198,11 +198,6 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
size. By default, the buffer size is size. By default, the buffer size is
automatically resized from an initial value automatically resized from an initial value
of SIZE. of SIZE.
--http-chunk-size SIZE Size of a chunk for chunk-based HTTP
downloading (e.g. 10485760 or 10M) (default
is disabled). May be useful for bypassing
bandwidth throttling imposed by a webserver
(experimental)
--playlist-reverse Download playlist videos in reverse order --playlist-reverse Download playlist videos in reverse order
--playlist-random Download playlist videos in random order --playlist-random Download playlist videos in random order
--xattr-set-filesize Set file xattribute ytdl.filesize with --xattr-set-filesize Set file xattribute ytdl.filesize with
@ -868,7 +863,7 @@ Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox). In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, Mac OS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare). Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).

View File

@ -128,7 +128,7 @@
- **CamdemyFolder** - **CamdemyFolder**
- **CamWithHer** - **CamWithHer**
- **canalc2.tv** - **canalc2.tv**
- **Canalplus**: mycanal.fr and piwiplus.fr - **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas** - **Canvas**
- **CanvasEen**: canvas.be and een.be - **CanvasEen**: canvas.be and een.be
- **CarambaTV** - **CarambaTV**
@ -210,7 +210,6 @@
- **defense.gouv.fr** - **defense.gouv.fr**
- **democracynow** - **democracynow**
- **DHM**: Filmarchiv - Deutsches Historisches Museum - **DHM**: Filmarchiv - Deutsches Historisches Museum
- **Digg**
- **DigitallySpeaking** - **DigitallySpeaking**
- **Digiteka** - **Digiteka**
- **Discovery** - **Discovery**
@ -383,6 +382,7 @@
- **JWPlatform** - **JWPlatform**
- **Kakao** - **Kakao**
- **Kaltura** - **Kaltura**
- **Kamcord**
- **KanalPlay**: Kanal 5/9/11 Play - **KanalPlay**: Kanal 5/9/11 Play
- **Kankan** - **Kankan**
- **Karaoketv** - **Karaoketv**
@ -478,7 +478,6 @@
- **Moniker**: allmyvideos.net and vidspot.net - **Moniker**: allmyvideos.net and vidspot.net
- **Morningstar**: morningstar.com - **Morningstar**: morningstar.com
- **Motherless** - **Motherless**
- **MotherlessGroup**
- **Motorsport**: motorsport.com - **Motorsport**: motorsport.com
- **MovieClips** - **MovieClips**
- **MovieFap** - **MovieFap**
@ -682,6 +681,7 @@
- **revision** - **revision**
- **revision3:embed** - **revision3:embed**
- **RICE** - **RICE**
- **RingTV**
- **RMCDecouverte** - **RMCDecouverte**
- **RockstarGames** - **RockstarGames**
- **RoosterTeeth** - **RoosterTeeth**
@ -702,7 +702,6 @@
- **rtve.es:live**: RTVE.es live streams - **rtve.es:live**: RTVE.es live streams
- **rtve.es:television** - **rtve.es:television**
- **RTVNH** - **RTVNH**
- **RTVS**
- **Rudo** - **Rudo**
- **RUHD** - **RUHD**
- **RulePorn** - **RulePorn**
@ -732,8 +731,6 @@
- **ServingSys** - **ServingSys**
- **Servus** - **Servus**
- **Sexu** - **Sexu**
- **SeznamZpravy**
- **SeznamZpravyArticle**
- **Shahid** - **Shahid**
- **ShahidShow** - **ShahidShow**
- **Shared**: shared.sx - **Shared**: shared.sx
@ -775,7 +772,7 @@
- **Sport5** - **Sport5**
- **SportBoxEmbed** - **SportBoxEmbed**
- **SportDeutschland** - **SportDeutschland**
- **SpringboardPlatform** - **Sportschau**
- **Sprout** - **Sprout**
- **sr:mediathek**: Saarländischer Rundfunk - **sr:mediathek**: Saarländischer Rundfunk
- **SRGSSR** - **SRGSSR**
@ -824,6 +821,7 @@
- **ThePlatform** - **ThePlatform**
- **ThePlatformFeed** - **ThePlatformFeed**
- **TheScene** - **TheScene**
- **TheSixtyOne**
- **TheStar** - **TheStar**
- **TheSun** - **TheSun**
- **TheWeatherChannel** - **TheWeatherChannel**
@ -1003,14 +1001,10 @@
- **WatchIndianPorn**: Watch Indian Porn - **WatchIndianPorn**: Watch Indian Porn
- **WDR** - **WDR**
- **wdr:mobile** - **wdr:mobile**
- **WDRElefant**
- **WDRPage**
- **Webcaster** - **Webcaster**
- **WebcasterFeed** - **WebcasterFeed**
- **WebOfStories** - **WebOfStories**
- **WebOfStoriesPlaylist** - **WebOfStoriesPlaylist**
- **Weibo**
- **WeiboMobile**
- **WeiqiTV**: WQTV - **WeiqiTV**: WQTV
- **wholecloud**: WholeCloud - **wholecloud**: WholeCloud
- **Wimp** - **Wimp**
@ -1030,8 +1024,6 @@
- **xiami:artist**: 虾米音乐 - 歌手 - **xiami:artist**: 虾米音乐 - 歌手
- **xiami:collection**: 虾米音乐 - 精选集 - **xiami:collection**: 虾米音乐 - 精选集
- **xiami:song**: 虾米音乐 - **xiami:song**: 虾米音乐
- **ximalaya**: 喜马拉雅FM
- **ximalaya:album**: 喜马拉雅FM 专辑
- **XMinus** - **XMinus**
- **XNXX** - **XNXX**
- **Xstream** - **Xstream**

View File

@ -3,4 +3,4 @@ universal = True
[flake8] [flake8]
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git
ignore = E402,E501,E731,E741 ignore = E402,E501,E731

View File

@ -92,8 +92,8 @@ class TestDownload(unittest.TestCase):
def generator(test_case, tname): def generator(test_case, tname):
def test_template(self): def test_template(self):
ie = youtube_dl.extractor.get_info_extractor(test_case['name'])() ie = youtube_dl.extractor.get_info_extractor(test_case['name'])
other_ies = [get_info_extractor(ie_key)() for ie_key in test_case.get('add_ie', [])] other_ies = [get_info_extractor(ie_key) for ie_key in test_case.get('add_ie', [])]
is_playlist = any(k.startswith('playlist') for k in test_case) is_playlist = any(k.startswith('playlist') for k in test_case)
test_cases = test_case.get( test_cases = test_case.get(
'playlist', [] if is_playlist else [test_case]) 'playlist', [] if is_playlist else [test_case])

View File

@ -1,125 +0,0 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
import os
import re
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import try_rm
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_http_server
from youtube_dl.downloader.http import HttpFD
from youtube_dl.utils import encodeFilename
import ssl
import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
def http_server_port(httpd):
if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
# In Jython SSLSocket is not a subclass of socket.socket
sock = httpd.socket.sock
else:
sock = httpd.socket
return sock.getsockname()[1]
TEST_SIZE = 10 * 1024
class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
def log_message(self, format, *args):
pass
def send_content_range(self, total=None):
range_header = self.headers.get('Range')
start = end = None
if range_header:
mobj = re.search(r'^bytes=(\d+)-(\d+)', range_header)
if mobj:
start = int(mobj.group(1))
end = int(mobj.group(2))
valid_range = start is not None and end is not None
if valid_range:
content_range = 'bytes %d-%d' % (start, end)
if total:
content_range += '/%d' % total
self.send_header('Content-Range', content_range)
return (end - start + 1) if valid_range else total
def serve(self, range=True, content_length=True):
self.send_response(200)
self.send_header('Content-Type', 'video/mp4')
size = TEST_SIZE
if range:
size = self.send_content_range(TEST_SIZE)
if content_length:
self.send_header('Content-Length', size)
self.end_headers()
self.wfile.write(b'#' * size)
def do_GET(self):
if self.path == '/regular':
self.serve()
elif self.path == '/no-content-length':
self.serve(content_length=False)
elif self.path == '/no-range':
self.serve(range=False)
elif self.path == '/no-range-no-content-length':
self.serve(range=False, content_length=False)
else:
assert False
class FakeLogger(object):
def debug(self, msg):
pass
def warning(self, msg):
pass
def error(self, msg):
pass
class TestHttpFD(unittest.TestCase):
def setUp(self):
self.httpd = compat_http_server.HTTPServer(
('127.0.0.1', 0), HTTPTestRequestHandler)
self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True
self.server_thread.start()
def download(self, params, ep):
params['logger'] = FakeLogger()
ydl = YoutubeDL(params)
downloader = HttpFD(ydl, params)
filename = 'testfile.mp4'
try_rm(encodeFilename(filename))
self.assertTrue(downloader.real_download(filename, {
'url': 'http://127.0.0.1:%d/%s' % (self.port, ep),
}))
self.assertEqual(os.path.getsize(encodeFilename(filename)), TEST_SIZE)
try_rm(encodeFilename(filename))
def download_all(self, params):
for ep in ('regular', 'no-content-length', 'no-range', 'no-range-no-content-length'):
self.download(params, ep)
def test_regular(self):
self.download_all({})
def test_chunked(self):
self.download_all({
'http_chunk_size': 1000,
})
if __name__ == '__main__':
unittest.main()

View File

@ -47,7 +47,7 @@ class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
self.end_headers() self.end_headers()
return return
new_url = 'http://127.0.0.1:%d/中文.html' % http_server_port(self.server) new_url = 'http://localhost:%d/中文.html' % http_server_port(self.server)
self.send_response(302) self.send_response(302)
self.send_header(b'Location', new_url.encode('utf-8')) self.send_header(b'Location', new_url.encode('utf-8'))
self.end_headers() self.end_headers()
@ -74,7 +74,7 @@ class FakeLogger(object):
class TestHTTP(unittest.TestCase): class TestHTTP(unittest.TestCase):
def setUp(self): def setUp(self):
self.httpd = compat_http_server.HTTPServer( self.httpd = compat_http_server.HTTPServer(
('127.0.0.1', 0), HTTPTestRequestHandler) ('localhost', 0), HTTPTestRequestHandler)
self.port = http_server_port(self.httpd) self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever) self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True self.server_thread.daemon = True
@ -86,15 +86,15 @@ class TestHTTP(unittest.TestCase):
return return
ydl = YoutubeDL({'logger': FakeLogger()}) ydl = YoutubeDL({'logger': FakeLogger()})
r = ydl.extract_info('http://127.0.0.1:%d/302' % self.port) r = ydl.extract_info('http://localhost:%d/302' % self.port)
self.assertEqual(r['entries'][0]['url'], 'http://127.0.0.1:%d/vid.mp4' % self.port) self.assertEqual(r['entries'][0]['url'], 'http://localhost:%d/vid.mp4' % self.port)
class TestHTTPS(unittest.TestCase): class TestHTTPS(unittest.TestCase):
def setUp(self): def setUp(self):
certfn = os.path.join(TEST_DIR, 'testcert.pem') certfn = os.path.join(TEST_DIR, 'testcert.pem')
self.httpd = compat_http_server.HTTPServer( self.httpd = compat_http_server.HTTPServer(
('127.0.0.1', 0), HTTPTestRequestHandler) ('localhost', 0), HTTPTestRequestHandler)
self.httpd.socket = ssl.wrap_socket( self.httpd.socket = ssl.wrap_socket(
self.httpd.socket, certfile=certfn, server_side=True) self.httpd.socket, certfile=certfn, server_side=True)
self.port = http_server_port(self.httpd) self.port = http_server_port(self.httpd)
@ -107,11 +107,11 @@ class TestHTTPS(unittest.TestCase):
ydl = YoutubeDL({'logger': FakeLogger()}) ydl = YoutubeDL({'logger': FakeLogger()})
self.assertRaises( self.assertRaises(
Exception, Exception,
ydl.extract_info, 'https://127.0.0.1:%d/video.html' % self.port) ydl.extract_info, 'https://localhost:%d/video.html' % self.port)
ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True}) ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True})
r = ydl.extract_info('https://127.0.0.1:%d/video.html' % self.port) r = ydl.extract_info('https://localhost:%d/video.html' % self.port)
self.assertEqual(r['entries'][0]['url'], 'https://127.0.0.1:%d/vid.mp4' % self.port) self.assertEqual(r['entries'][0]['url'], 'https://localhost:%d/vid.mp4' % self.port)
def _build_proxy_handler(name): def _build_proxy_handler(name):
@ -132,23 +132,23 @@ def _build_proxy_handler(name):
class TestProxy(unittest.TestCase): class TestProxy(unittest.TestCase):
def setUp(self): def setUp(self):
self.proxy = compat_http_server.HTTPServer( self.proxy = compat_http_server.HTTPServer(
('127.0.0.1', 0), _build_proxy_handler('normal')) ('localhost', 0), _build_proxy_handler('normal'))
self.port = http_server_port(self.proxy) self.port = http_server_port(self.proxy)
self.proxy_thread = threading.Thread(target=self.proxy.serve_forever) self.proxy_thread = threading.Thread(target=self.proxy.serve_forever)
self.proxy_thread.daemon = True self.proxy_thread.daemon = True
self.proxy_thread.start() self.proxy_thread.start()
self.geo_proxy = compat_http_server.HTTPServer( self.geo_proxy = compat_http_server.HTTPServer(
('127.0.0.1', 0), _build_proxy_handler('geo')) ('localhost', 0), _build_proxy_handler('geo'))
self.geo_port = http_server_port(self.geo_proxy) self.geo_port = http_server_port(self.geo_proxy)
self.geo_proxy_thread = threading.Thread(target=self.geo_proxy.serve_forever) self.geo_proxy_thread = threading.Thread(target=self.geo_proxy.serve_forever)
self.geo_proxy_thread.daemon = True self.geo_proxy_thread.daemon = True
self.geo_proxy_thread.start() self.geo_proxy_thread.start()
def test_proxy(self): def test_proxy(self):
geo_proxy = '127.0.0.1:{0}'.format(self.geo_port) geo_proxy = 'localhost:{0}'.format(self.geo_port)
ydl = YoutubeDL({ ydl = YoutubeDL({
'proxy': '127.0.0.1:{0}'.format(self.port), 'proxy': 'localhost:{0}'.format(self.port),
'geo_verification_proxy': geo_proxy, 'geo_verification_proxy': geo_proxy,
}) })
url = 'http://foo.com/bar' url = 'http://foo.com/bar'
@ -162,7 +162,7 @@ class TestProxy(unittest.TestCase):
def test_proxy_with_idn(self): def test_proxy_with_idn(self):
ydl = YoutubeDL({ ydl = YoutubeDL({
'proxy': '127.0.0.1:{0}'.format(self.port), 'proxy': 'localhost:{0}'.format(self.port),
}) })
url = 'http://中文.tw/' url = 'http://中文.tw/'
response = ydl.urlopen(url).read().decode('utf-8') response = ydl.urlopen(url).read().decode('utf-8')

View File

@ -814,9 +814,6 @@ class TestUtil(unittest.TestCase):
inp = '''{"duration": "00:01:07"}''' inp = '''{"duration": "00:01:07"}'''
self.assertEqual(js_to_json(inp), '''{"duration": "00:01:07"}''') self.assertEqual(js_to_json(inp), '''{"duration": "00:01:07"}''')
inp = '''{segments: [{"offset":-3.885780586188048e-16,"duration":39.75000000000001}]}'''
self.assertEqual(js_to_json(inp), '''{"segments": [{"offset":-3.885780586188048e-16,"duration":39.75000000000001}]}''')
def test_js_to_json_edgecases(self): def test_js_to_json_edgecases(self):
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}") on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"}) self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
@ -888,13 +885,6 @@ class TestUtil(unittest.TestCase):
on = js_to_json('{/*comment\n*/42/*comment\n*/:/*comment\n*/42/*comment\n*/}') on = js_to_json('{/*comment\n*/42/*comment\n*/:/*comment\n*/42/*comment\n*/}')
self.assertEqual(json.loads(on), {'42': 42}) self.assertEqual(json.loads(on), {'42': 42})
on = js_to_json('{42:4.2e1}')
self.assertEqual(json.loads(on), {'42': 42.0})
def test_js_to_json_malformed(self):
self.assertEqual(js_to_json('42a1'), '42"a1"')
self.assertEqual(js_to_json('42a-1'), '42"a"-1')
def test_extract_attributes(self): def test_extract_attributes(self):
self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'}) self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'}) self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})

View File

@ -298,8 +298,7 @@ class YoutubeDL(object):
the downloader (see youtube_dl/downloader/common.py): the downloader (see youtube_dl/downloader/common.py):
nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test, nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test,
noresizebuffer, retries, continuedl, noprogress, consoletitle, noresizebuffer, retries, continuedl, noprogress, consoletitle,
xattr_set_filesize, external_downloader_args, hls_use_mpegts, xattr_set_filesize, external_downloader_args, hls_use_mpegts.
http_chunk_size.
The following options are used by the post processors: The following options are used by the post processors:
prefer_ffmpeg: If True, use ffmpeg instead of avconv if both are available, prefer_ffmpeg: If True, use ffmpeg instead of avconv if both are available,

View File

@ -191,11 +191,6 @@ def _real_main(argv=None):
if numeric_buffersize is None: if numeric_buffersize is None:
parser.error('invalid buffer size specified') parser.error('invalid buffer size specified')
opts.buffersize = numeric_buffersize opts.buffersize = numeric_buffersize
if opts.http_chunk_size is not None:
numeric_chunksize = FileDownloader.parse_bytes(opts.http_chunk_size)
if not numeric_chunksize:
parser.error('invalid http chunk size specified')
opts.http_chunk_size = numeric_chunksize
if opts.playliststart <= 0: if opts.playliststart <= 0:
raise ValueError('Playlist start must be positive') raise ValueError('Playlist start must be positive')
if opts.playlistend not in (-1, None) and opts.playlistend < opts.playliststart: if opts.playlistend not in (-1, None) and opts.playlistend < opts.playliststart:
@ -351,7 +346,6 @@ def _real_main(argv=None):
'keep_fragments': opts.keep_fragments, 'keep_fragments': opts.keep_fragments,
'buffersize': opts.buffersize, 'buffersize': opts.buffersize,
'noresizebuffer': opts.noresizebuffer, 'noresizebuffer': opts.noresizebuffer,
'http_chunk_size': opts.http_chunk_size,
'continuedl': opts.continue_dl, 'continuedl': opts.continue_dl,
'noprogress': opts.noprogress, 'noprogress': opts.noprogress,
'progress_with_newline': opts.progress_with_newline, 'progress_with_newline': opts.progress_with_newline,

View File

@ -1,8 +1,8 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
from math import ceil from math import ceil
from .compat import compat_b64decode
from .utils import bytes_to_intlist, intlist_to_bytes from .utils import bytes_to_intlist, intlist_to_bytes
BLOCK_SIZE_BYTES = 16 BLOCK_SIZE_BYTES = 16
@ -180,7 +180,7 @@ def aes_decrypt_text(data, password, key_size_bytes):
""" """
NONCE_LENGTH_BYTES = 8 NONCE_LENGTH_BYTES = 8
data = bytes_to_intlist(compat_b64decode(data)) data = bytes_to_intlist(base64.b64decode(data.encode('utf-8')))
password = bytes_to_intlist(password.encode('utf-8')) password = bytes_to_intlist(password.encode('utf-8'))
key = password[:key_size_bytes] + [0] * (key_size_bytes - len(password)) key = password[:key_size_bytes] + [0] * (key_size_bytes - len(password))

View File

@ -1,7 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import binascii import binascii
import collections import collections
import ctypes import ctypes
@ -2897,24 +2896,9 @@ except TypeError:
if isinstance(spec, compat_str): if isinstance(spec, compat_str):
spec = spec.encode('ascii') spec = spec.encode('ascii')
return struct.unpack(spec, *args) return struct.unpack(spec, *args)
class compat_Struct(struct.Struct):
def __init__(self, fmt):
if isinstance(fmt, compat_str):
fmt = fmt.encode('ascii')
super(compat_Struct, self).__init__(fmt)
else: else:
compat_struct_pack = struct.pack compat_struct_pack = struct.pack
compat_struct_unpack = struct.unpack compat_struct_unpack = struct.unpack
if platform.python_implementation() == 'IronPython' and sys.version_info < (2, 7, 8):
class compat_Struct(struct.Struct):
def unpack(self, string):
if not isinstance(string, buffer): # noqa: F821
string = buffer(string) # noqa: F821
return super(compat_Struct, self).unpack(string)
else:
compat_Struct = struct.Struct
try: try:
from future_builtins import zip as compat_zip from future_builtins import zip as compat_zip
@ -2924,16 +2908,6 @@ except ImportError: # not 2.6+ or is 3.x
except ImportError: except ImportError:
compat_zip = zip compat_zip = zip
if sys.version_info < (3, 3):
def compat_b64decode(s, *args, **kwargs):
if isinstance(s, compat_str):
s = s.encode('ascii')
return base64.b64decode(s, *args, **kwargs)
else:
compat_b64decode = base64.b64decode
if platform.python_implementation() == 'PyPy' and sys.pypy_version_info < (5, 4, 0): if platform.python_implementation() == 'PyPy' and sys.pypy_version_info < (5, 4, 0):
# PyPy2 prior to version 5.4.0 expects byte strings as Windows function # PyPy2 prior to version 5.4.0 expects byte strings as Windows function
# names, see the original PyPy issue [1] and the youtube-dl one [2]. # names, see the original PyPy issue [1] and the youtube-dl one [2].
@ -2956,8 +2930,6 @@ __all__ = [
'compat_HTMLParseError', 'compat_HTMLParseError',
'compat_HTMLParser', 'compat_HTMLParser',
'compat_HTTPError', 'compat_HTTPError',
'compat_Struct',
'compat_b64decode',
'compat_basestring', 'compat_basestring',
'compat_chr', 'compat_chr',
'compat_cookiejar', 'compat_cookiejar',

View File

@ -49,9 +49,6 @@ class FileDownloader(object):
external_downloader_args: A list of additional command-line arguments for the external_downloader_args: A list of additional command-line arguments for the
external downloader. external downloader.
hls_use_mpegts: Use the mpegts container for HLS videos. hls_use_mpegts: Use the mpegts container for HLS videos.
http_chunk_size: Size of a chunk for chunk-based HTTP downloading.May be
useful for bypassing bandwidth throttling imposed by
a webserver (experimental)
Subclasses of this one must re-define the real_download method. Subclasses of this one must re-define the real_download method.
""" """

View File

@ -1,12 +1,12 @@
from __future__ import division, unicode_literals from __future__ import division, unicode_literals
import base64
import io import io
import itertools import itertools
import time import time
from .fragment import FragmentFD from .fragment import FragmentFD
from ..compat import ( from ..compat import (
compat_b64decode,
compat_etree_fromstring, compat_etree_fromstring,
compat_urlparse, compat_urlparse,
compat_urllib_error, compat_urllib_error,
@ -312,7 +312,7 @@ class F4mFD(FragmentFD):
boot_info = self._get_bootstrap_from_url(bootstrap_url) boot_info = self._get_bootstrap_from_url(bootstrap_url)
else: else:
bootstrap_url = None bootstrap_url = None
bootstrap = compat_b64decode(node.text) bootstrap = base64.b64decode(node.text.encode('ascii'))
boot_info = read_bootstrap_info(bootstrap) boot_info = read_bootstrap_info(bootstrap)
return boot_info, bootstrap_url return boot_info, bootstrap_url
@ -349,7 +349,7 @@ class F4mFD(FragmentFD):
live = boot_info['live'] live = boot_info['live']
metadata_node = media.find(_add_ns('metadata')) metadata_node = media.find(_add_ns('metadata'))
if metadata_node is not None: if metadata_node is not None:
metadata = compat_b64decode(metadata_node.text) metadata = base64.b64decode(metadata_node.text.encode('ascii'))
else: else:
metadata = None metadata = None

View File

@ -4,18 +4,13 @@ import errno
import os import os
import socket import socket
import time import time
import random
import re import re
from .common import FileDownloader from .common import FileDownloader
from ..compat import ( from ..compat import compat_urllib_error
compat_str,
compat_urllib_error,
)
from ..utils import ( from ..utils import (
ContentTooShortError, ContentTooShortError,
encodeFilename, encodeFilename,
int_or_none,
sanitize_open, sanitize_open,
sanitized_Request, sanitized_Request,
write_xattr, write_xattr,
@ -43,26 +38,21 @@ class HttpFD(FileDownloader):
add_headers = info_dict.get('http_headers') add_headers = info_dict.get('http_headers')
if add_headers: if add_headers:
headers.update(add_headers) headers.update(add_headers)
basic_request = sanitized_Request(url, None, headers)
request = sanitized_Request(url, None, headers)
is_test = self.params.get('test', False) is_test = self.params.get('test', False)
chunk_size = self._TEST_FILE_SIZE if is_test else (
info_dict.get('downloader_options', {}).get('http_chunk_size') or if is_test:
self.params.get('http_chunk_size') or 0) request.add_header('Range', 'bytes=0-%s' % str(self._TEST_FILE_SIZE - 1))
ctx.open_mode = 'wb' ctx.open_mode = 'wb'
ctx.resume_len = 0 ctx.resume_len = 0
ctx.data_len = None
ctx.block_size = self.params.get('buffersize', 1024)
ctx.start_time = time.time()
ctx.chunk_size = None
if self.params.get('continuedl', True): if self.params.get('continuedl', True):
# Establish possible resume length # Establish possible resume length
if os.path.isfile(encodeFilename(ctx.tmpfilename)): if os.path.isfile(encodeFilename(ctx.tmpfilename)):
ctx.resume_len = os.path.getsize( ctx.resume_len = os.path.getsize(encodeFilename(ctx.tmpfilename))
encodeFilename(ctx.tmpfilename))
ctx.is_resume = ctx.resume_len > 0
count = 0 count = 0
retries = self.params.get('retries', 0) retries = self.params.get('retries', 0)
@ -74,36 +64,11 @@ class HttpFD(FileDownloader):
def __init__(self, source_error): def __init__(self, source_error):
self.source_error = source_error self.source_error = source_error
class NextFragment(Exception):
pass
def set_range(req, start, end):
range_header = 'bytes=%d-' % start
if end:
range_header += compat_str(end)
req.add_header('Range', range_header)
def establish_connection(): def establish_connection():
ctx.chunk_size = (random.randint(int(chunk_size * 0.95), chunk_size) if ctx.resume_len != 0:
if not is_test and chunk_size else chunk_size) self.report_resuming_byte(ctx.resume_len)
if ctx.resume_len > 0: request.add_header('Range', 'bytes=%d-' % ctx.resume_len)
range_start = ctx.resume_len
if ctx.is_resume:
self.report_resuming_byte(ctx.resume_len)
ctx.open_mode = 'ab' ctx.open_mode = 'ab'
elif ctx.chunk_size > 0:
range_start = 0
else:
range_start = None
ctx.is_resume = False
range_end = range_start + ctx.chunk_size - 1 if ctx.chunk_size else None
if range_end and ctx.data_len is not None and range_end >= ctx.data_len:
range_end = ctx.data_len - 1
has_range = range_start is not None
ctx.has_range = has_range
request = sanitized_Request(url, None, headers)
if has_range:
set_range(request, range_start, range_end)
# Establish connection # Establish connection
try: try:
ctx.data = self.ydl.urlopen(request) ctx.data = self.ydl.urlopen(request)
@ -112,40 +77,29 @@ class HttpFD(FileDownloader):
# that don't support resuming and serve a whole file with no Content-Range # that don't support resuming and serve a whole file with no Content-Range
# set in response despite of requested Range (see # set in response despite of requested Range (see
# https://github.com/rg3/youtube-dl/issues/6057#issuecomment-126129799) # https://github.com/rg3/youtube-dl/issues/6057#issuecomment-126129799)
if has_range: if ctx.resume_len > 0:
content_range = ctx.data.headers.get('Content-Range') content_range = ctx.data.headers.get('Content-Range')
if content_range: if content_range:
content_range_m = re.search(r'bytes (\d+)-(\d+)?(?:/(\d+))?', content_range) content_range_m = re.search(r'bytes (\d+)-', content_range)
# Content-Range is present and matches requested Range, resume is possible # Content-Range is present and matches requested Range, resume is possible
if content_range_m: if content_range_m and ctx.resume_len == int(content_range_m.group(1)):
if range_start == int(content_range_m.group(1)): return
content_range_end = int_or_none(content_range_m.group(2))
content_len = int_or_none(content_range_m.group(3))
accept_content_len = (
# Non-chunked download
not ctx.chunk_size or
# Chunked download and requested piece or
# its part is promised to be served
content_range_end == range_end or
content_len < range_end)
if accept_content_len:
ctx.data_len = content_len
return
# Content-Range is either not present or invalid. Assuming remote webserver is # Content-Range is either not present or invalid. Assuming remote webserver is
# trying to send the whole file, resume is not possible, so wiping the local file # trying to send the whole file, resume is not possible, so wiping the local file
# and performing entire redownload # and performing entire redownload
self.report_unable_to_resume() self.report_unable_to_resume()
ctx.resume_len = 0 ctx.resume_len = 0
ctx.open_mode = 'wb' ctx.open_mode = 'wb'
ctx.data_len = int_or_none(ctx.data.info().get('Content-length', None))
return return
except (compat_urllib_error.HTTPError, ) as err: except (compat_urllib_error.HTTPError, ) as err:
if err.code == 416: if (err.code < 500 or err.code >= 600) and err.code != 416:
# Unexpected HTTP error
raise
elif err.code == 416:
# Unable to resume (requested range not satisfiable) # Unable to resume (requested range not satisfiable)
try: try:
# Open the connection again without the range header # Open the connection again without the range header
ctx.data = self.ydl.urlopen( ctx.data = self.ydl.urlopen(basic_request)
sanitized_Request(url, None, headers))
content_length = ctx.data.info()['Content-Length'] content_length = ctx.data.info()['Content-Length']
except (compat_urllib_error.HTTPError, ) as err: except (compat_urllib_error.HTTPError, ) as err:
if err.code < 500 or err.code >= 600: if err.code < 500 or err.code >= 600:
@ -176,9 +130,6 @@ class HttpFD(FileDownloader):
ctx.resume_len = 0 ctx.resume_len = 0
ctx.open_mode = 'wb' ctx.open_mode = 'wb'
return return
elif err.code < 500 or err.code >= 600:
# Unexpected HTTP error
raise
raise RetryDownload(err) raise RetryDownload(err)
except socket.error as err: except socket.error as err:
if err.errno != errno.ECONNRESET: if err.errno != errno.ECONNRESET:
@ -209,7 +160,7 @@ class HttpFD(FileDownloader):
return False return False
byte_counter = 0 + ctx.resume_len byte_counter = 0 + ctx.resume_len
block_size = ctx.block_size block_size = self.params.get('buffersize', 1024)
start = time.time() start = time.time()
# measure time over whole while-loop, so slow_down() and best_block_size() work together properly # measure time over whole while-loop, so slow_down() and best_block_size() work together properly
@ -282,30 +233,25 @@ class HttpFD(FileDownloader):
# Progress message # Progress message
speed = self.calc_speed(start, now, byte_counter - ctx.resume_len) speed = self.calc_speed(start, now, byte_counter - ctx.resume_len)
if ctx.data_len is None: if data_len is None:
eta = None eta = None
else: else:
eta = self.calc_eta(start, time.time(), ctx.data_len - ctx.resume_len, byte_counter - ctx.resume_len) eta = self.calc_eta(start, time.time(), data_len - ctx.resume_len, byte_counter - ctx.resume_len)
self._hook_progress({ self._hook_progress({
'status': 'downloading', 'status': 'downloading',
'downloaded_bytes': byte_counter, 'downloaded_bytes': byte_counter,
'total_bytes': ctx.data_len, 'total_bytes': data_len,
'tmpfilename': ctx.tmpfilename, 'tmpfilename': ctx.tmpfilename,
'filename': ctx.filename, 'filename': ctx.filename,
'eta': eta, 'eta': eta,
'speed': speed, 'speed': speed,
'elapsed': now - ctx.start_time, 'elapsed': now - start,
}) })
if is_test and byte_counter == data_len: if is_test and byte_counter == data_len:
break break
if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len:
ctx.resume_len = byte_counter
# ctx.block_size = block_size
raise NextFragment()
if ctx.stream is None: if ctx.stream is None:
self.to_stderr('\n') self.to_stderr('\n')
self.report_error('Did not get any data blocks') self.report_error('Did not get any data blocks')
@ -330,7 +276,7 @@ class HttpFD(FileDownloader):
'total_bytes': byte_counter, 'total_bytes': byte_counter,
'filename': ctx.filename, 'filename': ctx.filename,
'status': 'finished', 'status': 'finished',
'elapsed': time.time() - ctx.start_time, 'elapsed': time.time() - start,
}) })
return True return True
@ -344,8 +290,6 @@ class HttpFD(FileDownloader):
if count <= retries: if count <= retries:
self.report_retry(e.source_error, count, retries) self.report_retry(e.source_error, count, retries)
continue continue
except NextFragment:
continue
except SucceedDownload: except SucceedDownload:
return True return True

View File

@ -1,27 +1,25 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import time import time
import struct
import binascii import binascii
import io import io
from .fragment import FragmentFD from .fragment import FragmentFD
from ..compat import ( from ..compat import compat_urllib_error
compat_Struct,
compat_urllib_error,
)
u8 = compat_Struct('>B') u8 = struct.Struct(b'>B')
u88 = compat_Struct('>Bx') u88 = struct.Struct(b'>Bx')
u16 = compat_Struct('>H') u16 = struct.Struct(b'>H')
u1616 = compat_Struct('>Hxx') u1616 = struct.Struct(b'>Hxx')
u32 = compat_Struct('>I') u32 = struct.Struct(b'>I')
u64 = compat_Struct('>Q') u64 = struct.Struct(b'>Q')
s88 = compat_Struct('>bx') s88 = struct.Struct(b'>bx')
s16 = compat_Struct('>h') s16 = struct.Struct(b'>h')
s1616 = compat_Struct('>hxx') s1616 = struct.Struct(b'>hxx')
s32 = compat_Struct('>i') s32 = struct.Struct(b'>i')
unity_matrix = (s32.pack(0x10000) + s32.pack(0) * 3) * 2 + s32.pack(0x40000000) unity_matrix = (s32.pack(0x10000) + s32.pack(0) * 3) * 2 + s32.pack(0x40000000)
@ -141,7 +139,7 @@ def write_piff_header(stream, params):
sample_entry_payload += u16.pack(0x18) # depth sample_entry_payload += u16.pack(0x18) # depth
sample_entry_payload += s16.pack(-1) # pre defined sample_entry_payload += s16.pack(-1) # pre defined
codec_private_data = binascii.unhexlify(params['codec_private_data'].encode('utf-8')) codec_private_data = binascii.unhexlify(params['codec_private_data'])
if fourcc in ('H264', 'AVC1'): if fourcc in ('H264', 'AVC1'):
sps, pps = codec_private_data.split(u32.pack(1))[1:] sps, pps = codec_private_data.split(u32.pack(1))[1:]
avcc_payload = u8.pack(1) # configuration version avcc_payload = u8.pack(1) # configuration version

View File

@ -1,15 +1,13 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import json import json
import os import os
from .common import InfoExtractor from .common import InfoExtractor
from ..aes import aes_cbc_decrypt from ..aes import aes_cbc_decrypt
from ..compat import ( from ..compat import compat_ord
compat_b64decode,
compat_ord,
)
from ..utils import ( from ..utils import (
bytes_to_intlist, bytes_to_intlist,
ExtractorError, ExtractorError,
@ -50,9 +48,9 @@ class ADNIE(InfoExtractor):
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js # http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt( dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(compat_b64decode(enc_subtitles[24:])), bytes_to_intlist(base64.b64decode(enc_subtitles[24:])),
bytes_to_intlist(b'\x1b\xe0\x29\x61\x38\x94\x24\x00\x12\xbd\xc5\x80\xac\xce\xbe\xb0'), bytes_to_intlist(b'\x1b\xe0\x29\x61\x38\x94\x24\x00\x12\xbd\xc5\x80\xac\xce\xbe\xb0'),
bytes_to_intlist(compat_b64decode(enc_subtitles[:24])) bytes_to_intlist(base64.b64decode(enc_subtitles[:24]))
)) ))
subtitles_json = self._parse_json( subtitles_json = self._parse_json(
dec_subtitles[:-compat_ord(dec_subtitles[-1])].decode(), dec_subtitles[:-compat_ord(dec_subtitles[-1])].decode(),

View File

@ -11,7 +11,7 @@ from ..utils import (
class AMCNetworksIE(ThePlatformIE): class AMCNetworksIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|(?:we|sundance)tv)\.com/(?:movies|shows(?:/[^/]+)+)/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies|shows(?:/[^/]+)+)/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1', 'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
'md5': '', 'md5': '',
@ -51,9 +51,6 @@ class AMCNetworksIE(ThePlatformIE):
}, { }, {
'url': 'http://www.wetv.com/shows/la-hair/videos/season-05/episode-09-episode-9-2/episode-9-sneak-peek-3', 'url': 'http://www.wetv.com/shows/la-hair/videos/season-05/episode-09-episode-9-2/episode-9-sneak-peek-3',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.sundancetv.com/shows/riviera/full-episodes/season-1/episode-01-episode-1',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,13 +1,11 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_urllib_parse_unquote
compat_b64decode,
compat_urllib_parse_unquote,
)
class BigflixIE(InfoExtractor): class BigflixIE(InfoExtractor):
@ -41,8 +39,8 @@ class BigflixIE(InfoExtractor):
webpage, 'title') webpage, 'title')
def decode_url(quoted_b64_url): def decode_url(quoted_b64_url):
return compat_b64decode(compat_urllib_parse_unquote( return base64.b64decode(compat_urllib_parse_unquote(
quoted_b64_url)).decode('utf-8') quoted_b64_url).encode('ascii')).decode('utf-8')
formats = [] formats = []
for height, encoded_url in re.findall( for height, encoded_url in re.findall(

View File

@ -102,7 +102,6 @@ class BiliBiliIE(InfoExtractor):
video_id, anime_id, compat_urlparse.urljoin(url, '//bangumi.bilibili.com/anime/%s' % anime_id))) video_id, anime_id, compat_urlparse.urljoin(url, '//bangumi.bilibili.com/anime/%s' % anime_id)))
headers = { headers = {
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Referer': url
} }
headers.update(self.geo_verification_headers()) headers.update(self.geo_verification_headers())
@ -117,15 +116,10 @@ class BiliBiliIE(InfoExtractor):
payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid) payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest() sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
headers = {
'Referer': url
}
headers.update(self.geo_verification_headers())
video_info = self._download_json( video_info = self._download_json(
'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign), 'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign),
video_id, note='Downloading video info page', video_id, note='Downloading video info page',
headers=headers) headers=self.geo_verification_headers())
if 'durl' not in video_info: if 'durl' not in video_info:
self._report_error(video_info) self._report_error(video_info)

View File

@ -690,17 +690,10 @@ class BrightcoveNewIE(AdobePassIE):
webpage, 'policy key', group='pk') webpage, 'policy key', group='pk')
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s' % (account_id, video_id) api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s' % (account_id, video_id)
headers = {
'Accept': 'application/json;pk=%s' % policy_key,
}
referrer = smuggled_data.get('referrer')
if referrer:
headers.update({
'Referer': referrer,
'Origin': re.search(r'https?://[^/]+', referrer).group(0),
})
try: try:
json_data = self._download_json(api_url, video_id, headers=headers) json_data = self._download_json(api_url, video_id, headers={
'Accept': 'application/json;pk=%s' % policy_key
})
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
json_data = self._parse_json(e.cause.read().decode(), video_id)[0] json_data = self._parse_json(e.cause.read().decode(), video_id)[0]

View File

@ -4,36 +4,59 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse
from ..utils import ( from ..utils import (
dict_get,
# ExtractorError, # ExtractorError,
# HEADRequest, # HEADRequest,
int_or_none, int_or_none,
qualities, qualities,
remove_end,
unified_strdate, unified_strdate,
) )
class CanalplusIE(InfoExtractor): class CanalplusIE(InfoExtractor):
IE_DESC = 'mycanal.fr and piwiplus.fr' IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
_VALID_URL = r'https?://(?:www\.)?(?P<site>mycanal|piwiplus)\.fr/(?:[^/]+/)*(?P<display_id>[^?/]+)(?:\.html\?.*\bvid=|/p/)(?P<id>\d+)' _VALID_URL = r'''(?x)
https?://
(?:
(?:
(?:(?:www|m)\.)?canalplus\.fr|
(?:www\.)?piwiplus\.fr|
(?:www\.)?d8\.tv|
(?:www\.)?c8\.fr|
(?:www\.)?d17\.tv|
(?:(?:football|www)\.)?cstar\.fr|
(?:www\.)?itele\.fr
)/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
player\.canalplus\.fr/#/(?P<id>\d+)
)
'''
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json' _VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json'
_SITE_ID_MAP = { _SITE_ID_MAP = {
'mycanal': 'cplus', 'canalplus': 'cplus',
'piwiplus': 'teletoon', 'piwiplus': 'teletoon',
'd8': 'd8',
'c8': 'd8',
'd17': 'd17',
'cstar': 'd17',
'itele': 'itele',
} }
# Only works for direct mp4 URLs # Only works for direct mp4 URLs
_GEO_COUNTRIES = ['FR'] _GEO_COUNTRIES = ['FR']
_TESTS = [{ _TESTS = [{
'url': 'https://www.mycanal.fr/d17-emissions/lolywood/p/1397061', 'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814',
'info_dict': { 'info_dict': {
'id': '1397061', 'id': '1405510',
'display_id': 'lolywood', 'display_id': 'pid1830-c-zapping',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Euro 2016 : Je préfère te prévenir - Lolywood - Episode 34', 'title': 'Zapping - 02/07/2016',
'description': 'md5:7d97039d455cb29cdba0d652a0efaa5e', 'description': 'Le meilleur de toutes les chaînes, tous les jours',
'upload_date': '20160602', 'upload_date': '20160702',
}, },
}, { }, {
# geo restricted, bypassed # geo restricted, bypassed
@ -47,12 +70,64 @@ class CanalplusIE(InfoExtractor):
'upload_date': '20140724', 'upload_date': '20140724',
}, },
'expected_warnings': ['HTTP Error 403: Forbidden'], 'expected_warnings': ['HTTP Error 403: Forbidden'],
}, {
# geo restricted, bypassed
'url': 'http://www.c8.fr/c8-divertissement/ms-touche-pas-a-mon-poste/pid6318-videos-integrales.html?vid=1443684',
'md5': 'bb6f9f343296ab7ebd88c97b660ecf8d',
'info_dict': {
'id': '1443684',
'display_id': 'pid6318-videos-integrales',
'ext': 'mp4',
'title': 'Guess my iep ! - TPMP - 07/04/2017',
'description': 'md5:6f005933f6e06760a9236d9b3b5f17fa',
'upload_date': '20170407',
},
'expected_warnings': ['HTTP Error 403: Forbidden'],
}, {
'url': 'http://www.itele.fr/chroniques/invite-michael-darmon/rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'info_dict': {
'id': '1420176',
'display_id': 'rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'ext': 'mp4',
'title': 'L\'invité de Michaël Darmon du 14/10/2016 - ',
'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.',
'upload_date': '20161014',
},
}, {
'url': 'http://football.cstar.fr/cstar-minisite-foot/pid7566-feminines-videos.html?vid=1416769',
'info_dict': {
'id': '1416769',
'display_id': 'pid7566-feminines-videos',
'ext': 'mp4',
'title': 'France - Albanie : les temps forts de la soirée - 20/09/2016',
'description': 'md5:c3f30f2aaac294c1c969b3294de6904e',
'upload_date': '20160921',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://m.canalplus.fr/?vid=1398231',
'only_matching': True,
}, {
'url': 'http://www.d17.tv/emissions/pid8303-lolywood.html?vid=1397061',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
site, display_id, video_id = re.match(self._VALID_URL, url).groups() mobj = re.match(self._VALID_URL, url)
site_id = self._SITE_ID_MAP[site] site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]]
# Beware, some subclasses do not define an id group
display_id = remove_end(dict_get(mobj.groupdict(), ('display_id', 'id', 'vid')), '.html')
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
[r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)',
r'id=["\']canal_video_player(?P<id>\d+)',
r'data-video=["\'](?P<id>\d+)'],
webpage, 'video id', default=mobj.group('vid'), group='id')
info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id) info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
video_data = self._download_json(info_url, video_id, 'Downloading video JSON') video_data = self._download_json(info_url, video_id, 'Downloading video JSON')
@ -86,7 +161,7 @@ class CanalplusIE(InfoExtractor):
format_url + '?hdcore=2.11.3', video_id, f4m_id=format_id, fatal=False)) format_url + '?hdcore=2.11.3', video_id, f4m_id=format_id, fatal=False))
else: else:
formats.append({ formats.append({
# the secret extracted from ya function in http://player.canalplus.fr/common/js/canalPlayer.js # the secret extracted ya function in http://player.canalplus.fr/common/js/canalPlayer.js
'url': format_url + '?secret=pqzerjlsmdkjfoiuerhsdlfknaes', 'url': format_url + '?secret=pqzerjlsmdkjfoiuerhsdlfknaes',
'format_id': format_id, 'format_id': format_id,
'preference': preference(format_id), 'preference': preference(format_id),

View File

@ -75,10 +75,10 @@ class CBSInteractiveIE(CBSIE):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex( data_json = self._html_search_regex(
r"data(?:-(?:cnet|zdnet))?-video(?:-(?:uvp(?:js)?|player))?-options='([^']+)'", r"data-(?:cnet|zdnet)-video(?:-uvp(?:js)?)?-options='([^']+)'",
webpage, 'data json') webpage, 'data json')
data = self._parse_json(data_json, display_id) data = self._parse_json(data_json, display_id)
vdata = data.get('video') or (data.get('videos') or data.get('playlist'))[0] vdata = data.get('video') or data['videos'][0]
video_id = vdata['mpxRefId'] video_id = vdata['mpxRefId']

View File

@ -1,11 +1,11 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import base64
import json import json
from .common import InfoExtractor from .common import InfoExtractor
from .youtube import YoutubeIE from .youtube import YoutubeIE
from ..compat import compat_b64decode
from ..utils import ( from ..utils import (
clean_html, clean_html,
ExtractorError ExtractorError
@ -58,7 +58,7 @@ class ChilloutzoneIE(InfoExtractor):
base64_video_info = self._html_search_regex( base64_video_info = self._html_search_regex(
r'var cozVidData = "(.+?)";', webpage, 'video data') r'var cozVidData = "(.+?)";', webpage, 'video data')
decoded_video_info = compat_b64decode(base64_video_info).decode('utf-8') decoded_video_info = base64.b64decode(base64_video_info.encode('utf-8')).decode('utf-8')
video_info_dict = json.loads(decoded_video_info) video_info_dict = json.loads(decoded_video_info)
# get video information from dict # get video information from dict

View File

@ -1,10 +1,10 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_b64decode
from ..utils import parse_duration from ..utils import parse_duration
@ -44,7 +44,8 @@ class ChirbitIE(InfoExtractor):
# Reverse engineered from https://chirb.it/js/chirbit.player.js (look # Reverse engineered from https://chirb.it/js/chirbit.player.js (look
# for soundURL) # for soundURL)
audio_url = compat_b64decode(data_fd[::-1]).decode('utf-8') audio_url = base64.b64decode(
data_fd[::-1].encode('ascii')).decode('utf-8')
title = self._search_regex( title = self._search_regex(
r'class=["\']chirbit-title["\'][^>]*>([^<]+)', webpage, 'title') r'class=["\']chirbit-title["\'][^>]*>([^<]+)', webpage, 'title')

View File

@ -174,8 +174,6 @@ class InfoExtractor(object):
width : height ratio as float. width : height ratio as float.
* no_resume The server does not support resuming the * no_resume The server does not support resuming the
(HTTP or RTMP) download. Boolean. (HTTP or RTMP) download. Boolean.
* downloader_options A dictionary of downloader options as
described in FileDownloader
url: Final video URL. url: Final video URL.
ext: Video filename extension. ext: Video filename extension.
@ -1029,7 +1027,7 @@ class InfoExtractor(object):
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries') part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'): if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
info['series'] = unescapeHTML(part_of_series.get('name')) info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type in ('Article', 'NewsArticle'): elif item_type == 'Article':
info.update({ info.update({
'timestamp': parse_iso8601(e.get('datePublished')), 'timestamp': parse_iso8601(e.get('datePublished')),
'title': unescapeHTML(e.get('headline')), 'title': unescapeHTML(e.get('headline')),
@ -2406,7 +2404,7 @@ class InfoExtractor(object):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', entry_protocol='m3u8_native', source_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=m3u8_id, fatal=False)) m3u8_id=m3u8_id, fatal=False))
elif source_type == 'dash' or ext == 'mpd': elif ext == 'mpd':
formats.extend(self._extract_mpd_formats( formats.extend(self._extract_mpd_formats(
source_url, video_id, mpd_id=mpd_id, fatal=False)) source_url, video_id, mpd_id=mpd_id, fatal=False))
elif ext == 'smil': elif ext == 'smil':

View File

@ -3,13 +3,13 @@ from __future__ import unicode_literals
import re import re
import json import json
import base64
import zlib import zlib
from hashlib import sha1 from hashlib import sha1
from math import pow, sqrt, floor from math import pow, sqrt, floor
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_b64decode,
compat_etree_fromstring, compat_etree_fromstring,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_request, compat_urllib_request,
@ -272,8 +272,8 @@ class CrunchyrollIE(CrunchyrollBaseIE):
} }
def _decrypt_subtitles(self, data, iv, id): def _decrypt_subtitles(self, data, iv, id):
data = bytes_to_intlist(compat_b64decode(data)) data = bytes_to_intlist(base64.b64decode(data.encode('utf-8')))
iv = bytes_to_intlist(compat_b64decode(iv)) iv = bytes_to_intlist(base64.b64decode(iv.encode('utf-8')))
id = int(id) id = int(id)
def obfuscate_key_aux(count, modulo, start): def obfuscate_key_aux(count, modulo, start):

View File

@ -10,7 +10,6 @@ from ..aes import (
aes_cbc_decrypt, aes_cbc_decrypt,
aes_cbc_encrypt, aes_cbc_encrypt,
) )
from ..compat import compat_b64decode
from ..utils import ( from ..utils import (
bytes_to_intlist, bytes_to_intlist,
bytes_to_long, bytes_to_long,
@ -94,7 +93,7 @@ class DaisukiMottoIE(InfoExtractor):
rtn = self._parse_json( rtn = self._parse_json(
intlist_to_bytes(aes_cbc_decrypt(bytes_to_intlist( intlist_to_bytes(aes_cbc_decrypt(bytes_to_intlist(
compat_b64decode(encrypted_rtn)), base64.b64decode(encrypted_rtn)),
aes_key, iv)).decode('utf-8').rstrip('\0'), aes_key, iv)).decode('utf-8').rstrip('\0'),
video_id) video_id)

View File

@ -1,56 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import js_to_json
class DiggIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?digg\.com/video/(?P<id>[^/?#&]+)'
_TESTS = [{
# JWPlatform via provider
'url': 'http://digg.com/video/sci-fi-short-jonah-daniel-kaluuya-get-out',
'info_dict': {
'id': 'LcqvmS0b',
'ext': 'mp4',
'title': "'Get Out' Star Daniel Kaluuya Goes On 'Moby Dick'-Like Journey In Sci-Fi Short 'Jonah'",
'description': 'md5:541bb847648b6ee3d6514bc84b82efda',
'upload_date': '20180109',
'timestamp': 1515530551,
},
'params': {
'skip_download': True,
},
}, {
# Youtube via provider
'url': 'http://digg.com/video/dog-boat-seal-play',
'only_matching': True,
}, {
# vimeo as regular embed
'url': 'http://digg.com/video/dream-girl-short-film',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
info = self._parse_json(
self._search_regex(
r'(?s)video_info\s*=\s*({.+?});\n', webpage, 'video info',
default='{}'), display_id, transform_source=js_to_json,
fatal=False)
video_id = info.get('video_id')
if video_id:
provider = info.get('provider_name')
if provider == 'youtube':
return self.url_result(
video_id, ie='Youtube', video_id=video_id)
elif provider == 'jwplayer':
return self.url_result(
'jwplatform:%s' % video_id, ie='JWPlatform',
video_id=video_id)
return self.url_result(url, 'Generic')

View File

@ -12,28 +12,25 @@ from ..compat import (
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
float_or_none,
int_or_none, int_or_none,
remove_end, remove_end,
try_get, try_get,
unified_strdate, unified_strdate,
unified_timestamp,
update_url_query, update_url_query,
USER_AGENTS, USER_AGENTS,
) )
class DPlayIE(InfoExtractor): class DPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?P<domain>www\.(?P<host>dplay\.(?P<country>dk|se|no)))/(?:video(?:er|s)/)?(?P<id>[^/]+/[^/?#]+)' _VALID_URL = r'https?://(?P<domain>www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
# non geo restricted, via secure api, unsigned download hls URL # non geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/', 'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
'info_dict': { 'info_dict': {
'id': '3172', 'id': '3172',
'display_id': 'nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet', 'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Svensken lär sig njuta av livet', 'title': 'Svensken lär sig njuta av livet',
'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8', 'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
@ -51,7 +48,7 @@ class DPlayIE(InfoExtractor):
'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/', 'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
'info_dict': { 'info_dict': {
'id': '70816', 'id': '70816',
'display_id': 'mig-og-min-mor/season-6-episode-12', 'display_id': 'season-6-episode-12',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Episode 12', 'title': 'Episode 12',
'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90', 'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
@ -68,33 +65,6 @@ class DPlayIE(InfoExtractor):
# geo restricted, via direct unsigned hls URL # geo restricted, via direct unsigned hls URL
'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/', 'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
'only_matching': True, 'only_matching': True,
}, {
# disco-api
'url': 'https://www.dplay.no/videoer/i-kongens-klr/sesong-1-episode-7',
'info_dict': {
'id': '40206',
'display_id': 'i-kongens-klr/sesong-1-episode-7',
'ext': 'mp4',
'title': 'Episode 7',
'description': 'md5:e3e1411b2b9aebeea36a6ec5d50c60cf',
'duration': 2611.16,
'timestamp': 1516726800,
'upload_date': '20180123',
'series': 'I kongens klær',
'season_number': 1,
'episode_number': 7,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
'url': 'https://www.dplay.dk/videoer/singleliv/season-5-episode-3',
'only_matching': True,
}, {
'url': 'https://www.dplay.se/videos/sofias-anglar/sofias-anglar-1001',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -102,81 +72,10 @@ class DPlayIE(InfoExtractor):
display_id = mobj.group('id') display_id = mobj.group('id')
domain = mobj.group('domain') domain = mobj.group('domain')
self._initialize_geo_bypass([mobj.group('country').upper()])
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( video_id = self._search_regex(
r'data-video-id=["\'](\d+)', webpage, 'video id', default=None) r'data-video-id=["\'](\d+)', webpage, 'video id')
if not video_id:
host = mobj.group('host')
disco_base = 'https://disco-api.%s' % host
self._download_json(
'%s/token' % disco_base, display_id, 'Downloading token',
query={
'realm': host.replace('.', ''),
})
video = self._download_json(
'%s/content/videos/%s' % (disco_base, display_id), display_id,
headers={
'Referer': url,
'x-disco-client': 'WEB:UNKNOWN:dplay-client:0.0.1',
}, query={
'include': 'show'
})
video_id = video['data']['id']
info = video['data']['attributes']
title = info['name']
formats = []
for format_id, format_dict in self._download_json(
'%s/playback/videoPlaybackInfo/%s' % (disco_base, video_id),
display_id)['data']['attributes']['streaming'].items():
if not isinstance(format_dict, dict):
continue
format_url = format_dict.get('url')
if not format_url:
continue
ext = determine_ext(format_url)
if format_id == 'dash' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, display_id, mpd_id='dash', fatal=False))
elif format_id == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
else:
formats.append({
'url': format_url,
'format_id': format_id,
})
self._sort_formats(formats)
series = None
try:
included = video.get('included')
if isinstance(included, list):
show = next(e for e in included if e.get('type') == 'show')
series = try_get(
show, lambda x: x['attributes']['name'], compat_str)
except StopIteration:
pass
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': info.get('description'),
'duration': float_or_none(
info.get('videoDuration'), scale=1000),
'timestamp': unified_timestamp(info.get('publishStart')),
'series': series,
'season_number': int_or_none(info.get('seasonNumber')),
'episode_number': int_or_none(info.get('episodeNumber')),
'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats,
}
info = self._download_json( info = self._download_json(
'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id), 'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id),

View File

@ -1,10 +1,10 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_b64decode
from ..utils import ( from ..utils import (
qualities, qualities,
sanitized_Request, sanitized_Request,
@ -42,7 +42,7 @@ class DumpertIE(InfoExtractor):
r'data-files="([^"]+)"', webpage, 'data files') r'data-files="([^"]+)"', webpage, 'data files')
files = self._parse_json( files = self._parse_json(
compat_b64decode(files_base64).decode('utf-8'), base64.b64decode(files_base64.encode('utf-8')).decode('utf-8'),
video_id) video_id)
quality = qualities(['flv', 'mobile', 'tablet', '720p']) quality = qualities(['flv', 'mobile', 'tablet', '720p'])

View File

@ -1,13 +1,13 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import json import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_b64decode,
compat_str,
compat_urlparse, compat_urlparse,
compat_str,
) )
from ..utils import ( from ..utils import (
extract_attributes, extract_attributes,
@ -36,9 +36,9 @@ class EinthusanIE(InfoExtractor):
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js # reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
def _decrypt(self, encrypted_data, video_id): def _decrypt(self, encrypted_data, video_id):
return self._parse_json(compat_b64decode(( return self._parse_json(base64.b64decode((
encrypted_data[:10] + encrypted_data[-1] + encrypted_data[12:-1] encrypted_data[:10] + encrypted_data[-1] + encrypted_data[12:-1]
)).decode('utf-8'), video_id) ).encode('ascii')).decode('utf-8'), video_id)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)

View File

@ -259,7 +259,6 @@ from .deezer import DeezerPlaylistIE
from .democracynow import DemocracynowIE from .democracynow import DemocracynowIE
from .dfb import DFBIE from .dfb import DFBIE
from .dhm import DHMIE from .dhm import DHMIE
from .digg import DiggIE
from .dotsub import DotsubIE from .dotsub import DotsubIE
from .douyutv import ( from .douyutv import (
DouyuShowIE, DouyuShowIE,
@ -490,6 +489,7 @@ from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE from .jpopsukitv import JpopsukiIE
from .kakao import KakaoIE from .kakao import KakaoIE
from .kaltura import KalturaIE from .kaltura import KalturaIE
from .kamcord import KamcordIE
from .kanalplay import KanalPlayIE from .kanalplay import KanalPlayIE
from .kankan import KankanIE from .kankan import KankanIE
from .karaoketv import KaraoketvIE from .karaoketv import KaraoketvIE
@ -881,6 +881,7 @@ from .revision3 import (
Revision3IE, Revision3IE,
) )
from .rice import RICEIE from .rice import RICEIE
from .ringtv import RingTVIE
from .rmcdecouverte import RMCDecouverteIE from .rmcdecouverte import RMCDecouverteIE
from .ro220 import Ro220IE from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE from .rockstargames import RockstarGamesIE
@ -900,7 +901,6 @@ from .rtp import RTPIE
from .rts import RTSIE from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
from .rtvnh import RTVNHIE from .rtvnh import RTVNHIE
from .rtvs import RTVSIE
from .rudo import RudoIE from .rudo import RudoIE
from .ruhd import RUHDIE from .ruhd import RUHDIE
from .ruleporn import RulePornIE from .ruleporn import RulePornIE
@ -933,10 +933,6 @@ from .servingsys import ServingSysIE
from .servus import ServusIE from .servus import ServusIE
from .sevenplus import SevenPlusIE from .sevenplus import SevenPlusIE
from .sexu import SexuIE from .sexu import SexuIE
from .seznamzpravy import (
SeznamZpravyIE,
SeznamZpravyArticleIE,
)
from .shahid import ( from .shahid import (
ShahidIE, ShahidIE,
ShahidShowIE, ShahidShowIE,
@ -994,7 +990,7 @@ from .stitcher import StitcherIE
from .sport5 import Sport5IE from .sport5 import Sport5IE
from .sportbox import SportBoxEmbedIE from .sportbox import SportBoxEmbedIE
from .sportdeutschland import SportDeutschlandIE from .sportdeutschland import SportDeutschlandIE
from .springboardplatform import SpringboardPlatformIE from .sportschau import SportschauIE
from .sprout import SproutIE from .sprout import SproutIE
from .srgssr import ( from .srgssr import (
SRGSSRIE, SRGSSRIE,
@ -1050,6 +1046,7 @@ from .theplatform import (
ThePlatformFeedIE, ThePlatformFeedIE,
) )
from .thescene import TheSceneIE from .thescene import TheSceneIE
from .thesixtyone import TheSixtyOneIE
from .thestar import TheStarIE from .thestar import TheStarIE
from .thesun import TheSunIE from .thesun import TheSunIE
from .theweatherchannel import TheWeatherChannelIE from .theweatherchannel import TheWeatherChannelIE
@ -1071,6 +1068,7 @@ from .tnaflix import (
from .toggle import ToggleIE from .toggle import ToggleIE
from .tonline import TOnlineIE from .tonline import TOnlineIE
from .toongoggles import ToonGogglesIE from .toongoggles import ToonGogglesIE
from .totalwebcasting import TotalWebCastingIE
from .toutv import TouTvIE from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE from .traileraddict import TrailerAddictIE
@ -1291,8 +1289,6 @@ from .watchbox import WatchBoxIE
from .watchindianporn import WatchIndianPornIE from .watchindianporn import WatchIndianPornIE
from .wdr import ( from .wdr import (
WDRIE, WDRIE,
WDRPageIE,
WDRElefantIE,
WDRMobileIE, WDRMobileIE,
) )
from .webcaster import ( from .webcaster import (
@ -1303,10 +1299,6 @@ from .webofstories import (
WebOfStoriesIE, WebOfStoriesIE,
WebOfStoriesPlaylistIE, WebOfStoriesPlaylistIE,
) )
from .weibo import (
WeiboIE,
WeiboMobileIE
)
from .weiqitv import WeiqiTVIE from .weiqitv import WeiqiTVIE
from .wimp import WimpIE from .wimp import WimpIE
from .wistia import WistiaIE from .wistia import WistiaIE
@ -1332,10 +1324,6 @@ from .xiami import (
XiamiArtistIE, XiamiArtistIE,
XiamiCollectionIE XiamiCollectionIE
) )
from .ximalaya import (
XimalayaIE,
XimalayaAlbumIE
)
from .xminus import XMinusIE from .xminus import XMinusIE
from .xnxx import XNXXIE from .xnxx import XNXXIE
from .xstream import XstreamIE from .xstream import XstreamIE

View File

@ -33,7 +33,7 @@ class FranceInterIE(InfoExtractor):
description = self._og_search_description(webpage) description = self._og_search_description(webpage)
upload_date_str = self._search_regex( upload_date_str = self._search_regex(
r'class=["\']\s*cover-emission-period\s*["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<', r'class=["\']cover-emission-period["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<',
webpage, 'upload date', fatal=False) webpage, 'upload date', fatal=False)
if upload_date_str: if upload_date_str:
upload_date_list = upload_date_str.split() upload_date_list = upload_date_str.split()

View File

@ -1,8 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
@ -11,52 +9,44 @@ from ..utils import (
class GameStarIE(InfoExtractor): class GameStarIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?game(?P<site>pro|star)\.de/videos/.*,(?P<id>[0-9]+)\.html' _VALID_URL = r'https?://(?:www\.)?gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
_TESTS = [{ _TEST = {
'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html', 'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html',
'md5': 'ee782f1f8050448c95c5cacd63bc851c', 'md5': '96974ecbb7fd8d0d20fca5a00810cea7',
'info_dict': { 'info_dict': {
'id': '76110', 'id': '76110',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil', 'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil',
'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den...', 'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den...',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1406542380, 'timestamp': 1406542020,
'upload_date': '20140728', 'upload_date': '20140728',
'duration': 17, 'duration': 17
} }
}, { }
'url': 'http://www.gamepro.de/videos/top-10-indie-spiele-fuer-nintendo-switch-video-tolle-nindies-games-zum-download,95316.html',
'only_matching': True,
}, {
'url': 'http://www.gamestar.de/videos/top-10-indie-spiele-fuer-nintendo-switch-video-tolle-nindies-games-zum-download,95316.html',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
site = mobj.group('site')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
url = 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id
# TODO: there are multiple ld+json objects in the webpage, # TODO: there are multiple ld+json objects in the webpage,
# while _search_json_ld finds only the first one # while _search_json_ld finds only the first one
json_ld = self._parse_json(self._search_regex( json_ld = self._parse_json(self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>[^<]+VideoObject[^<]+)</script>', r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>[^<]+VideoObject[^<]+)</script>',
webpage, 'JSON-LD', group='json_ld'), video_id) webpage, 'JSON-LD', group='json_ld'), video_id)
info_dict = self._json_ld(json_ld, video_id) info_dict = self._json_ld(json_ld, video_id)
info_dict['title'] = remove_end( info_dict['title'] = remove_end(info_dict['title'], ' - GameStar')
info_dict['title'], ' - Game%s' % site.title())
view_count = int_or_none(json_ld.get('interactionCount')) view_count = json_ld.get('interactionCount')
comment_count = int_or_none(self._html_search_regex( comment_count = int_or_none(self._html_search_regex(
r'<span>Kommentare</span>\s*<span[^>]+class=["\']count[^>]+>\s*\(\s*([0-9]+)', r'([0-9]+) Kommentare</span>', webpage, 'comment_count',
webpage, 'comment count', fatal=False)) fatal=False))
info_dict.update({ info_dict.update({
'id': video_id, 'id': video_id,
'url': 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id, 'url': url,
'ext': 'mp4', 'ext': 'mp4',
'view_count': view_count, 'view_count': view_count,
'comment_count': comment_count 'comment_count': comment_count

View File

@ -101,7 +101,6 @@ from .vzaar import VzaarIE
from .channel9 import Channel9IE from .channel9 import Channel9IE
from .vshare import VShareIE from .vshare import VShareIE
from .mediasite import MediasiteIE from .mediasite import MediasiteIE
from .springboardplatform import SpringboardPlatformIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@ -1939,21 +1938,6 @@ class GenericIE(InfoExtractor):
'timestamp': 1474354800, 'timestamp': 1474354800,
'upload_date': '20160920', 'upload_date': '20160920',
} }
},
{
'url': 'http://www.kidzworld.com/article/30935-trolls-the-beat-goes-on-interview-skylar-astin-and-amanda-leighton',
'info_dict': {
'id': '1731611',
'ext': 'mp4',
'title': 'Official Trailer | TROLLS: THE BEAT GOES ON!',
'description': 'md5:eb5f23826a027ba95277d105f248b825',
'timestamp': 1516100691,
'upload_date': '20180116',
},
'params': {
'skip_download': True,
},
'add_ie': [SpringboardPlatformIE.ie_key()],
} }
# { # {
# # TODO: find another test # # TODO: find another test
@ -2280,10 +2264,7 @@ class GenericIE(InfoExtractor):
# Look for Brightcove New Studio embeds # Look for Brightcove New Studio embeds
bc_urls = BrightcoveNewIE._extract_urls(self, webpage) bc_urls = BrightcoveNewIE._extract_urls(self, webpage)
if bc_urls: if bc_urls:
return self.playlist_from_matches( return self.playlist_from_matches(bc_urls, video_id, video_title, ie='BrightcoveNew')
bc_urls, video_id, video_title,
getter=lambda x: smuggle_url(x, {'referrer': url}),
ie='BrightcoveNew')
# Look for Nexx embeds # Look for Nexx embeds
nexx_urls = NexxIE._extract_urls(webpage) nexx_urls = NexxIE._extract_urls(webpage)
@ -2727,9 +2708,9 @@ class GenericIE(InfoExtractor):
return self.url_result(viewlift_url) return self.url_result(viewlift_url)
# Look for JWPlatform embeds # Look for JWPlatform embeds
jwplatform_urls = JWPlatformIE._extract_urls(webpage) jwplatform_url = JWPlatformIE._extract_url(webpage)
if jwplatform_urls: if jwplatform_url:
return self.playlist_from_matches(jwplatform_urls, video_id, video_title, ie=JWPlatformIE.ie_key()) return self.url_result(jwplatform_url, 'JWPlatform')
# Look for Digiteka embeds # Look for Digiteka embeds
digiteka_url = DigitekaIE._extract_url(webpage) digiteka_url = DigitekaIE._extract_url(webpage)
@ -2925,12 +2906,6 @@ class GenericIE(InfoExtractor):
for mediasite_url in mediasite_urls] for mediasite_url in mediasite_urls]
return self.playlist_result(entries, video_id, video_title) return self.playlist_result(entries, video_id, video_title)
springboardplatform_urls = SpringboardPlatformIE._extract_urls(webpage)
if springboardplatform_urls:
return self.playlist_from_matches(
springboardplatform_urls, video_id, video_title,
ie=SpringboardPlatformIE.ie_key())
def merge_dicts(dict1, dict2): def merge_dicts(dict1, dict2):
merged = {} merged = {}
for k, v in dict1.items(): for k, v in dict1.items():

View File

@ -1,7 +1,8 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_b64decode
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
HEADRequest, HEADRequest,
@ -47,7 +48,7 @@ class HotNewHipHopIE(InfoExtractor):
if 'mediaKey' not in mkd: if 'mediaKey' not in mkd:
raise ExtractorError('Did not get a media key') raise ExtractorError('Did not get a media key')
redirect_url = compat_b64decode(video_url_base64).decode('utf-8') redirect_url = base64.b64decode(video_url_base64).decode('utf-8')
redirect_req = HEADRequest(redirect_url) redirect_req = HEADRequest(redirect_url)
req = self._request_webpage( req = self._request_webpage(
redirect_req, video_id, redirect_req, video_id,

View File

@ -2,8 +2,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
from ..compat import ( from ..compat import (
compat_b64decode,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urlparse, compat_urlparse,
) )
@ -60,7 +61,7 @@ class InfoQIE(BokeCCBaseIE):
encoded_id = self._search_regex( encoded_id = self._search_regex(
r"jsclassref\s*=\s*'([^']*)'", webpage, 'encoded id', default=None) r"jsclassref\s*=\s*'([^']*)'", webpage, 'encoded id', default=None)
real_id = compat_urllib_parse_unquote(compat_b64decode(encoded_id).decode('utf-8')) real_id = compat_urllib_parse_unquote(base64.b64decode(encoded_id.encode('ascii')).decode('utf-8'))
playpath = 'mp4:' + real_id playpath = 'mp4:' + real_id
return [{ return [{

View File

@ -23,14 +23,11 @@ class JWPlatformIE(InfoExtractor):
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
urls = JWPlatformIE._extract_urls(webpage) mobj = re.search(
return urls[0] if urls else None r'<(?:script|iframe)[^>]+?src=["\'](?P<url>(?:https?:)?//content.jwplatform.com/players/[a-zA-Z0-9]{8})',
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//content\.jwplatform\.com/players/[a-zA-Z0-9]{8})',
webpage) webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)

View File

@ -0,0 +1,71 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
qualities,
)
class KamcordIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?kamcord\.com/v/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.kamcord.com/v/hNYRduDgWb4',
'md5': 'c3180e8a9cfac2e86e1b88cb8751b54c',
'info_dict': {
'id': 'hNYRduDgWb4',
'ext': 'mp4',
'title': 'Drinking Madness',
'uploader': 'jacksfilms',
'uploader_id': '3044562',
'view_count': int,
'like_count': int,
'comment_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video = self._parse_json(
self._search_regex(
r'window\.__props\s*=\s*({.+?});?(?:\n|\s*</script)',
webpage, 'video'),
video_id)['video']
title = video['title']
formats = self._extract_m3u8_formats(
video['play']['hls'], video_id, 'mp4', entry_protocol='m3u8_native')
self._sort_formats(formats)
uploader = video.get('user', {}).get('username')
uploader_id = video.get('user', {}).get('id')
view_count = int_or_none(video.get('viewCount'))
like_count = int_or_none(video.get('heartCount'))
comment_count = int_or_none(video.get('messageCount'))
preference_key = qualities(('small', 'medium', 'large'))
thumbnails = [{
'url': thumbnail_url,
'id': thumbnail_id,
'preference': preference_key(thumbnail_id),
} for thumbnail_id, thumbnail_url in (video.get('thumbnail') or {}).items()
if isinstance(thumbnail_id, compat_str) and isinstance(thumbnail_url, compat_str)]
return {
'id': video_id,
'title': title,
'uploader': uploader,
'uploader_id': uploader_id,
'view_count': view_count,
'like_count': like_count,
'comment_count': comment_count,
'thumbnails': thumbnails,
'formats': formats,
}

View File

@ -1,6 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import datetime import datetime
import hashlib import hashlib
import re import re
@ -8,7 +9,6 @@ import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_b64decode,
compat_ord, compat_ord,
compat_str, compat_str,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
@ -329,7 +329,7 @@ class LetvCloudIE(InfoExtractor):
raise ExtractorError('Letv cloud returned an unknwon error') raise ExtractorError('Letv cloud returned an unknwon error')
def b64decode(s): def b64decode(s):
return compat_b64decode(s).decode('utf-8') return base64.b64decode(s.encode('utf-8')).decode('utf-8')
formats = [] formats = []
for media in play_json['data']['video_info']['media'].values(): for media in play_json['data']['video_info']['media'].values():

View File

@ -10,7 +10,6 @@ from ..utils import (
float_or_none, float_or_none,
int_or_none, int_or_none,
smuggle_url, smuggle_url,
try_get,
unsmuggle_url, unsmuggle_url,
ExtractorError, ExtractorError,
) )
@ -221,12 +220,6 @@ class LimelightBaseIE(InfoExtractor):
'subtitles': subtitles, 'subtitles': subtitles,
} }
def _extract_info_helper(self, pc, mobile, i, metadata):
return self._extract_info(
try_get(pc, lambda x: x['playlistItems'][i]['streams'], list) or [],
try_get(mobile, lambda x: x['mediaList'][i]['mobileUrls'], list) or [],
metadata)
class LimelightMediaIE(LimelightBaseIE): class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight' IE_NAME = 'limelight'
@ -289,7 +282,10 @@ class LimelightMediaIE(LimelightBaseIE):
'getMobilePlaylistByMediaId', 'properties', 'getMobilePlaylistByMediaId', 'properties',
smuggled_data.get('source_url')) smuggled_data.get('source_url'))
return self._extract_info_helper(pc, mobile, 0, metadata) return self._extract_info(
pc['playlistItems'][0].get('streams', []),
mobile['mediaList'][0].get('mobileUrls', []) if mobile else [],
metadata)
class LimelightChannelIE(LimelightBaseIE): class LimelightChannelIE(LimelightBaseIE):
@ -330,7 +326,10 @@ class LimelightChannelIE(LimelightBaseIE):
'media', smuggled_data.get('source_url')) 'media', smuggled_data.get('source_url'))
entries = [ entries = [
self._extract_info_helper(pc, mobile, i, medias['media_list'][i]) self._extract_info(
pc['playlistItems'][i].get('streams', []),
mobile['mediaList'][i].get('mobileUrls', []) if mobile else [],
medias['media_list'][i])
for i in range(len(medias['media_list']))] for i in range(len(medias['media_list']))]
return self.playlist_result(entries, channel_id, pc['title']) return self.playlist_result(entries, channel_id, pc['title'])

View File

@ -1,12 +1,13 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_urllib_parse_unquote
compat_b64decode, from ..utils import (
compat_urllib_parse_unquote, int_or_none,
) )
from ..utils import int_or_none
class MangomoloBaseIE(InfoExtractor): class MangomoloBaseIE(InfoExtractor):
@ -50,4 +51,4 @@ class MangomoloLiveIE(MangomoloBaseIE):
_IS_LIVE = True _IS_LIVE = True
def _get_real_id(self, page_id): def _get_real_id(self, page_id):
return compat_b64decode(compat_urllib_parse_unquote(page_id)).decode() return base64.b64decode(compat_urllib_parse_unquote(page_id).encode()).decode()

View File

@ -1,12 +1,12 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import functools import functools
import itertools import itertools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_b64decode,
compat_chr, compat_chr,
compat_ord, compat_ord,
compat_str, compat_str,
@ -79,7 +79,7 @@ class MixcloudIE(InfoExtractor):
if encrypted_play_info is not None: if encrypted_play_info is not None:
# Decode # Decode
encrypted_play_info = compat_b64decode(encrypted_play_info) encrypted_play_info = base64.b64decode(encrypted_play_info)
else: else:
# New path # New path
full_info_json = self._parse_json(self._html_search_regex( full_info_json = self._parse_json(self._html_search_regex(
@ -109,7 +109,7 @@ class MixcloudIE(InfoExtractor):
kpa_target = encrypted_play_info kpa_target = encrypted_play_info
else: else:
kps = ['https://', 'http://'] kps = ['https://', 'http://']
kpa_target = compat_b64decode(info_json['streamInfo']['url']) kpa_target = base64.b64decode(info_json['streamInfo']['url'])
for kp in kps: for kp in kps:
partial_key = self._decrypt_xor_cipher(kpa_target, kp) partial_key = self._decrypt_xor_cipher(kpa_target, kp)
for quote in ["'", '"']: for quote in ["'", '"']:
@ -165,7 +165,7 @@ class MixcloudIE(InfoExtractor):
format_url = stream_info.get(url_key) format_url = stream_info.get(url_key)
if not format_url: if not format_url:
continue continue
decrypted = self._decrypt_xor_cipher(key, compat_b64decode(format_url)) decrypted = self._decrypt_xor_cipher(key, base64.b64decode(format_url))
if not decrypted: if not decrypted:
continue continue
if url_key == 'hlsUrl': if url_key == 'hlsUrl':

View File

@ -68,7 +68,7 @@ class NationalGeographicVideoIE(InfoExtractor):
class NationalGeographicIE(ThePlatformIE, AdobePassIE): class NationalGeographicIE(ThePlatformIE, AdobePassIE):
IE_NAME = 'natgeo' IE_NAME = 'natgeo'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:(?:wild/)?[^/]+/)?(?:videos|episodes)/(?P<id>[^/?]+)' _VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/(?:videos|episodes)/(?P<id>[^/?]+)'
_TESTS = [ _TESTS = [
{ {
@ -102,10 +102,6 @@ class NationalGeographicIE(ThePlatformIE, AdobePassIE):
{ {
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episodes/the-power-of-miracles/', 'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episodes/the-power-of-miracles/',
'only_matching': True, 'only_matching': True,
},
{
'url': 'http://channel.nationalgeographic.com/videos/treasures-rediscovered/',
'only_matching': True,
} }
] ]

View File

@ -190,12 +190,10 @@ class NDREmbedBaseIE(InfoExtractor):
ext = determine_ext(src, None) ext = determine_ext(src, None)
if ext == 'f4m': if ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
src + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, src + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, f4m_id='hds'))
f4m_id='hds', fatal=False))
elif ext == 'm3u8': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', m3u8_id='hls', src, video_id, 'mp4', m3u8_id='hls', entry_protocol='m3u8_native'))
entry_protocol='m3u8_native', fatal=False))
else: else:
quality = f.get('quality') quality = f.get('quality')
ff = { ff = {

View File

@ -19,11 +19,11 @@ from ..utils import (
class OdnoklassnikiIE(InfoExtractor): class OdnoklassnikiIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|m|mobile)\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer|live)/(?P<id>[\d-]+)' _VALID_URL = r'https?://(?:(?:www|m|mobile)\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer)/(?P<id>[\d-]+)'
_TESTS = [{ _TESTS = [{
# metadata in JSON # metadata in JSON
'url': 'http://ok.ru/video/20079905452', 'url': 'http://ok.ru/video/20079905452',
'md5': '0b62089b479e06681abaaca9d204f152', 'md5': '6ba728d85d60aa2e6dd37c9e70fdc6bc',
'info_dict': { 'info_dict': {
'id': '20079905452', 'id': '20079905452',
'ext': 'mp4', 'ext': 'mp4',
@ -35,6 +35,7 @@ class OdnoklassnikiIE(InfoExtractor):
'like_count': int, 'like_count': int,
'age_limit': 0, 'age_limit': 0,
}, },
'skip': 'Video has been blocked',
}, { }, {
# metadataUrl # metadataUrl
'url': 'http://ok.ru/video/63567059965189-0?fromTime=5', 'url': 'http://ok.ru/video/63567059965189-0?fromTime=5',
@ -98,9 +99,6 @@ class OdnoklassnikiIE(InfoExtractor):
}, { }, {
'url': 'http://mobile.ok.ru/video/20079905452', 'url': 'http://mobile.ok.ru/video/20079905452',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.ok.ru/live/484531969818',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -186,10 +184,6 @@ class OdnoklassnikiIE(InfoExtractor):
}) })
return info return info
assert title
if provider == 'LIVE_TV_APP':
info['title'] = self._live_title(title)
quality = qualities(('4', '0', '1', '2', '3', '5')) quality = qualities(('4', '0', '1', '2', '3', '5'))
formats = [{ formats = [{
@ -216,20 +210,6 @@ class OdnoklassnikiIE(InfoExtractor):
if fmt_type: if fmt_type:
fmt['quality'] = quality(fmt_type) fmt['quality'] = quality(fmt_type)
# Live formats
m3u8_url = metadata.get('hlsMasterPlaylistUrl')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8',
m3u8_id='hls', fatal=False))
rtmp_url = metadata.get('rtmpUrl')
if rtmp_url:
formats.append({
'url': rtmp_url,
'format_id': 'rtmp',
'ext': 'flv',
})
self._sort_formats(formats) self._sort_formats(formats)
info['formats'] = formats info['formats'] = formats

View File

@ -1,13 +1,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import base64
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_str
compat_b64decode,
compat_str,
compat_urllib_parse_urlencode,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
@ -16,6 +12,7 @@ from ..utils import (
try_get, try_get,
unsmuggle_url, unsmuggle_url,
) )
from ..compat import compat_urllib_parse_urlencode
class OoyalaBaseIE(InfoExtractor): class OoyalaBaseIE(InfoExtractor):
@ -47,7 +44,7 @@ class OoyalaBaseIE(InfoExtractor):
url_data = try_get(stream, lambda x: x['url']['data'], compat_str) url_data = try_get(stream, lambda x: x['url']['data'], compat_str)
if not url_data: if not url_data:
continue continue
s_url = compat_b64decode(url_data).decode('utf-8') s_url = base64.b64decode(url_data.encode('ascii')).decode('utf-8')
if not s_url or s_url in urls: if not s_url or s_url in urls:
continue continue
urls.append(s_url) urls.append(s_url)

View File

@ -1,8 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_str, compat_str,
@ -20,14 +18,7 @@ from ..utils import (
class PandoraTVIE(InfoExtractor): class PandoraTVIE(InfoExtractor):
IE_NAME = 'pandora.tv' IE_NAME = 'pandora.tv'
IE_DESC = '판도라TV' IE_DESC = '판도라TV'
_VALID_URL = r'''(?x) _VALID_URL = r'https?://(?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?'
https?://
(?:
(?:www\.)?pandora\.tv/view/(?P<user_id>[^/]+)/(?P<id>\d+)| # new format
(?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?| # old format
m\.pandora\.tv/?\? # mobile
)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://jp.channel.pandora.tv/channel/video.ptv?c1=&prgid=53294230&ch_userid=mikakim&ref=main&lot=cate_01_2', 'url': 'http://jp.channel.pandora.tv/channel/video.ptv?c1=&prgid=53294230&ch_userid=mikakim&ref=main&lot=cate_01_2',
'info_dict': { 'info_dict': {
@ -62,25 +53,14 @@ class PandoraTVIE(InfoExtractor):
# Test metadata only # Test metadata only
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'http://www.pandora.tv/view/mikakim/53294230#36797454_new',
'only_matching': True,
}, {
'url': 'http://m.pandora.tv/?c=view&ch_userid=mikakim&prgid=54600346',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
user_id = mobj.group('user_id') video_id = qs.get('prgid', [None])[0]
video_id = mobj.group('id') user_id = qs.get('ch_userid', [None])[0]
if any(not f for f in (video_id, user_id,)):
if not user_id or not video_id: raise ExtractorError('Invalid URL', expected=True)
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
video_id = qs.get('prgid', [None])[0]
user_id = qs.get('ch_userid', [None])[0]
if any(not f for f in (video_id, user_id,)):
raise ExtractorError('Invalid URL', expected=True)
data = self._download_json( data = self._download_json(
'http://m.pandora.tv/?c=view&m=viewJsonApi&ch_userid=%s&prgid=%s' 'http://m.pandora.tv/?c=view&m=viewJsonApi&ch_userid=%s&prgid=%s'

View File

@ -4,9 +4,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
xpath_text, xpath_text,
@ -28,15 +26,17 @@ class PladformIE(InfoExtractor):
(?P<id>\d+) (?P<id>\d+)
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://out.pladform.ru/player?pl=64471&videoid=3777899&vk_puid15=0&vk_puid34=0', # http://muz-tv.ru/kinozal/view/7400/
'md5': '53362fac3a27352da20fa2803cc5cd6f', 'url': 'http://out.pladform.ru/player?pl=24822&videoid=100183293',
'md5': '61f37b575dd27f1bb2e1854777fe31f4',
'info_dict': { 'info_dict': {
'id': '3777899', 'id': '100183293',
'ext': 'mp4', 'ext': 'mp4',
'title': 'СТУДИЯ СОЮЗ • Шоу Студия Союз, 24 выпуск (01.02.2018) Нурлан Сабуров и Слава Комиссаренко', 'title': 'Тайны перевала Дятлова • 1 серия 2 часть',
'description': 'md5:05140e8bf1b7e2d46e7ba140be57fd95', 'description': 'Документальный сериал-расследование одной из самых жутких тайн ХХ века',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 3190, 'duration': 694,
'age_limit': 0,
}, },
}, { }, {
'url': 'http://static.pladform.ru/player.swf?pl=21469&videoid=100183293&vkcid=0', 'url': 'http://static.pladform.ru/player.swf?pl=21469&videoid=100183293&vkcid=0',
@ -56,48 +56,22 @@ class PladformIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
pl = qs.get('pl', ['1'])[0]
video = self._download_xml( video = self._download_xml(
'http://out.pladform.ru/getVideo', video_id, query={ 'http://out.pladform.ru/getVideo?pl=1&videoid=%s' % video_id,
'pl': pl, video_id)
'videoid': video_id,
})
def fail(text):
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, text),
expected=True)
if video.tag == 'error': if video.tag == 'error':
fail(video.text) raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, video.text),
expected=True)
quality = qualities(('ld', 'sd', 'hd')) quality = qualities(('ld', 'sd', 'hd'))
formats = [] formats = [{
for src in video.findall('./src'): 'url': src.text,
if src is None: 'format_id': src.get('quality'),
continue 'quality': quality(src.get('quality')),
format_url = src.text } for src in video.findall('./src')]
if not format_url:
continue
if src.get('type') == 'hls' or determine_ext(format_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': src.text,
'format_id': src.get('quality'),
'quality': quality(src.get('quality')),
})
if not formats:
error = xpath_text(video, './cap', 'error', default=None)
if error:
fail(error)
self._sort_formats(formats) self._sort_formats(formats)
webpage = self._download_webpage( webpage = self._download_webpage(

View File

@ -344,8 +344,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
r'clip[iI]d=(\d+)', r'clip[iI]d=(\d+)',
r'clip[iI]d\s*=\s*["\'](\d+)', r'clip[iI]d\s*=\s*["\'](\d+)',
r"'itemImageUrl'\s*:\s*'/dynamic/thumbnails/full/\d+/(\d+)", r"'itemImageUrl'\s*:\s*'/dynamic/thumbnails/full/\d+/(\d+)",
r'proMamsId&quot;\s*:\s*&quot;(\d+)',
r'proMamsId"\s*:\s*"(\d+)',
] ]
_TITLE_REGEXES = [ _TITLE_REGEXES = [
r'<h2 class="subtitle" itemprop="name">\s*(.+?)</h2>', r'<h2 class="subtitle" itemprop="name">\s*(.+?)</h2>',

View File

@ -5,93 +5,135 @@ from .common import InfoExtractor
from ..compat import compat_HTTPError from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
float_or_none, float_or_none,
int_or_none,
try_get,
# unified_timestamp,
ExtractorError, ExtractorError,
) )
class RedBullTVIE(InfoExtractor): class RedBullTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?redbull\.tv/video/(?P<id>AP-\w+)' _VALID_URL = r'https?://(?:www\.)?redbull\.tv/(?:video|film|live)/(?:AP-\w+/segment/)?(?P<id>AP-\w+)'
_TESTS = [{ _TESTS = [{
# film # film
'url': 'https://www.redbull.tv/video/AP-1Q6XCDTAN1W11', 'url': 'https://www.redbull.tv/video/AP-1Q756YYX51W11/abc-of-wrc',
'md5': 'fb0445b98aa4394e504b413d98031d1f', 'md5': 'fb0445b98aa4394e504b413d98031d1f',
'info_dict': { 'info_dict': {
'id': 'AP-1Q6XCDTAN1W11', 'id': 'AP-1Q756YYX51W11',
'ext': 'mp4', 'ext': 'mp4',
'title': 'ABC of... WRC - ABC of... S1E6', 'title': 'ABC of...WRC',
'description': 'md5:5c7ed8f4015c8492ecf64b6ab31e7d31', 'description': 'md5:5c7ed8f4015c8492ecf64b6ab31e7d31',
'duration': 1582.04, 'duration': 1582.04,
# 'timestamp': 1488405786,
# 'upload_date': '20170301',
}, },
}, { }, {
# episode # episode
'url': 'https://www.redbull.tv/video/AP-1PMHKJFCW1W11', 'url': 'https://www.redbull.tv/video/AP-1PMT5JCWH1W11/grime?playlist=shows:shows-playall:web',
'info_dict': { 'info_dict': {
'id': 'AP-1PMHKJFCW1W11', 'id': 'AP-1PMT5JCWH1W11',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Grime - Hashtags S2E4', 'title': 'Grime - Hashtags S2 E4',
'description': 'md5:b5f522b89b72e1e23216e5018810bb25', 'description': 'md5:334b741c8c1ce65be057eab6773c1cf5',
'duration': 904.6, 'duration': 904.6,
# 'timestamp': 1487290093,
# 'upload_date': '20170217',
'series': 'Hashtags',
'season_number': 2,
'episode_number': 4,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# segment
'url': 'https://www.redbull.tv/live/AP-1R5DX49XS1W11/segment/AP-1QSAQJ6V52111/semi-finals',
'info_dict': {
'id': 'AP-1QSAQJ6V52111',
'ext': 'mp4',
'title': 'Semi Finals - Vans Park Series Pro Tour',
'description': 'md5:306a2783cdafa9e65e39aa62f514fd97',
'duration': 11791.991,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.redbull.tv/film/AP-1MSKKF5T92111/in-motion',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
session = self._download_json( session = self._download_json(
'https://api.redbull.tv/v3/session', video_id, 'https://api-v2.redbull.tv/session', video_id,
note='Downloading access token', query={ note='Downloading access token', query={
'build': '4.370.0',
'category': 'personal_computer', 'category': 'personal_computer',
'os_version': '1.0',
'os_family': 'http', 'os_family': 'http',
}) })
if session.get('code') == 'error': if session.get('code') == 'error':
raise ExtractorError('%s said: %s' % ( raise ExtractorError('%s said: %s' % (
self.IE_NAME, session['message'])) self.IE_NAME, session['message']))
token = session['token'] auth = '%s %s' % (session.get('token_type', 'Bearer'), session['access_token'])
try: try:
video = self._download_json( info = self._download_json(
'https://api.redbull.tv/v3/products/' + video_id, 'https://api-v2.redbull.tv/content/%s' % video_id,
video_id, note='Downloading video information', video_id, note='Downloading video information',
headers={'Authorization': token} headers={'Authorization': auth}
) )
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
error_message = self._parse_json( error_message = self._parse_json(
e.cause.read().decode(), video_id)['error'] e.cause.read().decode(), video_id)['message']
raise ExtractorError('%s said: %s' % ( raise ExtractorError('%s said: %s' % (
self.IE_NAME, error_message), expected=True) self.IE_NAME, error_message), expected=True)
raise raise
title = video['title'].strip() video = info['video_product']
title = info['title'].strip()
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
'https://dms.redbull.tv/v3/%s/%s/playlist.m3u8' % (video_id, token), video['url'], video_id, 'mp4', entry_protocol='m3u8_native',
video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls') m3u8_id='hls')
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}
for resource in video.get('resources', []): for _, captions in (try_get(
if resource.startswith('closed_caption_'): video, lambda x: x['attachments']['captions'],
splitted_resource = resource.split('_') dict) or {}).items():
if splitted_resource[2]: if not captions or not isinstance(captions, list):
subtitles.setdefault('en', []).append({ continue
'url': 'https://resources.redbull.tv/%s/%s' % (video_id, resource), for caption in captions:
'ext': splitted_resource[2], caption_url = caption.get('url')
}) if not caption_url:
continue
ext = caption.get('format')
if ext == 'xml':
ext = 'ttml'
subtitles.setdefault(caption.get('lang') or 'en', []).append({
'url': caption_url,
'ext': ext,
})
subheading = video.get('subheading') subheading = info.get('subheading')
if subheading: if subheading:
title += ' - %s' % subheading title += ' - %s' % subheading
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': video.get('long_description') or video.get( 'description': info.get('long_description') or info.get(
'short_description'), 'short_description'),
'duration': float_or_none(video.get('duration'), scale=1000), 'duration': float_or_none(video.get('duration'), scale=1000),
# 'timestamp': unified_timestamp(info.get('published')),
'series': info.get('show_title'),
'season_number': int_or_none(info.get('season_number')),
'episode_number': int_or_none(info.get('episode_number')),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
} }

View File

@ -46,10 +46,9 @@ class RedTubeIE(InfoExtractor):
raise ExtractorError('Video %s has been removed' % video_id, expected=True) raise ExtractorError('Video %s has been removed' % video_id, expected=True)
title = self._html_search_regex( title = self._html_search_regex(
(r'<h(\d)[^>]+class="(?:video_title_text|videoTitle)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>', (r'<h1 class="videoTitle[^"]*">(?P<title>.+?)</h1>',
r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',), r'videoTitle\s*:\s*(["\'])(?P<title>)\1'),
webpage, 'title', group='title', webpage, 'title', group='title')
default=None) or self._og_search_title(webpage)
formats = [] formats = []
sources = self._parse_json( sources = self._parse_json(
@ -88,13 +87,12 @@ class RedTubeIE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
upload_date = unified_strdate(self._search_regex( upload_date = unified_strdate(self._search_regex(
r'<span[^>]+>ADDED ([^<]+)<', r'<span[^>]+class="added-time"[^>]*>ADDED ([^<]+)<',
webpage, 'upload date', fatal=False)) webpage, 'upload date', fatal=False))
duration = int_or_none(self._search_regex( duration = int_or_none(self._search_regex(
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', default=None)) r'videoDuration\s*:\s*(\d+)', webpage, 'duration', default=None))
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
(r'<div[^>]*>Views</div>\s*<div[^>]*>\s*([\d,.]+)', r'<span[^>]*>VIEWS</span></td>\s*<td>([\d,.]+)',
r'<span[^>]*>VIEWS</span>\s*</td>\s*<td>\s*([\d,.]+)'),
webpage, 'view count', fatal=False)) webpage, 'view count', fatal=False))
# No self-labeling, but they describe themselves as # No self-labeling, but they describe themselves as

View File

@ -5,8 +5,8 @@ from .common import InfoExtractor
class RestudyIE(InfoExtractor): class RestudyIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|portal)\.)?restudy\.dk/video/[^/]+/id/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?restudy\.dk/video/play/id/(?P<id>[0-9]+)'
_TESTS = [{ _TEST = {
'url': 'https://www.restudy.dk/video/play/id/1637', 'url': 'https://www.restudy.dk/video/play/id/1637',
'info_dict': { 'info_dict': {
'id': '1637', 'id': '1637',
@ -18,10 +18,7 @@ class RestudyIE(InfoExtractor):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
} }
}, { }
'url': 'https://portal.restudy.dk/video/leiden-frosteffekt/id/1637',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -32,7 +29,7 @@ class RestudyIE(InfoExtractor):
description = self._og_search_description(webpage).strip() description = self._og_search_description(webpage).strip()
formats = self._extract_smil_formats( formats = self._extract_smil_formats(
'https://cdn.portal.restudy.dk/dynamic/themes/front/awsmedia/SmilDirectory/video_%s.xml' % video_id, 'https://www.restudy.dk/awsmedia/SmilDirectory/video_%s.xml' % video_id,
video_id) video_id)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -0,0 +1,44 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class RingTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ringtv\.craveonline\.com/(?P<type>news|videos/video)/(?P<id>[^/?#]+)'
_TEST = {
'url': 'http://ringtv.craveonline.com/news/310833-luis-collazo-says-victor-ortiz-better-not-quit-on-jan-30',
'md5': 'd25945f5df41cdca2d2587165ac28720',
'info_dict': {
'id': '857645',
'ext': 'mp4',
'title': 'Video: Luis Collazo says Victor Ortiz "better not quit on Jan. 30" - Ring TV',
'description': 'Luis Collazo is excited about his Jan. 30 showdown with fellow former welterweight titleholder Victor Ortiz at Barclays Center in his hometown of Brooklyn. The SuperBowl week fight headlines a Golden Boy Live! card on Fox Sports 1.',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id').split('-')[0]
webpage = self._download_webpage(url, video_id)
if mobj.group('type') == 'news':
video_id = self._search_regex(
r'''(?x)<iframe[^>]+src="http://cms\.springboardplatform\.com/
embed_iframe/[0-9]+/video/([0-9]+)/''',
webpage, 'real video ID')
title = self._og_search_title(webpage)
description = self._html_search_regex(
r'addthis:description="([^"]+)"',
webpage, 'description', fatal=False)
final_url = 'http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/conversion/%s.mp4' % video_id
thumbnail_url = 'http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/snapshots/%s.jpg' % video_id
return {
'id': video_id,
'url': final_url,
'title': title,
'thumbnail': thumbnail_url,
'description': description,
}

View File

@ -1,12 +1,12 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..aes import aes_cbc_decrypt from ..aes import aes_cbc_decrypt
from ..compat import ( from ..compat import (
compat_b64decode,
compat_ord, compat_ord,
compat_str, compat_str,
) )
@ -142,11 +142,11 @@ class RTL2YouIE(RTL2YouBaseIE):
stream_data = self._download_json( stream_data = self._download_json(
self._BACKWERK_BASE_URL + 'stream/video/' + video_id, video_id) self._BACKWERK_BASE_URL + 'stream/video/' + video_id, video_id)
data, iv = compat_b64decode(stream_data['streamUrl']).decode().split(':') data, iv = base64.b64decode(stream_data['streamUrl']).decode().split(':')
stream_url = intlist_to_bytes(aes_cbc_decrypt( stream_url = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(compat_b64decode(data)), bytes_to_intlist(base64.b64decode(data)),
bytes_to_intlist(self._AES_KEY), bytes_to_intlist(self._AES_KEY),
bytes_to_intlist(compat_b64decode(iv)) bytes_to_intlist(base64.b64decode(iv))
)) ))
if b'rtl2_you_video_not_found' in stream_url: if b'rtl2_you_video_not_found' in stream_url:
raise ExtractorError('video not found', expected=True) raise ExtractorError('video not found', expected=True)

View File

@ -93,11 +93,58 @@ class RtlNlIE(InfoExtractor):
meta = info.get('meta', {}) meta = info.get('meta', {})
# m3u8 streams are encrypted and may not be handled properly by older ffmpeg/avconv.
# To workaround this previously adaptive -> flash trick was used to obtain
# unencrypted m3u8 streams (see https://github.com/rg3/youtube-dl/issues/4118)
# and bypass georestrictions as well.
# Currently, unencrypted m3u8 playlists are (intentionally?) invalid and therefore
# unusable albeit can be fixed by simple string replacement (see
# https://github.com/rg3/youtube-dl/pull/6337)
# Since recent ffmpeg and avconv handle encrypted streams just fine encrypted
# streams are used now.
videopath = material['videopath'] videopath = material['videopath']
m3u8_url = meta.get('videohost', 'http://manifest.us.rtl.nl') + videopath m3u8_url = meta.get('videohost', 'http://manifest.us.rtl.nl') + videopath
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
m3u8_url, uuid, 'mp4', m3u8_id='hls', fatal=False) m3u8_url, uuid, 'mp4', m3u8_id='hls', fatal=False)
video_urlpart = videopath.split('/adaptive/')[1][:-5]
PG_URL_TEMPLATE = 'http://pg.us.rtl.nl/rtlxl/network/%s/progressive/%s.mp4'
PG_FORMATS = (
('a2t', 512, 288),
('a3t', 704, 400),
('nettv', 1280, 720),
)
def pg_format(format_id, width, height):
return {
'url': PG_URL_TEMPLATE % (format_id, video_urlpart),
'format_id': 'pg-%s' % format_id,
'protocol': 'http',
'width': width,
'height': height,
}
if not formats:
formats = [pg_format(*pg_tuple) for pg_tuple in PG_FORMATS]
else:
pg_formats = []
for format_id, width, height in PG_FORMATS:
try:
# Find hls format with the same width and height corresponding
# to progressive format and copy metadata from it.
f = next(f for f in formats if f.get('height') == height)
# hls formats may have invalid width
f['width'] = width
f_copy = f.copy()
f_copy.update(pg_format(format_id, width, height))
pg_formats.append(f_copy)
except StopIteration:
# Missing hls format does mean that no progressive format with
# such width and height exists either.
pass
formats.extend(pg_formats)
self._sort_formats(formats) self._sort_formats(formats)
thumbnails = [] thumbnails = []

View File

@ -7,7 +7,6 @@ import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_b64decode,
compat_struct_unpack, compat_struct_unpack,
) )
from ..utils import ( from ..utils import (
@ -22,7 +21,7 @@ from ..utils import (
def _decrypt_url(png): def _decrypt_url(png):
encrypted_data = compat_b64decode(png) encrypted_data = base64.b64decode(png.encode('utf-8'))
text_index = encrypted_data.find(b'tEXt') text_index = encrypted_data.find(b'tEXt')
text_chunk = encrypted_data[text_index - 4:] text_chunk = encrypted_data[text_index - 4:]
length = compat_struct_unpack('!I', text_chunk[:4])[0] length = compat_struct_unpack('!I', text_chunk[:4])[0]

View File

@ -1,47 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class RTVSIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rtvs\.sk/(?:radio|televizia)/archiv/\d+/(?P<id>\d+)'
_TESTS = [{
# radio archive
'url': 'http://www.rtvs.sk/radio/archiv/11224/414872',
'md5': '134d5d6debdeddf8a5d761cbc9edacb8',
'info_dict': {
'id': '414872',
'ext': 'mp3',
'title': 'Ostrov pokladov 1 časť.mp3'
},
'params': {
'skip_download': True,
}
}, {
# tv archive
'url': 'http://www.rtvs.sk/televizia/archiv/8249/63118',
'md5': '85e2c55cf988403b70cac24f5c086dc6',
'info_dict': {
'id': '63118',
'ext': 'mp4',
'title': 'Amaro Džives - Náš deň',
'description': 'Galavečer pri príležitosti Medzinárodného dňa Rómov.'
},
'params': {
'skip_download': True,
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
playlist_url = self._search_regex(
r'playlist["\']?\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'playlist url', group='url')
data = self._download_json(
playlist_url, video_id, 'Downloading playlist')[0]
return self._parse_jwplayer_data(data, video_id=video_id)

View File

@ -1,169 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
)
from ..utils import (
urljoin,
int_or_none,
parse_codecs,
try_get,
)
def _raw_id(src_url):
return compat_urllib_parse_urlparse(src_url).path.split('/')[-1]
class SeznamZpravyIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?seznamzpravy\.cz/iframe/player\?.*\bsrc='
_TESTS = [{
'url': 'https://www.seznamzpravy.cz/iframe/player?duration=241&serviceSlug=zpravy&src=https%3A%2F%2Fv39-a.sdn.szn.cz%2Fv_39%2Fvmd%2F5999c902ea707c67d8e267a9%3Ffl%3Dmdk%2C432f65a0%7C&itemType=video&autoPlay=false&title=Sv%C4%9Bt%20bez%20obalu%3A%20%C4%8Ce%C5%A1t%C3%AD%20voj%C3%A1ci%20na%20mis%C3%ADch%20(kr%C3%A1tk%C3%A1%20verze)&series=Sv%C4%9Bt%20bez%20obalu&serviceName=Seznam%20Zpr%C3%A1vy&poster=%2F%2Fd39-a.sdn.szn.cz%2Fd_39%2Fc_img_F_I%2FR5puJ.jpeg%3Ffl%3Dcro%2C0%2C0%2C1920%2C1080%7Cres%2C1200%2C%2C1%7Cjpg%2C80%2C%2C1&width=1920&height=1080&cutFrom=0&cutTo=0&splVersion=VOD&contentId=170889&contextId=35990&showAdvert=true&collocation=&autoplayPossible=true&embed=&isVideoTooShortForPreroll=false&isVideoTooLongForPostroll=true&videoCommentOpKey=&videoCommentId=&version=4.0.76&dotService=zpravy&gemiusPrismIdentifier=bVc1ZIb_Qax4W2v5xOPGpMeCP31kFfrTzj0SqPTLh_b.Z7&zoneIdPreroll=seznam.pack.videospot&skipOffsetPreroll=5&sectionPrefixPreroll=%2Fzpravy',
'info_dict': {
'id': '170889',
'ext': 'mp4',
'title': 'Svět bez obalu: Čeští vojáci na misích (krátká verze)',
'thumbnail': r're:^https?://.*\.jpe?g',
'duration': 241,
'series': 'Svět bez obalu',
},
'params': {
'skip_download': True,
},
}, {
# with Location key
'url': 'https://www.seznamzpravy.cz/iframe/player?duration=null&serviceSlug=zpravy&src=https%3A%2F%2Flive-a.sdn.szn.cz%2Fv_39%2F59e468fe454f8472a96af9fa%3Ffl%3Dmdk%2C5c1e2840%7C&itemType=livevod&autoPlay=false&title=P%C5%99edseda%20KDU-%C4%8CSL%20Pavel%20B%C4%9Blobr%C3%A1dek%20ve%20volebn%C3%AD%20V%C3%BDzv%C4%9B%20Seznamu&series=V%C3%BDzva&serviceName=Seznam%20Zpr%C3%A1vy&poster=%2F%2Fd39-a.sdn.szn.cz%2Fd_39%2Fc_img_G_J%2FjTBCs.jpeg%3Ffl%3Dcro%2C0%2C0%2C1280%2C720%7Cres%2C1200%2C%2C1%7Cjpg%2C80%2C%2C1&width=16&height=9&cutFrom=0&cutTo=0&splVersion=VOD&contentId=185688&contextId=38489&showAdvert=true&collocation=&hideFullScreen=false&hideSubtitles=false&embed=&isVideoTooShortForPreroll=false&isVideoTooShortForPreroll2=false&isVideoTooLongForPostroll=false&fakePostrollZoneID=seznam.clanky.zpravy.preroll&fakePrerollZoneID=seznam.clanky.zpravy.preroll&videoCommentId=&trim=default_16x9&noPrerollVideoLength=30&noPreroll2VideoLength=undefined&noMidrollVideoLength=0&noPostrollVideoLength=999999&autoplayPossible=true&version=5.0.41&dotService=zpravy&gemiusPrismIdentifier=zD3g7byfW5ekpXmxTVLaq5Srjw5i4hsYo0HY1aBwIe..27&zoneIdPreroll=seznam.pack.videospot&skipOffsetPreroll=5&sectionPrefixPreroll=%2Fzpravy%2Fvyzva&zoneIdPostroll=seznam.pack.videospot&skipOffsetPostroll=5&sectionPrefixPostroll=%2Fzpravy%2Fvyzva&regression=false',
'info_dict': {
'id': '185688',
'ext': 'mp4',
'title': 'Předseda KDU-ČSL Pavel Bělobrádek ve volební Výzvě Seznamu',
'thumbnail': r're:^https?://.*\.jpe?g',
'series': 'Výzva',
},
'params': {
'skip_download': True,
},
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url') for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?seznamzpravy\.cz/iframe/player\?.*?)\1',
webpage)]
def _extract_sdn_formats(self, sdn_url, video_id):
sdn_data = self._download_json(sdn_url, video_id)
if sdn_data.get('Location'):
sdn_url = sdn_data['Location']
sdn_data = self._download_json(sdn_url, video_id)
formats = []
mp4_formats = try_get(sdn_data, lambda x: x['data']['mp4'], dict) or {}
for format_id, format_data in mp4_formats.items():
relative_url = format_data.get('url')
if not relative_url:
continue
try:
width, height = format_data.get('resolution')
except (TypeError, ValueError):
width, height = None, None
f = {
'url': urljoin(sdn_url, relative_url),
'format_id': 'http-%s' % format_id,
'tbr': int_or_none(format_data.get('bandwidth'), scale=1000),
'width': int_or_none(width),
'height': int_or_none(height),
}
f.update(parse_codecs(format_data.get('codec')))
formats.append(f)
pls = sdn_data.get('pls', {})
def get_url(format_id):
return try_get(pls, lambda x: x[format_id]['url'], compat_str)
dash_rel_url = get_url('dash')
if dash_rel_url:
formats.extend(self._extract_mpd_formats(
urljoin(sdn_url, dash_rel_url), video_id, mpd_id='dash',
fatal=False))
hls_rel_url = get_url('hls')
if hls_rel_url:
formats.extend(self._extract_m3u8_formats(
urljoin(sdn_url, hls_rel_url), video_id, ext='mp4',
m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return formats
def _real_extract(self, url):
params = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
src = params['src'][0]
title = params['title'][0]
video_id = params.get('contentId', [_raw_id(src)])[0]
formats = self._extract_sdn_formats(src + 'spl2,2,VOD', video_id)
duration = int_or_none(params.get('duration', [None])[0])
series = params.get('series', [None])[0]
thumbnail = params.get('poster', [None])[0]
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'series': series,
'formats': formats,
}
class SeznamZpravyArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:seznam\.cz/zpravy|seznamzpravy\.cz)/clanek/(?:[^/?#&]+)-(?P<id>\d+)'
_API_URL = 'https://apizpravy.seznam.cz/'
_TESTS = [{
# two videos on one page, with SDN URL
'url': 'https://www.seznamzpravy.cz/clanek/jejich-svet-na-nas-utoci-je-lepsi-branit-se-na-jejich-pisecku-rika-reziser-a-major-v-zaloze-marhoul-35990',
'info_dict': {
'id': '35990',
'title': 'md5:6011c877a36905f28f271fcd8dcdb0f2',
'description': 'md5:933f7b06fa337a814ba199d3596d27ba',
},
'playlist_count': 2,
}, {
# video with live stream URL
'url': 'https://www.seznam.cz/zpravy/clanek/znovu-do-vlady-s-ano-pavel-belobradek-ve-volebnim-specialu-seznamu-38489',
'info_dict': {
'id': '38489',
'title': 'md5:8fa1afdc36fd378cf0eba2b74c5aca60',
'description': 'md5:428e7926a1a81986ec7eb23078004fb4',
},
'playlist_count': 1,
}]
def _real_extract(self, url):
article_id = self._match_id(url)
webpage = self._download_webpage(url, article_id)
info = self._search_json_ld(webpage, article_id, default={})
title = info.get('title') or self._og_search_title(webpage, fatal=False)
description = info.get('description') or self._og_search_description(webpage)
return self.playlist_result([
self.url_result(url, ie=SeznamZpravyIE.ie_key())
for url in SeznamZpravyIE._extract_urls(webpage)],
article_id, title, description)

View File

@ -1,7 +1,8 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_b64decode
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
@ -21,8 +22,8 @@ class SharedBaseIE(InfoExtractor):
video_url = self._extract_video_url(webpage, video_id, url) video_url = self._extract_video_url(webpage, video_id, url)
title = compat_b64decode(self._html_search_meta( title = base64.b64decode(self._html_search_meta(
'full:title', webpage, 'title')).decode('utf-8') 'full:title', webpage, 'title').encode('utf-8')).decode('utf-8')
filesize = int_or_none(self._html_search_meta( filesize = int_or_none(self._html_search_meta(
'full:size', webpage, 'file size', fatal=False)) 'full:size', webpage, 'file size', fatal=False))
@ -91,4 +92,5 @@ class VivoIE(SharedBaseIE):
r'InitializeStream\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1', r'InitializeStream\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'stream', group='url'), webpage, 'stream', group='url'),
video_id, video_id,
transform_source=lambda x: compat_b64decode(x).decode('utf-8'))[0] transform_source=lambda x: base64.b64decode(
x.encode('ascii')).decode('utf-8'))[0]

View File

@ -4,11 +4,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_str
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
int_or_none, int_or_none,
@ -61,7 +57,7 @@ class SixPlayIE(InfoExtractor):
container = asset.get('video_container') container = asset.get('video_container')
ext = determine_ext(asset_url) ext = determine_ext(asset_url)
if container == 'm3u8' or ext == 'm3u8': if container == 'm3u8' or ext == 'm3u8':
if protocol == 'usp' and not compat_parse_qs(compat_urllib_parse_urlparse(asset_url).query).get('token', [None])[0]: if protocol == 'usp':
asset_url = re.sub(r'/([^/]+)\.ism/[^/]*\.m3u8', r'/\1.ism/\1.m3u8', asset_url) asset_url = re.sub(r'/([^/]+)\.ism/[^/]*\.m3u8', r'/\1.ism/\1.m3u8', asset_url)
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
asset_url, video_id, 'mp4', 'm3u8_native', asset_url, video_id, 'mp4', 'm3u8_native',

View File

@ -157,7 +157,8 @@ class SoundcloudIE(InfoExtractor):
}, },
] ]
_CLIENT_ID = 'DQskPX1pntALRzMp4HSxya3Mc0AO66Ro' _CLIENT_ID = 'c6CU49JDMapyrQo06UxU9xouB9ZVzqCn'
_IPHONE_CLIENT_ID = '376f225bf427445fc4bfb6b99b72e0bf'
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):

View File

@ -6,7 +6,7 @@ from .mtv import MTVServicesInfoExtractor
class SouthParkIE(MTVServicesInfoExtractor): class SouthParkIE(MTVServicesInfoExtractor):
IE_NAME = 'southpark.cc.com' IE_NAME = 'southpark.cc.com'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))' _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss' _FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss'
@ -20,9 +20,6 @@ class SouthParkIE(MTVServicesInfoExtractor):
'timestamp': 1112760000, 'timestamp': 1112760000,
'upload_date': '20050406', 'upload_date': '20050406',
}, },
}, {
'url': 'http://southpark.cc.com/collections/7758/fan-favorites/1',
'only_matching': True,
}] }]
@ -44,7 +41,7 @@ class SouthParkEsIE(SouthParkIE):
class SouthParkDeIE(SouthParkIE): class SouthParkDeIE(SouthParkIE):
IE_NAME = 'southpark.de' IE_NAME = 'southpark.de'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.de/(?:clips|alle-episoden|collections)/(?P<id>.+?)(\?|#|$))' _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.de/(?:clips|alle-episoden)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southpark.de/feeds/video-player/mrss/' _FEED_URL = 'http://www.southpark.de/feeds/video-player/mrss/'
_TESTS = [{ _TESTS = [{
@ -73,15 +70,12 @@ class SouthParkDeIE(SouthParkIE):
'description': 'Kyle will mit seinem kleinen Bruder Ike Videospiele spielen. Als der nicht mehr mit ihm spielen will, hat Kyle Angst, dass er die Kids von heute nicht mehr versteht.', 'description': 'Kyle will mit seinem kleinen Bruder Ike Videospiele spielen. Als der nicht mehr mit ihm spielen will, hat Kyle Angst, dass er die Kids von heute nicht mehr versteht.',
}, },
'playlist_count': 3, 'playlist_count': 3,
}, {
'url': 'http://www.southpark.de/collections/2476/superhero-showdown/1',
'only_matching': True,
}] }]
class SouthParkNlIE(SouthParkIE): class SouthParkNlIE(SouthParkIE):
IE_NAME = 'southpark.nl' IE_NAME = 'southpark.nl'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))' _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|(?:full-)?episodes)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southpark.nl/feeds/video-player/mrss/' _FEED_URL = 'http://www.southpark.nl/feeds/video-player/mrss/'
_TESTS = [{ _TESTS = [{
@ -96,7 +90,7 @@ class SouthParkNlIE(SouthParkIE):
class SouthParkDkIE(SouthParkIE): class SouthParkDkIE(SouthParkIE):
IE_NAME = 'southparkstudios.dk' IE_NAME = 'southparkstudios.dk'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southparkstudios\.(?:dk|nu)/(?:clips|full-episodes|collections)/(?P<id>.+?)(\?|#|$))' _VALID_URL = r'https?://(?:www\.)?(?P<url>southparkstudios\.dk/(?:clips|full-episodes)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southparkstudios.dk/feeds/video-player/mrss/' _FEED_URL = 'http://www.southparkstudios.dk/feeds/video-player/mrss/'
_TESTS = [{ _TESTS = [{
@ -106,10 +100,4 @@ class SouthParkDkIE(SouthParkIE):
'description': 'Butters is convinced he\'s living in a virtual reality.', 'description': 'Butters is convinced he\'s living in a virtual reality.',
}, },
'playlist_mincount': 3, 'playlist_mincount': 3,
}, {
'url': 'http://www.southparkstudios.dk/collections/2476/superhero-showdown/1',
'only_matching': True,
}, {
'url': 'http://www.southparkstudios.nu/collections/2476/superhero-showdown/1',
'only_matching': True,
}] }]

View File

@ -4,10 +4,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .nexx import ( from .nexx import NexxEmbedIE
NexxIE,
NexxEmbedIE,
)
from .spiegeltv import SpiegeltvIE from .spiegeltv import SpiegeltvIE
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
@ -54,10 +51,6 @@ class SpiegelIE(InfoExtractor):
}, { }, {
'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-iframe.html', 'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-iframe.html',
'only_matching': True, 'only_matching': True,
}, {
# nexx video
'url': 'http://www.spiegel.de/video/spiegel-tv-magazin-ueber-guellekrise-in-schleswig-holstein-video-99012776.html',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -68,14 +61,6 @@ class SpiegelIE(InfoExtractor):
if SpiegeltvIE.suitable(handle.geturl()): if SpiegeltvIE.suitable(handle.geturl()):
return self.url_result(handle.geturl(), 'Spiegeltv') return self.url_result(handle.geturl(), 'Spiegeltv')
nexx_id = self._search_regex(
r'nexxOmniaId\s*:\s*(\d+)', webpage, 'nexx id', default=None)
if nexx_id:
domain_id = NexxIE._extract_domain_id(webpage) or '748'
return self.url_result(
'nexx:%s:%s' % (domain_id, nexx_id), ie=NexxIE.ie_key(),
video_id=nexx_id)
video_data = extract_attributes(self._search_regex(r'(<div[^>]+id="spVideoElements"[^>]+>)', webpage, 'video element', default='')) video_data = extract_attributes(self._search_regex(r'(<div[^>]+id="spVideoElements"[^>]+>)', webpage, 'video element', default=''))
title = video_data.get('data-video-title') or get_element_by_attribute('class', 'module-title', webpage) title = video_data.get('data-video-title') or get_element_by_attribute('class', 'module-title', webpage)

View File

@ -0,0 +1,38 @@
# coding: utf-8
from __future__ import unicode_literals
from .wdr import WDRBaseIE
from ..utils import get_element_by_attribute
class SportschauIE(WDRBaseIE):
IE_NAME = 'Sportschau'
_VALID_URL = r'https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video-?(?P<id>[^/#?]+)\.html'
_TEST = {
'url': 'http://www.sportschau.de/uefaeuro2016/videos/video-dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100.html',
'info_dict': {
'id': 'mdb-1140188',
'display_id': 'dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100',
'ext': 'mp4',
'title': 'DFB-Team geht gut gelaunt ins Spiel gegen Polen',
'description': 'Vor dem zweiten Gruppenspiel gegen Polen herrscht gute Stimmung im deutschen Team. Insbesondere Bastian Schweinsteiger strotzt vor Optimismus nach seinem Tor gegen die Ukraine.',
'upload_date': '20160615',
},
'skip': 'Geo-restricted to Germany',
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = get_element_by_attribute('class', 'headline', webpage)
description = self._html_search_meta('description', webpage, 'description')
info = self._extract_wdr_video(webpage, video_id)
info.update({
'title': title,
'description': description,
})
return info

View File

@ -1,125 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
xpath_attr,
xpath_text,
xpath_element,
unescapeHTML,
unified_timestamp,
)
class SpringboardPlatformIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
cms\.springboardplatform\.com/
(?:
(?:previews|embed_iframe)/(?P<index>\d+)/video/(?P<id>\d+)|
xml_feeds_advanced/index/(?P<index_2>\d+)/rss3/(?P<id_2>\d+)
)
'''
_TESTS = [{
'url': 'http://cms.springboardplatform.com/previews/159/video/981017/0/0/1',
'md5': '5c3cb7b5c55740d482561099e920f192',
'info_dict': {
'id': '981017',
'ext': 'mp4',
'title': 'Redman "BUD like YOU" "Usher Good Kisser" REMIX',
'description': 'Redman "BUD like YOU" "Usher Good Kisser" REMIX',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1409132328,
'upload_date': '20140827',
'duration': 193,
},
}, {
'url': 'http://cms.springboardplatform.com/embed_iframe/159/video/981017/rab007/rapbasement.com/1/1',
'only_matching': True,
}, {
'url': 'http://cms.springboardplatform.com/embed_iframe/20/video/1731611/ki055/kidzworld.com/10',
'only_matching': True,
}, {
'url': 'http://cms.springboardplatform.com/xml_feeds_advanced/index/159/rss3/981017/0/0/1/',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//cms\.springboardplatform\.com/embed_iframe/\d+/video/\d+.*?)\1',
webpage)]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') or mobj.group('id_2')
index = mobj.group('index') or mobj.group('index_2')
video = self._download_xml(
'http://cms.springboardplatform.com/xml_feeds_advanced/index/%s/rss3/%s'
% (index, video_id), video_id)
item = xpath_element(video, './/item', 'item', fatal=True)
content = xpath_element(
item, './{http://search.yahoo.com/mrss/}content', 'content',
fatal=True)
title = unescapeHTML(xpath_text(item, './title', 'title', fatal=True))
video_url = content.attrib['url']
if 'error_video.mp4' in video_url:
raise ExtractorError(
'Video %s no longer exists' % video_id, expected=True)
duration = int_or_none(content.get('duration'))
tbr = int_or_none(content.get('bitrate'))
filesize = int_or_none(content.get('fileSize'))
width = int_or_none(content.get('width'))
height = int_or_none(content.get('height'))
description = unescapeHTML(xpath_text(
item, './description', 'description'))
thumbnail = xpath_attr(
item, './{http://search.yahoo.com/mrss/}thumbnail', 'url',
'thumbnail')
timestamp = unified_timestamp(xpath_text(
item, './{http://cms.springboardplatform.com/namespaces.html}created',
'timestamp'))
formats = [{
'url': video_url,
'format_id': 'http',
'tbr': tbr,
'filesize': filesize,
'width': width,
'height': height,
}]
m3u8_format = formats[0].copy()
m3u8_format.update({
'url': re.sub(r'(https?://)cdn\.', r'\1hls.', video_url) + '.m3u8',
'ext': 'mp4',
'format_id': 'hls',
'protocol': 'm3u8_native',
})
formats.append(m3u8_format)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
}

View File

@ -58,7 +58,7 @@ class TBSIE(TurnerBaseIE):
continue continue
if stream_data.get('playlistProtection') == 'spe': if stream_data.get('playlistProtection') == 'spe':
m3u8_url = self._add_akamai_spe_token( m3u8_url = self._add_akamai_spe_token(
'http://token.vgtf.net/token/token_spe', 'http://www.%s.com/service/token_spe' % site,
m3u8_url, media_id, { m3u8_url, media_id, {
'url': url, 'url': url,
'site_name': site[:3].upper(), 'site_name': site[:3].upper(),

View File

@ -5,9 +5,8 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError,
qualities, qualities,
determine_ext,
) )
@ -18,7 +17,6 @@ class TeacherTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?teachertube\.com/(viewVideo\.php\?video_id=|music\.php\?music_id=|video/(?:[\da-z-]+-)?|audio/)(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?teachertube\.com/(viewVideo\.php\?video_id=|music\.php\?music_id=|video/(?:[\da-z-]+-)?|audio/)(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
# flowplayer
'url': 'http://www.teachertube.com/viewVideo.php?video_id=339997', 'url': 'http://www.teachertube.com/viewVideo.php?video_id=339997',
'md5': 'f9434ef992fd65936d72999951ee254c', 'md5': 'f9434ef992fd65936d72999951ee254c',
'info_dict': { 'info_dict': {
@ -26,10 +24,19 @@ class TeacherTubeIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Measures of dispersion from a frequency table', 'title': 'Measures of dispersion from a frequency table',
'description': 'Measures of dispersion from a frequency table', 'description': 'Measures of dispersion from a frequency table',
'thumbnail': r're:https?://.*\.(?:jpg|png)', 'thumbnail': r're:http://.*\.jpg',
},
}, {
'url': 'http://www.teachertube.com/viewVideo.php?video_id=340064',
'md5': '0d625ec6bc9bf50f70170942ad580676',
'info_dict': {
'id': '340064',
'ext': 'mp4',
'title': 'How to Make Paper Dolls _ Paper Art Projects',
'description': 'Learn how to make paper dolls in this simple',
'thumbnail': r're:http://.*\.jpg',
}, },
}, { }, {
# jwplayer
'url': 'http://www.teachertube.com/music.php?music_id=8805', 'url': 'http://www.teachertube.com/music.php?music_id=8805',
'md5': '01e8352006c65757caf7b961f6050e21', 'md5': '01e8352006c65757caf7b961f6050e21',
'info_dict': { 'info_dict': {
@ -39,21 +46,20 @@ class TeacherTubeIE(InfoExtractor):
'description': 'RADIJSKA EMISIJA ZRAKOPLOVNE TEHNI?KE ?KOLE P', 'description': 'RADIJSKA EMISIJA ZRAKOPLOVNE TEHNI?KE ?KOLE P',
}, },
}, { }, {
# unavailable video
'url': 'http://www.teachertube.com/video/intro-video-schleicher-297790', 'url': 'http://www.teachertube.com/video/intro-video-schleicher-297790',
'only_matching': True, 'md5': '9c79fbb2dd7154823996fc28d4a26998',
'info_dict': {
'id': '297790',
'ext': 'mp4',
'title': 'Intro Video - Schleicher',
'description': 'Intro Video - Why to flip, how flipping will',
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
error = self._search_regex(
r'<div\b[^>]+\bclass=["\']msgBox error[^>]+>([^<]+)', webpage,
'error', default=None)
if error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
title = self._html_search_meta('title', webpage, 'title', fatal=True) title = self._html_search_meta('title', webpage, 'title', fatal=True)
TITLE_SUFFIX = ' - TeacherTube' TITLE_SUFFIX = ' - TeacherTube'
if title.endswith(TITLE_SUFFIX): if title.endswith(TITLE_SUFFIX):
@ -78,16 +84,12 @@ class TeacherTubeIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._html_search_meta(
'thumbnail', webpage)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'thumbnail': self._html_search_regex(r'\'image\'\s*:\s*["\']([^"\']+)["\']', webpage, 'thumbnail'),
'thumbnail': thumbnail,
'formats': formats, 'formats': formats,
'description': description,
} }

View File

@ -1,20 +1,18 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import binascii import binascii
import re import re
import json import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_ord,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
qualities, qualities,
determine_ext, determine_ext,
) )
from ..compat import compat_ord
class TeamcocoIE(InfoExtractor): class TeamcocoIE(InfoExtractor):
@ -99,7 +97,7 @@ class TeamcocoIE(InfoExtractor):
for i in range(len(cur_fragments)): for i in range(len(cur_fragments)):
cur_sequence = (''.join(cur_fragments[i:] + cur_fragments[:i])).encode('ascii') cur_sequence = (''.join(cur_fragments[i:] + cur_fragments[:i])).encode('ascii')
try: try:
raw_data = compat_b64decode(cur_sequence) raw_data = base64.b64decode(cur_sequence)
if compat_ord(raw_data[0]) == compat_ord('{'): if compat_ord(raw_data[0]) == compat_ord('{'):
return json.loads(raw_data.decode('utf-8')) return json.loads(raw_data.decode('utf-8'))
except (TypeError, binascii.Error, UnicodeDecodeError, ValueError): except (TypeError, binascii.Error, UnicodeDecodeError, ValueError):

View File

@ -0,0 +1,106 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unified_strdate
class TheSixtyOneIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?thesixtyone\.com/
(?:.*?/)*
(?:
s|
song/comments/list|
song
)/(?:[^/]+/)?(?P<id>[A-Za-z0-9]+)/?$'''
_SONG_URL_TEMPLATE = 'http://thesixtyone.com/s/{0:}'
_SONG_FILE_URL_TEMPLATE = 'http://{audio_server:}/thesixtyone_production/audio/{0:}_stream'
_THUMBNAIL_URL_TEMPLATE = '{photo_base_url:}_desktop'
_TESTS = [
{
'url': 'http://www.thesixtyone.com/s/SrE3zD7s1jt/',
'md5': '821cc43b0530d3222e3e2b70bb4622ea',
'info_dict': {
'id': 'SrE3zD7s1jt',
'ext': 'mp3',
'title': 'CASIO - Unicorn War Mixtape',
'thumbnail': 're:^https?://.*_desktop$',
'upload_date': '20071217',
'duration': 3208,
}
},
{
'url': 'http://www.thesixtyone.com/song/comments/list/SrE3zD7s1jt',
'only_matching': True,
},
{
'url': 'http://www.thesixtyone.com/s/ULoiyjuJWli#/s/SrE3zD7s1jt/',
'only_matching': True,
},
{
'url': 'http://www.thesixtyone.com/#/s/SrE3zD7s1jt/',
'only_matching': True,
},
{
'url': 'http://www.thesixtyone.com/song/SrE3zD7s1jt/',
'only_matching': True,
},
{
'url': 'http://www.thesixtyone.com/maryatmidnight/song/StrawberriesandCream/yvWtLp0c4GQ/',
'only_matching': True,
},
]
_DECODE_MAP = {
'x': 'a',
'm': 'b',
'w': 'c',
'q': 'd',
'n': 'e',
'p': 'f',
'a': '0',
'h': '1',
'e': '2',
'u': '3',
's': '4',
'i': '5',
'o': '6',
'y': '7',
'r': '8',
'c': '9'
}
def _real_extract(self, url):
song_id = self._match_id(url)
webpage = self._download_webpage(
self._SONG_URL_TEMPLATE.format(song_id), song_id)
song_data = self._parse_json(self._search_regex(
r'"%s":\s(\{.*?\})' % song_id, webpage, 'song_data'), song_id)
if self._search_regex(r'(t61\.s3_audio_load\s*=\s*1\.0;)', webpage, 's3_audio_load marker', default=None):
song_data['audio_server'] = 's3.amazonaws.com'
else:
song_data['audio_server'] = song_data['audio_server'] + '.thesixtyone.com'
keys = [self._DECODE_MAP.get(s, s) for s in song_data['key']]
url = self._SONG_FILE_URL_TEMPLATE.format(
"".join(reversed(keys)), **song_data)
formats = [{
'format_id': 'sd',
'url': url,
'ext': 'mp3',
}]
return {
'id': song_id,
'title': '{artist:} - {name:}'.format(**song_data),
'formats': formats,
'comment_count': song_data.get('comments_count'),
'duration': song_data.get('play_time'),
'like_count': song_data.get('score'),
'thumbnail': self._THUMBNAIL_URL_TEMPLATE.format(**song_data),
'upload_date': unified_strdate(song_data.get('publish_date')),
}

View File

@ -0,0 +1,50 @@
from __future__ import unicode_literals
from .common import InfoExtractor
class TotalWebCastingIE(InfoExtractor):
IE_NAME = 'totalwebcasting.com'
_VALID_URL = r'https?://www\.totalwebcasting\.com/view/\?func=VOFF.*'
_TEST = {
'url': 'https://www.totalwebcasting.com/view/?func=VOFF&id=columbia&date=2017-01-04&seq=1',
'info_dict': {
'id': '270e1c415d443924485f547403180906731570466a42740764673853041316737548',
'title': 'Real World Cryptography Conference 2017',
'description': 'md5:47a31e91ed537a2bb0d3a091659dc80c',
},
'playlist_count': 6,
}
def _real_extract(self, url):
params = url.split('?', 1)[1]
webpage = self._download_webpage(url, params)
aprm = self._search_regex(r"startVideo\('(\w+)'", webpage, 'aprm')
VLEV = self._download_json("https://www.totalwebcasting.com/view/?func=VLEV&aprm=%s&style=G" % aprm, aprm)
parts = []
for s in VLEV["aiTimes"].values():
n = int(s[:-5])
if n == 99:
continue
if n not in parts:
parts.append(n)
parts.sort()
title = VLEV["title"]
entries = []
for p in parts:
VLEV = self._download_json("https://www.totalwebcasting.com/view/?func=VLEV&aprm=%s&style=G&refP=1&nf=%d&time=1&cs=1&ns=1" % (aprm, p), aprm)
for s in VLEV["playerObj"]["clip"]["sources"]:
if s["type"] != "video/mp4":
continue
entries.append({
"id": "%s_part%d" % (aprm, p),
"url": "https:" + s["src"],
"title": title,
})
return {
'_type': 'multi_video',
'id': aprm,
'entries': entries,
'title': title,
'description': VLEV.get("desc"),
}

View File

@ -1,10 +1,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_parse_qs
compat_b64decode,
compat_parse_qs,
)
class TutvIE(InfoExtractor): class TutvIE(InfoExtractor):
@ -27,7 +26,7 @@ class TutvIE(InfoExtractor):
data_content = self._download_webpage( data_content = self._download_webpage(
'http://tu.tv/flvurl.php?codVideo=%s' % internal_id, video_id, 'Downloading video info') 'http://tu.tv/flvurl.php?codVideo=%s' % internal_id, video_id, 'Downloading video info')
video_url = compat_b64decode(compat_parse_qs(data_content)['kpt'][0]).decode('utf-8') video_url = base64.b64decode(compat_parse_qs(data_content)['kpt'][0].encode('utf-8')).decode('utf-8')
return { return {
'id': internal_id, 'id': internal_id,

View File

@ -273,8 +273,6 @@ class TVPlayIE(InfoExtractor):
'ext': ext, 'ext': ext,
} }
if video_url.startswith('rtmp'): if video_url.startswith('rtmp'):
if smuggled_data.get('skip_rtmp'):
continue
m = re.search( m = re.search(
r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url) r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
if not m: if not m:
@ -436,10 +434,6 @@ class ViafreeIE(InfoExtractor):
return self.url_result( return self.url_result(
smuggle_url( smuggle_url(
'mtg:%s' % video_id, 'mtg:%s' % video_id,
{ {'geo_countries': [
'geo_countries': [ compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]]}),
compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]],
# rtmp host mtgfs.fplive.net for viafree is unresolvable
'skip_rtmp': True,
}),
ie=TVPlayIE.ie_key(), video_id=video_id) ie=TVPlayIE.ie_key(), video_id=video_id)

View File

@ -85,15 +85,10 @@ class TwitchBaseIE(InfoExtractor):
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
response = self._parse_json( response = self._parse_json(
e.cause.read().decode('utf-8'), None) e.cause.read().decode('utf-8'), None)
fail(response.get('message') or response['errors'][0]) fail(response['message'])
raise raise
if 'Authenticated successfully' in response.get('message', ''): redirect_url = urljoin(post_url, response['redirect'])
return None, None
redirect_url = urljoin(
post_url,
response.get('redirect') or response['redirect_path'])
return self._download_webpage_handle( return self._download_webpage_handle(
redirect_url, None, 'Downloading login redirect page', redirect_url, None, 'Downloading login redirect page',
headers=headers) headers=headers)
@ -111,10 +106,6 @@ class TwitchBaseIE(InfoExtractor):
'password': password, 'password': password,
}) })
# Successful login
if not redirect_page:
return
if re.search(r'(?i)<form[^>]+id="two-factor-submit"', redirect_page) is not None: if re.search(r'(?i)<form[^>]+id="two-factor-submit"', redirect_page) is not None:
# TODO: Add mechanism to request an SMS or phone call # TODO: Add mechanism to request an SMS or phone call
tfa_token = self._get_tfa_info('two-factor authentication token') tfa_token = self._get_tfa_info('two-factor authentication token')

View File

@ -318,14 +318,9 @@ class VKIE(VKBaseIE):
'You are trying to log in from an unusual location. You should confirm ownership at vk.com to log in with this IP.', 'You are trying to log in from an unusual location. You should confirm ownership at vk.com to log in with this IP.',
expected=True) expected=True)
ERROR_COPYRIGHT = 'Video %s has been removed from public access due to rightholder complaint.'
ERRORS = { ERRORS = {
r'>Видеозапись .*? была изъята из публичного доступа в связи с обращением правообладателя.<': r'>Видеозапись .*? была изъята из публичного доступа в связи с обращением правообладателя.<':
ERROR_COPYRIGHT, 'Video %s has been removed from public access due to rightholder complaint.',
r'>The video .*? was removed from public access by request of the copyright holder.<':
ERROR_COPYRIGHT,
r'<!>Please log in or <': r'<!>Please log in or <':
'Video %s is only available for registered users, ' 'Video %s is only available for registered users, '

View File

@ -4,50 +4,49 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
js_to_json, js_to_json,
strip_jsonp, strip_jsonp,
try_get,
unified_strdate, unified_strdate,
update_url_query, update_url_query,
urlhandle_detect_ext, urlhandle_detect_ext,
) )
class WDRIE(InfoExtractor): class WDRBaseIE(InfoExtractor):
_VALID_URL = r'https?://deviceids-medp\.wdr\.de/ondemand/\d+/(?P<id>\d+)\.js' def _extract_wdr_video(self, webpage, display_id):
_GEO_COUNTRIES = ['DE'] # for wdr.de the data-extension is in a tag with the class "mediaLink"
_TEST = { # for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
'url': 'http://deviceids-medp.wdr.de/ondemand/155/1557833.js', # for wdrmaus, in a tag with the class "videoButton" (previously a link
'info_dict': { # to the page in a multiline "videoLink"-tag)
'id': 'mdb-1557833', json_metadata = self._html_search_regex(
'ext': 'mp4', r'''(?sx)class=
'title': 'Biathlon-Staffel verpasst Podest bei Olympia-Generalprobe', (?:
'upload_date': '20180112', (["\'])(?:mediaLink|wdrrPlayerPlayBtn|videoButton)\b.*?\1[^>]+|
}, (["\'])videoLink\b.*?\2[\s]*>\n[^\n]*
} )data-extension=(["\'])(?P<data>(?:(?!\3).)+)\3
''',
webpage, 'media link', default=None, group='data')
def _real_extract(self, url): if not json_metadata:
video_id = self._match_id(url) return
media_link_obj = self._parse_json(json_metadata, display_id,
transform_source=js_to_json)
jsonp_url = media_link_obj['mediaObj']['url']
metadata = self._download_json( metadata = self._download_json(
url, video_id, transform_source=strip_jsonp) jsonp_url, display_id, transform_source=strip_jsonp)
is_live = metadata.get('mediaType') == 'live' metadata_tracker_data = metadata['trackerData']
metadata_media_resource = metadata['mediaResource']
tracker_data = metadata['trackerData']
media_resource = metadata['mediaResource']
formats = [] formats = []
# check if the metadata contains a direct URL to a file # check if the metadata contains a direct URL to a file
for kind, media_resource in media_resource.items(): for kind, media_resource in metadata_media_resource.items():
if kind not in ('dflt', 'alt'): if kind not in ('dflt', 'alt'):
continue continue
@ -58,13 +57,13 @@ class WDRIE(InfoExtractor):
ext = determine_ext(medium_url) ext = determine_ext(medium_url)
if ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
medium_url, video_id, 'mp4', 'm3u8_native', medium_url, display_id, 'mp4', 'm3u8_native',
m3u8_id='hls')) m3u8_id='hls'))
elif ext == 'f4m': elif ext == 'f4m':
manifest_url = update_url_query( manifest_url = update_url_query(
medium_url, {'hdcore': '3.2.0', 'plugin': 'aasp-3.2.0.77.18'}) medium_url, {'hdcore': '3.2.0', 'plugin': 'aasp-3.2.0.77.18'})
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
manifest_url, video_id, f4m_id='hds', fatal=False)) manifest_url, display_id, f4m_id='hds', fatal=False))
elif ext == 'smil': elif ext == 'smil':
formats.extend(self._extract_smil_formats( formats.extend(self._extract_smil_formats(
medium_url, 'stream', fatal=False)) medium_url, 'stream', fatal=False))
@ -74,7 +73,7 @@ class WDRIE(InfoExtractor):
} }
if ext == 'unknown_video': if ext == 'unknown_video':
urlh = self._request_webpage( urlh = self._request_webpage(
medium_url, video_id, note='Determining extension') medium_url, display_id, note='Determining extension')
ext = urlhandle_detect_ext(urlh) ext = urlhandle_detect_ext(urlh)
a_format['ext'] = ext a_format['ext'] = ext
formats.append(a_format) formats.append(a_format)
@ -82,30 +81,30 @@ class WDRIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}
caption_url = media_resource.get('captionURL') caption_url = metadata_media_resource.get('captionURL')
if caption_url: if caption_url:
subtitles['de'] = [{ subtitles['de'] = [{
'url': caption_url, 'url': caption_url,
'ext': 'ttml', 'ext': 'ttml',
}] }]
title = tracker_data['trackerClipTitle'] title = metadata_tracker_data['trackerClipTitle']
return { return {
'id': tracker_data.get('trackerClipId', video_id), 'id': metadata_tracker_data.get('trackerClipId', display_id),
'title': self._live_title(title) if is_live else title, 'display_id': display_id,
'alt_title': tracker_data.get('trackerClipSubcategory'), 'title': title,
'alt_title': metadata_tracker_data.get('trackerClipSubcategory'),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'upload_date': unified_strdate(tracker_data.get('trackerClipAirTime')), 'upload_date': unified_strdate(metadata_tracker_data.get('trackerClipAirTime')),
'is_live': is_live,
} }
class WDRPageIE(InfoExtractor): class WDRIE(WDRBaseIE):
_CURRENT_MAUS_URL = r'https?://(?:www\.)wdrmaus.de/(?:[^/]+/){1,2}[^/?#]+\.php5' _CURRENT_MAUS_URL = r'https?://(?:www\.)wdrmaus.de/(?:[^/]+/){1,2}[^/?#]+\.php5'
_PAGE_REGEX = r'/(?:mediathek/)?(?:[^/]+/)*(?P<display_id>[^/]+)\.html' _PAGE_REGEX = r'/(?:mediathek/)?[^/]+/(?P<type>[^/]+)/(?P<display_id>.+)\.html'
_VALID_URL = r'https?://(?:www\d?\.)?(?:wdr\d?|sportschau)\.de' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL _VALID_URL = r'(?P<page_url>https?://(?:www\d\.)?wdr\d?\.de)' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL
_TESTS = [ _TESTS = [
{ {
@ -125,7 +124,6 @@ class WDRPageIE(InfoExtractor):
'ext': 'ttml', 'ext': 'ttml',
}]}, }]},
}, },
'skip': 'HTTP Error 404: Not Found',
}, },
{ {
'url': 'http://www1.wdr.de/mediathek/audio/wdr3/wdr3-gespraech-am-samstag/audio-schriftstellerin-juli-zeh-100.html', 'url': 'http://www1.wdr.de/mediathek/audio/wdr3/wdr3-gespraech-am-samstag/audio-schriftstellerin-juli-zeh-100.html',
@ -141,17 +139,19 @@ class WDRPageIE(InfoExtractor):
'is_live': False, 'is_live': False,
'subtitles': {} 'subtitles': {}
}, },
'skip': 'HTTP Error 404: Not Found',
}, },
{ {
'url': 'http://www1.wdr.de/mediathek/video/live/index.html', 'url': 'http://www1.wdr.de/mediathek/video/live/index.html',
'info_dict': { 'info_dict': {
'id': 'mdb-1406149', 'id': 'mdb-103364',
'ext': 'mp4', 'ext': 'mp4',
'title': r're:^WDR Fernsehen im Livestream \(nur in Deutschland erreichbar\) [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'display_id': 'index',
'title': r're:^WDR Fernsehen im Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'alt_title': 'WDR Fernsehen Live', 'alt_title': 'WDR Fernsehen Live',
'upload_date': '20150101', 'upload_date': None,
'description': 'md5:ae2ff888510623bf8d4b115f95a9b7c9',
'is_live': True, 'is_live': True,
'subtitles': {}
}, },
'params': { 'params': {
'skip_download': True, # m3u8 download 'skip_download': True, # m3u8 download
@ -159,18 +159,19 @@ class WDRPageIE(InfoExtractor):
}, },
{ {
'url': 'http://www1.wdr.de/mediathek/video/sendungen/aktuelle-stunde/aktuelle-stunde-120.html', 'url': 'http://www1.wdr.de/mediathek/video/sendungen/aktuelle-stunde/aktuelle-stunde-120.html',
'playlist_mincount': 7, 'playlist_mincount': 8,
'info_dict': { 'info_dict': {
'id': 'aktuelle-stunde-120', 'id': 'aktuelle-stunde/aktuelle-stunde-120',
}, },
}, },
{ {
'url': 'http://www.wdrmaus.de/aktuelle-sendung/index.php5', 'url': 'http://www.wdrmaus.de/aktuelle-sendung/index.php5',
'info_dict': { 'info_dict': {
'id': 'mdb-1552552', 'id': 'mdb-1323501',
'ext': 'mp4', 'ext': 'mp4',
'upload_date': 're:^[0-9]{8}$', 'upload_date': 're:^[0-9]{8}$',
'title': 're:^Die Sendung mit der Maus vom [0-9.]{10}$', 'title': 're:^Die Sendung mit der Maus vom [0-9.]{10}$',
'description': 'Die Seite mit der Maus -',
}, },
'skip': 'The id changes from week to week because of the new episode' 'skip': 'The id changes from week to week because of the new episode'
}, },
@ -182,6 +183,7 @@ class WDRPageIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20130919', 'upload_date': '20130919',
'title': 'Sachgeschichte - Achterbahn ', 'title': 'Sachgeschichte - Achterbahn ',
'description': 'Die Seite mit der Maus -',
}, },
}, },
{ {
@ -189,114 +191,52 @@ class WDRPageIE(InfoExtractor):
# Live stream, MD5 unstable # Live stream, MD5 unstable
'info_dict': { 'info_dict': {
'id': 'mdb-869971', 'id': 'mdb-869971',
'ext': 'mp4', 'ext': 'flv',
'title': r're:^COSMO Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'title': 'COSMO Livestream',
'description': 'md5:2309992a6716c347891c045be50992e4',
'upload_date': '20160101', 'upload_date': '20160101',
}, },
'params': {
'skip_download': True, # m3u8 download
}
},
{
'url': 'http://www.sportschau.de/handballem2018/handball-nationalmannschaft-em-stolperstein-vorrunde-100.html',
'info_dict': {
'id': 'mdb-1556012',
'ext': 'mp4',
'title': 'DHB-Vizepräsident Bob Hanning - "Die Weltspitze ist extrem breit"',
'upload_date': '20180111',
},
'params': {
'skip_download': True,
},
},
{
'url': 'http://www.sportschau.de/handballem2018/audio-vorschau---die-handball-em-startet-mit-grossem-favoritenfeld-100.html',
'only_matching': True,
} }
] ]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
url_type = mobj.group('type')
page_url = mobj.group('page_url')
display_id = mobj.group('display_id') display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
entries = [] info_dict = self._extract_wdr_video(webpage, display_id)
# Article with several videos if not info_dict:
# for wdr.de the data-extension is in a tag with the class "mediaLink"
# for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
# for wdrmaus, in a tag with the class "videoButton" (previously a link
# to the page in a multiline "videoLink"-tag)
for mobj in re.finditer(
r'''(?sx)class=
(?:
(["\'])(?:mediaLink|wdrrPlayerPlayBtn|videoButton)\b.*?\1[^>]+|
(["\'])videoLink\b.*?\2[\s]*>\n[^\n]*
)data-extension=(["\'])(?P<data>(?:(?!\3).)+)\3
''', webpage):
media_link_obj = self._parse_json(
mobj.group('data'), display_id, transform_source=js_to_json,
fatal=False)
if not media_link_obj:
continue
jsonp_url = try_get(
media_link_obj, lambda x: x['mediaObj']['url'], compat_str)
if jsonp_url:
entries.append(self.url_result(jsonp_url, ie=WDRIE.ie_key()))
# Playlist (e.g. https://www1.wdr.de/mediathek/video/sendungen/aktuelle-stunde/aktuelle-stunde-120.html)
if not entries:
entries = [ entries = [
self.url_result( self.url_result(page_url + href[0], 'WDR')
compat_urlparse.urljoin(url, mobj.group('href')), for href in re.findall(
ie=WDRPageIE.ie_key()) r'<a href="(%s)"[^>]+data-extension=' % self._PAGE_REGEX,
for mobj in re.finditer( webpage)
r'<a[^>]+\bhref=(["\'])(?P<href>(?:(?!\1).)+)\1[^>]+\bdata-extension=',
webpage) if re.match(self._PAGE_REGEX, mobj.group('href'))
] ]
return self.playlist_result(entries, playlist_id=display_id) if entries: # Playlist page
return self.playlist_result(entries, playlist_id=display_id)
raise ExtractorError('No downloadable streams found', expected=True)
class WDRElefantIE(InfoExtractor): is_live = url_type == 'live'
_VALID_URL = r'https?://(?:www\.)wdrmaus\.de/elefantenseite/#(?P<id>.+)'
_TEST = {
'url': 'http://www.wdrmaus.de/elefantenseite/#folge_ostern_2015',
'info_dict': {
'title': 'Folge Oster-Spezial 2015',
'id': 'mdb-1088195',
'ext': 'mp4',
'age_limit': None,
'upload_date': '20150406'
},
'params': {
'skip_download': True,
},
}
def _real_extract(self, url): if is_live:
display_id = self._match_id(url) info_dict.update({
'title': self._live_title(info_dict['title']),
'upload_date': None,
})
elif 'upload_date' not in info_dict:
info_dict['upload_date'] = unified_strdate(self._html_search_meta('DC.Date', webpage, 'upload date'))
# Table of Contents seems to always be at this address, so fetch it directly. info_dict.update({
# The website fetches configurationJS.php5, which links to tableOfContentsJS.php5. 'description': self._html_search_meta('Description', webpage),
table_of_contents = self._download_json( 'is_live': is_live,
'https://www.wdrmaus.de/elefantenseite/data/tableOfContentsJS.php5', })
display_id)
if display_id not in table_of_contents: return info_dict
raise ExtractorError(
'No entry in site\'s table of contents for this URL. '
'Is the fragment part of the URL (after the #) correct?',
expected=True)
xml_metadata_path = table_of_contents[display_id]['xmlPath']
xml_metadata = self._download_xml(
'https://www.wdrmaus.de/elefantenseite/' + xml_metadata_path,
display_id)
zmdb_url_element = xml_metadata.find('./movie/zmdb_url')
if zmdb_url_element is None:
raise ExtractorError(
'%s is not a video' % display_id, expected=True)
return self.url_result(zmdb_url_element.text, ie=WDRIE.ie_key())
class WDRMobileIE(InfoExtractor): class WDRMobileIE(InfoExtractor):

View File

@ -1,140 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
import json
import random
import re
from ..compat import (
compat_parse_qs,
compat_str,
)
from ..utils import (
js_to_json,
strip_jsonp,
urlencode_postdata,
)
class WeiboIE(InfoExtractor):
_VALID_URL = r'https?://weibo\.com/[0-9]+/(?P<id>[a-zA-Z0-9]+)'
_TEST = {
'url': 'https://weibo.com/6275294458/Fp6RGfbff?type=comment',
'info_dict': {
'id': 'Fp6RGfbff',
'ext': 'mp4',
'title': 'You should have servants to massage you,... 来自Hosico_猫 - 微博',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
# to get Referer url for genvisitor
webpage, urlh = self._download_webpage_handle(url, video_id)
visitor_url = urlh.geturl()
if 'passport.weibo.com' in visitor_url:
# first visit
visitor_data = self._download_json(
'https://passport.weibo.com/visitor/genvisitor', video_id,
note='Generating first-visit data',
transform_source=strip_jsonp,
headers={'Referer': visitor_url},
data=urlencode_postdata({
'cb': 'gen_callback',
'fp': json.dumps({
'os': '2',
'browser': 'Gecko57,0,0,0',
'fonts': 'undefined',
'screenInfo': '1440*900*24',
'plugins': '',
}),
}))
tid = visitor_data['data']['tid']
cnfd = '%03d' % visitor_data['data']['confidence']
self._download_webpage(
'https://passport.weibo.com/visitor/visitor', video_id,
note='Running first-visit callback',
query={
'a': 'incarnate',
't': tid,
'w': 2,
'c': cnfd,
'cb': 'cross_domain',
'from': 'weibo',
'_rand': random.random(),
})
webpage = self._download_webpage(
url, video_id, note='Revisiting webpage')
title = self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'title')
video_formats = compat_parse_qs(self._search_regex(
r'video-sources=\\\"(.+?)\"', webpage, 'video_sources'))
formats = []
supported_resolutions = (480, 720)
for res in supported_resolutions:
vid_urls = video_formats.get(compat_str(res))
if not vid_urls or not isinstance(vid_urls, list):
continue
vid_url = vid_urls[0]
formats.append({
'url': vid_url,
'height': res,
})
self._sort_formats(formats)
uploader = self._og_search_property(
'nick-name', webpage, 'uploader', default=None)
return {
'id': video_id,
'title': title,
'uploader': uploader,
'formats': formats
}
class WeiboMobileIE(InfoExtractor):
_VALID_URL = r'https?://m\.weibo\.cn/status/(?P<id>[0-9]+)(\?.+)?'
_TEST = {
'url': 'https://m.weibo.cn/status/4189191225395228?wm=3333_2001&sourcetype=weixin&featurecode=newtitle&from=singlemessage&isappinstalled=0',
'info_dict': {
'id': '4189191225395228',
'ext': 'mp4',
'title': '午睡当然是要甜甜蜜蜜的啦',
'uploader': '柴犬柴犬'
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
# to get Referer url for genvisitor
webpage = self._download_webpage(url, video_id, note='visit the page')
weibo_info = self._parse_json(self._search_regex(
r'var\s+\$render_data\s*=\s*\[({.*})\]\[0\]\s*\|\|\s*{};',
webpage, 'js_code', flags=re.DOTALL),
video_id, transform_source=js_to_json)
status_data = weibo_info.get('status', {})
page_info = status_data.get('page_info')
title = status_data['status_title']
uploader = status_data.get('user', {}).get('screen_name')
return {
'id': video_id,
'title': title,
'uploader': uploader,
'url': page_info['media_info']['stream_url']
}

View File

@ -1,233 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import re
from .common import InfoExtractor
class XimalayaBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['CN']
class XimalayaIE(XimalayaBaseIE):
IE_NAME = 'ximalaya'
IE_DESC = '喜马拉雅FM'
_VALID_URL = r'https?://(?:www\.|m\.)?ximalaya\.com/(?P<uid>[0-9]+)/sound/(?P<id>[0-9]+)'
_USER_URL_FORMAT = '%s://www.ximalaya.com/zhubo/%i/'
_TESTS = [
{
'url': 'http://www.ximalaya.com/61425525/sound/47740352/',
'info_dict': {
'id': '47740352',
'ext': 'm4a',
'uploader': '小彬彬爱听书',
'uploader_id': 61425525,
'uploader_url': 'http://www.ximalaya.com/zhubo/61425525/',
'title': '261.唐诗三百首.卷八.送孟浩然之广陵.李白',
'description': "contains:《送孟浩然之广陵》\n作者:李白\n故人西辞黄鹤楼,烟花三月下扬州。\n孤帆远影碧空尽,惟见长江天际流。",
'thumbnails': [
{
'name': 'cover_url',
'url': r're:^https?://.*\.jpg$',
},
{
'name': 'cover_url_142',
'url': r're:^https?://.*\.jpg$',
'width': 180,
'height': 180
}
],
'categories': ['renwen', '人文'],
'duration': 93,
'view_count': int,
'like_count': int,
}
},
{
'url': 'http://m.ximalaya.com/61425525/sound/47740352/',
'info_dict': {
'id': '47740352',
'ext': 'm4a',
'uploader': '小彬彬爱听书',
'uploader_id': 61425525,
'uploader_url': 'http://www.ximalaya.com/zhubo/61425525/',
'title': '261.唐诗三百首.卷八.送孟浩然之广陵.李白',
'description': "contains:《送孟浩然之广陵》\n作者:李白\n故人西辞黄鹤楼,烟花三月下扬州。\n孤帆远影碧空尽,惟见长江天际流。",
'thumbnails': [
{
'name': 'cover_url',
'url': r're:^https?://.*\.jpg$',
},
{
'name': 'cover_url_142',
'url': r're:^https?://.*\.jpg$',
'width': 180,
'height': 180
}
],
'categories': ['renwen', '人文'],
'duration': 93,
'view_count': int,
'like_count': int,
}
},
{
'url': 'https://www.ximalaya.com/11045267/sound/15705996/',
'info_dict': {
'id': '15705996',
'ext': 'm4a',
'uploader': '李延隆老师',
'uploader_id': 11045267,
'uploader_url': 'https://www.ximalaya.com/zhubo/11045267/',
'title': 'Lesson 1 Excuse me!',
'description': "contains:Listen to the tape then answer\xa0this question. Whose handbag is it?\n"
"听录音,然后回答问题,这是谁的手袋?",
'thumbnails': [
{
'name': 'cover_url',
'url': r're:^https?://.*\.jpg$',
},
{
'name': 'cover_url_142',
'url': r're:^https?://.*\.jpg$',
'width': 180,
'height': 180
}
],
'categories': ['train', '外语'],
'duration': 40,
'view_count': int,
'like_count': int,
}
},
]
def _real_extract(self, url):
is_m = 'm.ximalaya' in url
scheme = 'https' if url.startswith('https') else 'http'
audio_id = self._match_id(url)
webpage = self._download_webpage(url, audio_id,
note='Download sound page for %s' % audio_id,
errnote='Unable to get sound page')
audio_info_file = '%s://m.ximalaya.com/tracks/%s.json' % (scheme, audio_id)
audio_info = self._download_json(audio_info_file, audio_id,
'Downloading info json %s' % audio_info_file,
'Unable to download info file')
formats = []
for bps, k in (('24k', 'play_path_32'), ('64k', 'play_path_64')):
if audio_info.get(k):
formats.append({
'format_id': bps,
'url': audio_info[k],
})
thumbnails = []
for k in audio_info.keys():
# cover pics kyes like: cover_url', 'cover_url_142'
if k.startswith('cover_url'):
thumbnail = {'name': k, 'url': audio_info[k]}
if k == 'cover_url_142':
thumbnail['width'] = 180
thumbnail['height'] = 180
thumbnails.append(thumbnail)
audio_uploader_id = audio_info.get('uid')
if is_m:
audio_description = self._html_search_regex(r'(?s)<section\s+class=["\']content[^>]+>(.+?)</section>',
webpage, 'audio_description', fatal=False)
else:
audio_description = self._html_search_regex(r'(?s)<div\s+class=["\']rich_intro[^>]*>(.+?</article>)',
webpage, 'audio_description', fatal=False)
if not audio_description:
audio_description_file = '%s://www.ximalaya.com/sounds/%s/rich_intro' % (scheme, audio_id)
audio_description = self._download_webpage(audio_description_file, audio_id,
note='Downloading description file %s' % audio_description_file,
errnote='Unable to download descrip file',
fatal=False)
audio_description = audio_description.strip() if audio_description else None
return {
'id': audio_id,
'uploader': audio_info.get('nickname'),
'uploader_id': audio_uploader_id,
'uploader_url': self._USER_URL_FORMAT % (scheme, audio_uploader_id) if audio_uploader_id else None,
'title': audio_info['title'],
'thumbnails': thumbnails,
'description': audio_description,
'categories': list(filter(None, (audio_info.get('category_name'), audio_info.get('category_title')))),
'duration': audio_info.get('duration'),
'view_count': audio_info.get('play_count'),
'like_count': audio_info.get('favorites_count'),
'formats': formats,
}
class XimalayaAlbumIE(XimalayaBaseIE):
IE_NAME = 'ximalaya:album'
IE_DESC = '喜马拉雅FM 专辑'
_VALID_URL = r'https?://(?:www\.|m\.)?ximalaya\.com/(?P<uid>[0-9]+)/album/(?P<id>[0-9]+)'
_TEMPLATE_URL = '%s://www.ximalaya.com/%s/album/%s/'
_BASE_URL_TEMPL = '%s://www.ximalaya.com%s'
_LIST_VIDEO_RE = r'<a[^>]+?href="(?P<url>/%s/sound/(?P<id>\d+)/?)"[^>]+?title="(?P<title>[^>]+)">'
_TESTS = [{
'url': 'http://www.ximalaya.com/61425525/album/5534601/',
'info_dict': {
'title': '唐诗三百首(含赏析)',
'id': '5534601',
},
'playlist_count': 312,
}, {
'url': 'http://m.ximalaya.com/61425525/album/5534601',
'info_dict': {
'title': '唐诗三百首(含赏析)',
'id': '5534601',
},
'playlist_count': 312,
},
]
def _real_extract(self, url):
self.scheme = scheme = 'https' if url.startswith('https') else 'http'
mobj = re.match(self._VALID_URL, url)
uid, playlist_id = mobj.group('uid'), mobj.group('id')
webpage = self._download_webpage(self._TEMPLATE_URL % (scheme, uid, playlist_id), playlist_id,
note='Download album page for %s' % playlist_id,
errnote='Unable to get album info')
title = self._html_search_regex(r'detailContent_title[^>]*><h1(?:[^>]+)?>([^<]+)</h1>',
webpage, 'title', fatal=False)
return self.playlist_result(self._entries(webpage, playlist_id, uid), playlist_id, title)
def _entries(self, page, playlist_id, uid):
html = page
for page_num in itertools.count(1):
for entry in self._process_page(html, uid):
yield entry
next_url = self._search_regex(r'<a\s+href=(["\'])(?P<more>[\S]+)\1[^>]+rel=(["\'])next\3',
html, 'list_next_url', default=None, group='more')
if not next_url:
break
next_full_url = self._BASE_URL_TEMPL % (self.scheme, next_url)
html = self._download_webpage(next_full_url, playlist_id)
def _process_page(self, html, uid):
find_from = html.index('album_soundlist')
for mobj in re.finditer(self._LIST_VIDEO_RE % uid, html[find_from:]):
yield self.url_result(self._BASE_URL_TEMPL % (self.scheme, mobj.group('url')),
XimalayaIE.ie_key(),
mobj.group('id'),
mobj.group('title'))

View File

@ -1596,12 +1596,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if 'token' not in video_info: if 'token' not in video_info:
video_info = get_video_info video_info = get_video_info
break break
def extract_unavailable_message():
return self._html_search_regex(
r'(?s)<h1[^>]+id="unavailable-message"[^>]*>(.+?)</h1>',
video_webpage, 'unavailable message', default=None)
if 'token' not in video_info: if 'token' not in video_info:
if 'reason' in video_info: if 'reason' in video_info:
if 'The uploader has not made this video available in your country.' in video_info['reason']: if 'The uploader has not made this video available in your country.' in video_info['reason']:
@ -1610,13 +1604,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
countries = regions_allowed.split(',') if regions_allowed else None countries = regions_allowed.split(',') if regions_allowed else None
self.raise_geo_restricted( self.raise_geo_restricted(
msg=video_info['reason'][0], countries=countries) msg=video_info['reason'][0], countries=countries)
reason = video_info['reason'][0]
if 'Invalid parameters' in reason:
unavailable_message = extract_unavailable_message()
if unavailable_message:
reason = unavailable_message
raise ExtractorError( raise ExtractorError(
'YouTube said: %s' % reason, 'YouTube said: %s' % video_info['reason'][0],
expected=True, video_id=video_id) expected=True, video_id=video_id)
else: else:
raise ExtractorError( raise ExtractorError(
@ -1821,7 +1810,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': video_info['conn'][0], 'url': video_info['conn'][0],
'player_url': player_url, 'player_url': player_url,
}] }]
elif not is_live and (len(video_info.get('url_encoded_fmt_stream_map', [''])[0]) >= 1 or len(video_info.get('adaptive_fmts', [''])[0]) >= 1): elif len(video_info.get('url_encoded_fmt_stream_map', [''])[0]) >= 1 or len(video_info.get('adaptive_fmts', [''])[0]) >= 1:
encoded_url_map = video_info.get('url_encoded_fmt_stream_map', [''])[0] + ',' + video_info.get('adaptive_fmts', [''])[0] encoded_url_map = video_info.get('url_encoded_fmt_stream_map', [''])[0] + ',' + video_info.get('adaptive_fmts', [''])[0]
if 'rtmpe%3Dyes' in encoded_url_map: if 'rtmpe%3Dyes' in encoded_url_map:
raise ExtractorError('rtmpe downloads are not supported, see https://github.com/rg3/youtube-dl/issues/343 for more information.', expected=True) raise ExtractorError('rtmpe downloads are not supported, see https://github.com/rg3/youtube-dl/issues/343 for more information.', expected=True)
@ -1944,11 +1933,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
break break
if codecs: if codecs:
dct.update(parse_codecs(codecs)) dct.update(parse_codecs(codecs))
if dct.get('acodec') == 'none' or dct.get('vcodec') == 'none':
dct['downloader_options'] = {
# Youtube throttles chunks >~10M
'http_chunk_size': 10485760,
}
formats.append(dct) formats.append(dct)
elif video_info.get('hlsvp'): elif video_info.get('hlsvp'):
manifest_url = video_info['hlsvp'][0] manifest_url = video_info['hlsvp'][0]
@ -1969,7 +1953,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True' a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = 'True'
formats.append(a_format) formats.append(a_format)
else: else:
unavailable_message = extract_unavailable_message() unavailable_message = self._html_search_regex(
r'(?s)<h1[^>]+id="unavailable-message"[^>]*>(.+?)</h1>',
video_webpage, 'unavailable message', default=None)
if unavailable_message: if unavailable_message:
raise ExtractorError(unavailable_message, expected=True) raise ExtractorError(unavailable_message, expected=True)
raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info') raise ExtractorError('no conn, hlsvp or url_encoded_fmt_stream_map information found in video info')
@ -2544,11 +2530,10 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor):
webpage = self._download_webpage(url, channel_id, fatal=False) webpage = self._download_webpage(url, channel_id, fatal=False)
if webpage: if webpage:
page_type = self._og_search_property( page_type = self._og_search_property(
'type', webpage, 'page type', default='') 'type', webpage, 'page type', default=None)
video_id = self._html_search_meta( video_id = self._html_search_meta(
'videoId', webpage, 'video id', default=None) 'videoId', webpage, 'video id', default=None)
if page_type.startswith('video') and video_id and re.match( if page_type == 'video' and video_id and re.match(r'^[0-9A-Za-z_-]{11}$', video_id):
r'^[0-9A-Za-z_-]{11}$', video_id):
return self.url_result(video_id, YoutubeIE.ie_key()) return self.url_result(video_id, YoutubeIE.ie_key())
return self.url_result(base_url) return self.url_result(base_url)

View File

@ -478,11 +478,6 @@ def parseOpts(overrideArguments=None):
'--no-resize-buffer', '--no-resize-buffer',
action='store_true', dest='noresizebuffer', default=False, action='store_true', dest='noresizebuffer', default=False,
help='Do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.') help='Do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.')
downloader.add_option(
'--http-chunk-size',
dest='http_chunk_size', metavar='SIZE', default=None,
help='Size of a chunk for chunk-based HTTP downloading (e.g. 10485760 or 10M) (default is disabled). '
'May be useful for bypassing bandwidth throttling imposed by a webserver (experimental)')
downloader.add_option( downloader.add_option(
'--test', '--test',
action='store_true', dest='test', default=False, action='store_true', dest='test', default=False,

View File

@ -866,8 +866,8 @@ def _create_http_connection(ydl_handler, http_class, is_https, *args, **kwargs):
# expected HTTP responses to meet HTTP/1.0 or later (see also # expected HTTP responses to meet HTTP/1.0 or later (see also
# https://github.com/rg3/youtube-dl/issues/6727) # https://github.com/rg3/youtube-dl/issues/6727)
if sys.version_info < (3, 0): if sys.version_info < (3, 0):
kwargs['strict'] = True kwargs[b'strict'] = True
hc = http_class(*args, **compat_kwargs(kwargs)) hc = http_class(*args, **kwargs)
source_address = ydl_handler._params.get('source_address') source_address = ydl_handler._params.get('source_address')
if source_address is not None: if source_address is not None:
sa = (source_address, 0) sa = (source_address, 0)
@ -2267,7 +2267,7 @@ def js_to_json(code):
"(?:[^"\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^"\\]*"| "(?:[^"\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^"\\]*"|
'(?:[^'\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^'\\]*'| '(?:[^'\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^'\\]*'|
{comment}|,(?={skip}[\]}}])| {comment}|,(?={skip}[\]}}])|
(?:(?<![0-9])[eE]|[a-df-zA-DF-Z_])[.a-zA-Z_0-9]*| [a-zA-Z_][.a-zA-Z_0-9]*|
\b(?:0[xX][0-9a-fA-F]+|0+[0-7]+)(?:{skip}:)?| \b(?:0[xX][0-9a-fA-F]+|0+[0-7]+)(?:{skip}:)?|
[0-9]+(?={skip}:) [0-9]+(?={skip}:)
'''.format(comment=COMMENT_RE, skip=SKIP_RE), fix_kv, code) '''.format(comment=COMMENT_RE, skip=SKIP_RE), fix_kv, code)

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2018.02.04' __version__ = '2017.12.31'