Compare commits

...

54 Commits

Author SHA1 Message Date
250eea6821 release 2017.03.02 2017-03-02 22:33:22 +07:00
28d15b73f8 [ChangeLog] Actualize 2017-03-02 22:29:56 +07:00
11bb6ad1a5 [facebook] Fix extraction (closes #12323)
Almost all videos now use the pagelet type 'permalink_video_pagelet'
2017-03-02 20:51:24 +08:00
c9612c0487 [youtube] Mark errors about rental videos as expected
Closes #12324
2017-03-02 16:59:53 +08:00
af5049f128 [adobepass] Add Charter Spectrum (#11465)
Thanks @tv21 for the fix!
2017-03-02 02:15:51 +08:00
158af5242e [utils] Carry long doc string 2017-03-01 23:04:02 +07:00
40df485f55 [YoutubeDL] Don't sanitize identifiers (closes #12317) 2017-03-01 23:03:36 +07:00
4b8a984c67 [npo] Add support for audio 2017-03-01 22:21:13 +07:00
83e8fce628 [npo] Improve extraction and update tests 2017-03-01 22:14:46 +07:00
aa9cc2ecbf [npo] Adapt to app.php API (closes #12311) 2017-03-01 05:03:35 +07:00
1dc24093f8 release 2017.02.28 2017-02-28 23:59:22 +07:00
11bae9cdde [ChangeLog] Actualize 2017-02-28 23:49:24 +07:00
43b38424a9 [azmedien:showplaylist] Improve (closes #12160) 2017-02-28 23:37:54 +07:00
948519b35d [azmedien:showplaylist] Add support for all episodes playlists 2017-02-28 23:36:05 +07:00
87dadd456a [youtube:playlist] Recognize another playlist pattern (closes #11928, closes #12286) 2017-02-28 23:06:47 +07:00
7c4aa6fd6f [daisuki] Add subtitles (#4738) 2017-02-28 22:29:01 +08:00
9bd05b5a18 [daisuki] Add new extractor (closes #4738) 2017-02-28 22:19:26 +08:00
0a5445ddbe [utils] Add bytes_to_long() and long_to_bytes()
Used in daisuki.net (#4738)

Both are adapted from public domain PyCrypto:
https://github.com/dlitz/pycrypto/blob/master/lib/Crypto/Util/number.py
2017-02-28 22:10:31 +08:00
f48409c7ac [utils] Add pkcs1pad
Used in daisuki.net (#4738)
2017-02-28 22:10:31 +08:00
c9619f0a17 [aes] Add aes_cbc_encrypt
Used in daisuki.net (#4738)
2017-02-28 22:10:31 +08:00
f4c68ba372 [douyu] Fix extraction and update _TESTS
They've switched from flv to hls

Closes #12301
2017-02-28 21:41:03 +08:00
ef48a1175d release 2017.02.27 2017-02-27 23:26:07 +07:00
c6184bcf7b [ChangeLog] Actualize 2017-02-27 23:24:03 +07:00
18abb74376 [npo] Relax _VALID_URL for zapp.nl 2017-02-27 23:13:51 +07:00
dbc01fdb6f [hetklokhuis] Fix IE_NAME 2017-02-27 23:10:29 +07:00
f264c62334 [npo] Add support for zapp.nl 2017-02-27 23:10:00 +07:00
0dc5a86a32 [npo] Add support for hetklokhuis.nl (closes #12293) 2017-02-27 22:43:19 +07:00
0e879f432a [youtube:channel] Remove duplicate test 2017-02-27 22:22:43 +07:00
892b47ab6c [scivee] Remove extractor (#9315)
The Wikipedia page is changed from active to down:
https://en.wikipedia.org/w/index.php?title=SciVee&diff=prev&oldid=723161154

Some other interesting bits:

$ nslookup www.scivee.tv
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
www.scivee.tv   canonical name = scivee.rcsb.org.
Name:   scivee.rcsb.org
Address: 132.249.231.211

$ nslookup rcsb.org
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
Name:   rcsb.org
Address: 132.249.231.77

Both IPs are from UCSD. I guess it's maintained by a lab and they don't
maintain it anymore.
2017-02-27 21:34:33 +08:00
fdeea72611 [cda] Decode URL (fixes #12255) 2017-02-26 22:05:52 +08:00
xbe
7fd4655256 [crunchyroll] Extract uploader name that's not a link
Provide the Crunchyroll extractor with the ability to extract uploader
names that aren't links. Add a test for this new functionality.
This fixes #12267.
2017-02-26 19:08:10 +08:00
fd5c4aab59 [youtube] Raise GeoRestrictedError 2017-02-26 16:52:40 +07:00
8878789f11 [dailymotion] Raise GeoRestrictedError 2017-02-26 16:52:40 +07:00
a5cf17989b [MDR] Relax _VALID_URL and playerURL matching and update _TESTS
Ref: #12169
2017-02-26 17:24:54 +08:00
b3aec47665 [tvigle] Raise GeoRestrictedError 2017-02-25 23:27:45 +07:00
9d0c08a02c [vevo] Fix videos with the new streams/streamsV3 format (closes #11719) 2017-02-26 00:15:49 +08:00
e498758b9c [freshlive] Fix issues and improve (closes #12175) 2017-02-25 22:56:42 +07:00
5fc8d89361 [freshlive] Add extractor 2017-02-25 22:55:17 +07:00
d374d943f3 [downloader/common] Limit displaying 2 digits after decimal point in sleep interval message 2017-02-25 20:59:04 +07:00
103f8c8d36 [xhamster] Capture and output videoClosed error (#12263) 2017-02-25 20:38:21 +07:00
922ab7840b [etonline] Add extractor (closes #12236) 2017-02-25 20:16:40 +07:00
831217291a [compat] Use try except for compat_numeric_types 2017-02-25 19:44:50 +07:00
db182c63fb [njpwworld] Add new extractor (closes #11561) 2017-02-25 18:44:39 +08:00
eeb0a95684 [extractor/common] Add 'preference' to _parse_html5_media_entries
Some websites, like NJPWorld, put different qualities on different
player pages.
2017-02-25 18:40:05 +08:00
231bcd0b6b [amcnetworks] Relax _VALID_URL (#12127) 2017-02-25 02:51:53 +07:00
204efc8509 release 2017.02.24.1 2017-02-24 21:59:39 +07:00
5d3a51e1b9 [ChangeLog] Actualize 2017-02-24 21:57:39 +07:00
ad3033037c [noco] Modernize 2017-02-24 21:51:56 +07:00
f3bc281239 [noco] Swtich login URL to https (closes #12246) 2017-02-24 21:48:34 +07:00
441d7a32e5 [thescene] Extract more metadata 2017-02-24 21:22:29 +07:00
51ed496307 [thescene] Fix extraction (closes #12235) 2017-02-24 22:08:45 +08:00
68f17a9c2d [tubitv] use geo bypass mechanism 2017-02-24 12:27:56 +01:00
39e7277ed1 [openload] fix extraction(closes #10408) 2017-02-24 11:21:58 +01:00
42dcdbe11c [ivi] Raise GeoRestrictedError 2017-02-24 10:54:39 +07:00
37 changed files with 1050 additions and 387 deletions

View File

@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.24*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.24**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.03.02*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.03.02**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.02.24
[debug] youtube-dl version 2017.03.02
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -1,3 +1,69 @@
version 2017.03.02
Core
+ [adobepass] Add support for Charter Spectrum (#11465)
* [YoutubeDL] Don't sanitize identifiers in output template (#12317)
Extractors
* [facebook] Fix extraction (#12323, #12330)
* [youtube] Mark errors about rental videos as expected (#12324)
+ [npo] Add support for audio
* [npo] Adapt to app.php API (#12311, #12320)
version 2017.02.28
Core
+ [utils] Add bytes_to_long and long_to_bytes
+ [utils] Add pkcs1pad
+ [aes] Add aes_cbc_encrypt
Extractors
+ [azmedien:showplaylist] Add support for show playlists (#12160)
+ [youtube:playlist] Recognize another playlist pattern (#11928, #12286)
+ [daisuki] Add support for daisuki.net (#2486, #3186, #4738, #6175, #7776,
#10060)
* [douyu] Fix extraction (#12301)
version 2017.02.27
Core
* [downloader/common] Limit displaying 2 digits after decimal point in sleep
interval message (#12183)
+ [extractor/common] Add preference to _parse_html5_media_entries
Extractors
+ [npo] Add support for zapp.nl
+ [npo] Add support for hetklokhuis.nl (#12293)
- [scivee] Remove extractor (#9315)
+ [cda] Decode download URL (#12255)
+ [crunchyroll] Improve uploader extraction (#12267)
+ [youtube] Raise GeoRestrictedError
+ [dailymotion] Raise GeoRestrictedError
+ [mdr] Recognize more URL patterns (#12169)
+ [tvigle] Raise GeoRestrictedError
* [vevo] Fix extraction for videos with the new streams/streamsV3 format
(#11719)
+ [freshlive] Add support for freshlive.tv (#12175)
+ [xhamster] Capture and output videoClosed error (#12263)
+ [etonline] Add support for etonline.com (#12236)
+ [njpwworld] Add support for njpwworld.com (#11561)
* [amcnetworks] Relax URL regular expression (#12127)
version 2017.02.24.1
Extractors
* [noco] Modernize
* [noco] Switch login URL to https (#12246)
+ [thescene] Extract more metadata
* [thescene] Fix extraction (#12235)
+ [tubitv] Use geo bypass mechanism
* [openload] Fix extraction (#10408)
+ [ivi] Raise GeoRestrictedError
version 2017.02.24
Core

View File

@ -78,6 +78,7 @@
- **awaan:video**
- **AZMedien**: AZ Medien videos
- **AZMedienPlaylist**: AZ Medien playlists
- **AZMedienShowPlaylist**: AZ Medien show playlists
- **Azubu**
- **AzubuLive**
- **BaiduVideo**: 百度视频
@ -191,6 +192,8 @@
- **dailymotion:playlist**
- **dailymotion:user**
- **DailymotionCloud**
- **Daisuki**
- **DaisukiPlaylist**
- **daum.net**
- **daum.net:clip**
- **daum.net:playlist**
@ -239,6 +242,7 @@
- **ESPN**
- **ESPNArticle**
- **EsriVideo**
- **ETOnline**
- **Europa**
- **EveryonesMixtape**
- **ExpoTV**
@ -274,6 +278,7 @@
- **francetvinfo.fr**
- **Freesound**
- **freespeech.org**
- **FreshLive**
- **Funimation**
- **FunnyOrDie**
- **Fusion**
@ -310,6 +315,7 @@
- **HellPorno**
- **Helsinki**: helsinki.fi
- **HentaiStigma**
- **hetklokhuis**
- **hgtv.com:show**
- **HistoricFilms**
- **history:topic**: History.com Topic
@ -511,6 +517,7 @@
- **Nintendo**
- **njoy**: N-JOY
- **njoy:embed**
- **NJPWWorld**: 新日本プロレスワールド
- **NobelPrize**
- **Noco**
- **Normalboots**
@ -666,7 +673,6 @@
- **savefrom.net**
- **SBS**: sbs.com.au
- **schooltv**
- **SciVee**
- **screen.yahoo:search**: Yahoo screen search
- **Screencast**
- **ScreencastOMatic**

View File

@ -8,7 +8,7 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.aes import aes_decrypt, aes_encrypt, aes_cbc_decrypt, aes_decrypt_text
from youtube_dl.aes import aes_decrypt, aes_encrypt, aes_cbc_decrypt, aes_cbc_encrypt, aes_decrypt_text
from youtube_dl.utils import bytes_to_intlist, intlist_to_bytes
import base64
@ -34,6 +34,13 @@ class TestAES(unittest.TestCase):
decrypted = intlist_to_bytes(aes_cbc_decrypt(data, self.key, self.iv))
self.assertEqual(decrypted.rstrip(b'\x08'), self.secret_msg)
def test_cbc_encrypt(self):
data = bytes_to_intlist(self.secret_msg)
encrypted = intlist_to_bytes(aes_cbc_encrypt(data, self.key, self.iv))
self.assertEqual(
encrypted,
b"\x97\x92+\xe5\x0b\xc3\x18\x91ky9m&\xb3\xb5@\xe6'\xc2\x96.\xc8u\x88\xab9-[\x9e|\xf1\xcd")
def test_decrypt_text(self):
password = intlist_to_bytes(self.key).decode('utf-8')
encrypted = base64.b64encode(

View File

@ -52,6 +52,7 @@ from youtube_dl.utils import (
parse_filesize,
parse_count,
parse_iso8601,
pkcs1pad,
read_batch_urls,
sanitize_filename,
sanitize_path,
@ -1104,6 +1105,14 @@ The first line
ohdave_rsa_encrypt(b'aa111222', e, N),
'726664bd9a23fd0c70f9f1b84aab5e3905ce1e45a584e9cbcf9bcc7510338fc1986d6c599ff990d923aa43c51c0d9013cd572e13bc58f4ae48f2ed8c0b0ba881')
def test_pkcs1pad(self):
data = [1, 2, 3]
padded_data = pkcs1pad(data, 32)
self.assertEqual(padded_data[:2], [0, 2])
self.assertEqual(padded_data[28:], [0, 1, 2, 3])
self.assertRaises(ValueError, pkcs1pad, data, 8)
def test_encode_base_n(self):
self.assertEqual(encode_base_n(0, 30), '0')
self.assertEqual(encode_base_n(80, 30), '2k')

View File

@ -616,7 +616,7 @@ class YoutubeDL(object):
sanitize = lambda k, v: sanitize_filename(
compat_str(v),
restricted=self.params.get('restrictfilenames'),
is_id=(k == 'id'))
is_id=(k == 'id' or k.endswith('_id')))
template_dict = dict((k, v if isinstance(v, compat_numeric_types) else sanitize(k, v))
for k, v in template_dict.items()
if v is not None and not isinstance(v, (list, tuple, dict)))

View File

@ -60,6 +60,34 @@ def aes_cbc_decrypt(data, key, iv):
return decrypted_data
def aes_cbc_encrypt(data, key, iv):
"""
Encrypt with aes in CBC mode. Using PKCS#7 padding
@param {int[]} data cleartext
@param {int[]} key 16/24/32-Byte cipher key
@param {int[]} iv 16-Byte IV
@returns {int[]} encrypted data
"""
expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
encrypted_data = []
previous_cipher_block = iv
for i in range(block_count):
block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]
remaining_length = BLOCK_SIZE_BYTES - len(block)
block += [remaining_length] * remaining_length
mixed_block = xor(block, previous_cipher_block)
encrypted_block = aes_encrypt(mixed_block, expanded_key)
encrypted_data += encrypted_block
previous_cipher_block = encrypted_block
return encrypted_data
def key_expansion(data):
"""
Generate key schedule

View File

@ -2760,8 +2760,10 @@ else:
compat_kwargs = lambda kwargs: kwargs
compat_numeric_types = ((int, float, long, complex) if sys.version_info[0] < 3
else (int, float, complex))
try:
compat_numeric_types = (int, float, long, complex)
except NameError: # Python 3
compat_numeric_types = (int, float, complex)
if sys.version_info < (2, 7):

View File

@ -347,7 +347,10 @@ class FileDownloader(object):
if min_sleep_interval:
max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
self.to_screen(
'[download] Sleeping %s seconds...' % (
int(sleep_interval) if sleep_interval.is_integer()
else '%.2f' % sleep_interval))
time.sleep(sleep_interval)
return self.real_download(filename, info_dict)

View File

@ -36,6 +36,11 @@ MSO_INFO = {
'username_field': 'Ecom_User_ID',
'password_field': 'Ecom_Password',
},
'Charter_Direct': {
'name': 'Charter Spectrum',
'username_field': 'IDToken1',
'password_field': 'IDToken2',
},
'thr030': {
'name': '3 Rivers Communications'
},

View File

@ -10,7 +10,7 @@ from ..utils import (
class AMCNetworksIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies/|shows/[^/]+/(?:full-episodes/)?[^/]+/episode-\d+(?:-(?:[^/]+/)?|/))(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies|shows(?:/[^/]+)+)/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
'md5': '',
@ -44,6 +44,12 @@ class AMCNetworksIE(ThePlatformIE):
}, {
'url': 'http://www.bbcamerica.com/shows/doctor-who/full-episodes/the-power-of-the-daleks/episode-01-episode-1-color-version',
'only_matching': True,
}, {
'url': 'http://www.wetv.com/shows/mama-june-from-not-to-hot/full-episode/season-01/thin-tervention',
'only_matching': True,
}, {
'url': 'http://www.wetv.com/shows/la-hair/videos/season-05/episode-09-episode-9-2/episode-9-sneak-peek-3',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import re
@ -5,6 +6,7 @@ import re
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import (
get_element_by_class,
get_element_by_id,
strip_or_none,
urljoin,
@ -170,3 +172,42 @@ class AZMedienPlaylistIE(AZMedienBaseIE):
'video-title', webpage)), group='title')
return self.playlist_result(entries, show_id, title)
class AZMedienShowPlaylistIE(AZMedienBaseIE):
IE_DESC = 'AZ Medien show playlists'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
telezueri\.ch|
telebaern\.tv|
telem1\.ch
)/
(?:
all-episodes|
alle-episoden
)/
(?P<id>[^/?#&]+)
'''
_TEST = {
'url': 'http://www.telezueri.ch/all-episodes/astrotalk',
'info_dict': {
'id': 'astrotalk',
'title': 'TeleZüri: AstroTalk - alle episoden',
'description': 'md5:4c0f7e7d741d906004266e295ceb4a26',
},
'playlist_mincount': 13,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
episodes = get_element_by_class('search-mobile-box', webpage)
entries = [self.url_result(
urljoin(url, m.group('url'))) for m in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1', episodes)]
title = self._og_search_title(webpage, fatal=False)
description = self._og_search_description(webpage)
return self.playlist_result(entries, playlist_id, title, description)

View File

@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import codecs
import re
from .common import InfoExtractor
@ -96,6 +97,10 @@ class CDAIE(InfoExtractor):
if not video or 'file' not in video:
self.report_warning('Unable to extract %s version information' % version)
return
if video['file'].startswith('uggc'):
video['file'] = codecs.decode(video['file'], 'rot_13')
if video['file'].endswith('adc.mp4'):
video['file'] = video['file'].replace('adc.mp4', '.mp4')
f = {
'url': video['file'],
}

View File

@ -2010,7 +2010,7 @@ class InfoExtractor(object):
})
return formats
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8', mpd_id=None):
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8', mpd_id=None, preference=None):
def absolute_url(video_url):
return compat_urlparse.urljoin(base_url, video_url)
@ -2032,7 +2032,8 @@ class InfoExtractor(object):
is_plain_url = False
formats = self._extract_m3u8_formats(
full_url, video_id, ext='mp4',
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id)
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id,
preference=preference)
elif ext == 'mpd':
is_plain_url = False
formats = self._extract_mpd_formats(

View File

@ -207,6 +207,21 @@ class CrunchyrollIE(CrunchyrollBaseIE):
# Just test metadata extraction
'skip_download': True,
},
}, {
# make sure we can extract an uploader name that's not a link
'url': 'http://www.crunchyroll.com/hakuoki-reimeiroku/episode-1-dawn-of-the-divine-warriors-606899',
'info_dict': {
'id': '606899',
'ext': 'mp4',
'title': 'Hakuoki Reimeiroku Episode 1 Dawn of the Divine Warriors',
'description': 'Ryunosuke was left to die, but Serizawa-san asked him a simple question "Do you want to live?"',
'uploader': 'Geneon Entertainment',
'upload_date': '20120717',
},
'params': {
# just test metadata extraction
'skip_download': True,
},
}]
_FORMAT_IDS = {
@ -388,8 +403,9 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
if video_upload_date:
video_upload_date = unified_strdate(video_upload_date)
video_uploader = self._html_search_regex(
r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', webpage,
'video_uploader', fatal=False)
# try looking for both an uploader that's a link and one that's not
[r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
webpage, 'video_uploader', fatal=False)
available_fmts = []
for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):

View File

@ -282,9 +282,14 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
}
def _check_error(self, info):
error = info.get('error')
if info.get('error') is not None:
title = error['title']
# See https://developer.dailymotion.com/api#access-error
if error.get('code') == 'DM007':
self.raise_geo_restricted(msg=title)
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, info['error']['title']), expected=True)
'%s said: %s' % (self.IE_NAME, title), expected=True)
def _get_subtitles(self, video_id, webpage):
try:

View File

@ -0,0 +1,159 @@
from __future__ import unicode_literals
import base64
import json
import random
import re
from .common import InfoExtractor
from ..aes import (
aes_cbc_decrypt,
aes_cbc_encrypt,
)
from ..utils import (
bytes_to_intlist,
bytes_to_long,
clean_html,
ExtractorError,
intlist_to_bytes,
get_element_by_id,
js_to_json,
int_or_none,
long_to_bytes,
pkcs1pad,
remove_end,
)
class DaisukiIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?daisuki\.net/[^/]+/[^/]+/[^/]+/watch\.[^.]+\.(?P<id>\d+)\.html'
_TEST = {
'url': 'http://www.daisuki.net/tw/en/anime/watch.TheIdolMasterCG.11213.html',
'info_dict': {
'id': '11213',
'ext': 'mp4',
'title': '#01 Who is in the pumpkin carriage? - THE IDOLM@STER CINDERELLA GIRLS',
'subtitles': {
'mul': [{
'ext': 'ttml',
}],
},
'creator': 'BANDAI NAMCO Entertainment',
},
'params': {
'skip_download': True, # AES-encrypted HLS stream
},
}
# The public key in PEM format can be found in clientlibs_anime_watch.min.js
_RSA_KEY = (0xc5524c25e8e14b366b3754940beeb6f96cb7e2feef0b932c7659a0c5c3bf173d602464c2df73d693b513ae06ff1be8f367529ab30bf969c5640522181f2a0c51ea546ae120d3d8d908595e4eff765b389cde080a1ef7f1bbfb07411cc568db73b7f521cedf270cbfbe0ddbc29b1ac9d0f2d8f4359098caffee6d07915020077d, 65537)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
flashvars = self._parse_json(self._search_regex(
r'(?s)var\s+flashvars\s*=\s*({.+?});', webpage, 'flashvars'),
video_id, transform_source=js_to_json)
iv = [0] * 16
data = {}
for key in ('device_cd', 'mv_id', 'ss1_prm', 'ss2_prm', 'ss3_prm', 'ss_id'):
data[key] = flashvars.get(key, '')
encrypted_rtn = None
# Some AES keys are rejected. Try it with different AES keys
for idx in range(5):
aes_key = [random.randint(0, 254) for _ in range(32)]
padded_aeskey = intlist_to_bytes(pkcs1pad(aes_key, 128))
n, e = self._RSA_KEY
encrypted_aeskey = long_to_bytes(pow(bytes_to_long(padded_aeskey), e, n))
init_data = self._download_json('http://www.daisuki.net/bin/bgn/init', video_id, query={
's': flashvars.get('s', ''),
'c': flashvars.get('ss3_prm', ''),
'e': url,
'd': base64.b64encode(intlist_to_bytes(aes_cbc_encrypt(
bytes_to_intlist(json.dumps(data)),
aes_key, iv))).decode('ascii'),
'a': base64.b64encode(encrypted_aeskey).decode('ascii'),
}, note='Downloading JSON metadata' + (' (try #%d)' % (idx + 1) if idx > 0 else ''))
if 'rtn' in init_data:
encrypted_rtn = init_data['rtn']
break
self._sleep(5, video_id)
if encrypted_rtn is None:
raise ExtractorError('Failed to fetch init data')
rtn = self._parse_json(
intlist_to_bytes(aes_cbc_decrypt(bytes_to_intlist(
base64.b64decode(encrypted_rtn)),
aes_key, iv)).decode('utf-8').rstrip('\0'),
video_id)
formats = self._extract_m3u8_formats(
rtn['play_url'], video_id, ext='mp4', entry_protocol='m3u8_native')
title = remove_end(self._og_search_title(webpage), ' - DAISUKI')
creator = self._html_search_regex(
r'Creator\s*:\s*([^<]+)', webpage, 'creator', fatal=False)
subtitles = {}
caption_url = rtn.get('caption_url')
if caption_url:
# mul: multiple languages
subtitles['mul'] = [{
'url': caption_url,
'ext': 'ttml',
}]
return {
'id': video_id,
'title': title,
'formats': formats,
'subtitles': subtitles,
'creator': creator,
}
class DaisukiPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)daisuki\.net/[^/]+/[^/]+/[^/]+/detail\.(?P<id>[a-zA-Z0-9]+)\.html'
_TEST = {
'url': 'http://www.daisuki.net/tw/en/anime/detail.TheIdolMasterCG.html',
'info_dict': {
'id': 'TheIdolMasterCG',
'title': 'THE IDOLM@STER CINDERELLA GIRLS',
'description': 'md5:0f2c028a9339f7a2c7fbf839edc5c5d8',
},
'playlist_count': 26,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
episode_pattern = r'''(?sx)
<img[^>]+delay="[^"]+/(\d+)/movie\.jpg".+?
<p[^>]+class=".*?\bepisodeNumber\b.*?">(?:<a[^>]+>)?([^<]+)'''
entries = [{
'_type': 'url_transparent',
'url': url.replace('detail', 'watch').replace('.html', '.' + movie_id + '.html'),
'episode_id': episode_id,
'episode_number': int_or_none(episode_id),
} for movie_id, episode_id in re.findall(episode_pattern, webpage)]
playlist_title = remove_end(
self._og_search_title(webpage, fatal=False), ' - Anime - DAISUKI')
playlist_description = clean_html(get_element_by_id('synopsisTxt', webpage))
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)

View File

@ -1,15 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import hashlib
import time
import uuid
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse_urlencode,
)
from ..utils import (
ExtractorError,
unescapeHTML,
@ -24,8 +16,8 @@ class DouyuTVIE(InfoExtractor):
'info_dict': {
'id': '17732',
'display_id': 'iseven',
'ext': 'flv',
'title': 're:^清晨醒脑T-ara根本停不下来! [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'ext': 'mp4',
'title': 're:^清晨醒脑T-ARA根本停不下来! [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': r're:.*m7show@163\.com.*',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': '7师傅',
@ -39,7 +31,7 @@ class DouyuTVIE(InfoExtractor):
'info_dict': {
'id': '85982',
'display_id': '85982',
'ext': 'flv',
'ext': 'mp4',
'title': 're:^小漠从零单排记——CSOL2躲猫猫 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:746a2f7a253966a06755a912f0acc0d2',
'thumbnail': r're:^https?://.*\.jpg$',
@ -55,8 +47,8 @@ class DouyuTVIE(InfoExtractor):
'info_dict': {
'id': '17732',
'display_id': '17732',
'ext': 'flv',
'title': 're:^清晨醒脑T-ara根本停不下来! [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'ext': 'mp4',
'title': 're:^清晨醒脑T-ARA根本停不下来! [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': r're:.*m7show@163\.com.*',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': '7师傅',
@ -96,45 +88,18 @@ class DouyuTVIE(InfoExtractor):
if room.get('show_status') == '2':
raise ExtractorError('Live stream is offline', expected=True)
tt = compat_str(int(time.time() / 60))
did = uuid.uuid4().hex.upper()
sign_content = ''.join((room_id, did, self._API_KEY, tt))
sign = hashlib.md5((sign_content).encode('utf-8')).hexdigest()
flv_data = compat_urllib_parse_urlencode({
'cdn': 'ws',
'rate': '0',
'tt': tt,
'did': did,
'sign': sign,
})
video_info = self._download_json(
'http://www.douyu.com/lapi/live/getPlay/%s' % room_id, video_id,
data=flv_data, note='Downloading video info',
headers={'Content-Type': 'application/x-www-form-urlencoded'})
error_code = video_info.get('error', 0)
if error_code is not 0:
raise ExtractorError(
'%s reported error %i' % (self.IE_NAME, error_code),
expected=True)
base_url = video_info['data']['rtmp_url']
live_path = video_info['data']['rtmp_live']
video_url = '%s/%s' % (base_url, live_path)
formats = self._extract_m3u8_formats(
room['hls_url'], video_id, ext='mp4')
title = self._live_title(unescapeHTML(room['room_name']))
description = room.get('notice')
description = room.get('show_details')
thumbnail = room.get('room_src')
uploader = room.get('nickname')
return {
'id': room_id,
'display_id': video_id,
'url': video_url,
'formats': formats,
'title': title,
'description': description,
'thumbnail': thumbnail,

View File

@ -0,0 +1,39 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class ETOnlineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?etonline\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.etonline.com/tv/211130_dove_cameron_liv_and_maddie_emotional_episode_series_finale/',
'info_dict': {
'id': '211130_dove_cameron_liv_and_maddie_emotional_episode_series_finale',
'title': 'md5:a21ec7d3872ed98335cbd2a046f34ee6',
'description': 'md5:8b94484063f463cca709617c79618ccd',
},
'playlist_count': 2,
}, {
'url': 'http://www.etonline.com/media/video/here_are_the_stars_who_love_bringing_their_moms_as_dates_to_the_oscars-211359/',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1242911076001/default_default/index.html?videoId=ref:%s'
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % video_id, 'BrightcoveNew', video_id)
for video_id in re.findall(
r'site\.brightcove\s*\([^,]+,\s*["\'](title_\d+)', webpage)]
return self.playlist_result(
entries, playlist_id,
self._og_search_title(webpage, fatal=False),
self._og_search_description(webpage))

View File

@ -83,6 +83,7 @@ from .awaan import (
from .azmedien import (
AZMedienIE,
AZMedienPlaylistIE,
AZMedienShowPlaylistIE,
)
from .azubu import AzubuIE, AzubuLiveIE
from .baidu import BaiduVideoIE
@ -227,6 +228,10 @@ from .dailymotion import (
DailymotionUserIE,
DailymotionCloudIE,
)
from .daisuki import (
DaisukiIE,
DaisukiPlaylistIE,
)
from .daum import (
DaumIE,
DaumClipIE,
@ -288,6 +293,7 @@ from .espn import (
ESPNArticleIE,
)
from .esri import EsriVideoIE
from .etonline import ETOnlineIE
from .europa import EuropaIE
from .everyonesmixtape import EveryonesMixtapeIE
from .expotv import ExpoTVIE
@ -338,6 +344,7 @@ from .francetv import (
)
from .freesound import FreesoundIE
from .freespeech import FreespeechIE
from .freshlive import FreshLiveIE
from .funimation import FunimationIE
from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE
@ -637,6 +644,7 @@ from .ninecninemedia import (
from .ninegag import NineGagIE
from .ninenow import NineNowIE
from .nintendo import NintendoIE
from .njpwworld import NJPWWorldIE
from .nobelprize import NobelPrizeIE
from .noco import NocoIE
from .normalboots import NormalbootsIE
@ -666,6 +674,7 @@ from .npo import (
NPORadioIE,
NPORadioFragmentIE,
SchoolTVIE,
HetKlokhuisIE,
VPROIE,
WNLIE,
)
@ -835,7 +844,6 @@ from .safari import (
from .sapo import SapoIE
from .savefrom import SaveFromIE
from .sbs import SBSIE
from .scivee import SciVeeIE
from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE
from .scrippsnetworks import ScrippsNetworksWatchIE

View File

@ -303,7 +303,7 @@ class FacebookIE(InfoExtractor):
if not video_data:
server_js_data = self._parse_json(
self._search_regex(
r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:stream_pagelet|pagelet_group_mall)',
r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:stream_pagelet|pagelet_group_mall|permalink_video_pagelet)',
webpage, 'js data', default='{}'),
video_id, transform_source=js_to_json, fatal=False)
if server_js_data:

View File

@ -0,0 +1,84 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
try_get,
unified_timestamp,
)
class FreshLiveIE(InfoExtractor):
_VALID_URL = r'https?://freshlive\.tv/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'https://freshlive.tv/satotv/74712',
'md5': '9f0cf5516979c4454ce982df3d97f352',
'info_dict': {
'id': '74712',
'ext': 'mp4',
'title': 'テスト',
'description': 'テスト',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1511,
'timestamp': 1483619655,
'upload_date': '20170105',
'uploader': 'サトTV',
'uploader_id': 'satotv',
'view_count': int,
'comment_count': int,
'is_live': False,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
options = self._parse_json(
self._search_regex(
r'window\.__CONTEXT__\s*=\s*({.+?});\s*</script>',
webpage, 'initial context'),
video_id)
info = options['context']['dispatcher']['stores']['ProgramStore']['programs'][video_id]
title = info['title']
if info.get('status') == 'upcoming':
raise ExtractorError('Stream %s is upcoming' % video_id, expected=True)
stream_url = info.get('liveStreamUrl') or info['archiveStreamUrl']
is_live = info.get('liveStreamUrl') is not None
formats = self._extract_m3u8_formats(
stream_url, video_id, ext='mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id='hls')
if is_live:
title = self._live_title(title)
return {
'id': video_id,
'formats': formats,
'title': title,
'description': info.get('description'),
'thumbnail': info.get('thumbnailUrl'),
'duration': int_or_none(info.get('airTime')),
'timestamp': unified_timestamp(info.get('createdAt')),
'uploader': try_get(
info, lambda x: x['channel']['title'], compat_str),
'uploader_id': try_get(
info, lambda x: x['channel']['code'], compat_str),
'uploader_url': try_get(
info, lambda x: x['channel']['permalink'], compat_str),
'view_count': int_or_none(info.get('viewCount')),
'comment_count': int_or_none(info.get('commentCount')),
'tags': info.get('tags', []),
'is_live': is_live,
}

View File

@ -16,6 +16,8 @@ class IviIE(InfoExtractor):
IE_DESC = 'ivi.ru'
IE_NAME = 'ivi'
_VALID_URL = r'https?://(?:www\.)?ivi\.ru/(?:watch/(?:[^/]+/)?|video/player\?.*?videoId=)(?P<id>\d+)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['RU']
_TESTS = [
# Single movie
@ -91,7 +93,11 @@ class IviIE(InfoExtractor):
if 'error' in video_json:
error = video_json['error']
if error['origin'] == 'NoRedisValidData':
origin = error['origin']
if origin == 'NotAllowedForLocation':
self.raise_geo_restricted(
msg=error['message'], countries=self._GEO_COUNTRIES)
elif origin == 'NoRedisValidData':
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
raise ExtractorError(
'Unable to download video %s: %s' % (video_id, error['message']),

View File

@ -14,7 +14,7 @@ from ..utils import (
class MDRIE(InfoExtractor):
IE_DESC = 'MDR.DE and KiKA'
_VALID_URL = r'https?://(?:www\.)?(?:mdr|kika)\.de/(?:.*)/[a-z]+-?(?P<id>\d+)(?:_.+?)?\.html'
_VALID_URL = r'https?://(?:www\.)?(?:mdr|kika)\.de/(?:.*)/[a-z-]+-?(?P<id>\d+)(?:_.+?)?\.html'
_TESTS = [{
# MDR regularly deletes its videos
@ -31,6 +31,7 @@ class MDRIE(InfoExtractor):
'duration': 250,
'uploader': 'MITTELDEUTSCHER RUNDFUNK',
},
'skip': '404 not found',
}, {
'url': 'http://www.kika.de/baumhaus/videos/video19636.html',
'md5': '4930515e36b06c111213e80d1e4aad0e',
@ -41,6 +42,7 @@ class MDRIE(InfoExtractor):
'duration': 134,
'uploader': 'KIKA',
},
'skip': '404 not found',
}, {
'url': 'http://www.kika.de/sendungen/einzelsendungen/weihnachtsprogramm/videos/video8182.html',
'md5': '5fe9c4dd7d71e3b238f04b8fdd588357',
@ -49,11 +51,21 @@ class MDRIE(InfoExtractor):
'ext': 'mp4',
'title': 'Beutolomäus und der geheime Weihnachtswunsch',
'description': 'md5:b69d32d7b2c55cbe86945ab309d39bbd',
'timestamp': 1450950000,
'upload_date': '20151224',
'timestamp': 1482541200,
'upload_date': '20161224',
'duration': 4628,
'uploader': 'KIKA',
},
}, {
# audio with alternative playerURL pattern
'url': 'http://www.mdr.de/kultur/videos-und-audios/audio-radio/operation-mindfuck-robert-wilson100.html',
'info_dict': {
'id': '100',
'ext': 'mp4',
'title': 'Feature: Operation Mindfuck - Robert Anton Wilson',
'duration': 3239,
'uploader': 'MITTELDEUTSCHER RUNDFUNK',
},
}, {
'url': 'http://www.kika.de/baumhaus/sendungen/video19636_zc-fea7f8a0_zs-4bf89c60.html',
'only_matching': True,
@ -71,7 +83,7 @@ class MDRIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
data_url = self._search_regex(
r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+?-avCustom\.xml)\1',
webpage, 'data url', group='url').replace(r'\/', '/')
doc = self._download_xml(

View File

@ -0,0 +1,83 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
get_element_by_class,
urlencode_postdata,
)
class NJPWWorldIE(InfoExtractor):
_VALID_URL = r'https?://njpwworld\.com/p/(?P<id>[a-z0-9_]+)'
IE_DESC = '新日本プロレスワールド'
_NETRC_MACHINE = 'njpwworld'
_TEST = {
'url': 'http://njpwworld.com/p/s_series_00155_1_9/',
'info_dict': {
'id': 's_series_00155_1_9',
'ext': 'mp4',
'title': '第9試合 ランディ・サベージ vs リック・スタイナー',
'tags': list,
},
'params': {
'skip_download': True, # AES-encrypted m3u8
},
'skip': 'Requires login',
}
def _real_initialize(self):
self._login()
def _login(self):
username, password = self._get_login_info()
# No authentication to be performed
if not username:
return True
webpage, urlh = self._download_webpage_handle(
'https://njpwworld.com/auth/login', None,
note='Logging in', errnote='Unable to login',
data=urlencode_postdata({'login_id': username, 'pw': password}))
# /auth/login will return 302 for successful logins
if urlh.geturl() == 'https://njpwworld.com/auth/login':
self.report_warning('unable to login')
return False
return True
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
formats = []
for player_url, kind in re.findall(r'<a[^>]+href="(/player[^"]+)".+?<img[^>]+src="[^"]+qf_btn_([^".]+)', webpage):
player_url = compat_urlparse.urljoin(url, player_url)
player_page = self._download_webpage(
player_url, video_id, note='Downloading player page')
entries = self._parse_html5_media_entries(
player_url, player_page, video_id, m3u8_id='hls-%s' % kind,
m3u8_entry_protocol='m3u8_native',
preference=2 if 'hq' in kind else 1)
formats.extend(entries[0]['formats'])
self._sort_formats(formats)
post_content = get_element_by_class('post-content', webpage)
tags = re.findall(
r'<li[^>]+class="tag-[^"]+"><a[^>]*>([^<]+)</a></li>', post_content
) if post_content else None
return {
'id': video_id,
'title': self._og_search_title(webpage),
'formats': formats,
'tags': tags,
}

View File

@ -23,7 +23,7 @@ from ..utils import (
class NocoIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?noco\.tv/emission/|player\.noco\.tv/\?idvideo=)(?P<id>\d+)'
_LOGIN_URL = 'http://noco.tv/do.php'
_LOGIN_URL = 'https://noco.tv/do.php'
_API_URL_TEMPLATE = 'https://api.noco.tv/1.1/%s?ts=%s&tk=%s'
_SUB_LANG_TEMPLATE = '&sub_lang=%s'
_NETRC_MACHINE = 'noco'
@ -69,16 +69,17 @@ class NocoIE(InfoExtractor):
if username is None:
return
login_form = {
'a': 'login',
'cookie': '1',
'username': username,
'password': password,
}
request = sanitized_Request(self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded; charset=UTF-8')
login = self._download_json(request, None, 'Logging in as %s' % username)
login = self._download_json(
self._LOGIN_URL, None, 'Logging in as %s' % username,
data=urlencode_postdata({
'a': 'login',
'cookie': '1',
'username': username,
'password': password,
}),
headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
})
if 'erreur' in login:
raise ExtractorError('Unable to login: %s' % clean_html(login['erreur']), expected=True)

View File

@ -3,41 +3,27 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..compat import (
compat_HTTPError,
compat_str,
)
from ..utils import (
determine_ext,
ExtractorError,
fix_xml_ampersands,
orderedSet,
parse_duration,
qualities,
strip_jsonp,
unified_strdate,
ExtractorError,
)
class NPOBaseIE(InfoExtractor):
def _get_token(self, video_id):
token_page = self._download_webpage(
'http://ida.omroep.nl/npoplayer/i.js',
video_id, note='Downloading token')
token = self._search_regex(
r'npoplayer\.token = "(.+?)"', token_page, 'token')
# Decryption algorithm extracted from http://npoplayer.omroep.nl/csjs/npoplayer-min.js
token_l = list(token)
first = second = None
for i in range(5, len(token_l) - 4):
if token_l[i].isdigit():
if first is None:
first = i
elif second is None:
second = i
if first is None or second is None:
first = 12
second = 13
token_l[first], token_l[second] = token_l[second], token_l[first]
return ''.join(token_l)
return self._download_json(
'http://ida.omroep.nl/app.php/auth', video_id,
note='Downloading token')['token']
class NPOIE(NPOBaseIE):
@ -51,97 +37,120 @@ class NPOIE(NPOBaseIE):
(?:
npo\.nl/(?!live|radio)(?:[^/]+/){2}|
ntr\.nl/(?:[^/]+/){2,}|
omroepwnl\.nl/video/fragment/[^/]+__
omroepwnl\.nl/video/fragment/[^/]+__|
zapp\.nl/[^/]+/[^/]+/
)
)
(?P<id>[^/?#]+)
'''
_TESTS = [
{
'url': 'http://www.npo.nl/nieuwsuur/22-06-2014/VPWON_1220719',
'md5': '4b3f9c429157ec4775f2c9cb7b911016',
'info_dict': {
'id': 'VPWON_1220719',
'ext': 'm4v',
'title': 'Nieuwsuur',
'description': 'Dagelijks tussen tien en elf: nieuws, sport en achtergronden.',
'upload_date': '20140622',
},
_TESTS = [{
'url': 'http://www.npo.nl/nieuwsuur/22-06-2014/VPWON_1220719',
'md5': '4b3f9c429157ec4775f2c9cb7b911016',
'info_dict': {
'id': 'VPWON_1220719',
'ext': 'm4v',
'title': 'Nieuwsuur',
'description': 'Dagelijks tussen tien en elf: nieuws, sport en achtergronden.',
'upload_date': '20140622',
},
{
'url': 'http://www.npo.nl/de-mega-mike-mega-thomas-show/27-02-2009/VARA_101191800',
'md5': 'da50a5787dbfc1603c4ad80f31c5120b',
'info_dict': {
'id': 'VARA_101191800',
'ext': 'm4v',
'title': 'De Mega Mike & Mega Thomas show: The best of.',
'description': 'md5:3b74c97fc9d6901d5a665aac0e5400f4',
'upload_date': '20090227',
'duration': 2400,
},
}, {
'url': 'http://www.npo.nl/de-mega-mike-mega-thomas-show/27-02-2009/VARA_101191800',
'md5': 'da50a5787dbfc1603c4ad80f31c5120b',
'info_dict': {
'id': 'VARA_101191800',
'ext': 'm4v',
'title': 'De Mega Mike & Mega Thomas show: The best of.',
'description': 'md5:3b74c97fc9d6901d5a665aac0e5400f4',
'upload_date': '20090227',
'duration': 2400,
},
{
'url': 'http://www.npo.nl/tegenlicht/25-02-2013/VPWON_1169289',
'md5': 'f8065e4e5a7824068ed3c7e783178f2c',
'info_dict': {
'id': 'VPWON_1169289',
'ext': 'm4v',
'title': 'Tegenlicht: De toekomst komt uit Afrika',
'description': 'md5:52cf4eefbc96fffcbdc06d024147abea',
'upload_date': '20130225',
'duration': 3000,
},
}, {
'url': 'http://www.npo.nl/tegenlicht/25-02-2013/VPWON_1169289',
'md5': 'f8065e4e5a7824068ed3c7e783178f2c',
'info_dict': {
'id': 'VPWON_1169289',
'ext': 'm4v',
'title': 'Tegenlicht: Zwart geld. De toekomst komt uit Afrika',
'description': 'md5:52cf4eefbc96fffcbdc06d024147abea',
'upload_date': '20130225',
'duration': 3000,
},
{
'url': 'http://www.npo.nl/de-nieuwe-mens-deel-1/21-07-2010/WO_VPRO_043706',
'info_dict': {
'id': 'WO_VPRO_043706',
'ext': 'wmv',
'title': 'De nieuwe mens - Deel 1',
'description': 'md5:518ae51ba1293ffb80d8d8ce90b74e4b',
'duration': 4680,
},
'params': {
# mplayer mms download
'skip_download': True,
}
}, {
'url': 'http://www.npo.nl/de-nieuwe-mens-deel-1/21-07-2010/WO_VPRO_043706',
'info_dict': {
'id': 'WO_VPRO_043706',
'ext': 'm4v',
'title': 'De nieuwe mens - Deel 1',
'description': 'md5:518ae51ba1293ffb80d8d8ce90b74e4b',
'duration': 4680,
},
# non asf in streams
{
'url': 'http://www.npo.nl/hoe-gaat-europa-verder-na-parijs/10-01-2015/WO_NOS_762771',
'md5': 'b3da13de374cbe2d5332a7e910bef97f',
'info_dict': {
'id': 'WO_NOS_762771',
'ext': 'mp4',
'title': 'Hoe gaat Europa verder na Parijs?',
},
},
{
'url': 'http://www.ntr.nl/Aap-Poot-Pies/27/detail/Aap-poot-pies/VPWON_1233944#content',
'md5': '01c6a2841675995da1f0cf776f03a9c3',
'info_dict': {
'id': 'VPWON_1233944',
'ext': 'm4v',
'title': 'Aap, poot, pies',
'description': 'md5:c9c8005d1869ae65b858e82c01a91fde',
'upload_date': '20150508',
'duration': 599,
},
},
{
'url': 'http://www.omroepwnl.nl/video/fragment/vandaag-de-dag-verkiezingen__POMS_WNL_853698',
'md5': 'd30cd8417b8b9bca1fdff27428860d08',
'info_dict': {
'id': 'POW_00996502',
'ext': 'm4v',
'title': '''"Dit is wel een 'landslide'..."''',
'description': 'md5:f8d66d537dfb641380226e31ca57b8e8',
'upload_date': '20150508',
'duration': 462,
},
'params': {
'skip_download': True,
}
]
}, {
# non asf in streams
'url': 'http://www.npo.nl/hoe-gaat-europa-verder-na-parijs/10-01-2015/WO_NOS_762771',
'info_dict': {
'id': 'WO_NOS_762771',
'ext': 'mp4',
'title': 'Hoe gaat Europa verder na Parijs?',
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://www.ntr.nl/Aap-Poot-Pies/27/detail/Aap-poot-pies/VPWON_1233944#content',
'info_dict': {
'id': 'VPWON_1233944',
'ext': 'm4v',
'title': 'Aap, poot, pies',
'description': 'md5:c9c8005d1869ae65b858e82c01a91fde',
'upload_date': '20150508',
'duration': 599,
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://www.omroepwnl.nl/video/fragment/vandaag-de-dag-verkiezingen__POMS_WNL_853698',
'info_dict': {
'id': 'POW_00996502',
'ext': 'm4v',
'title': '''"Dit is wel een 'landslide'..."''',
'description': 'md5:f8d66d537dfb641380226e31ca57b8e8',
'upload_date': '20150508',
'duration': 462,
},
'params': {
'skip_download': True,
}
}, {
# audio
'url': 'http://www.npo.nl/jouw-stad-rotterdam/29-01-2017/RBX_FUNX_6683215/RBX_FUNX_7601437',
'info_dict': {
'id': 'RBX_FUNX_6683215',
'ext': 'mp3',
'title': 'Jouw Stad Rotterdam',
'description': 'md5:db251505244f097717ec59fabc372d9f',
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://www.zapp.nl/de-bzt-show/gemist/KN_1687547',
'only_matching': True,
}, {
'url': 'http://www.zapp.nl/de-bzt-show/filmpjes/POMS_KN_7315118',
'only_matching': True,
}, {
'url': 'http://www.zapp.nl/beste-vrienden-quiz/extra-video-s/WO_NTR_1067990',
'only_matching': True,
}, {
# live stream
'url': 'npo:LI_NL1_4188102',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -170,70 +179,115 @@ class NPOIE(NPOBaseIE):
token = self._get_token(video_id)
formats = []
urls = set()
pubopties = metadata.get('pubopties')
if pubopties:
quality = qualities(['adaptive', 'wmv_sb', 'h264_sb', 'wmv_bb', 'h264_bb', 'wvc1_std', 'h264_std'])
for format_id in pubopties:
format_info = self._download_json(
'http://ida.omroep.nl/odi/?prid=%s&puboptions=%s&adaptive=yes&token=%s'
% (video_id, format_id, token),
video_id, 'Downloading %s JSON' % format_id)
if format_info.get('error_code', 0) or format_info.get('errorcode', 0):
quality = qualities(['adaptive', 'wmv_sb', 'h264_sb', 'wmv_bb', 'h264_bb', 'wvc1_std', 'h264_std'])
items = self._download_json(
'http://ida.omroep.nl/app.php/%s' % video_id, video_id,
'Downloading formats JSON', query={
'adaptive': 'yes',
'token': token,
})['items'][0]
for num, item in enumerate(items):
item_url = item.get('url')
if not item_url or item_url in urls:
continue
urls.add(item_url)
format_id = self._search_regex(
r'video/ida/([^/]+)', item_url, 'format id',
default=None)
def add_format_url(format_url):
formats.append({
'url': format_url,
'format_id': format_id,
'quality': quality(format_id),
})
# Example: http://www.npo.nl/de-nieuwe-mens-deel-1/21-07-2010/WO_VPRO_043706
if item.get('contentType') in ('url', 'audio'):
add_format_url(item_url)
continue
try:
stream_info = self._download_json(
item_url + '&type=json', video_id,
'Downloading %s stream JSON'
% item.get('label') or item.get('format') or format_id or num)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
error = (self._parse_json(
ee.cause.read().decode(), video_id,
fatal=False) or {}).get('errorstring')
if error:
raise ExtractorError(error, expected=True)
raise
# Stream URL instead of JSON, example: npo:LI_NL1_4188102
if isinstance(stream_info, compat_str):
if not stream_info.startswith('http'):
continue
streams = format_info.get('streams')
if streams:
try:
video_info = self._download_json(
streams[0] + '&type=json',
video_id, 'Downloading %s stream JSON' % format_id)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
error = (self._parse_json(ee.cause.read().decode(), video_id, fatal=False) or {}).get('errorstring')
if error:
raise ExtractorError(error, expected=True)
raise
else:
video_info = format_info
video_url = video_info.get('url')
if not video_url:
video_url = stream_info
# JSON
else:
video_url = stream_info.get('url')
if not video_url or video_url in urls:
continue
urls.add(item_url)
if determine_ext(video_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
else:
add_format_url(video_url)
is_live = metadata.get('medium') == 'live'
if not is_live:
for num, stream in enumerate(metadata.get('streams', [])):
stream_url = stream.get('url')
if not stream_url or stream_url in urls:
continue
if format_id == 'adaptive':
formats.extend(self._extract_m3u8_formats(video_url, video_id, 'mp4'))
else:
urls.add(stream_url)
# smooth streaming is not supported
stream_type = stream.get('type', '').lower()
if stream_type in ['ss', 'ms']:
continue
if stream_type == 'hds':
f4m_formats = self._extract_f4m_formats(
stream_url, video_id, fatal=False)
# f4m downloader downloads only piece of live stream
for f4m_format in f4m_formats:
f4m_format['preference'] = -1
formats.extend(f4m_formats)
elif stream_type == 'hls':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, ext='mp4', fatal=False))
# Example: http://www.npo.nl/de-nieuwe-mens-deel-1/21-07-2010/WO_VPRO_043706
elif '.asf' in stream_url:
asx = self._download_xml(
stream_url, video_id,
'Downloading stream %d ASX playlist' % num,
transform_source=fix_xml_ampersands, fatal=False)
if not asx:
continue
ref = asx.find('./ENTRY/Ref')
if ref is None:
continue
video_url = ref.get('href')
if not video_url or video_url in urls:
continue
urls.add(video_url)
formats.append({
'url': video_url,
'format_id': format_id,
'quality': quality(format_id),
'ext': stream.get('formaat', 'asf'),
'quality': stream.get('kwaliteit'),
'preference': -10,
})
streams = metadata.get('streams')
if streams:
for i, stream in enumerate(streams):
stream_url = stream.get('url')
if not stream_url:
continue
if '.asf' not in stream_url:
else:
formats.append({
'url': stream_url,
'quality': stream.get('kwaliteit'),
})
continue
asx = self._download_xml(
stream_url, video_id,
'Downloading stream %d ASX playlist' % i,
transform_source=fix_xml_ampersands)
ref = asx.find('./ENTRY/Ref')
if ref is None:
continue
video_url = ref.get('href')
if not video_url:
continue
formats.append({
'url': video_url,
'ext': stream.get('formaat', 'asf'),
'quality': stream.get('kwaliteit'),
})
self._sort_formats(formats)
@ -246,28 +300,28 @@ class NPOIE(NPOBaseIE):
return {
'id': video_id,
'title': title,
'title': self._live_title(title) if is_live else title,
'description': metadata.get('info'),
'thumbnail': metadata.get('images', [{'url': None}])[-1]['url'],
'upload_date': unified_strdate(metadata.get('gidsdatum')),
'duration': parse_duration(metadata.get('tijdsduur')),
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
}
class NPOLiveIE(NPOBaseIE):
IE_NAME = 'npo.nl:live'
_VALID_URL = r'https?://(?:www\.)?npo\.nl/live/(?P<id>.+)'
_VALID_URL = r'https?://(?:www\.)?npo\.nl/live/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.npo.nl/live/npo-1',
'info_dict': {
'id': 'LI_NEDERLAND1_136692',
'id': 'LI_NL1_4188102',
'display_id': 'npo-1',
'ext': 'mp4',
'title': 're:^Nederland 1 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'Livestream',
'title': 're:^NPO 1 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'is_live': True,
},
'params': {
@ -283,58 +337,12 @@ class NPOLiveIE(NPOBaseIE):
live_id = self._search_regex(
r'data-prid="([^"]+)"', webpage, 'live id')
metadata = self._download_json(
'http://e.omroep.nl/metadata/%s' % live_id,
display_id, transform_source=strip_jsonp)
token = self._get_token(display_id)
formats = []
streams = metadata.get('streams')
if streams:
for stream in streams:
stream_type = stream.get('type').lower()
# smooth streaming is not supported
if stream_type in ['ss', 'ms']:
continue
stream_info = self._download_json(
'http://ida.omroep.nl/aapi/?stream=%s&token=%s&type=jsonp'
% (stream.get('url'), token),
display_id, 'Downloading %s JSON' % stream_type)
if stream_info.get('error_code', 0) or stream_info.get('errorcode', 0):
continue
stream_url = self._download_json(
stream_info['stream'], display_id,
'Downloading %s URL' % stream_type,
'Unable to download %s URL' % stream_type,
transform_source=strip_jsonp, fatal=False)
if not stream_url:
continue
if stream_type == 'hds':
f4m_formats = self._extract_f4m_formats(stream_url, display_id)
# f4m downloader downloads only piece of live stream
for f4m_format in f4m_formats:
f4m_format['preference'] = -1
formats.extend(f4m_formats)
elif stream_type == 'hls':
formats.extend(self._extract_m3u8_formats(stream_url, display_id, 'mp4'))
else:
formats.append({
'url': stream_url,
'preference': -10,
})
self._sort_formats(formats)
return {
'_type': 'url_transparent',
'url': 'npo:%s' % live_id,
'ie_key': NPOIE.ie_key(),
'id': live_id,
'display_id': display_id,
'title': self._live_title(metadata['titel']),
'description': metadata['info'],
'thumbnail': metadata.get('images', [{'url': None}])[-1]['url'],
'formats': formats,
'is_live': True,
}
@ -416,7 +424,21 @@ class NPORadioFragmentIE(InfoExtractor):
}
class SchoolTVIE(InfoExtractor):
class NPODataMidEmbedIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video_id', group='id')
return {
'_type': 'url_transparent',
'ie_key': 'NPO',
'url': 'npo:%s' % video_id,
'display_id': display_id
}
class SchoolTVIE(NPODataMidEmbedIE):
IE_NAME = 'schooltv'
_VALID_URL = r'https?://(?:www\.)?schooltv\.nl/video/(?P<id>[^/?#&]+)'
@ -435,17 +457,25 @@ class SchoolTVIE(InfoExtractor):
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video_id', group='id')
return {
'_type': 'url_transparent',
'ie_key': 'NPO',
'url': 'npo:%s' % video_id,
'display_id': display_id
class HetKlokhuisIE(NPODataMidEmbedIE):
IE_NAME = 'hetklokhuis'
_VALID_URL = r'https?://(?:www\.)?hetklokhuis.nl/[^/]+/\d+/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://hetklokhuis.nl/tv-uitzending/3471/Zwaartekrachtsgolven',
'info_dict': {
'id': 'VPWON_1260528',
'display_id': 'Zwaartekrachtsgolven',
'ext': 'm4v',
'title': 'Het Klokhuis: Zwaartekrachtsgolven',
'description': 'md5:c94f31fb930d76c2efa4a4a71651dd48',
'upload_date': '20170223',
},
'params': {
'skip_download': True
}
}
class NPOPlaylistBaseIE(NPOIE):

View File

@ -72,16 +72,21 @@ class OpenloadIE(InfoExtractor):
raise ExtractorError('File not found', expected=True)
ol_id = self._search_regex(
'<span[^>]+id="[^"]+"[^>]*>([0-9]+)</span>',
'<span[^>]+id="[^"]+"[^>]*>([0-9A-Za-z]+)</span>',
webpage, 'openload ID')
first_two_chars = int(float(ol_id[0:][:2]))
first_char = int(ol_id[0])
urlcode = []
num = 2
num = 1
while num < len(ol_id):
key = int(float(ol_id[num + 3:][:2]))
urlcode.append((key, compat_chr(int(float(ol_id[num:][:3])) - first_two_chars)))
i = ord(ol_id[num])
key = 0
if i <= 90:
key = i - 65
elif i >= 97:
key = 25 + i - 97
urlcode.append((key, compat_chr(int(ol_id[num + 2:num + 5]) // int(ol_id[num + 1]) - first_char)))
num += 5
video_url = 'https://openload.co/stream/' + ''.join(

View File

@ -1,57 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class SciVeeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?scivee\.tv/node/(?P<id>\d+)'
_TEST = {
'url': 'http://www.scivee.tv/node/62352',
'md5': 'b16699b74c9e6a120f6772a44960304f',
'info_dict': {
'id': '62352',
'ext': 'mp4',
'title': 'Adam Arkin at the 2014 DOE JGI Genomics of Energy & Environment Meeting',
'description': 'md5:81f1710638e11a481358fab1b11059d7',
},
'skip': 'Not accessible from Travis CI server',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
# annotations XML is malformed
annotations = self._download_webpage(
'http://www.scivee.tv/assets/annotations/%s' % video_id, video_id, 'Downloading annotations')
title = self._html_search_regex(r'<title>([^<]+)</title>', annotations, 'title')
description = self._html_search_regex(r'<abstract>([^<]+)</abstract>', annotations, 'abstract', fatal=False)
filesize = int_or_none(self._html_search_regex(
r'<filesize>([^<]+)</filesize>', annotations, 'filesize', fatal=False))
formats = [
{
'url': 'http://www.scivee.tv/assets/audio/%s' % video_id,
'ext': 'mp3',
'format_id': 'audio',
},
{
'url': 'http://www.scivee.tv/assets/video/%s' % video_id,
'ext': 'mp4',
'format_id': 'video',
'filesize': filesize,
},
]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': 'http://www.scivee.tv/assets/videothumb/%s' % video_id,
'formats': formats,
}

View File

@ -3,7 +3,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import qualities
from ..utils import (
int_or_none,
qualities,
)
class TheSceneIE(InfoExtractor):
@ -16,6 +19,11 @@ class TheSceneIE(InfoExtractor):
'ext': 'mp4',
'title': 'Narciso Rodriguez: Spring 2013 Ready-to-Wear',
'display_id': 'narciso-rodriguez-spring-2013-ready-to-wear',
'duration': 127,
'series': 'Style.com Fashion Shows',
'season': 'Ready To Wear Spring 2013',
'tags': list,
'categories': list,
},
}
@ -32,21 +40,29 @@ class TheSceneIE(InfoExtractor):
player = self._download_webpage(player_url, display_id)
info = self._parse_json(
self._search_regex(
r'(?m)var\s+video\s+=\s+({.+?});$', player, 'info json'),
r'(?m)video\s*:\s*({.+?}),$', player, 'info json'),
display_id)
video_id = info['id']
title = info['title']
qualities_order = qualities(('low', 'high'))
formats = [{
'format_id': '{0}-{1}'.format(f['type'].split('/')[0], f['quality']),
'url': f['src'],
'quality': qualities_order(f['quality']),
} for f in info['sources'][0]]
} for f in info['sources']]
self._sort_formats(formats)
return {
'id': info['id'],
'id': video_id,
'display_id': display_id,
'title': info['title'],
'title': title,
'formats': formats,
'thumbnail': info.get('poster_frame'),
'duration': int_or_none(info.get('duration')),
'series': info.get('series_title'),
'season': info.get('season_title'),
'tags': info.get('tags'),
'categories': info.get('categories'),
}

View File

@ -16,6 +16,7 @@ class TubiTvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tubitv\.com/video/(?P<id>[0-9]+)'
_LOGIN_URL = 'http://tubitv.com/login'
_NETRC_MACHINE = 'tubitv'
_GEO_COUNTRIES = ['US']
_TEST = {
'url': 'http://tubitv.com/video/283829/the_comedian_at_the_friday',
'md5': '43ac06be9326f41912dc64ccf7a80320',

View File

@ -17,6 +17,9 @@ class TvigleIE(InfoExtractor):
IE_DESC = 'Интернет-телевидение Tvigle.ru'
_VALID_URL = r'https?://(?:www\.)?(?:tvigle\.ru/(?:[^/]+/)+(?P<display_id>[^/]+)/$|cloud\.tvigle\.ru/video/(?P<id>\d+))'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['RU']
_TESTS = [
{
'url': 'http://www.tvigle.ru/video/sokrat/',
@ -72,8 +75,13 @@ class TvigleIE(InfoExtractor):
error_message = item.get('errorMessage')
if not videos and error_message:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_message), expected=True)
if item.get('isGeoBlocked') is True:
self.raise_geo_restricted(
msg=error_message, countries=self._GEO_COUNTRIES)
else:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_message),
expected=True)
title = item['title']
description = item.get('description')

View File

@ -17,12 +17,12 @@ from ..utils import (
class VevoBaseIE(InfoExtractor):
def _extract_json(self, webpage, video_id, item):
def _extract_json(self, webpage, video_id):
return self._parse_json(
self._search_regex(
r'window\.__INITIAL_STORE__\s*=\s*({.+?});\s*</script>',
webpage, 'initial store'),
video_id)['default'][item]
video_id)
class VevoIE(VevoBaseIE):
@ -139,6 +139,11 @@ class VevoIE(VevoBaseIE):
# no genres available
'url': 'http://www.vevo.com/watch/INS171400764',
'only_matching': True,
}, {
# Another case available only via the webpage; using streams/streamsV3 formats
# Geo-restricted to Netherlands/Germany
'url': 'http://www.vevo.com/watch/boostee/pop-corn-clip-officiel/FR1A91600909',
'only_matching': True,
}]
_VERSIONS = {
0: 'youtube', # only in AuthenticateVideo videoVersions
@ -193,7 +198,14 @@ class VevoIE(VevoBaseIE):
# https://github.com/rg3/youtube-dl/issues/9366)
if not video_versions:
webpage = self._download_webpage(url, video_id)
video_versions = self._extract_json(webpage, video_id, 'streams')[video_id][0]
json_data = self._extract_json(webpage, video_id)
if 'streams' in json_data.get('default', {}):
video_versions = json_data['default']['streams'][video_id][0]
else:
video_versions = [
value
for key, value in json_data['apollo']['data'].items()
if key.startswith('%s.streams' % video_id)]
uploader = None
artist = None
@ -207,7 +219,7 @@ class VevoIE(VevoBaseIE):
formats = []
for video_version in video_versions:
version = self._VERSIONS.get(video_version['version'])
version = self._VERSIONS.get(video_version.get('version'), 'generic')
version_url = video_version.get('url')
if not version_url:
continue
@ -339,7 +351,7 @@ class VevoPlaylistIE(VevoBaseIE):
if video_id:
return self.url_result('vevo:%s' % video_id, VevoIE.ie_key())
playlists = self._extract_json(webpage, playlist_id, '%ss' % playlist_kind)
playlists = self._extract_json(webpage, playlist_id)['default']['%ss' % playlist_kind]
playlist = (list(playlists.values())[0]
if playlist_kind == 'playlist' else playlists[playlist_id])

View File

@ -5,6 +5,7 @@ import re
from .common import InfoExtractor
from ..utils import (
dict_get,
ExtractorError,
int_or_none,
parse_duration,
unified_strdate,
@ -57,6 +58,10 @@ class XHamsterIE(InfoExtractor):
}, {
'url': 'https://xhamster.com/movies/2272726/amber_slayed_by_the_knight.html',
'only_matching': True,
}, {
# This video is visible for marcoalfa123456's friends only
'url': 'https://it.xhamster.com/movies/7263980/la_mia_vicina.html',
'only_matching': True,
}]
def _real_extract(self, url):
@ -78,6 +83,12 @@ class XHamsterIE(InfoExtractor):
mrss_url = '%s://xhamster.com/movies/%s/%s.html' % (proto, video_id, seo)
webpage = self._download_webpage(mrss_url, video_id)
error = self._html_search_regex(
r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',
webpage, 'error', default=None)
if error:
raise ExtractorError(error, expected=True)
title = self._html_search_regex(
[r'<h1[^>]*>([^<]+)</h1>',
r'<meta[^>]+itemprop=".*?caption.*?"[^>]+content="(.+?)"',

View File

@ -47,7 +47,6 @@ from ..utils import (
unsmuggle_url,
uppercase_escape,
urlencode_postdata,
ISO3166Utils,
)
@ -371,6 +370,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}
_SUBTITLE_FORMATS = ('ttml', 'vtt')
_GEO_BYPASS = False
IE_NAME = 'youtube'
_TESTS = [
{
@ -917,7 +918,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# itag 212
'url': '1t24XAntNCY',
'only_matching': True,
}
},
{
# geo restricted to JP
'url': 'sJL6WA-aGkQ',
'only_matching': True,
},
]
def __init__(self, *args, **kwargs):
@ -1376,11 +1382,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if 'token' not in video_info:
if 'reason' in video_info:
if 'The uploader has not made this video available in your country.' in video_info['reason']:
regions_allowed = self._html_search_meta('regionsAllowed', video_webpage, default=None)
if regions_allowed:
raise ExtractorError('YouTube said: This video is available in %s only' % (
', '.join(map(ISO3166Utils.short2full, regions_allowed.split(',')))),
expected=True)
regions_allowed = self._html_search_meta(
'regionsAllowed', video_webpage, default=None)
countries = regions_allowed.split(',') if regions_allowed else None
self.raise_geo_restricted(
msg=video_info['reason'][0], countries=countries)
raise ExtractorError(
'YouTube said: %s' % video_info['reason'][0],
expected=True, video_id=video_id)
@ -1448,7 +1454,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Check for "rental" videos
if 'ypc_video_rental_bar_text' in video_info and 'author' not in video_info:
raise ExtractorError('"rental" videos not supported')
raise ExtractorError('"rental" videos not supported. See https://github.com/rg3/youtube-dl/issues/359 for more information.', expected=True)
# Start extracting information
self.report_information_extraction(video_id)
@ -1845,7 +1851,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
(?:
youtube\.com/
(?:
(?:course|view_play_list|my_playlists|artist|playlist|watch|embed/videoseries)
(?:course|view_play_list|my_playlists|artist|playlist|watch|embed/(?:videoseries|[0-9A-Za-z_-]{11}))
\? (?:.*?[&;])*? (?:p|a|list)=
| p/
)|
@ -1918,6 +1924,13 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'title': 'JODA15',
'id': 'PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
}
}, {
'url': 'http://www.youtube.com/embed/_xDOZElKyNU?list=PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
'playlist_mincount': 485,
'info_dict': {
'title': '2017 華語最新單曲 (2/24更新)',
'id': 'PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
}
}, {
'note': 'Embedded SWF player',
'url': 'https://www.youtube.com/p/YN5VISEtHet5D4NEvfTd0zcgFk84NqFZ?hl=en_US&fs=1&rel=0',
@ -2066,7 +2079,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
# Check if it's a video-specific URL
query_dict = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
video_id = query_dict.get('v', [None])[0] or self._search_regex(
r'(?:^|//)youtu\.be/([0-9A-Za-z_-]{11})', url,
r'(?:(?:^|//)youtu\.be/|youtube\.com/embed/(?!videoseries))([0-9A-Za-z_-]{11})', url,
'video id', default=None)
if video_id:
if self._downloader.params.get('noplaylist'):
@ -2226,7 +2239,7 @@ class YoutubeUserIE(YoutubeChannelIE):
'url': 'https://www.youtube.com/gametrailers',
'only_matching': True,
}, {
# This channel is not available.
# This channel is not available, geo restricted to JP
'url': 'https://www.youtube.com/user/kananishinoSMEJ/videos',
'only_matching': True,
}]

View File

@ -473,7 +473,8 @@ def timeconvert(timestr):
def sanitize_filename(s, restricted=False, is_id=False):
"""Sanitizes a string so it could be used as part of a filename.
If restricted is set, use a stricter subset of allowed characters.
Set is_id if this is not an arbitrary string, but an ID that should be kept if possible
Set is_id if this is not an arbitrary string, but an ID that should be kept
if possible.
"""
def replace_insane(char):
if restricted and char in ACCENT_CHARS:
@ -3319,6 +3320,57 @@ class PerRequestProxyHandler(compat_urllib_request.ProxyHandler):
self, req, proxy, type)
# Both long_to_bytes and bytes_to_long are adapted from PyCrypto, which is
# released into Public Domain
# https://github.com/dlitz/pycrypto/blob/master/lib/Crypto/Util/number.py#L387
def long_to_bytes(n, blocksize=0):
"""long_to_bytes(n:long, blocksize:int) : string
Convert a long integer to a byte string.
If optional blocksize is given and greater than zero, pad the front of the
byte string with binary zeros so that the length is a multiple of
blocksize.
"""
# after much testing, this algorithm was deemed to be the fastest
s = b''
n = int(n)
while n > 0:
s = compat_struct_pack('>I', n & 0xffffffff) + s
n = n >> 32
# strip off leading zeros
for i in range(len(s)):
if s[i] != b'\000'[0]:
break
else:
# only happens when n == 0
s = b'\000'
i = 0
s = s[i:]
# add back some pad bytes. this could be done more efficiently w.r.t. the
# de-padding being done above, but sigh...
if blocksize > 0 and len(s) % blocksize:
s = (blocksize - len(s) % blocksize) * b'\000' + s
return s
def bytes_to_long(s):
"""bytes_to_long(string) : long
Convert a byte string to a long integer.
This is (essentially) the inverse of long_to_bytes().
"""
acc = 0
length = len(s)
if length % 4:
extra = (4 - length % 4)
s = b'\000' * extra + s
length = length + extra
for i in range(0, length, 4):
acc = (acc << 32) + compat_struct_unpack('>I', s[i:i + 4])[0]
return acc
def ohdave_rsa_encrypt(data, exponent, modulus):
'''
Implement OHDave's RSA algorithm. See http://www.ohdave.com/rsa/
@ -3336,6 +3388,21 @@ def ohdave_rsa_encrypt(data, exponent, modulus):
return '%x' % encrypted
def pkcs1pad(data, length):
"""
Padding input data with PKCS#1 scheme
@param {int[]} data input data
@param {int} length target length
@returns {int[]} padded data
"""
if len(data) > length - 11:
raise ValueError('Input data too long for PKCS#1 padding')
pseudo_random = [random.randint(0, 254) for _ in range(length - len(data) - 3)]
return [0, 2] + pseudo_random + [0] + data
def encode_base_n(num, n, table=None):
FULL_TABLE = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
if not table:

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2017.02.24'
__version__ = '2017.03.02'