Compare commits

..

28 Commits

Author SHA1 Message Date
77a9a9c295 release 2016.06.12 2016-06-12 12:06:48 +07:00
84dcd1c4e4 [streamcloud] Detect removed videos (Closes #3768) 2016-06-12 11:08:39 +07:00
971e3b7520 [nrk:skole] Fix extraction 2016-06-12 07:20:37 +07:00
4e79011729 [nrktv] Fix tests 2016-06-12 06:57:04 +07:00
a936ac321c [README.md] Document using output template in batch files (Closes #9717) 2016-06-12 06:39:31 +07:00
98960c911c [instagram] Extract metadata from JSON 2016-06-12 06:06:04 +07:00
329ca3bef6 [utils] Add try_get
To reduce boilerplate when accessing JSON
2016-06-12 06:05:34 +07:00
2c3322e36e [youporn] Fix metadata extraction 2016-06-12 04:49:37 +07:00
80ae228b34 [matchtv] Modernize 2016-06-12 01:57:23 +07:00
6d28c408cf [viki] Do not use a fallback language for title in the first try
In test_Viki_3, 'titles' gives a Hebrew title.
2016-06-11 23:00:44 +08:00
c83b35d4aa [viki] Update _TESTS 2016-06-11 22:39:13 +08:00
94e5d6aedb [viki] Skip a geo-restricted test 2016-06-11 21:49:01 +08:00
531a74968c [vimeo] Fix extraction for VimeoReview videos 2016-06-11 21:35:08 +08:00
c5edd147d1 [generic] Remove an invalid test
Now handled by telewebion.py
2016-06-11 18:39:58 +08:00
856150d056 [telewebion] Add new extractor (closes #5135) 2016-06-11 18:39:58 +08:00
03ebea89b0 Merge pull request #9755 from vxbinaca/patch-2
[utils] Change Firefox 44 to 47
2016-06-11 17:38:45 +08:00
15d106787e [utils] Change Firefox 44 to 47
See commit title.
2016-06-11 05:36:31 -04:00
7aab3696dd [kuwo] Update _TESTS 2016-06-11 15:37:04 +08:00
47787efa2b [leeco] Recognize Le Sports URLs (fixes #9750) 2016-06-11 13:14:41 +08:00
4a420119a6 release 2016.06.11.3 2016-06-11 08:34:30 +07:00
33751818d3 release 2016.06.11.2 2016-06-11 08:28:51 +07:00
698f127c1a [setup.py] Add python 3.5 classifier 2016-06-11 06:14:22 +07:00
fe458b6596 [limelight] Extract ttml subtitles (Closes #9739) 2016-06-11 05:57:27 +07:00
21ac1a8ac3 [limelight] Fix typo 2016-06-11 05:52:50 +07:00
79027c0ea0 [limelight] Improve _VALID_URLs 2016-06-11 05:40:02 +07:00
4cad2929cd [limelight] Fix _VALID_URLs 2016-06-11 05:30:44 +07:00
62666af99f [indavideo] Fix formats' height (Closes #9744) 2016-06-11 05:13:05 +07:00
9ddc289f88 [README.md] Document missing playlist fields in output template 2016-06-11 04:59:47 +07:00
20 changed files with 352 additions and 155 deletions

View File

@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.11.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.11.1**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.12*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.12**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.06.11.1
[debug] youtube-dl version 2016.06.12
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -511,6 +511,9 @@ The basic usage is not to set any template arguments when downloading a single f
- `autonumber`: Five-digit number that will be increased with each download, starting at zero
- `playlist`: Name or id of the playlist that contains the video
- `playlist_index`: Index of the video in the playlist padded with leading zeros according to the total length of the playlist
- `playlist_id`: Playlist identifier
- `playlist_title`: Playlist title
Available for the video that belongs to some logical chapter or section:
- `chapter`: Name or title of the chapter the video belongs to
@ -550,6 +553,10 @@ The current default template is `%(title)s-%(id)s.%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
#### Output template and Windows batch files
If you are using output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`.
#### Output template examples
Note on Windows you may need to use double quotes instead of single.

View File

@ -44,8 +44,8 @@
- **appletrailers:section**
- **archive.org**: archive.org videos
- **ARD**
- **ARD:mediathek**
- **ARD:mediathek**: Saarländischer Rundfunk
- **ARD:mediathek**
- **arte.tv**
- **arte.tv:+7**
- **arte.tv:cinema**
@ -647,6 +647,7 @@
- **Telegraaf**
- **TeleMB**
- **TeleTask**
- **Telewebion**
- **TF1**
- **TheIntercept**
- **ThePlatform**

View File

@ -122,6 +122,7 @@ setup(
"Programming Language :: Python :: 3.2",
"Programming Language :: Python :: 3.3",
"Programming Language :: Python :: 3.4",
"Programming Language :: Python :: 3.5",
],
cmdclass={'build_lazy_extractors': build_lazy_extractors},

View File

@ -777,6 +777,7 @@ from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .teletask import TeleTaskIE
from .telewebion import TelewebionIE
from .testurl import TestURLIE
from .tf1 import TF1IE
from .theintercept import TheInterceptIE

View File

@ -1073,20 +1073,6 @@ class GenericIE(InfoExtractor):
'skip_download': True,
}
},
# Contains a SMIL manifest
{
'url': 'http://www.telewebion.com/fa/1263668/%D9%82%D8%B1%D8%B9%D9%87%E2%80%8C%DA%A9%D8%B4%DB%8C-%D9%84%DB%8C%DA%AF-%D9%82%D9%87%D8%B1%D9%85%D8%A7%D9%86%D8%A7%D9%86-%D8%A7%D8%B1%D9%88%D9%BE%D8%A7/%2B-%D9%81%D9%88%D8%AA%D8%A8%D8%A7%D9%84.html',
'info_dict': {
'id': 'file',
'ext': 'flv',
'title': '+ Football: Lottery Champions League Europe',
'uploader': 'www.telewebion.com',
},
'params': {
# rtmpe downloads
'skip_download': True,
}
},
# Brightcove URL in single quotes
{
'url': 'http://www.sportsnet.ca/baseball/mlb/sn-presents-russell-martin-world-citizen/',

View File

@ -60,7 +60,8 @@ class IndavideoEmbedIE(InfoExtractor):
formats = [{
'url': video_url,
'height': self._search_regex(r'\.(\d{3,4})\.mp4$', video_url, 'height', default=None),
'height': int_or_none(self._search_regex(
r'\.(\d{3,4})\.mp4(?:\?|$)', video_url, 'height', default=None)),
} for video_url in video_urls]
self._sort_formats(formats)

View File

@ -8,6 +8,7 @@ from ..utils import (
int_or_none,
limit_length,
lowercase_escape,
try_get,
)
@ -19,10 +20,16 @@ class InstagramIE(InfoExtractor):
'info_dict': {
'id': 'aye83DjauH',
'ext': 'mp4',
'uploader_id': 'naomipq',
'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
}
'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1371748545,
'upload_date': '20130620',
'uploader_id': 'naomipq',
'uploader': 'Naomi Leonor Phan-Quang',
'like_count': int,
'comment_count': int,
},
}, {
# missing description
'url': 'https://www.instagram.com/p/BA-pQFBG8HZ/?taken-by=britneyspears',
@ -31,6 +38,13 @@ class InstagramIE(InfoExtractor):
'ext': 'mp4',
'uploader_id': 'britneyspears',
'title': 'Video by britneyspears',
'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1453760977,
'upload_date': '20160125',
'uploader_id': 'britneyspears',
'uploader': 'Britney Spears',
'like_count': int,
'comment_count': int,
},
'params': {
'skip_download': True,
@ -67,21 +81,57 @@ class InstagramIE(InfoExtractor):
url = mobj.group('url')
webpage = self._download_webpage(url, video_id)
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',
webpage, 'uploader id', fatal=False)
desc = self._search_regex(
r'"caption":"(.+?)"', webpage, 'description', default=None)
if desc is not None:
desc = lowercase_escape(desc)
(video_url, description, thumbnail, timestamp, uploader,
uploader_id, like_count, comment_count) = [None] * 8
shared_data = self._parse_json(
self._search_regex(
r'window\._sharedData\s*=\s*({.+?});',
webpage, 'shared data', default='{}'),
video_id, fatal=False)
if shared_data:
media = try_get(
shared_data, lambda x: x['entry_data']['PostPage'][0]['media'], dict)
if media:
video_url = media.get('video_url')
description = media.get('caption')
thumbnail = media.get('display_src')
timestamp = int_or_none(media.get('date'))
uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username')
like_count = int_or_none(media.get('likes', {}).get('count'))
comment_count = int_or_none(media.get('comments', {}).get('count'))
if not video_url:
video_url = self._og_search_video_url(webpage, secure=False)
if not uploader_id:
uploader_id = self._search_regex(
r'"owner"\s*:\s*{\s*"username"\s*:\s*"(.+?)"',
webpage, 'uploader id', fatal=False)
if not description:
description = self._search_regex(
r'"caption"\s*:\s*"(.+?)"', webpage, 'description', default=None)
if description is not None:
description = lowercase_escape(description)
if not thumbnail:
thumbnail = self._og_search_thumbnail(webpage)
return {
'id': video_id,
'url': self._og_search_video_url(webpage, secure=False),
'url': video_url,
'ext': 'mp4',
'title': 'Video by %s' % uploader_id,
'thumbnail': self._og_search_thumbnail(webpage),
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'uploader_id': uploader_id,
'description': desc,
'uploader': uploader,
'like_count': like_count,
'comment_count': comment_count,
}

View File

@ -148,8 +148,8 @@ class KuwoAlbumIE(InfoExtractor):
'url': 'http://www.kuwo.cn/album/502294/',
'info_dict': {
'id': '502294',
'title': 'M',
'description': 'md5:6a7235a84cc6400ec3b38a7bdaf1d60c',
'title': 'Made\xa0Series\xa0《M》',
'description': 'md5:d463f0d8a0ff3c3ea3d6ed7452a9483f',
},
'playlist_count': 2,
}
@ -209,7 +209,7 @@ class KuwoSingerIE(InfoExtractor):
'url': 'http://www.kuwo.cn/mingxing/bruno+mars/',
'info_dict': {
'id': 'bruno+mars',
'title': 'Bruno Mars',
'title': 'Bruno\xa0Mars',
},
'playlist_mincount': 329,
}, {
@ -306,7 +306,7 @@ class KuwoMvIE(KuwoBaseIE):
'id': '6480076',
'ext': 'mp4',
'title': 'My HouseMV',
'creator': '2PM',
'creator': 'PM02:00',
},
# In this video, music URLs (anti.s) are blocked outside China and
# USA, while the MV URL (mvurl) is available globally, so force the MV

View File

@ -28,7 +28,7 @@ from ..utils import (
class LeIE(InfoExtractor):
IE_DESC = '乐视网'
_VALID_URL = r'https?://www\.le\.com/ptv/vplay/(?P<id>\d+)\.html'
_VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|sports\.le\.com/video)/(?P<id>\d+)\.html'
_URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html'
@ -69,6 +69,9 @@ class LeIE(InfoExtractor):
'hls_prefer_native': True,
},
'skip': 'Only available in China',
}, {
'url': 'http://sports.le.com/video/25737697.html',
'only_matching': True,
}]
@staticmethod
@ -196,7 +199,7 @@ class LeIE(InfoExtractor):
class LePlaylistIE(InfoExtractor):
_VALID_URL = r'https?://[a-z]+\.le\.com/[a-z]+/(?P<id>[a-z0-9_]+)'
_VALID_URL = r'https?://[a-z]+\.le\.com/(?!video)[a-z]+/(?P<id>[a-z0-9_]+)'
_TESTS = [{
'url': 'http://www.le.com/tv/46177.html',

View File

@ -98,13 +98,19 @@ class LimelightBaseIE(InfoExtractor):
} for thumbnail in properties.get('thumbnails', []) if thumbnail.get('url')]
subtitles = {}
for caption in properties.get('captions', {}):
for caption in properties.get('captions', []):
lang = caption.get('language_code')
subtitles_url = caption.get('url')
if lang and subtitles_url:
subtitles[lang] = [{
subtitles.setdefault(lang, []).append({
'url': subtitles_url,
}]
})
closed_captions_url = properties.get('closed_captions_url')
if closed_captions_url:
subtitles.setdefault('en', []).append({
'url': closed_captions_url,
'ext': 'ttml',
})
return {
'id': video_id,
@ -123,7 +129,18 @@ class LimelightBaseIE(InfoExtractor):
class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight'
_VALID_URL = r'(?:limelight:media:|https?://link\.videoplatform\.limelight\.com/media/\??\bmediaId=)(?P<id>[a-z0-9]{32})'
_VALID_URL = r'''(?x)
(?:
limelight:media:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bmediaId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?mediaId=3ffd040b522b4485b6d84effc750cd86',
'info_dict': {
@ -158,6 +175,9 @@ class LimelightMediaIE(LimelightBaseIE):
# rtmp download
'skip_download': True,
},
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'media'
_API_PATH = 'media'
@ -176,15 +196,29 @@ class LimelightMediaIE(LimelightBaseIE):
class LimelightChannelIE(LimelightBaseIE):
IE_NAME = 'limelight:channel'
_VALID_URL = r'(?:limelight:channel:|https?://link\.videoplatform\.limelight\.com/media/\??\bchannelId=)(?P<id>[a-z0-9]{32})'
_TEST = {
_VALID_URL = r'''(?x)
(?:
limelight:channel:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bchannelId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?channelId=ab6a524c379342f9b23642917020c082',
'info_dict': {
'id': 'ab6a524c379342f9b23642917020c082',
'title': 'Javascript Sample Code',
},
'playlist_mincount': 3,
}
}, {
'url': 'http://assets.delvenetworks.com/player/loader.swf?channelId=ab6a524c379342f9b23642917020c082',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'channel'
_API_PATH = 'channels'
@ -207,15 +241,29 @@ class LimelightChannelIE(LimelightBaseIE):
class LimelightChannelListIE(LimelightBaseIE):
IE_NAME = 'limelight:channel_list'
_VALID_URL = r'(?:limelight:channel_list:|https?://link\.videoplatform\.limelight\.com/media/\?.*?\bchannelListId=)(?P<id>[a-z0-9]{32})'
_TEST = {
_VALID_URL = r'''(?x)
(?:
limelight:channel_list:|
https?://
(?:
link\.videoplatform\.limelight\.com/media/|
assets\.delvenetworks\.com/player/loader\.swf
)
\?.*?\bchannelListId=
)
(?P<id>[a-z0-9]{32})
'''
_TESTS = [{
'url': 'http://link.videoplatform.limelight.com/media/?channelListId=301b117890c4465c8179ede21fd92e2b',
'info_dict': {
'id': '301b117890c4465c8179ede21fd92e2b',
'title': 'Website - Hero Player',
},
'playlist_mincount': 2,
}
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?channelListId=301b117890c4465c8179ede21fd92e2b',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'channel_list'
def _real_extract(self, url):

View File

@ -4,16 +4,12 @@ from __future__ import unicode_literals
import random
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode
from ..utils import (
sanitized_Request,
xpath_text,
)
from ..utils import xpath_text
class MatchTVIE(InfoExtractor):
_VALID_URL = r'https?://matchtv\.ru/?#live-player'
_TEST = {
_VALID_URL = r'https?://matchtv\.ru(?:/on-air|/?#live-player)'
_TESTS = [{
'url': 'http://matchtv.ru/#live-player',
'info_dict': {
'id': 'matchtv-live',
@ -24,12 +20,16 @@ class MatchTVIE(InfoExtractor):
'params': {
'skip_download': True,
},
}
}, {
'url': 'http://matchtv.ru/on-air/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = 'matchtv-live'
request = sanitized_Request(
'http://player.matchtv.ntvplus.tv/player/smil?%s' % compat_urllib_parse_urlencode({
video_url = self._download_json(
'http://player.matchtv.ntvplus.tv/player/smil', video_id,
query={
'ts': '',
'quality': 'SD',
'contentId': '561d2c0df7159b37178b4567',
@ -40,11 +40,10 @@ class MatchTVIE(InfoExtractor):
'contentType': 'channel',
'timeShift': '0',
'platform': 'portal',
}),
},
headers={
'Referer': 'http://player.matchtv.ntvplus.tv/embed-player/NTVEmbedPlayer.swf',
})
video_url = self._download_json(request, video_id)['data']['videoUrl']
})['data']['videoUrl']
f4m_url = xpath_text(self._download_xml(video_url, video_id), './to')
formats = self._extract_f4m_formats(f4m_url, video_id)
self._sort_formats(formats)

View File

@ -163,7 +163,7 @@ class NRKTVIE(NRKBaseIE):
'ext': 'mp4',
'title': '20 spørsmål 23.05.2014',
'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'duration': 1741.52,
'duration': 1741,
},
}, {
'url': 'https://tv.nrk.no/program/mdfp15000514',
@ -173,7 +173,7 @@ class NRKTVIE(NRKBaseIE):
'ext': 'mp4',
'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting 24.05.2014',
'description': 'md5:89290c5ccde1b3a24bb8050ab67fe1db',
'duration': 4605.08,
'duration': 4605,
},
}, {
# single playlist video
@ -260,30 +260,34 @@ class NRKPlaylistIE(InfoExtractor):
class NRKSkoleIE(InfoExtractor):
IE_DESC = 'NRK Skole'
_VALID_URL = r'https?://(?:www\.)?nrk\.no/skole/klippdetalj?.*\btopic=(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?nrk\.no/skole/?\?.*\bmediaId=(?P<id>\d+)'
_TESTS = [{
'url': 'http://nrk.no/skole/klippdetalj?topic=nrk:klipp/616532',
'md5': '04cd85877cc1913bce73c5d28a47e00f',
'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099',
'md5': '6bc936b01f9dd8ed45bc58b252b2d9b6',
'info_dict': {
'id': '6021',
'ext': 'flv',
'ext': 'mp4',
'title': 'Genetikk og eneggede tvillinger',
'description': 'md5:3aca25dcf38ec30f0363428d2b265f8d',
'duration': 399,
},
}, {
'url': 'http://www.nrk.no/skole/klippdetalj?topic=nrk%3Aklipp%2F616532#embed',
'only_matching': True,
}, {
'url': 'http://www.nrk.no/skole/klippdetalj?topic=urn:x-mediadb:21379',
'url': 'https://www.nrk.no/skole/?page=objectives&subject=naturfag&objective=K15114&mediaId=19355',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(self._match_id(url))
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(
'https://mimir.nrk.no/plugin/1.0/static?mediaId=%s' % video_id,
video_id)
nrk_id = self._parse_json(
self._search_regex(
r'<script[^>]+type=["\']application/json["\'][^>]*>({.+?})</script>',
webpage, 'application json'),
video_id)['activeMedia']['psId']
nrk_id = self._search_regex(r'data-nrk-id=["\'](\d+)', webpage, 'nrk id')
return self.url_result('nrk:%s' % nrk_id)

View File

@ -5,6 +5,7 @@ import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
sanitized_Request,
urlencode_postdata,
)
@ -14,7 +15,7 @@ class StreamcloudIE(InfoExtractor):
IE_NAME = 'streamcloud.eu'
_VALID_URL = r'https?://streamcloud\.eu/(?P<id>[a-zA-Z0-9_-]+)(?:/(?P<fname>[^#?]*)\.html)?'
_TEST = {
_TESTS = [{
'url': 'http://streamcloud.eu/skp9j99s4bpz/youtube-dl_test_video_____________-BaW_jenozKc.mp4.html',
'md5': '6bea4c7fa5daaacc2a946b7146286686',
'info_dict': {
@ -23,7 +24,10 @@ class StreamcloudIE(InfoExtractor):
'title': 'youtube-dl test video \'/\\ ä ↭',
},
'skip': 'Only available from the EU'
}
}, {
'url': 'http://streamcloud.eu/ua8cmfh1nbe6/NSHIP-148--KUC-NG--H264-.mp4.html',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -31,6 +35,10 @@ class StreamcloudIE(InfoExtractor):
orig_webpage = self._download_webpage(url, video_id)
if '>File Not Found<' in orig_webpage:
raise ExtractorError(
'Video %s does not exist' % video_id, expected=True)
fields = re.findall(r'''(?x)<input\s+
type="(?:hidden|submit)"\s+
name="([^"]+)"\s+

View File

@ -0,0 +1,55 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class TelewebionIE(InfoExtractor):
_VALID_URL = r'https?://www\.telewebion\.com/#!/episode/(?P<id>\d+)'
_TEST = {
'url': 'http://www.telewebion.com/#!/episode/1263668/',
'info_dict': {
'id': '1263668',
'ext': 'mp4',
'title': 'قرعه\u200cکشی لیگ قهرمانان اروپا',
'thumbnail': 're:^https?://.*\.jpg',
'view_count': int,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
secure_token = self._download_webpage(
'http://m.s2.telewebion.com/op/op?action=getSecurityToken', video_id)
episode_details = self._download_json(
'http://m.s2.telewebion.com/op/op', video_id,
query={'action': 'getEpisodeDetails', 'episode_id': video_id})
m3u8_url = 'http://m.s1.telewebion.com/smil/%s.m3u8?filepath=%s&m3u8=1&secure_token=%s' % (
video_id, episode_details['file_path'], secure_token)
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', m3u8_id='hls')
picture_paths = [
episode_details.get('picture_path'),
episode_details.get('large_picture_path'),
]
thumbnails = [{
'url': picture_path,
'preference': idx,
} for idx, picture_path in enumerate(picture_paths) if picture_path is not None]
return {
'id': video_id,
'title': episode_details['title'],
'formats': formats,
'thumbnails': thumbnails,
'view_count': episode_details.get('view_count'),
}

View File

@ -101,10 +101,13 @@ class VikiBaseIE(InfoExtractor):
self.report_warning('Unable to get session token, login has probably failed')
@staticmethod
def dict_selection(dict_obj, preferred_key):
def dict_selection(dict_obj, preferred_key, allow_fallback=True):
if preferred_key in dict_obj:
return dict_obj.get(preferred_key)
if not allow_fallback:
return
filtered_dict = list(filter(None, [dict_obj.get(k) for k in dict_obj.keys()]))
return filtered_dict[0] if filtered_dict else None
@ -127,7 +130,7 @@ class VikiIE(VikiBaseIE):
}, {
# clip
'url': 'http://www.viki.com/videos/1067139v-the-avengers-age-of-ultron-press-conference',
'md5': '86c0b5dbd4d83a6611a79987cc7a1989',
'md5': 'feea2b1d7b3957f70886e6dfd8b8be84',
'info_dict': {
'id': '1067139v',
'ext': 'mp4',
@ -156,17 +159,18 @@ class VikiIE(VikiBaseIE):
'params': {
# m3u8 download
'skip_download': True,
}
},
'skip': 'Blocked in the US',
}, {
# episode
'url': 'http://www.viki.com/videos/44699v-boys-over-flowers-episode-1',
'md5': '190f3ef426005ba3a080a63325955bc3',
'md5': '1f54697dabc8f13f31bf06bb2e4de6db',
'info_dict': {
'id': '44699v',
'ext': 'mp4',
'title': 'Boys Over Flowers - Episode 1',
'description': 'md5:52617e4f729c7d03bfd4bcbbb6e946f2',
'duration': 4155,
'description': 'md5:b89cf50038b480b88b5b3c93589a9076',
'duration': 4204,
'timestamp': 1270496524,
'upload_date': '20100405',
'uploader': 'group8',
@ -196,7 +200,7 @@ class VikiIE(VikiBaseIE):
}, {
# non-English description
'url': 'http://www.viki.com/videos/158036v-love-in-magic',
'md5': '1713ae35df5a521b31f6dc40730e7c9c',
'md5': '013dc282714e22acf9447cad14ff1208',
'info_dict': {
'id': '158036v',
'ext': 'mp4',
@ -217,7 +221,7 @@ class VikiIE(VikiBaseIE):
self._check_errors(video)
title = self.dict_selection(video.get('titles', {}), 'en')
title = self.dict_selection(video.get('titles', {}), 'en', allow_fallback=False)
if not title:
title = 'Episode %d' % video.get('number') if video.get('type') == 'episode' else video.get('id') or video_id
container_titles = video.get('container', {}).get('titles', {})
@ -302,7 +306,7 @@ class VikiChannelIE(VikiBaseIE):
'title': 'Boys Over Flowers',
'description': 'md5:ecd3cff47967fe193cff37c0bec52790',
},
'playlist_count': 70,
'playlist_mincount': 71,
}, {
'url': 'http://www.viki.com/tv/1354c-poor-nastya-complete',
'info_dict': {

View File

@ -66,6 +66,69 @@ class VimeoBaseInfoExtractor(InfoExtractor):
def _set_vimeo_cookie(self, name, value):
self._set_cookie('vimeo.com', name, value)
def _vimeo_sort_formats(self, formats):
# Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
# at the same time without actual units specified. This lead to wrong sorting.
self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'format_id'))
def _parse_config(self, config, video_id):
# Extract title
video_title = config['video']['title']
# Extract uploader, uploader_url and uploader_id
video_uploader = config['video'].get('owner', {}).get('name')
video_uploader_url = config['video'].get('owner', {}).get('url')
video_uploader_id = video_uploader_url.split('/')[-1] if video_uploader_url else None
# Extract video thumbnail
video_thumbnail = config['video'].get('thumbnail')
if video_thumbnail is None:
video_thumbs = config['video'].get('thumbs')
if video_thumbs and isinstance(video_thumbs, dict):
_, video_thumbnail = sorted((int(width if width.isdigit() else 0), t_url) for (width, t_url) in video_thumbs.items())[-1]
# Extract video duration
video_duration = int_or_none(config['video'].get('duration'))
formats = []
config_files = config['video'].get('files') or config['request'].get('files', {})
for f in config_files.get('progressive', []):
video_url = f.get('url')
if not video_url:
continue
formats.append({
'url': video_url,
'format_id': 'http-%s' % f.get('quality'),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'fps': int_or_none(f.get('fps')),
'tbr': int_or_none(f.get('bitrate')),
})
m3u8_url = config_files.get('hls', {}).get('url')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
subtitles = {}
text_tracks = config['request'].get('text_tracks')
if text_tracks:
for tt in text_tracks:
subtitles[tt['lang']] = [{
'ext': 'vtt',
'url': 'https://vimeo.com' + tt['url'],
}]
return {
'title': video_title,
'uploader': video_uploader,
'uploader_id': video_uploader_id,
'uploader_url': video_uploader_url,
'thumbnail': video_thumbnail,
'duration': video_duration,
'formats': formats,
'subtitles': subtitles,
}
class VimeoIE(VimeoBaseInfoExtractor):
"""Information extractor for vimeo.com."""
@ -153,7 +216,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_id': 'user18948128',
'uploader': 'Jaime Marquínez Ferrándiz',
'duration': 10,
'description': 'This is "youtube-dl password protected test video" by Jaime Marquínez Ferrándiz on Vimeo, the home for high quality videos and the people\u2026',
'description': 'This is "youtube-dl password protected test video" by on Vimeo, the home for high quality videos and the people who love them.',
},
'params': {
'videopassword': 'youtube-dl',
@ -389,21 +452,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
'https://player.vimeo.com/player/%s' % feature_id,
{'force_feature_id': True}), 'Vimeo')
# Extract title
video_title = config['video']['title']
# Extract uploader, uploader_url and uploader_id
video_uploader = config['video'].get('owner', {}).get('name')
video_uploader_url = config['video'].get('owner', {}).get('url')
video_uploader_id = video_uploader_url.split('/')[-1] if video_uploader_url else None
# Extract video thumbnail
video_thumbnail = config['video'].get('thumbnail')
if video_thumbnail is None:
video_thumbs = config['video'].get('thumbs')
if video_thumbs and isinstance(video_thumbs, dict):
_, video_thumbnail = sorted((int(width if width.isdigit() else 0), t_url) for (width, t_url) in video_thumbs.items())[-1]
# Extract video description
video_description = self._html_search_regex(
@ -423,9 +471,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
if not video_description and not mobj.group('player'):
self._downloader.report_warning('Cannot find video description')
# Extract video duration
video_duration = int_or_none(config['video'].get('duration'))
# Extract upload date
video_upload_date = None
mobj = re.search(r'<time[^>]+datetime="([^"]+)"', webpage)
@ -463,53 +508,22 @@ class VimeoIE(VimeoBaseInfoExtractor):
'format_id': source_name,
'preference': 1,
})
config_files = config['video'].get('files') or config['request'].get('files', {})
for f in config_files.get('progressive', []):
video_url = f.get('url')
if not video_url:
continue
formats.append({
'url': video_url,
'format_id': 'http-%s' % f.get('quality'),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'fps': int_or_none(f.get('fps')),
'tbr': int_or_none(f.get('bitrate')),
})
m3u8_url = config_files.get('hls', {}).get('url')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
# Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
# at the same time without actual units specified. This lead to wrong sorting.
self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'format_id'))
subtitles = {}
text_tracks = config['request'].get('text_tracks')
if text_tracks:
for tt in text_tracks:
subtitles[tt['lang']] = [{
'ext': 'vtt',
'url': 'https://vimeo.com' + tt['url'],
}]
return {
info_dict = self._parse_config(config, video_id)
formats.extend(info_dict['formats'])
self._vimeo_sort_formats(formats)
info_dict.update({
'id': video_id,
'uploader': video_uploader,
'uploader_url': video_uploader_url,
'uploader_id': video_uploader_id,
'upload_date': video_upload_date,
'title': video_title,
'thumbnail': video_thumbnail,
'description': video_description,
'duration': video_duration,
'formats': formats,
'upload_date': video_upload_date,
'description': video_description,
'webpage_url': url,
'view_count': view_count,
'like_count': like_count,
'comment_count': comment_count,
'subtitles': subtitles,
}
})
return info_dict
class VimeoOndemandIE(VimeoBaseInfoExtractor):
@ -692,7 +706,7 @@ class VimeoGroupsIE(VimeoAlbumIE):
return self._extract_videos(name, 'https://vimeo.com/groups/%s' % name)
class VimeoReviewIE(InfoExtractor):
class VimeoReviewIE(VimeoBaseInfoExtractor):
IE_NAME = 'vimeo:review'
IE_DESC = 'Review pages on vimeo'
_VALID_URL = r'https://vimeo\.com/[^/]+/review/(?P<id>[^/]+)'
@ -704,6 +718,7 @@ class VimeoReviewIE(InfoExtractor):
'ext': 'mp4',
'title': "DICK HARDWICK 'Comedian'",
'uploader': 'Richard Hardwick',
'uploader_id': 'user21297594',
}
}, {
'note': 'video player needs Referer',
@ -716,14 +731,18 @@ class VimeoReviewIE(InfoExtractor):
'uploader': 'DevWeek Events',
'duration': 2773,
'thumbnail': 're:^https?://.*\.jpg$',
'uploader_id': 'user22258446',
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
player_url = 'https://player.vimeo.com/player/' + video_id
return self.url_result(player_url, 'Vimeo', video_id)
video_id = self._match_id(url)
config = self._download_json(
'https://player.vimeo.com/video/%s/config' % video_id, video_id)
info_dict = self._parse_config(config, video_id)
self._vimeo_sort_formats(info_dict['formats'])
info_dict['id'] = video_id
return info_dict
class VimeoWatchLaterIE(VimeoChannelIE):

View File

@ -17,7 +17,7 @@ class YouPornIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?youporn\.com/watch/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'md5': '71ec5fcfddacf80f495efa8b6a8d9a89',
'md5': '3744d24c50438cf5b6f6d59feb5055c2',
'info_dict': {
'id': '505835',
'display_id': 'sex-ed-is-it-safe-to-masturbate-daily',
@ -121,21 +121,21 @@ class YouPornIE(InfoExtractor):
webpage, 'thumbnail', fatal=False, group='thumbnail')
uploader = self._html_search_regex(
r'(?s)<div[^>]+class=["\']videoInfoBy(?:\s+[^"\']+)?["\'][^>]*>\s*By:\s*</div>(.+?)</(?:a|div)>',
r'(?s)<div[^>]+class=["\']submitByLink["\'][^>]*>(.+?)</div>',
webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r'(?s)<div[^>]+class=["\']videoInfoTime["\'][^>]*>(.+?)</div>',
r'(?s)<div[^>]+class=["\']videoInfo(?:Date|Time)["\'][^>]*>(.+?)</div>',
webpage, 'upload date', fatal=False))
age_limit = self._rta_search(webpage)
average_rating = int_or_none(self._search_regex(
r'<div[^>]+class=["\']videoInfoRating["\'][^>]*>\s*<div[^>]+class=["\']videoRatingPercentage["\'][^>]*>(\d+)%</div>',
r'<div[^>]+class=["\']videoRatingPercentage["\'][^>]*>(\d+)%</div>',
webpage, 'average rating', fatal=False))
view_count = str_to_int(self._search_regex(
r'(?s)<div[^>]+class=["\']videoInfoViews["\'][^>]*>.*?([\d,.]+)\s*</div>',
webpage, 'view count', fatal=False))
r'(?s)<div[^>]+class=(["\']).*?\bvideoInfoViews\b.*?\1[^>]*>.*?(?P<count>[\d,.]+)<',
webpage, 'view count', fatal=False, group='count'))
comment_count = str_to_int(self._search_regex(
r'>All [Cc]omments? \(([\d,.]+)\)',
webpage, 'comment count', fatal=False))

View File

@ -76,7 +76,7 @@ def register_socks_protocols():
compiled_regex_type = type(re.compile(''))
std_headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/44.0 (Chrome)',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
@ -1901,6 +1901,16 @@ def dict_get(d, key_or_keys, default=None, skip_false_values=True):
return d.get(key_or_keys, default)
def try_get(src, getter, expected_type=None):
try:
v = getter(src)
except (AttributeError, KeyError, TypeError, IndexError):
pass
else:
if expected_type is None or isinstance(v, expected_type):
return v
def encode_compat_str(string, encoding=preferredencoding(), errors='strict'):
return string if isinstance(string, compat_str) else compat_str(string, encoding, errors)

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2016.06.11.1'
__version__ = '2016.06.12'