Compare commits

...

127 Commits

Author SHA1 Message Date
92da3cd848 release 2016.02.22 2016-02-22 11:57:31 +01:00
6212bcb191 [tf1] fix info extraction(fixes #8599) 2016-02-22 09:57:40 +01:00
d69abbd3f0 [googledrive] Make thumbnail optional (Closes #8629) 2016-02-22 03:13:18 +06:00
1d00a8823e [arte] PEP 8 2016-02-22 01:32:23 +06:00
5d6e1011df [pbs] Extract all formats (Closes #8538) 2016-02-22 01:23:27 +06:00
f5bdb44443 [extractor/common] Add _remove_duplicate_formats 2016-02-22 01:19:39 +06:00
7efc1c2b49 [twitter] Fix metadata extraction and test_Twitter_1 2016-02-21 17:29:28 +08:00
132e3b74bd [twitter] Fix a typo 2016-02-21 17:21:37 +08:00
bdbf4ba40e [twitter:amplify] Extract more metadata 2016-02-21 17:16:35 +08:00
acb6e97e6a [twitter] Fix several failed tests 2016-02-21 16:57:56 +08:00
445d72b8b5 [twitter:amplify] Add TwitterAmplifyIE for handling Twitter smart URLs
Closes #8075
2016-02-21 16:41:24 +08:00
92c5e11b40 [arte:future] Fix test 2016-02-21 14:23:58 +06:00
0dd046c16c [arte:magazine] Fix test 2016-02-21 13:57:30 +06:00
305168ca3e [arte:+7] Detect more embeds (Closes #8613) 2016-02-21 13:55:25 +06:00
b72f6163dc [arte:+7] Improve _VALID_URL 2016-02-21 13:37:31 +06:00
33d4fdabfa [extractor/generic] Add support for ok embeds (#8619) 2016-02-21 09:51:54 +06:00
cafcf657a4 add more subtitles mime types to mimetype2ext and fix the platform subtitle extraction 2016-02-20 22:02:03 +01:00
7360db05b4 [postprocessor/embedthumbnail] Allow mkv to embed thumbnails
Fixes #6046
2016-02-21 03:32:03 +08:00
765ac263db [utils] mimetype2ext: return 'm4a' for 'audio/mp4' (fixes #8620)
The youtube extractor was using 'mp4' for them, therefore filters like 'bestaudio[ext=m4a]' stopped working (94278f7202 broke it).
2016-02-20 19:55:10 +01:00
a4e4d7dfcd [test_iqiyi_sdk_interpreter] Add test for iQiyi login 2016-02-20 23:10:39 +08:00
73f9c2867d [iqiyi] Support playlists (closes #8019) 2016-02-20 22:44:04 +08:00
9c86d50916 [faz] Future-proof XML element check 2016-02-20 14:11:44 +01:00
1d14c75f55 [Makefile] iQiyi login test requires network 2016-02-20 20:49:30 +08:00
99709cc3f1 [iqiyi] Implement _login()
Currently only email login supported
2016-02-20 19:54:58 +08:00
5bc880b988 [utils] Add OHDave's RSA encryption function 2016-02-20 19:54:58 +08:00
958759f44b [appletrailers] Extend _VALID_URL (#8524) 2016-02-20 15:54:00 +08:00
86bf29050e [test_YoutubeDL] Make test pass until more intelligent sort formats (Closes #8462) 2016-02-20 03:36:03 +06:00
04cbc4980d [mtv] imporove duration extraction 2016-02-19 20:56:45 +01:00
8765151c8a [mtv] Extract duration from each playlist item
RSS used instead of manifest files because it's exact to the millisecond
with the video I tested while in manifest it's only exact to the second.
2016-02-19 19:38:28 +00:00
8ec64ac683 [README.md] Clarify verbose log 2016-02-19 22:18:21 +06:00
ed8648a322 [pornhub] Fix thumbnail and duration extraction (Closes #8604) 2016-02-19 21:42:46 +06:00
88641243ab [pornhub:playlistbase] Improve extract entries 2016-02-18 22:30:19 +06:00
40e146aa1e [pornhub:user:videos] Add extractor (Closes #8548) 2016-02-18 22:29:17 +06:00
f3f9cd9234 [francetv] Improve video id regex (Closes #8563) 2016-02-18 22:09:21 +06:00
ebf1b291d0 [youtube:watchlater] Respect --no-playlist 2016-02-18 22:03:46 +06:00
bc7a9cd8fb [youtube:watchlater] Improve _VALID_URL (Closes #8594) 2016-02-18 21:50:21 +06:00
d48502b82a [arte] Improve _VALID_URLs 2016-02-18 21:29:52 +06:00
479ec54a8d [arte:magazine] Improve (Closes #8473) 2016-02-18 21:29:07 +06:00
49625662a9 [arte:magazine] Add extractor 2016-02-18 21:28:18 +06:00
8b809a079a [cbsnews] use find_xpath_attr 2016-02-18 16:10:09 +01:00
778433cb90 [cbsnews] extract subtitle url from theplatform SMIL manifest(fixes #8568) 2016-02-18 15:43:28 +01:00
411cb8f476 [dailymotion] Fix view count extraction
Fix view count parsing when the decimal marker is a whitespace, e.g. '101 101'
2016-02-18 20:31:43 +06:00
63bf4f0dc0 [vrt] Detect geo restriction 2016-02-17 23:28:41 +06:00
80e59a0d5d [vrt] Make formats extraction non fatal (Closes #8587) 2016-02-17 23:18:23 +06:00
8bbd3d1476 [arte] Fix upload date extraction (Closes #8581) 2016-02-17 22:51:08 +06:00
e725e4bced [arte] PEP 8 2016-02-17 22:37:55 +06:00
08d65046f0 [arte] Make sorting aware of en/es formats 2016-02-17 22:37:05 +06:00
44b9745000 [arte] Extend more _VALID_URLs for en and es support 2016-02-17 21:53:53 +06:00
9654fc875b [arte:+7] Fix extraction for react-based layout 2016-02-17 21:49:15 +06:00
0f425e65ec [arte:+7] Add support for en and es URLs 2016-02-17 21:47:18 +06:00
e277f2a63b [orf:tvthek] Check formats (Closes #8580) 2016-02-16 22:23:38 +06:00
f4db09178a [xtube:user] Remove duplicated video ids 2016-02-16 22:06:26 +06:00
86be3cdc2a [xtube] Fix extraction (Closes #8565) 2016-02-16 22:05:23 +06:00
cb64ccc715 [facebook] Improve error handling (#8572) 2016-02-16 09:07:38 +08:00
f66a3c7bc2 [screenjunkies] Fix spelling 2016-02-16 01:30:00 +06:00
fe80df3080 Credit @TingPing for screenjunkies (#8505) 2016-02-16 01:24:57 +06:00
1932476c13 [iqiyi] Omit MD5 sums for the VIP-only video 2016-02-16 02:45:21 +08:00
d2c1f79f20 [youtube:searchurl] Extend _VALID_URL 2016-02-16 00:29:51 +06:00
8eacae8cf9 Credit @RobinHoutevelts for canvas subtiltes (#8537) 2016-02-15 22:33:32 +06:00
c8a80fd818 [screenjunkies] Improve, extract more metadata and workaround subscription (Closes #8505) 2016-02-15 22:29:28 +06:00
b9e8d7140a [screenjunkies] Add new extractor
This doesn't handle the plus only videos yet

Closes #8492
2016-02-15 22:28:36 +06:00
6eff2605d6 [canvas] Add subtitles test (#8537) 2016-02-15 20:59:16 +06:00
fd7a3ea4a4 [canvas] Improve subtitles (Closes #8537) 2016-02-15 20:54:01 +06:00
8d3eeb36d7 [Canvas] Add subtitles 2016-02-15 20:50:03 +06:00
8e0548e180 [iqiyi] Partial support for VIP-only videos
See #8569 and #8019. Currently only 6-min preview are supported
2016-02-15 19:58:24 +08:00
a517bb4b1e [noz] Add new extractor 2016-02-15 00:07:16 +01:00
9dcefb23a1 [laola1tv] Improve (Closes #8478) 2016-02-14 23:40:26 +06:00
d9da74bc06 Credit @blackwinter for laola1tv (#8478) 2016-02-14 23:39:49 +06:00
5e19323ed9 [laola1tv] Fixes for changed site layout.
* Fixed valid URLs (w/ tests).
* Fixed iframe URL extraction.
* Fixed token URL extraction.
* Fixed variable extraction.
* Fixed uploader spelling.
* Added upload_date to result dictionary.
2016-02-14 23:01:49 +06:00
611c1dd96e [refactor] Single quotes consistency 2016-02-14 15:37:17 +06:00
d800609c62 [refactor] Do not specify redundant None as second argument in dict.get() 2016-02-14 14:25:04 +06:00
c78c9cd10d [downloader/dash] PEP 8 2016-02-14 14:13:09 +06:00
e76394f36c [globo] Switch to new-style classes 2016-02-14 14:02:12 +06:00
080e09557d [aes] Switch to new-style classes 2016-02-14 14:01:43 +06:00
fca2e6d5a6 [dailymotion:cloud] Use idiomatic name for classmethod's first argument 2016-02-14 13:44:23 +06:00
b45f2b1d6e [myvideo] Mark broken 2016-02-14 11:24:57 +06:00
fc2e70ee90 Merge pull request #8479 from remitamine/dash_downloader
[downloader/dash] Implement dashsegments fd in terms of fragment fd
2016-02-13 21:12:33 +01:00
b4561e857f [animeondemand] Add .netrc 2016-02-13 22:41:58 +06:00
7023251239 [comedycentral] Support /shows URLs (fixes #8405) 2016-02-13 12:26:27 +01:00
e2bd68c901 [animeondemand][wip] Add extractor (#8518) 2016-02-13 13:30:31 +06:00
35ced3985a release 2016.02.13 2016-02-13 08:25:05 +01:00
3e18700d45 [nbc] Correct test 2016-02-13 07:45:32 +06:00
f9f49d87c2 [youtube] Add test for #8536 2016-02-13 05:18:58 +06:00
6863631c26 [youtube] Improve multifeed videos extraction (Closes #8536) 2016-02-13 05:01:20 +06:00
9d939cec48 [extractor/generic] Add direct mpd url test 2016-02-13 00:36:47 +06:00
4c77d3f52a [YoutubeDL] Allow bestvideo+bestaudio for any extractor 2016-02-13 00:23:14 +06:00
7be747b921 [extractor/generic] Pass mpd base url to _parse_mpd_formats 2016-02-13 00:15:59 +06:00
bb20526b64 [extractor/common] Improve base url construction 2016-02-13 00:13:56 +06:00
bcbb1b08b2 Revert "[aenetworks] extract http formats"
This reverts commit 3d98f97c64.
2016-02-12 17:56:06 +01:00
3d98f97c64 [aenetworks] extract http formats 2016-02-12 17:39:32 +01:00
c349456ef6 [extractor/common] strip http urls in smil manifest 2016-02-12 17:38:48 +01:00
5a4905924d [extractor/generic] Improve dailymotion embed detection (Closes #8521, closes #8325) 2016-02-12 22:03:10 +06:00
b826035dd5 [vimeo] Fix authentication (Closes #8520) 2016-02-12 03:16:26 +06:00
a7cab4d039 [theplatform] remove unused import and change smil url for ThePlatformFeedIE 2016-02-11 18:50:14 +01:00
fc3810f6d1 Merge branch 'master' of github.com:rg3/youtube-dl 2016-02-11 18:13:56 +01:00
3dc71d82ce [theplatform] fix pid extraction in the platform feed 2016-02-11 18:13:03 +01:00
9c7b38981c [utils] Bump Firefox version in User-Agent
Old version number causes Youtube not to serve some formats in ytplayer.config
2016-02-11 23:12:30 +06:00
8b85ac3fd9 [cbc] Add new extractor(closes #3803)(closes #4731)(closes #5309) 2016-02-11 18:10:32 +01:00
81e1c4e2fc [extractor/common] remove duplicate rtmp formats in smil manifest 2016-02-11 17:58:48 +01:00
388ae76b52 [YoutubeDL] Fix format resolution when height is missing 2016-02-11 22:46:13 +06:00
b67d63149d [youtube] Fix typos 2016-02-11 22:33:08 +06:00
28280e8ded [plays] PEP 8 2016-02-11 22:02:57 +06:00
6b3fbd3425 [pbs] Fix multi part videos extraction 2016-02-11 22:02:37 +06:00
a7ab46375b [pbs] Update some tests 2016-02-11 21:43:01 +06:00
b14d5e26f6 [pbs] Improve description extraction 2016-02-11 21:28:09 +06:00
9a61dfba0c [pbs] Revert prefer portalplayer 2016-02-11 21:22:57 +06:00
154c209e2d [extractor/common] improve dash format ids 2016-02-11 10:33:26 +01:00
d1ea5e171f [plays] Add new extractor(#8458) 2016-02-11 10:30:31 +01:00
a1188d0ed0 [crackle] add prefix to format ids 2016-02-10 22:39:33 +01:00
47d205a646 [crackle] improve format sorting 2016-02-10 22:23:56 +01:00
80f772c28a [crackle] Add new extractor 2016-02-10 22:16:21 +01:00
f817d9bec1 release 2016.02.10 2016-02-10 16:17:38 +01:00
e2effb08a4 [YoutubeDL] Sanitize format_id (Closes #8494) 2016-02-10 21:16:58 +06:00
7fcea295c5 [pbs] Switch to portal player by default (Closes #8491) 2016-02-10 20:46:38 +06:00
cc799437ea [youku] Report private videos (Closes #8498) 2016-02-10 20:05:17 +06:00
89d23f37f2 [hotstar] Relax _VALID_URL (Closes #8487) 2016-02-10 04:43:00 +06:00
b92071ef00 release 2016.02.09.1 2016-02-09 20:12:36 +01:00
47246ae26c [viddler] Update tests 2016-02-10 01:12:47 +06:00
9c15869c28 [viddler] Add support for secret videos (Closes #8481) 2016-02-10 01:09:07 +06:00
51e9094f4a [extractor/common] extract youtube dash formats filesize(fixes #8480) 2016-02-09 20:05:39 +01:00
5e3a6fec33 [fox] update test 2016-02-09 17:30:42 +01:00
c43fe0268c [downloader/dash] Implement dashsegments fd in terms of fragment fd 2016-02-09 17:25:44 +01:00
d413095f7e [extractor/common] remove duplicated formats and subtiles in smil manifests 2016-02-09 17:15:41 +01:00
1bedf4de06 [fox] extract http formats 2016-02-09 17:12:34 +01:00
3967a761f4 [mailru] Fix tests 2016-02-09 21:31:51 +06:00
b081350bd9 [mailru] Improve and modernize 2016-02-09 21:30:48 +06:00
16f1430ba6 [mailru] Prefer metaUrl API (Closes #8474) 2016-02-09 21:14:02 +06:00
112 changed files with 2246 additions and 713 deletions

View File

@ -157,3 +157,6 @@ Founder Fang
Andrew Alexeyew
Saso Bezlaj
Erwin de Haan
Jens Wille
Robin Houtevelts
Patrick Griffis

View File

@ -1,6 +1,6 @@
**Please include the full output of youtube-dl when run with `-v`**, i.e. add `-v` flag to your command line, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
```
$ youtube-dl -v http://www.youtube.com/watch?v=BaW_jenozKcj
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']

View File

@ -44,7 +44,7 @@ test:
ot: offlinetest
offlinetest: codetest
nosetests --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py
nosetests --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
tar: youtube-dl.tar.gz

View File

@ -935,9 +935,9 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
**Please include the full output of youtube-dl when run with `-v`**, i.e. add `-v` flag to your command line, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
```
$ youtube-dl -v http://www.youtube.com/watch?v=BaW_jenozKcj
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']

View File

@ -30,6 +30,7 @@
- **AlJazeera**
- **Allocine**
- **AlphaPorno**
- **AnimeOnDemand**
- **anitube.se**
- **AnySex**
- **Aparat**
@ -49,6 +50,7 @@
- **arte.tv:ddc**
- **arte.tv:embed**
- **arte.tv:future**
- **arte.tv:magazine**
- **AtresPlayer**
- **ATTTechChannel**
- **AudiMedia**
@ -89,6 +91,8 @@
- **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas**
- **CBC**
- **CBCPlayer**
- **CBS**
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
@ -120,6 +124,7 @@
- **ComedyCentralShows**: The Daily Show / The Colbert Report
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **Cracked**
- **Crackle**
- **Criterion**
- **CrooksAndLiars**
- **Crunchyroll**
@ -357,7 +362,7 @@
- **MySpace:album**
- **MySpass**
- **Myvi**
- **myvideo**
- **myvideo** (Currently broken)
- **MyVidster**
- **n-tv.de**
- **NationalGeographic**
@ -407,6 +412,7 @@
- **NowTV** (Currently broken)
- **NowTVList**
- **nowvideo**: NowVideo
- **Noz**
- **npo**: npo.nl and ntr.nl
- **npo.nl:live**
- **npo.nl:radio**
@ -445,6 +451,7 @@
- **PlanetaPlay**
- **play.fm**
- **played.to**
- **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid**
- **Playwire**
@ -456,6 +463,7 @@
- **PornHd**
- **PornHub**
- **PornHubPlaylist**
- **PornHubUserVideos**
- **Pornotube**
- **PornoVoisines**
- **PornoXO**
@ -518,6 +526,7 @@
- **screen.yahoo:search**: Yahoo screen search
- **Screencast**
- **ScreencastOMatic**
- **ScreenJunkies**
- **ScreenwaveMedia**
- **SenateISVP**
- **ServingSys**
@ -651,6 +660,7 @@
- **twitch:video**
- **twitch:vod**
- **twitter**
- **twitter:amplify**
- **twitter:card**
- **Ubu**
- **udemy**

View File

@ -234,7 +234,7 @@ class TestFormatSelection(unittest.TestCase):
def test_youtube_format_selection(self):
order = [
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '36', '17', '13',
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '17', '36', '13',
# Apple HTTP Live Streaming
'96', '95', '94', '93', '92', '132', '151',
# 3D

View File

@ -0,0 +1,47 @@
#!/usr/bin/env python
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
from youtube_dl.extractor import IqiyiIE
class IqiyiIEWithCredentials(IqiyiIE):
def _get_login_info(self):
return 'foo', 'bar'
class WarningLogger(object):
def __init__(self):
self.messages = []
def warning(self, msg):
self.messages.append(msg)
def debug(self, msg):
pass
def error(self, msg):
pass
class TestIqiyiSDKInterpreter(unittest.TestCase):
def test_iqiyi_sdk_interpreter(self):
'''
Test the functionality of IqiyiSDKInterpreter by trying to log in
If `sign` is incorrect, /validate call throws an HTTP 556 error
'''
logger = WarningLogger()
ie = IqiyiIEWithCredentials(FakeYDL({'logger': logger}))
ie._login()
self.assertTrue('unable to log in:' in logger.messages[0])
if __name__ == '__main__':
unittest.main()

View File

@ -35,6 +35,7 @@ from youtube_dl.utils import (
is_html,
js_to_json,
limit_length,
ohdave_rsa_encrypt,
OnDemandPagedList,
orderedSet,
parse_duration,
@ -792,6 +793,13 @@ The first line
{'nocheckcertificate': False}, '--check-certificate', 'nocheckcertificate', 'false', 'true', '='),
['--check-certificate=true'])
def test_ohdave_rsa_encrypt(self):
N = 0xab86b6371b5318aaa1d3c9e612a9f1264f372323c8c0f19875b5fc3b3fd3afcc1e5bec527aa94bfa85bffc157e4245aebda05389a5357b75115ac94f074aefcd
e = 65537
self.assertEqual(
ohdave_rsa_encrypt(b'aa111222', e, N),
'726664bd9a23fd0c70f9f1b84aab5e3905ce1e45a584e9cbcf9bcc7510338fc1986d6c599ff990d923aa43c51c0d9013cd572e13bc58f4ae48f2ed8c0b0ba881')
if __name__ == '__main__':
unittest.main()

View File

@ -605,12 +605,12 @@ class YoutubeDL(object):
if rejecttitle:
if re.search(rejecttitle, title, re.IGNORECASE):
return '"' + title + '" title matched reject pattern "' + rejecttitle + '"'
date = info_dict.get('upload_date', None)
date = info_dict.get('upload_date')
if date is not None:
dateRange = self.params.get('daterange', DateRange())
if date not in dateRange:
return '%s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange)
view_count = info_dict.get('view_count', None)
view_count = info_dict.get('view_count')
if view_count is not None:
min_views = self.params.get('min_views')
if min_views is not None and view_count < min_views:
@ -747,18 +747,18 @@ class YoutubeDL(object):
new_result, download=download, extra_info=extra_info)
elif result_type == 'playlist' or result_type == 'multi_video':
# We process each entry in the playlist
playlist = ie_result.get('title', None) or ie_result.get('id', None)
playlist = ie_result.get('title') or ie_result.get('id')
self.to_screen('[download] Downloading playlist: %s' % playlist)
playlist_results = []
playliststart = self.params.get('playliststart', 1) - 1
playlistend = self.params.get('playlistend', None)
playlistend = self.params.get('playlistend')
# For backwards compatibility, interpret -1 as whole list
if playlistend == -1:
playlistend = None
playlistitems_str = self.params.get('playlist_items', None)
playlistitems_str = self.params.get('playlist_items')
playlistitems = None
if playlistitems_str is not None:
def iter_playlistitems(format):
@ -782,7 +782,7 @@ class YoutubeDL(object):
entries = ie_entries[playliststart:playlistend]
n_entries = len(entries)
self.to_screen(
"[%s] playlist %s: Collected %d video ids (downloading %d of them)" %
'[%s] playlist %s: Collected %d video ids (downloading %d of them)' %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
elif isinstance(ie_entries, PagedList):
if playlistitems:
@ -796,7 +796,7 @@ class YoutubeDL(object):
playliststart, playlistend)
n_entries = len(entries)
self.to_screen(
"[%s] playlist %s: Downloading %d videos" %
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, n_entries))
else: # iterable
if playlistitems:
@ -807,7 +807,7 @@ class YoutubeDL(object):
ie_entries, playliststart, playlistend))
n_entries = len(entries)
self.to_screen(
"[%s] playlist %s: Downloading %d videos" %
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, n_entries))
if self.params.get('playlistreverse', False):
@ -1288,6 +1288,9 @@ class YoutubeDL(object):
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
else:
# Sanitize format_id from characters used in format selector expression
format['format_id'] = re.sub('[\s,/+\[\]()]', '_', format['format_id'])
format_id = format['format_id']
if format_id not in formats_dict:
formats_dict[format_id] = []
@ -1338,7 +1341,6 @@ class YoutubeDL(object):
if req_format is None:
req_format_list = []
if (self.params.get('outtmpl', DEFAULT_OUTTMPL) != '-' and
info_dict['extractor'] in ['youtube', 'ted'] and
not info_dict.get('is_live')):
merger = FFmpegMergerPP(self)
if merger.available and merger.can_merge():
@ -1795,7 +1797,7 @@ class YoutubeDL(object):
else:
res = '%sp' % format['height']
elif format.get('width') is not None:
res = '?x%d' % format['width']
res = '%dx?' % format['width']
else:
res = default
return res

View File

@ -7,7 +7,7 @@ from __future__ import unicode_literals
import sys
if __package__ is None and not hasattr(sys, "frozen"):
if __package__ is None and not hasattr(sys, 'frozen'):
# direct call of __main__.py
import os.path
path = os.path.realpath(os.path.abspath(__file__))

View File

@ -161,7 +161,7 @@ def aes_decrypt_text(data, password, key_size_bytes):
nonce = data[:NONCE_LENGTH_BYTES]
cipher = data[NONCE_LENGTH_BYTES:]
class Counter:
class Counter(object):
__value = nonce + [0] * (BLOCK_SIZE_BYTES - NONCE_LENGTH_BYTES)
def next_value(self):

View File

@ -181,20 +181,20 @@ except ImportError: # Python < 3.4
# parameter := attribute "=" value
url = req.get_full_url()
scheme, data = url.split(":", 1)
mediatype, data = data.split(",", 1)
scheme, data = url.split(':', 1)
mediatype, data = data.split(',', 1)
# even base64 encoded data URLs might be quoted so unquote in any case:
data = compat_urllib_parse_unquote_to_bytes(data)
if mediatype.endswith(";base64"):
if mediatype.endswith(';base64'):
data = binascii.a2b_base64(data)
mediatype = mediatype[:-7]
if not mediatype:
mediatype = "text/plain;charset=US-ASCII"
mediatype = 'text/plain;charset=US-ASCII'
headers = email.message_from_string(
"Content-type: %s\nContent-length: %d\n" % (mediatype, len(data)))
'Content-type: %s\nContent-length: %d\n' % (mediatype, len(data)))
return compat_urllib_response.addinfourl(io.BytesIO(data), headers, url)
@ -268,7 +268,7 @@ except ImportError: # Python 2
nv = name_value.split('=', 1)
if len(nv) != 2:
if strict_parsing:
raise ValueError("bad query field: %r" % (name_value,))
raise ValueError('bad query field: %r' % (name_value,))
# Handle case of a control-name with no equal sign
if keep_blank_values:
nv.append('')
@ -466,7 +466,7 @@ if sys.version_info < (2, 7):
if err is not None:
raise err
else:
raise socket.error("getaddrinfo returns an empty list")
raise socket.error('getaddrinfo returns an empty list')
else:
compat_socket_create_connection = socket.create_connection

View File

@ -157,7 +157,7 @@ class FileDownloader(object):
def slow_down(self, start_time, now, byte_counter):
"""Sleep if the download speed is over the rate limit."""
rate_limit = self.params.get('ratelimit', None)
rate_limit = self.params.get('ratelimit')
if rate_limit is None or byte_counter == 0:
return
if now is None:

View File

@ -1,67 +1,59 @@
from __future__ import unicode_literals
import os
import re
from .common import FileDownloader
from ..utils import sanitized_Request
from .fragment import FragmentFD
from ..utils import (
sanitize_open,
encodeFilename,
)
class DashSegmentsFD(FileDownloader):
class DashSegmentsFD(FragmentFD):
"""
Download segments in a DASH manifest
"""
FD_NAME = 'dashsegments'
def real_download(self, filename, info_dict):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
base_url = info_dict['url']
segment_urls = info_dict['segment_urls']
segment_urls = [info_dict['segment_urls'][0]] if self.params.get('test', False) else info_dict['segment_urls']
initialization_url = info_dict.get('initialization_url')
is_test = self.params.get('test', False)
remaining_bytes = self._TEST_FILE_SIZE if is_test else None
byte_counter = 0
ctx = {
'filename': filename,
'total_frags': len(segment_urls) + (1 if initialization_url else 0),
}
def append_url_to_file(outf, target_url, target_name, remaining_bytes=None):
self.to_screen('[DashSegments] %s: Downloading %s' % (info_dict['id'], target_name))
req = sanitized_Request(target_url)
if remaining_bytes is not None:
req.add_header('Range', 'bytes=0-%d' % (remaining_bytes - 1))
data = self.ydl.urlopen(req).read()
if remaining_bytes is not None:
data = data[:remaining_bytes]
outf.write(data)
return len(data)
self._prepare_and_start_frag_download(ctx)
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
with open(tmpfilename, 'wb') as outf:
if info_dict.get('initialization_url'):
append_url_to_file(
outf, combine_url(base_url, info_dict['initialization_url']),
'initialization segment')
for i, segment_url in enumerate(segment_urls):
segment_len = append_url_to_file(
outf, combine_url(base_url, segment_url),
'segment %d / %d' % (i + 1, len(segment_urls)),
remaining_bytes)
byte_counter += segment_len
if remaining_bytes is not None:
remaining_bytes -= segment_len
if remaining_bytes <= 0:
break
segments_filenames = []
self.try_rename(tmpfilename, filename)
def append_url_to_file(target_url, target_filename):
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
ctx['dest_stream'].write(down.read())
down.close()
segments_filenames.append(target_sanitized)
self._hook_progress({
'downloaded_bytes': byte_counter,
'total_bytes': byte_counter,
'filename': filename,
'status': 'finished',
})
if initialization_url:
append_url_to_file(initialization_url, ctx['tmpfilename'] + '-Init')
for i, segment_url in enumerate(segment_urls):
segment_filename = '%s-Seg%d' % (ctx['tmpfilename'], i)
append_url_to_file(segment_url, segment_filename)
self._finish_frag_download(ctx)
for segment_file in segments_filenames:
os.remove(encodeFilename(segment_file))
return True

View File

@ -38,7 +38,7 @@ class FragmentFD(FileDownloader):
'continuedl': True,
'quiet': True,
'noprogress': True,
'ratelimit': self.params.get('ratelimit', None),
'ratelimit': self.params.get('ratelimit'),
'retries': self.params.get('retries', 0),
'test': self.params.get('test', False),
}

View File

@ -140,8 +140,8 @@ class HttpFD(FileDownloader):
if data_len is not None:
data_len = int(data_len) + resume_len
min_data_len = self.params.get("min_filesize", None)
max_data_len = self.params.get("max_filesize", None)
min_data_len = self.params.get('min_filesize')
max_data_len = self.params.get('max_filesize')
if min_data_len is not None and data_len < min_data_len:
self.to_screen('\r[download] File is smaller than min-filesize (%s bytes < %s bytes). Aborting.' % (data_len, min_data_len))
return False

View File

@ -94,15 +94,15 @@ class RtmpFD(FileDownloader):
return proc.returncode
url = info_dict['url']
player_url = info_dict.get('player_url', None)
page_url = info_dict.get('page_url', None)
app = info_dict.get('app', None)
play_path = info_dict.get('play_path', None)
tc_url = info_dict.get('tc_url', None)
flash_version = info_dict.get('flash_version', None)
player_url = info_dict.get('player_url')
page_url = info_dict.get('page_url')
app = info_dict.get('app')
play_path = info_dict.get('play_path')
tc_url = info_dict.get('tc_url')
flash_version = info_dict.get('flash_version')
live = info_dict.get('rtmp_live', False)
conn = info_dict.get('rtmp_conn', None)
protocol = info_dict.get('rtmp_protocol', None)
conn = info_dict.get('rtmp_conn')
protocol = info_dict.get('rtmp_protocol')
real_time = info_dict.get('rtmp_real_time', False)
no_resume = info_dict.get('no_resume', False)
continue_dl = self.params.get('continuedl', True)

View File

@ -20,6 +20,7 @@ from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE
from .animeondemand import AnimeOnDemandIE
from .anitube import AnitubeIE
from .anysex import AnySexIE
from .aol import AolIE
@ -44,6 +45,7 @@ from .arte import (
ArteTVFutureIE,
ArteTVCinemaIE,
ArteTVDDCIE,
ArteTVMagazineIE,
ArteTVEmbedIE,
)
from .atresplayer import AtresPlayerIE
@ -89,6 +91,10 @@ from .camdemy import (
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .canvas import CanvasIE
from .cbc import (
CBCIE,
CBCPlayerIE,
)
from .cbs import CBSIE
from .cbsnews import (
CBSNewsIE,
@ -126,6 +132,7 @@ from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .condenast import CondeNastIE
from .cracked import CrackedIE
from .crackle import CrackleIE
from .criterion import CriterionIE
from .crooksandliars import CrooksAndLiarsIE
from .crunchyroll import (
@ -484,6 +491,7 @@ from .nowtv import (
NowTVIE,
NowTVListIE,
)
from .noz import NozIE
from .npo import (
NPOIE,
NPOLiveIE,
@ -533,6 +541,7 @@ from .planetaplay import PlanetaPlayIE
from .pladform import PladformIE
from .played import PlayedIE
from .playfm import PlayFMIE
from .plays import PlaysTVIE
from .playtvak import PlaytvakIE
from .playvid import PlayvidIE
from .playwire import PlaywireIE
@ -546,6 +555,7 @@ from .pornhd import PornHdIE
from .pornhub import (
PornHubIE,
PornHubPlaylistIE,
PornHubUserVideosIE,
)
from .pornotube import PornotubeIE
from .pornovoisines import PornoVoisinesIE
@ -613,6 +623,7 @@ from .sbs import SBSIE
from .scivee import SciVeeIE
from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE
from .screenjunkies import ScreenJunkiesIE
from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
from .senateisvp import SenateISVPIE
from .servingsys import ServingSysIE
@ -787,7 +798,11 @@ from .twitch import (
TwitchBookmarksIE,
TwitchStreamIE,
)
from .twitter import TwitterCardIE, TwitterIE
from .twitter import (
TwitterCardIE,
TwitterIE,
TwitterAmplifyIE,
)
from .ubu import UbuIE
from .udemy import (
UdemyIE,

View File

@ -28,7 +28,7 @@ class AENetworksIE(InfoExtractor):
'info_dict': {
'id': 'eg47EERs_JsZ',
'ext': 'mp4',
'title': "Winter Is Coming",
'title': 'Winter Is Coming',
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
},
'params': {

View File

@ -0,0 +1,160 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
encode_dict,
ExtractorError,
sanitized_Request,
urlencode_postdata,
)
class AnimeOnDemandIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?anime-on-demand\.de/anime/(?P<id>\d+)'
_LOGIN_URL = 'https://www.anime-on-demand.de/users/sign_in'
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
_NETRC_MACHINE = 'animeondemand'
_TEST = {
'url': 'https://www.anime-on-demand.de/anime/161',
'info_dict': {
'id': '161',
'title': 'Grimgar, Ashes and Illusions (OmU)',
'description': 'md5:6681ce3c07c7189d255ac6ab23812d31',
},
'playlist_mincount': 4,
}
def _login(self):
(username, password) = self._get_login_info()
if username is None:
return
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
login_form = self._form_hidden_inputs('new_user', login_page)
login_form.update({
'user[login]': username,
'user[password]': password,
})
post_url = self._search_regex(
r'<form[^>]+action=(["\'])(?P<url>.+?)\1', login_page,
'post url', default=self._LOGIN_URL, group='url')
if not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
request = sanitized_Request(
post_url, urlencode_postdata(encode_dict(login_form)))
request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage(
request, None, 'Logging in as %s' % username)
if all(p not in response for p in ('>Logout<', 'href="/users/sign_out"')):
error = self._search_regex(
r'<p class="alert alert-danger">(.+?)</p>',
response, 'error', default=None)
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in')
def _real_initialize(self):
self._login()
def _real_extract(self, url):
anime_id = self._match_id(url)
webpage = self._download_webpage(url, anime_id)
if 'data-playlist=' not in webpage:
self._download_webpage(
self._APPLY_HTML5_URL, anime_id,
'Activating HTML5 beta', 'Unable to apply HTML5 beta')
webpage = self._download_webpage(url, anime_id)
csrf_token = self._html_search_meta(
'csrf-token', webpage, 'csrf token', fatal=True)
anime_title = self._html_search_regex(
r'(?s)<h1[^>]+itemprop="name"[^>]*>(.+?)</h1>',
webpage, 'anime name')
anime_description = self._html_search_regex(
r'(?s)<div[^>]+itemprop="description"[^>]*>(.+?)</div>',
webpage, 'anime description', default=None)
entries = []
for episode_html in re.findall(r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage):
m = re.search(
r'class="episodebox-title"[^>]+title="Episode (?P<number>\d+) - (?P<title>.+?)"', episode_html)
if not m:
continue
episode_number = int(m.group('number'))
episode_title = m.group('title')
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
formats = []
playlist_url = self._search_regex(
r'data-playlist=(["\'])(?P<url>.+?)\1',
episode_html, 'data playlist', default=None, group='url')
if playlist_url:
request = sanitized_Request(
compat_urlparse.urljoin(url, playlist_url),
headers={
'X-Requested-With': 'XMLHttpRequest',
'X-CSRF-Token': csrf_token,
'Referer': url,
'Accept': 'application/json, text/javascript, */*; q=0.01',
})
playlist = self._download_json(
request, video_id, 'Downloading playlist JSON', fatal=False)
if playlist:
playlist = playlist['playlist'][0]
title = playlist['title']
description = playlist.get('description')
for source in playlist.get('sources', []):
file_ = source.get('file')
if file_ and determine_ext(file_) == 'm3u8':
formats = self._extract_m3u8_formats(
file_, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
if formats:
f = common_info.copy()
f.update({
'title': title,
'description': description,
'formats': formats,
})
entries.append(f)
m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
episode_html)
if m:
f = common_info.copy()
f.update({
'id': '%s-teaser' % f['id'],
'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')),
})
entries.append(f)
return self.playlist_result(entries, anime_id, anime_title, anime_description)

View File

@ -12,7 +12,7 @@ from ..utils import (
class AppleTrailersIE(InfoExtractor):
IE_NAME = 'appletrailers'
_VALID_URL = r'https?://(?:www\.)?trailers\.apple\.com/(?:trailers|ca)/(?P<company>[^/]+)/(?P<movie>[^/]+)'
_VALID_URL = r'https?://(?:www\.|movie)?trailers\.apple\.com/(?:trailers|ca)/(?P<company>[^/]+)/(?P<movie>[^/]+)'
_TESTS = [{
'url': 'http://trailers.apple.com/trailers/wb/manofsteel/',
'info_dict': {
@ -73,6 +73,9 @@ class AppleTrailersIE(InfoExtractor):
}, {
'url': 'http://trailers.apple.com/ca/metropole/autrui/',
'only_matching': True,
}, {
'url': 'http://movietrailers.apple.com/trailers/focus_features/kuboandthetwostrings/',
'only_matching': True,
}]
_JSON_RE = r'iTunes.playURL\((.*?)\);'

View File

@ -23,7 +23,7 @@ from ..utils import (
class ArteTvIE(InfoExtractor):
_VALID_URL = r'http://videos\.arte\.tv/(?P<lang>fr|de)/.*-(?P<id>.*?)\.html'
_VALID_URL = r'http://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
IE_NAME = 'arte.tv'
def _real_extract(self, url):
@ -63,7 +63,7 @@ class ArteTvIE(InfoExtractor):
class ArteTVPlus7IE(InfoExtractor):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&+])'
@classmethod
def _extract_url_info(cls, url):
@ -102,13 +102,32 @@ class ArteTVPlus7IE(InfoExtractor):
iframe_url = find_iframe_url(webpage, None)
if not iframe_url:
embed_url = self._html_search_regex(
r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url')
player = self._download_json(
embed_url, video_id, 'Downloading player page')
iframe_url = find_iframe_url(player['html'])
json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
return self._extract_from_json_url(json_url, video_id, lang)
r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
if embed_url:
player = self._download_json(
embed_url, video_id, 'Downloading player page')
iframe_url = find_iframe_url(player['html'])
# en and es URLs produce react-based pages with different layout (e.g.
# http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
if not iframe_url:
program = self._search_regex(
r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
webpage, 'program', default=None)
if program:
embed_html = self._parse_json(program, video_id)
if embed_html:
iframe_url = find_iframe_url(embed_html['embed_html'])
if iframe_url:
json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
if json_url:
return self._extract_from_json_url(json_url, video_id, lang)
# Differend kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
embed_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'embed url', group='url')
return self.url_result(embed_url)
def _extract_from_json_url(self, json_url, video_id, lang):
info = self._download_json(json_url, video_id)
@ -116,7 +135,7 @@ class ArteTVPlus7IE(InfoExtractor):
upload_date_str = player_info.get('shootingDate')
if not upload_date_str:
upload_date_str = player_info.get('VDA', '').split(' ')[0]
upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0]
title = player_info['VTI'].strip()
subtitle = player_info.get('VSU', '').strip()
@ -132,27 +151,30 @@ class ArteTVPlus7IE(InfoExtractor):
}
qfunc = qualities(['HQ', 'MQ', 'EQ', 'SQ'])
LANGS = {
'fr': 'F',
'de': 'A',
'en': 'E[ANG]',
'es': 'E[ESP]',
}
formats = []
for format_id, format_dict in player_info['VSR'].items():
f = dict(format_dict)
versionCode = f.get('versionCode')
langcode = {
'fr': 'F',
'de': 'A',
}.get(lang, lang)
lang_rexs = [r'VO?%s' % langcode, r'VO?.-ST%s' % langcode]
lang_pref = (
None if versionCode is None else (
10 if any(re.match(r, versionCode) for r in lang_rexs)
else -10))
langcode = LANGS.get(lang, lang)
lang_rexs = [r'VO?%s-' % re.escape(langcode), r'VO?.-ST%s$' % re.escape(langcode)]
lang_pref = None
if versionCode:
matched_lang_rexs = [r for r in lang_rexs if re.match(r, versionCode)]
lang_pref = -10 if not matched_lang_rexs else 10 * len(matched_lang_rexs)
source_pref = 0
if versionCode is not None:
# The original version with subtitles has lower relevance
if re.match(r'VO-ST(F|A)', versionCode):
if re.match(r'VO-ST(F|A|E)', versionCode):
source_pref -= 10
# The version with sourds/mal subtitles has also lower relevance
elif re.match(r'VO?(F|A)-STM\1', versionCode):
elif re.match(r'VO?(F|A|E)-STM\1', versionCode):
source_pref -= 9
format = {
'format_id': format_id,
@ -185,7 +207,7 @@ class ArteTVPlus7IE(InfoExtractor):
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de)/(?:magazine?/)?(?P<id>[^?#]+)'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:magazine?/)?(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
@ -209,7 +231,7 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(?P<id>.+)'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://future.arte.tv/fr/info-sciences/les-ecrevisses-aussi-sont-anxieuses',
@ -217,6 +239,7 @@ class ArteTVFutureIE(ArteTVPlus7IE):
'id': '050940-028-A',
'ext': 'mp4',
'title': 'Les écrevisses aussi peuvent être anxieuses',
'upload_date': '20140902',
},
}, {
'url': 'http://future.arte.tv/fr/la-science-est-elle-responsable',
@ -226,7 +249,7 @@ class ArteTVFutureIE(ArteTVPlus7IE):
class ArteTVDDCIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:ddc'
_VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>.+)'
_VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)'
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
@ -244,7 +267,7 @@ class ArteTVDDCIE(ArteTVPlus7IE):
class ArteTVConcertIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:concert'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>de|fr)/(?P<id>.+)'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
@ -261,7 +284,7 @@ class ArteTVConcertIE(ArteTVPlus7IE):
class ArteTVCinemaIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:cinema'
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>de|fr)/(?P<id>.+)'
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
_TEST = {
'url': 'http://cinema.arte.tv/de/node/38291',
@ -276,6 +299,37 @@ class ArteTVCinemaIE(ArteTVPlus7IE):
}
class ArteTVMagazineIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:magazine'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/magazine/[^/]+/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
# Embedded via <iframe src="http://www.arte.tv/arte_vp/index.php?json_url=..."
'url': 'http://www.arte.tv/magazine/trepalium/fr/entretien-avec-le-realisateur-vincent-lannoo-trepalium',
'md5': '2a9369bcccf847d1c741e51416299f25',
'info_dict': {
'id': '065965-000-A',
'ext': 'mp4',
'title': 'Trepalium - Extrait Ep.01',
'upload_date': '20160121',
},
}, {
# Embedded via <iframe src="http://www.arte.tv/guide/fr/embed/054813-004-A/medium"
'url': 'http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium',
'md5': 'fedc64fc7a946110fe311634e79782ca',
'info_dict': {
'id': '054813-004_PLUS7-F',
'ext': 'mp4',
'title': 'Trepalium (4/6)',
'description': 'md5:10057003c34d54e95350be4f9b05cb40',
'upload_date': '20160218',
},
}, {
'url': 'http://www.arte.tv/magazine/metropolis/de/frank-woeste-german-paris-metropolis',
'only_matching': True,
}]
class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)

View File

@ -86,7 +86,7 @@ class BBCCoUkIE(InfoExtractor):
'id': 'b00yng1d',
'ext': 'flv',
'title': 'The Voice UK: Series 3: Blind Auditions 5',
'description': "Emma Willis and Marvin Humes present the fifth set of blind auditions in the singing competition, as the coaches continue to build their teams based on voice alone.",
'description': 'Emma Willis and Marvin Humes present the fifth set of blind auditions in the singing competition, as the coaches continue to build their teams based on voice alone.',
'duration': 5100,
},
'params': {

View File

@ -6,7 +6,7 @@ from ..utils import float_or_none
class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?canvas\.be/video/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
_TESTS = [{
'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
'md5': 'ea838375a547ac787d4064d8c7860a6c',
'info_dict': {
@ -18,7 +18,27 @@ class CanvasIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 49.02,
}
}
}, {
# with subtitles
'url': 'http://www.canvas.be/video/panorama/2016/pieter-0167',
'info_dict': {
'id': 'mz-ast-5240ff21-2d30-4101-bba6-92b5ec67c625',
'display_id': 'pieter-0167',
'ext': 'mp4',
'title': 'Pieter 0167',
'description': 'md5:943cd30f48a5d29ba02c3a104dc4ec4e',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 2553.08,
'subtitles': {
'nl': [{
'ext': 'vtt',
}],
},
},
'params': {
'skip_download': True,
}
}]
def _real_extract(self, url):
display_id = self._match_id(url)
@ -54,6 +74,14 @@ class CanvasIE(InfoExtractor):
})
self._sort_formats(formats)
subtitles = {}
subtitle_urls = data.get('subtitleUrls')
if isinstance(subtitle_urls, list):
for subtitle in subtitle_urls:
subtitle_url = subtitle.get('url')
if subtitle_url and subtitle.get('type') == 'CLOSED':
subtitles.setdefault('nl', []).append({'url': subtitle_url})
return {
'id': video_id,
'display_id': display_id,
@ -62,4 +90,5 @@ class CanvasIE(InfoExtractor):
'formats': formats,
'duration': float_or_none(data.get('duration'), 1000),
'thumbnail': data.get('posterImageUrl'),
'subtitles': subtitles,
}

113
youtube_dl/extractor/cbc.py Normal file
View File

@ -0,0 +1,113 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import js_to_json
class CBCIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
# with mediaId
'url': 'http://www.cbc.ca/22minutes/videos/clips-season-23/don-cherry-play-offs',
'info_dict': {
'id': '2682904050',
'ext': 'flv',
'title': 'Don Cherry All-Stars',
'description': 'Don Cherry has a bee in his bonnet about AHL player John Scott because that guys got heart.',
'timestamp': 1454475540,
'upload_date': '20160203',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# with clipId
'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live',
'info_dict': {
'id': '2487345465',
'ext': 'flv',
'title': 'Robin Williams freestyles on 90 Minutes Live',
'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.',
'upload_date': '19700101',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# multiple iframes
'url': 'http://www.cbc.ca/natureofthings/blog/birds-eye-view-from-vancouvers-burrard-street-bridge-how-we-got-the-shot',
'playlist': [{
'info_dict': {
'id': '2680832926',
'ext': 'flv',
'title': 'An Eagle\'s-Eye View Off Burrard Bridge',
'description': 'Hercules the eagle flies from Vancouver\'s Burrard Bridge down to a nearby park with a mini-camera strapped to his back.',
'upload_date': '19700101',
},
}, {
'info_dict': {
'id': '2658915080',
'ext': 'flv',
'title': 'Fly like an eagle!',
'description': 'Eagle equipped with a mini camera flies from the world\'s tallest tower',
'upload_date': '19700101',
},
}],
'params': {
# rtmp download
'skip_download': True,
},
}]
@classmethod
def suitable(cls, url):
return False if CBCPlayerIE.suitable(url) else super(CBCIE, cls).suitable(url)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
player_init = self._search_regex(
r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage, 'player init',
default=None)
if player_init:
player_info = self._parse_json(player_init, display_id, js_to_json)
media_id = player_info.get('mediaId')
if not media_id:
clip_id = player_info['clipId']
media_id = self._download_json(
'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
clip_id)['entries'][0]['id'].split('/')[-1]
return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
else:
entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]
return self.playlist_result(entries)
class CBCPlayerIE(InfoExtractor):
_VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)'
_TEST = {
'url': 'http://www.cbc.ca/player/play/2683190193',
'info_dict': {
'id': '2683190193',
'ext': 'flv',
'title': 'Gerry Runs a Sweat Shop',
'description': 'md5:b457e1c01e8ff408d9d801c1c2cd29b0',
'timestamp': 1455067800,
'upload_date': '20160210',
},
'params': {
# rtmp download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'http://feed.theplatform.com/f/ExhSPC/vms_5akSXx4Ng_Zn?byGuid=%s' % video_id,
'ThePlatformFeed', video_id)

View File

@ -3,7 +3,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from ..utils import parse_duration
from ..utils import (
parse_duration,
find_xpath_attr,
)
class CBSNewsIE(ThePlatformIE):
@ -46,6 +49,15 @@ class CBSNewsIE(ThePlatformIE):
},
]
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return {
'en': [{
'ext': 'ttml',
'url': closed_caption_e.attrib['value'],
}]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _real_extract(self, url):
video_id = self._match_id(url)
@ -61,12 +73,6 @@ class CBSNewsIE(ThePlatformIE):
thumbnail = item.get('mediaImage') or item.get('thumbnail')
subtitles = {}
if 'mpxRefId' in video_info:
subtitles['en'] = [{
'ext': 'ttml',
'url': 'http://www.cbsnews.com/videos/captions/%s.adb_xml' % video_info['mpxRefId'],
}]
formats = []
for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
pid = item.get('media' + format_id)

View File

@ -45,7 +45,7 @@ class CCCIE(InfoExtractor):
title = self._html_search_regex(
r'(?s)<h1>(.*?)</h1>', webpage, 'title')
description = self._html_search_regex(
r"(?s)<h3>About</h3>(.+?)<h3>",
r'(?s)<h3>About</h3>(.+?)<h3>',
webpage, 'description', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r"(?s)<span[^>]+class='[^']*fa-calendar-o'[^>]*>(.+?)</span>",

View File

@ -177,16 +177,16 @@ class CeskaTelevizeIE(InfoExtractor):
for divider in [1000, 60, 60, 100]:
components.append(msec % divider)
msec //= divider
return "{3:02}:{2:02}:{1:02},{0:03}".format(*components)
return '{3:02}:{2:02}:{1:02},{0:03}'.format(*components)
def _fix_subtitle(subtitle):
for line in subtitle.splitlines():
m = re.match(r"^\s*([0-9]+);\s*([0-9]+)\s+([0-9]+)\s*$", line)
m = re.match(r'^\s*([0-9]+);\s*([0-9]+)\s+([0-9]+)\s*$', line)
if m:
yield m.group(1)
start, stop = (_msectotimecode(int(t)) for t in m.groups()[1:])
yield "{0} --> {1}".format(start, stop)
yield '{0} --> {1}'.format(start, stop)
else:
yield line
return "\r\n".join(_fix_subtitle(subtitles))
return '\r\n'.join(_fix_subtitle(subtitles))

View File

@ -26,14 +26,14 @@ class CNNIE(InfoExtractor):
'upload_date': '20130609',
},
}, {
"url": "http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29",
"md5": "b5cc60c60a3477d185af8f19a2a26f4e",
"info_dict": {
'url': 'http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29',
'md5': 'b5cc60c60a3477d185af8f19a2a26f4e',
'info_dict': {
'id': 'us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology',
'ext': 'mp4',
"title": "Student's epic speech stuns new freshmen",
"description": "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\"",
"upload_date": "20130821",
'title': "Student's epic speech stuns new freshmen",
'description': "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\"",
'upload_date': '20130821',
}
}, {
'url': 'http://www.cnn.com/video/data/2.0/video/living/2014/12/22/growing-america-nashville-salemtown-board-episode-1.hln.html',

View File

@ -46,9 +46,9 @@ class CollegeRamaIE(InfoExtractor):
video_id = self._match_id(url)
player_options_request = {
"getPlayerOptionsRequest": {
"ResourceId": video_id,
"QueryString": "",
'getPlayerOptionsRequest': {
'ResourceId': video_id,
'QueryString': '',
}
}

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
parse_duration,
@ -14,14 +15,13 @@ class ComCarCoffIE(InfoExtractor):
_TESTS = [{
'url': 'http://comediansincarsgettingcoffee.com/miranda-sings-happy-thanksgiving-miranda/',
'info_dict': {
'id': 'miranda-sings-happy-thanksgiving-miranda',
'id': '2494164',
'ext': 'mp4',
'upload_date': '20141127',
'timestamp': 1417107600,
'duration': 1232,
'title': 'Happy Thanksgiving Miranda',
'description': 'Jerry Seinfeld and his special guest Miranda Sings cruise around town in search of coffee, complaining and apologizing along the way.',
'thumbnail': 'http://ccc.crackle.com/images/s5e4_thumb.jpg',
},
'params': {
'skip_download': 'requires ffmpeg',
@ -39,15 +39,14 @@ class ComCarCoffIE(InfoExtractor):
r'window\.app\s*=\s*({.+?});\n', webpage, 'full data json'),
display_id)['videoData']
video_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(video_id) or full_data['singleshots'][video_id]
display_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id]
video_id = compat_str(video_data['mediaId'])
thumbnails = [{
'url': video_data['images']['thumb'],
}, {
'url': video_data['images']['poster'],
}]
formats = self._extract_m3u8_formats(
video_data['mediaUrl'], video_id, ext='mp4')
timestamp = int_or_none(video_data.get('pubDateTime')) or parse_iso8601(
video_data.get('pubDate'))
@ -55,6 +54,8 @@ class ComCarCoffIE(InfoExtractor):
video_data.get('duration'))
return {
'_type': 'url_transparent',
'url': 'crackle:%s' % video_id,
'id': video_id,
'display_id': display_id,
'title': video_data['title'],
@ -62,6 +63,7 @@ class ComCarCoffIE(InfoExtractor):
'timestamp': timestamp,
'duration': duration,
'thumbnails': thumbnails,
'formats': formats,
'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episode')),
'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),
}

View File

@ -16,11 +16,11 @@ from ..utils import (
class ComedyCentralIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/
(video-clips|episodes|cc-studios|video-collections|full-episodes)
(video-clips|episodes|cc-studios|video-collections|full-episodes|shows)
/(?P<title>.*)'''
_FEED_URL = 'http://comedycentral.com/feeds/mrss/'
_TEST = {
_TESTS = [{
'url': 'http://www.cc.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
'md5': 'c4f48e9eda1b16dd10add0744344b6d8',
'info_dict': {
@ -29,7 +29,10 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
'title': 'CC:Stand-Up|Greg Fitzsimmons: Life on Stage|Uncensored - Too Good of a Mother',
'description': 'After a certain point, breastfeeding becomes c**kblocking.',
},
}
}, {
'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/interviews/6yx39d/exclusive-rand-paul-extended-interview',
'only_matching': True,
}]
class ComedyCentralShowsIE(MTVServicesInfoExtractor):
@ -192,7 +195,7 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
if len(altMovieParams) == 0:
raise ExtractorError('unable to find Flash URL in webpage ' + url)
else:
mMovieParams = [("http://media.mtvnservices.com/" + altMovieParams[0], altMovieParams[0])]
mMovieParams = [('http://media.mtvnservices.com/' + altMovieParams[0], altMovieParams[0])]
uri = mMovieParams[0][1]
# Correct cc.com in uri

View File

@ -46,6 +46,7 @@ from ..utils import (
xpath_with_ns,
determine_protocol,
parse_duration,
mimetype2ext,
)
@ -636,7 +637,7 @@ class InfoExtractor(object):
downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data
if downloader_params.get('username', None) is not None:
if downloader_params.get('username') is not None:
username = downloader_params['username']
password = downloader_params['password']
elif downloader_params.get('usenetrc', False):
@ -663,7 +664,7 @@ class InfoExtractor(object):
return None
downloader_params = self._downloader.params
if downloader_params.get('twofactor', None) is not None:
if downloader_params.get('twofactor') is not None:
return downloader_params['twofactor']
return compat_getpass('Type %s and press [Return]: ' % note)
@ -744,7 +745,7 @@ class InfoExtractor(object):
'mature': 17,
'restricted': 19,
}
return RATING_TABLE.get(rating.lower(), None)
return RATING_TABLE.get(rating.lower())
def _family_friendly_search(self, html):
# See http://schema.org/VideoObject
@ -759,7 +760,7 @@ class InfoExtractor(object):
'0': 18,
'false': 18,
}
return RATING_TABLE.get(family_friendly.lower(), None)
return RATING_TABLE.get(family_friendly.lower())
def _twitter_search_player(self, html):
return self._html_search_meta('twitter:player', html,
@ -899,6 +900,16 @@ class InfoExtractor(object):
item='%s video format' % f.get('format_id') if f.get('format_id') else 'video'),
formats)
@staticmethod
def _remove_duplicate_formats(formats):
format_urls = set()
unique_formats = []
for f in formats:
if f['url'] not in format_urls:
format_urls.add(f['url'])
unique_formats.append(f)
formats[:] = unique_formats
def _is_valid_url(self, url, video_id, item='video'):
url = self._proto_relative_url(url, scheme='http:')
# For now assume non HTTP(S) URLs always valid
@ -1186,11 +1197,13 @@ class InfoExtractor(object):
http_count = 0
m3u8_count = 0
srcs = []
videos = smil.findall(self._xpath_ns('.//video', namespace))
for video in videos:
src = video.get('src')
if not src:
if not src or src in srcs:
continue
srcs.append(src)
bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
filesize = int_or_none(video.get('size') or video.get('fileSize'))
@ -1222,6 +1235,7 @@ class InfoExtractor(object):
continue
src_url = src if src.startswith('http') else compat_urlparse.urljoin(base, src)
src_url = src_url.strip()
if proto == 'm3u8' or src_ext == 'm3u8':
m3u8_formats = self._extract_m3u8_formats(
@ -1267,21 +1281,14 @@ class InfoExtractor(object):
return formats
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
urls = []
subtitles = {}
for num, textstream in enumerate(smil.findall(self._xpath_ns('.//textstream', namespace))):
src = textstream.get('src')
if not src:
if not src or src in urls:
continue
ext = textstream.get('ext') or determine_ext(src)
if not ext:
type_ = textstream.get('type')
SUBTITLES_TYPES = {
'text/vtt': 'vtt',
'text/srt': 'srt',
'application/smptett+xml': 'tt',
}
if type_ in SUBTITLES_TYPES:
ext = SUBTITLES_TYPES[type_]
urls.append(src)
ext = textstream.get('ext') or determine_ext(src) or mimetype2ext(textstream.get('type'))
lang = textstream.get('systemLanguage') or textstream.get('systemLanguageName') or textstream.get('lang') or subtitles_lang
subtitles.setdefault(lang, []).append({
'url': src,
@ -1430,12 +1437,16 @@ class InfoExtractor(object):
base_url = base_url_e.text + base_url
if re.match(r'^https?://', base_url):
break
if not re.match(r'^https?://', base_url):
if mpd_base_url and not re.match(r'^https?://', base_url):
if not mpd_base_url.endswith('/') and not base_url.startswith('/'):
mpd_base_url += '/'
base_url = mpd_base_url + base_url
representation_id = representation_attrib.get('id')
lang = representation_attrib.get('lang')
url_el = representation.find(_add_ns('BaseURL'))
filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
f = {
'format_id': mpd_id or representation_id,
'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
'url': base_url,
'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')),
@ -1446,6 +1457,7 @@ class InfoExtractor(object):
'acodec': 'none' if content_type == 'video' else representation_attrib.get('codecs'),
'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None,
'format_note': 'DASH %s' % content_type,
'filesize': filesize,
}
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
@ -1487,7 +1499,7 @@ class InfoExtractor(object):
def _live_title(self, name):
""" Generate the title for a live video """
now = datetime.datetime.now()
now_str = now.strftime("%Y-%m-%d %H:%M")
now_str = now.strftime('%Y-%m-%d %H:%M')
return name + ' ' + now_str
def _int(self, v, name, fatal=False, **kwargs):
@ -1560,7 +1572,7 @@ class InfoExtractor(object):
return {}
def _get_subtitles(self, *args, **kwargs):
raise NotImplementedError("This method must be implemented by subclasses")
raise NotImplementedError('This method must be implemented by subclasses')
@staticmethod
def _merge_subtitle_items(subtitle_list1, subtitle_list2):
@ -1586,7 +1598,7 @@ class InfoExtractor(object):
return {}
def _get_automatic_captions(self, *args, **kwargs):
raise NotImplementedError("This method must be implemented by subclasses")
raise NotImplementedError('This method must be implemented by subclasses')
class SearchInfoExtractor(InfoExtractor):
@ -1626,7 +1638,7 @@ class SearchInfoExtractor(InfoExtractor):
def _get_n_results(self, query, n):
"""Get a specified number of results for a query"""
raise NotImplementedError("This method must be implemented by subclasses")
raise NotImplementedError('This method must be implemented by subclasses')
@property
def SEARCH_KEY(self):

View File

@ -0,0 +1,95 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class CrackleIE(InfoExtractor):
_VALID_URL = r'(?:crackle:|https?://(?:www\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
_TEST = {
'url': 'http://www.crackle.com/the-art-of-more/2496419',
'info_dict': {
'id': '2496419',
'ext': 'mp4',
'title': 'Heavy Lies the Head',
'description': 'md5:bb56aa0708fe7b9a4861535f15c3abca',
},
'params': {
# m3u8 download
'skip_download': True,
}
}
# extracted from http://legacyweb-us.crackle.com/flash/QueryReferrer.ashx
_SUBTITLE_SERVER = 'http://web-us-az.crackle.com'
_UPLYNK_OWNER_ID = 'e8773f7770a44dbd886eee4fca16a66b'
_THUMBNAIL_TEMPLATE = 'http://images-us-am.crackle.com/%stnl_1920x1080.jpg?ts=20140107233116?c=635333335057637614'
# extracted from http://legacyweb-us.crackle.com/flash/ReferrerRedirect.ashx
_MEDIA_FILE_SLOTS = {
'c544.flv': {
'width': 544,
'height': 306,
},
'360p.mp4': {
'width': 640,
'height': 360,
},
'480p.mp4': {
'width': 852,
'height': 478,
},
'480p_1mbps.mp4': {
'width': 852,
'height': 478,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
item = self._download_xml(
'http://legacyweb-us.crackle.com/app/revamp/vidwallcache.aspx?flags=-1&fm=%s' % video_id,
video_id).find('i')
title = item.attrib['t']
thumbnail = None
subtitles = {}
formats = self._extract_m3u8_formats(
'http://content.uplynk.com/ext/%s/%s.m3u8' % (self._UPLYNK_OWNER_ID, video_id),
video_id, 'mp4', m3u8_id='hls', fatal=None)
path = item.attrib.get('p')
if path:
thumbnail = self._THUMBNAIL_TEMPLATE % path
http_base_url = 'http://ahttp.crackle.com/' + path
for mfs_path, mfs_info in self._MEDIA_FILE_SLOTS.items():
formats.append({
'url': http_base_url + mfs_path,
'format_id': 'http-' + mfs_path.split('.')[0],
'width': mfs_info['width'],
'height': mfs_info['height'],
})
for cc in item.findall('cc'):
locale = cc.attrib.get('l')
v = cc.attrib.get('v')
if locale and v:
if locale not in subtitles:
subtitles[locale] = []
subtitles[locale] = [{
'url': '%s/%s%s_%s.xml' % (self._SUBTITLE_SERVER, path, locale, v),
'ext': 'ttml',
}]
self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
return {
'id': video_id,
'title': title,
'description': item.attrib.get('d'),
'duration': int(item.attrib.get('r'), 16) if item.attrib.get('r') else None,
'series': item.attrib.get('sn'),
'season_number': int_or_none(item.attrib.get('se')),
'episode_number': int_or_none(item.attrib.get('ep')),
'thumbnail': thumbnail,
'subtitles': subtitles,
'formats': formats,
}

View File

@ -180,40 +180,40 @@ class CrunchyrollIE(CrunchyrollBaseIE):
return assvalue
output = '[Script Info]\n'
output += 'Title: %s\n' % sub_root.attrib["title"]
output += 'Title: %s\n' % sub_root.attrib['title']
output += 'ScriptType: v4.00+\n'
output += 'WrapStyle: %s\n' % sub_root.attrib["wrap_style"]
output += 'PlayResX: %s\n' % sub_root.attrib["play_res_x"]
output += 'PlayResY: %s\n' % sub_root.attrib["play_res_y"]
output += 'WrapStyle: %s\n' % sub_root.attrib['wrap_style']
output += 'PlayResX: %s\n' % sub_root.attrib['play_res_x']
output += 'PlayResY: %s\n' % sub_root.attrib['play_res_y']
output += """ScaledBorderAndShadow: yes
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
"""
for style in sub_root.findall('./styles/style'):
output += 'Style: ' + style.attrib["name"]
output += ',' + style.attrib["font_name"]
output += ',' + style.attrib["font_size"]
output += ',' + style.attrib["primary_colour"]
output += ',' + style.attrib["secondary_colour"]
output += ',' + style.attrib["outline_colour"]
output += ',' + style.attrib["back_colour"]
output += ',' + ass_bool(style.attrib["bold"])
output += ',' + ass_bool(style.attrib["italic"])
output += ',' + ass_bool(style.attrib["underline"])
output += ',' + ass_bool(style.attrib["strikeout"])
output += ',' + style.attrib["scale_x"]
output += ',' + style.attrib["scale_y"]
output += ',' + style.attrib["spacing"]
output += ',' + style.attrib["angle"]
output += ',' + style.attrib["border_style"]
output += ',' + style.attrib["outline"]
output += ',' + style.attrib["shadow"]
output += ',' + style.attrib["alignment"]
output += ',' + style.attrib["margin_l"]
output += ',' + style.attrib["margin_r"]
output += ',' + style.attrib["margin_v"]
output += ',' + style.attrib["encoding"]
output += 'Style: ' + style.attrib['name']
output += ',' + style.attrib['font_name']
output += ',' + style.attrib['font_size']
output += ',' + style.attrib['primary_colour']
output += ',' + style.attrib['secondary_colour']
output += ',' + style.attrib['outline_colour']
output += ',' + style.attrib['back_colour']
output += ',' + ass_bool(style.attrib['bold'])
output += ',' + ass_bool(style.attrib['italic'])
output += ',' + ass_bool(style.attrib['underline'])
output += ',' + ass_bool(style.attrib['strikeout'])
output += ',' + style.attrib['scale_x']
output += ',' + style.attrib['scale_y']
output += ',' + style.attrib['spacing']
output += ',' + style.attrib['angle']
output += ',' + style.attrib['border_style']
output += ',' + style.attrib['outline']
output += ',' + style.attrib['shadow']
output += ',' + style.attrib['alignment']
output += ',' + style.attrib['margin_l']
output += ',' + style.attrib['margin_r']
output += ',' + style.attrib['margin_v']
output += ',' + style.attrib['encoding']
output += '\n'
output += """
@ -222,15 +222,15 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
"""
for event in sub_root.findall('./events/event'):
output += 'Dialogue: 0'
output += ',' + event.attrib["start"]
output += ',' + event.attrib["end"]
output += ',' + event.attrib["style"]
output += ',' + event.attrib["name"]
output += ',' + event.attrib["margin_l"]
output += ',' + event.attrib["margin_r"]
output += ',' + event.attrib["margin_v"]
output += ',' + event.attrib["effect"]
output += ',' + event.attrib["text"]
output += ',' + event.attrib['start']
output += ',' + event.attrib['end']
output += ',' + event.attrib['style']
output += ',' + event.attrib['name']
output += ',' + event.attrib['margin_l']
output += ',' + event.attrib['margin_r']
output += ',' + event.attrib['margin_v']
output += ',' + event.attrib['effect']
output += ',' + event.attrib['text']
output += '\n'
return output
@ -376,7 +376,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
IE_NAME = "crunchyroll:playlist"
IE_NAME = 'crunchyroll:playlist'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?!(?:news|anime-news|library|forum|launchcalendar|lineup|store|comics|freetrial|login))(?P<id>[\w\-]+))/?(?:\?|$)'
_TESTS = [{

View File

@ -122,10 +122,13 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
description = self._og_search_description(webpage) or self._html_search_meta(
'description', webpage, 'description')
view_count = str_to_int(self._search_regex(
[r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:(\d+)"',
r'video_views_count[^>]+>\s+([\d\.,]+)'],
webpage, 'view count', fatal=False))
view_count_str = self._search_regex(
(r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([\s\d,.]+)"',
r'video_views_count[^>]+>\s+([\s\d\,.]+)'),
webpage, 'view count', fatal=False)
if view_count_str:
view_count_str = re.sub(r'\s', '', view_count_str)
view_count = str_to_int(view_count_str)
comment_count = int_or_none(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
webpage, 'comment count', fatal=False))
@ -396,13 +399,13 @@ class DailymotionCloudIE(DailymotionBaseInfoExtractor):
}]
@classmethod
def _extract_dmcloud_url(self, webpage):
mobj = re.search(r'<iframe[^>]+src=[\'"](%s)[\'"]' % self._VALID_EMBED_URL, webpage)
def _extract_dmcloud_url(cls, webpage):
mobj = re.search(r'<iframe[^>]+src=[\'"](%s)[\'"]' % cls._VALID_EMBED_URL, webpage)
if mobj:
return mobj.group(1)
mobj = re.search(
r'<input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=[\'"](%s)[\'"]' % self._VALID_EMBED_URL,
r'<input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=[\'"](%s)[\'"]' % cls._VALID_EMBED_URL,
webpage)
if mobj:
return mobj.group(1)

View File

@ -87,7 +87,7 @@ class DRBonanzaIE(InfoExtractor):
formats = []
for file in info['Files']:
if info['Type'] == "Video":
if info['Type'] == 'Video':
if file['Type'] in video_types:
format = parse_filename_info(file['Location'])
format.update({
@ -101,10 +101,10 @@ class DRBonanzaIE(InfoExtractor):
if '/bonanza/' in rtmp_url:
format['play_path'] = rtmp_url.split('/bonanza/')[1]
formats.append(format)
elif file['Type'] == "Thumb":
elif file['Type'] == 'Thumb':
thumbnail = file['Location']
elif info['Type'] == "Audio":
if file['Type'] == "Audio":
elif info['Type'] == 'Audio':
if file['Type'] == 'Audio':
format = parse_filename_info(file['Location'])
format.update({
'url': file['Location'],
@ -112,7 +112,7 @@ class DRBonanzaIE(InfoExtractor):
'vcodec': 'none',
})
formats.append(format)
elif file['Type'] == "Thumb":
elif file['Type'] == 'Thumb':
thumbnail = file['Location']
description = '%s\n%s\n%s\n' % (

View File

@ -17,85 +17,85 @@ class EightTracksIE(InfoExtractor):
IE_NAME = '8tracks'
_VALID_URL = r'https?://8tracks\.com/(?P<user>[^/]+)/(?P<id>[^/#]+)(?:#.*)?$'
_TEST = {
"name": "EightTracks",
"url": "http://8tracks.com/ytdl/youtube-dl-test-tracks-a",
"info_dict": {
'name': 'EightTracks',
'url': 'http://8tracks.com/ytdl/youtube-dl-test-tracks-a',
'info_dict': {
'id': '1336550',
'display_id': 'youtube-dl-test-tracks-a',
"description": "test chars: \"'/\\ä↭",
"title": "youtube-dl test tracks \"'/\\ä↭<>",
'description': "test chars: \"'/\\ä↭",
'title': "youtube-dl test tracks \"'/\\ä↭<>",
},
"playlist": [
'playlist': [
{
"md5": "96ce57f24389fc8734ce47f4c1abcc55",
"info_dict": {
"id": "11885610",
"ext": "m4a",
"title": "youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': '96ce57f24389fc8734ce47f4c1abcc55',
'info_dict': {
'id': '11885610',
'ext': 'm4a',
'title': "youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "4ab26f05c1f7291ea460a3920be8021f",
"info_dict": {
"id": "11885608",
"ext": "m4a",
"title": "youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': '4ab26f05c1f7291ea460a3920be8021f',
'info_dict': {
'id': '11885608',
'ext': 'm4a',
'title': "youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "d30b5b5f74217410f4689605c35d1fd7",
"info_dict": {
"id": "11885679",
"ext": "m4a",
"title": "youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': 'd30b5b5f74217410f4689605c35d1fd7',
'info_dict': {
'id': '11885679',
'ext': 'm4a',
'title': "youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "4eb0a669317cd725f6bbd336a29f923a",
"info_dict": {
"id": "11885680",
"ext": "m4a",
"title": "youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': '4eb0a669317cd725f6bbd336a29f923a',
'info_dict': {
'id': '11885680',
'ext': 'm4a',
'title': "youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "1893e872e263a2705558d1d319ad19e8",
"info_dict": {
"id": "11885682",
"ext": "m4a",
"title": "PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': '1893e872e263a2705558d1d319ad19e8',
'info_dict': {
'id': '11885682',
'ext': 'm4a',
'title': "PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "b673c46f47a216ab1741ae8836af5899",
"info_dict": {
"id": "11885683",
"ext": "m4a",
"title": "PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': 'b673c46f47a216ab1741ae8836af5899',
'info_dict': {
'id': '11885683',
'ext': 'm4a',
'title': "PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "1d74534e95df54986da7f5abf7d842b7",
"info_dict": {
"id": "11885684",
"ext": "m4a",
"title": "phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': '1d74534e95df54986da7f5abf7d842b7',
'info_dict': {
'id': '11885684',
'ext': 'm4a',
'title': "phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
{
"md5": "f081f47af8f6ae782ed131d38b9cd1c0",
"info_dict": {
"id": "11885685",
"ext": "m4a",
"title": "phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad",
"uploader_id": "ytdl"
'md5': 'f081f47af8f6ae782ed131d38b9cd1c0',
'info_dict': {
'id': '11885685',
'ext': 'm4a',
'title': "phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
}
]

View File

@ -72,7 +72,7 @@ class EllenTVClipsIE(InfoExtractor):
def _extract_playlist(self, webpage):
json_string = self._search_regex(r'playerView.addClips\(\[\{(.*?)\}\]\);', webpage, 'json')
try:
return json.loads("[{" + json_string + "}]")
return json.loads('[{' + json_string + '}]')
except ValueError as ve:
raise ExtractorError('Failed to download JSON', cause=ve)

View File

@ -14,14 +14,14 @@ class EveryonesMixtapeIE(InfoExtractor):
_TESTS = [{
'url': 'http://everyonesmixtape.com/#/mix/m7m0jJAbMQi/5',
"info_dict": {
'info_dict': {
'id': '5bfseWNmlds',
'ext': 'mp4',
"title": "Passion Pit - \"Sleepyhead\" (Official Music Video)",
"uploader": "FKR.TV",
"uploader_id": "frenchkissrecords",
"description": "Music video for \"Sleepyhead\" from Passion Pit's debut EP Chunk Of Change.\nBuy on iTunes: https://itunes.apple.com/us/album/chunk-of-change-ep/id300087641\n\nDirected by The Wilderness.\n\nhttp://www.passionpitmusic.com\nhttp://www.frenchkissrecords.com",
"upload_date": "20081015"
'title': "Passion Pit - \"Sleepyhead\" (Official Music Video)",
'uploader': 'FKR.TV',
'uploader_id': 'frenchkissrecords',
'description': "Music video for \"Sleepyhead\" from Passion Pit's debut EP Chunk Of Change.\nBuy on iTunes: https://itunes.apple.com/us/album/chunk-of-change-ep/id300087641\n\nDirected by The Wilderness.\n\nhttp://www.passionpitmusic.com\nhttp://www.frenchkissrecords.com",
'upload_date': '20081015'
},
'params': {
'skip_download': True, # This is simply YouTube

View File

@ -41,7 +41,7 @@ class ExfmIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
song_id = mobj.group('id')
info_url = "http://ex.fm/api/v3/song/%s" % song_id
info_url = 'http://ex.fm/api/v3/song/%s' % song_id
info = self._download_json(info_url, song_id)['song']
song_url = info['url']
if re.match(self._SOUNDCLOUD_URL, song_url) is not None:

View File

@ -186,7 +186,7 @@ class FacebookIE(InfoExtractor):
if not video_data:
server_js_data = self._parse_json(self._search_regex(
r'handleServerJS\(({.+})\);', webpage, 'server js data'), video_id)
for item in server_js_data['instances']:
for item in server_js_data.get('instances', []):
if item[1][0] == 'VideoConfig':
video_data = video_data_list2dict(item[2][0]['videoData'])
break

View File

@ -52,7 +52,7 @@ class FazIE(InfoExtractor):
formats = []
for pref, code in enumerate(['LOW', 'HIGH', 'HQ']):
encoding = xpath_element(encodings, code)
if encoding:
if encoding is not None:
encoding_url = xpath_text(encoding, 'FILENAME')
if encoding_url:
formats.append({

View File

@ -87,7 +87,7 @@ class FC2IE(InfoExtractor):
mimi = hashlib.md5((video_id + '_gGddgPfeaf_gzyr').encode('utf-8')).hexdigest()
info_url = (
"http://video.fc2.com/ginfo.php?mimi={1:s}&href={2:s}&v={0:s}&fversion=WIN%2011%2C6%2C602%2C180&from=2&otag=0&upid={0:s}&tk=null&".
'http://video.fc2.com/ginfo.php?mimi={1:s}&href={2:s}&v={0:s}&fversion=WIN%2011%2C6%2C602%2C180&from=2&otag=0&upid={0:s}&tk=null&'.
format(video_id, mimi, compat_urllib_request.quote(refer, safe=b'').replace('.', '%2E')))
info_webpage = self._download_webpage(

View File

@ -9,6 +9,7 @@ class FOXIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.fox.com/watch/255180355939/7684182528',
'md5': 'ebd296fcc41dd4b19f8115d8461a3165',
'info_dict': {
'id': '255180355939',
'ext': 'mp4',
@ -17,10 +18,6 @@ class FOXIE(InfoExtractor):
'duration': 129,
},
'add_ie': ['ThePlatform'],
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
@ -29,7 +26,7 @@ class FOXIE(InfoExtractor):
release_url = self._parse_json(self._search_regex(
r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
video_id)['release_url'] + '&manifest=m3u'
video_id)['release_url'] + '&switch=http'
return {
'_type': 'url_transparent',

View File

@ -10,7 +10,7 @@ class FranceInterIE(InfoExtractor):
_TEST = {
'url': 'http://www.franceinter.fr/player/reecouter?play=793962',
'md5': '4764932e466e6f6c79c317d2e74f6884',
"info_dict": {
'info_dict': {
'id': '793962',
'ext': 'mp3',
'title': 'LHistoire dans les jeux vidéo',

View File

@ -289,7 +289,7 @@ class FranceTVIE(FranceTVBaseInfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id, catalogue = self._html_search_regex(
r'href="http://videos?\.francetv\.fr/video/([^@]+@[^"]+)"',
r'(?:href=|player\.setVideo\(\s*)"http://videos?\.francetv\.fr/video/([^@]+@[^"]+)"',
webpage, 'video ID').split('@')
return self._extract_video(video_id, catalogue)

View File

@ -12,8 +12,8 @@ class FreeVideoIE(InfoExtractor):
'info_dict': {
'id': 'vysukany-zadecek-22033',
'ext': 'mp4',
"title": "vysukany-zadecek-22033",
"age_limit": 18,
'title': 'vysukany-zadecek-22033',
'age_limit': 18,
},
'skip': 'Blocked outside .cz',
}

View File

@ -224,6 +224,20 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
},
# MPD from http://dash-mse-test.appspot.com/media.html
{
'url': 'http://yt-dash-mse-test.commondatastorage.googleapis.com/media/car-20120827-manifest.mpd',
'md5': '4b57baab2e30d6eb3a6a09f0ba57ef53',
'info_dict': {
'id': 'car-20120827-manifest',
'ext': 'mp4',
'title': 'car-20120827-manifest',
'formats': 'mincount:9',
},
'params': {
'format': 'bestvideo',
},
},
# google redirect
{
'url': 'http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCUQtwIwAA&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DcmQHVoWB5FY&ei=F-sNU-LLCaXk4QT52ICQBQ&usg=AFQjCNEw4hL29zgOohLXvpJ-Bdh2bils1Q&bvm=bv.61965928,d.bGE',
@ -1302,7 +1316,8 @@ class GenericIE(InfoExtractor):
return {
'id': video_id,
'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
'formats': self._parse_mpd_formats(doc, video_id),
'formats': self._parse_mpd_formats(
doc, video_id, mpd_base_url=url.rpartition('/')[0]),
}
except compat_xml_parse_error:
pass
@ -1413,7 +1428,7 @@ class GenericIE(InfoExtractor):
# Look for embedded Dailymotion player
matches = re.findall(
r'<(?:embed|iframe)[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
if matches:
return _playlist_from_matches(
matches, lambda m: unescapeHTML(m[1]))
@ -1558,6 +1573,11 @@ class GenericIE(InfoExtractor):
if mobj is not None:
return self.url_result(mobj.group('url'), 'VK')
# Look for embedded Odnoklassniki player
mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:odnoklassniki|ok)\.ru/videoembed/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'Odnoklassniki')
# Look for embedded ivi player
mobj = re.search(r'<embed[^>]+?src=(["\'])(?P<url>https?://(?:www\.)?ivi\.ru/video/player.+?)\1', webpage)
if mobj is not None:

View File

@ -65,7 +65,7 @@ class GloboIE(InfoExtractor):
'only_matching': True,
}]
class MD5:
class MD5(object):
HEX_FORMAT_LOWERCASE = 0
HEX_FORMAT_UPPERCASE = 1
BASE64_PAD_CHARACTER_DEFAULT_COMPLIANCE = ''

View File

@ -82,7 +82,7 @@ class GoogleDriveIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'formats': formats,
}

View File

@ -11,8 +11,8 @@ class HentaiStigmaIE(InfoExtractor):
'info_dict': {
'id': 'inyouchuu-etsu-bonus',
'ext': 'mp4',
"title": "Inyouchuu Etsu Bonus",
"age_limit": 18,
'title': 'Inyouchuu Etsu Bonus',
'age_limit': 18,
}
}

View File

@ -10,8 +10,8 @@ from ..utils import (
class HotStarIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/.*?[/-](?P<id>\d{10})'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})'
_TESTS = [{
'url': 'http://www.hotstar.com/on-air-with-aib--english-1000076273',
'info_dict': {
'id': '1000076273',
@ -26,7 +26,13 @@ class HotStarIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}
}, {
'url': 'http://www.hotstar.com/sports/cricket/rajitha-sizzles-on-debut-with-329/2001477583',
'only_matching': True,
}, {
'url': 'http://www.hotstar.com/1000000515',
'only_matching': True,
}]
_GET_CONTENT_TEMPLATE = 'http://account.hotstar.com/AVS/besc?action=GetAggregatedContentDetails&channel=PCTV&contentId=%s'
_GET_CDN_TEMPLATE = 'http://getcdn.hotstar.com/AVS/besc?action=GetCDN&asJson=Y&channel=%s&id=%s&type=%s'

View File

@ -2,14 +2,194 @@
from __future__ import unicode_literals
import hashlib
import itertools
import math
import os
import random
import re
import time
import uuid
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..utils import ExtractorError
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
ohdave_rsa_encrypt,
remove_start,
sanitized_Request,
urlencode_postdata,
url_basename,
)
def md5_text(text):
return hashlib.md5(text.encode('utf-8')).hexdigest()
class IqiyiSDK(object):
def __init__(self, target, ip, timestamp):
self.target = target
self.ip = ip
self.timestamp = timestamp
@staticmethod
def split_sum(data):
return compat_str(sum(map(lambda p: int(p, 16), list(data))))
@staticmethod
def digit_sum(num):
if isinstance(num, int):
num = compat_str(num)
return compat_str(sum(map(int, num)))
def even_odd(self):
even = self.digit_sum(compat_str(self.timestamp)[::2])
odd = self.digit_sum(compat_str(self.timestamp)[1::2])
return even, odd
def preprocess(self, chunksize):
self.target = md5_text(self.target)
chunks = []
for i in range(32 // chunksize):
chunks.append(self.target[chunksize * i:chunksize * (i + 1)])
if 32 % chunksize:
chunks.append(self.target[32 - 32 % chunksize:])
return chunks, list(map(int, self.ip.split('.')))
def mod(self, modulus):
chunks, ip = self.preprocess(32)
self.target = chunks[0] + ''.join(map(lambda p: compat_str(p % modulus), ip))
def split(self, chunksize):
modulus_map = {
4: 256,
5: 10,
8: 100,
}
chunks, ip = self.preprocess(chunksize)
ret = ''
for i in range(len(chunks)):
ip_part = compat_str(ip[i] % modulus_map[chunksize]) if i < 4 else ''
if chunksize == 8:
ret += ip_part + chunks[i]
else:
ret += chunks[i] + ip_part
self.target = ret
def handle_input16(self):
self.target = md5_text(self.target)
self.target = self.split_sum(self.target[:16]) + self.target + self.split_sum(self.target[16:])
def handle_input8(self):
self.target = md5_text(self.target)
ret = ''
for i in range(4):
part = self.target[8 * i:8 * (i + 1)]
ret += self.split_sum(part) + part
self.target = ret
def handleSum(self):
self.target = md5_text(self.target)
self.target = self.split_sum(self.target) + self.target
def date(self, scheme):
self.target = md5_text(self.target)
d = time.localtime(self.timestamp)
strings = {
'y': compat_str(d.tm_year),
'm': '%02d' % d.tm_mon,
'd': '%02d' % d.tm_mday,
}
self.target += ''.join(map(lambda c: strings[c], list(scheme)))
def split_time_even_odd(self):
even, odd = self.even_odd()
self.target = odd + md5_text(self.target) + even
def split_time_odd_even(self):
even, odd = self.even_odd()
self.target = even + md5_text(self.target) + odd
def split_ip_time_sum(self):
chunks, ip = self.preprocess(32)
self.target = compat_str(sum(ip)) + chunks[0] + self.digit_sum(self.timestamp)
def split_time_ip_sum(self):
chunks, ip = self.preprocess(32)
self.target = self.digit_sum(self.timestamp) + chunks[0] + compat_str(sum(ip))
class IqiyiSDKInterpreter(object):
BASE62_TABLE = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
def __init__(self, sdk_code):
self.sdk_code = sdk_code
@classmethod
def base62(cls, num):
if num == 0:
return '0'
ret = ''
while num:
ret = cls.BASE62_TABLE[num % 62] + ret
num = num // 62
return ret
def decode_eval_codes(self):
self.sdk_code = self.sdk_code[5:-3]
mobj = re.search(
r"'([^']+)',62,(\d+),'([^']+)'\.split\('\|'\),[^,]+,{}",
self.sdk_code)
obfucasted_code, count, symbols = mobj.groups()
count = int(count)
symbols = symbols.split('|')
symbol_table = {}
while count:
count -= 1
b62count = self.base62(count)
symbol_table[b62count] = symbols[count] or b62count
self.sdk_code = re.sub(
r'\b(\w+)\b', lambda mobj: symbol_table[mobj.group(0)],
obfucasted_code)
def run(self, target, ip, timestamp):
self.decode_eval_codes()
functions = re.findall(r'input=([a-zA-Z0-9]+)\(input', self.sdk_code)
sdk = IqiyiSDK(target, ip, timestamp)
other_functions = {
'handleSum': sdk.handleSum,
'handleInput8': sdk.handle_input8,
'handleInput16': sdk.handle_input16,
'splitTimeEvenOdd': sdk.split_time_even_odd,
'splitTimeOddEven': sdk.split_time_odd_even,
'splitIpTimeSum': sdk.split_ip_time_sum,
'splitTimeIpSum': sdk.split_time_ip_sum,
}
for function in functions:
if re.match(r'mod\d+', function):
sdk.mod(int(function[3:]))
elif re.match(r'date[ymd]{3}', function):
sdk.date(function[4:])
elif re.match(r'split\d+', function):
sdk.split(int(function[5:]))
elif function in other_functions:
other_functions[function]()
else:
raise ExtractorError('Unknown funcion %s' % function)
return sdk.target
class IqiyiIE(InfoExtractor):
@ -18,6 +198,8 @@ class IqiyiIE(InfoExtractor):
_VALID_URL = r'http://(?:[^.]+\.)?iqiyi\.com/.+\.html'
_NETRC_MACHINE = 'iqiyi'
_TESTS = [{
'url': 'http://www.iqiyi.com/v_19rrojlavg.html',
'md5': '2cb594dc2781e6c941a110d8f358118b',
@ -93,6 +275,35 @@ class IqiyiIE(InfoExtractor):
}, {
'url': 'http://yule.iqiyi.com/pcb.html',
'only_matching': True,
}, {
# VIP-only video. The first 2 parts (6 minutes) are available without login
# MD5 sums omitted as values are different on Travis CI and my machine
'url': 'http://www.iqiyi.com/v_19rrny4w8w.html',
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1',
'title': '泰坦尼克号',
},
'playlist': [{
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1_part1',
'ext': 'f4v',
'title': '泰坦尼克号',
},
}, {
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1_part2',
'ext': 'f4v',
'title': '泰坦尼克号',
},
}],
'expected_warnings': ['Needs a VIP account for full video'],
}, {
'url': 'http://www.iqiyi.com/a_19rrhb8ce1.html',
'info_dict': {
'id': '202918101',
'title': '灌篮高手 国语版',
},
'playlist_count': 101,
}]
_FORMATS_MAP = [
@ -104,11 +315,98 @@ class IqiyiIE(InfoExtractor):
('10', 'h1'),
]
@staticmethod
def md5_text(text):
return hashlib.md5(text.encode('utf-8')).hexdigest()
def _real_initialize(self):
self._login()
def construct_video_urls(self, data, video_id, _uuid):
@staticmethod
def _rsa_fun(data):
# public key extracted from http://static.iqiyi.com/js/qiyiV2/20160129180840/jobs/i18n/i18nIndex.js
N = 0xab86b6371b5318aaa1d3c9e612a9f1264f372323c8c0f19875b5fc3b3fd3afcc1e5bec527aa94bfa85bffc157e4245aebda05389a5357b75115ac94f074aefcd
e = 65537
return ohdave_rsa_encrypt(data, e, N)
def _login(self):
(username, password) = self._get_login_info()
# No authentication to be performed
if not username:
return True
data = self._download_json(
'http://kylin.iqiyi.com/get_token', None,
note='Get token for logging', errnote='Unable to get token for logging')
sdk = data['sdk']
timestamp = int(time.time())
target = '/apis/reglogin/login.action?lang=zh_TW&area_code=null&email=%s&passwd=%s&agenttype=1&from=undefined&keeplogin=0&piccode=&fromurl=&_pos=1' % (
username, self._rsa_fun(password.encode('utf-8')))
interp = IqiyiSDKInterpreter(sdk)
sign = interp.run(target, data['ip'], timestamp)
validation_params = {
'target': target,
'server': 'BEA3AA1908656AABCCFF76582C4C6660',
'token': data['token'],
'bird_src': 'f8d91d57af224da7893dd397d52d811a',
'sign': sign,
'bird_t': timestamp,
}
validation_result = self._download_json(
'http://kylin.iqiyi.com/validate?' + compat_urllib_parse.urlencode(validation_params), None,
note='Validate credentials', errnote='Unable to validate credentials')
MSG_MAP = {
'P00107': 'please login via the web interface and enter the CAPTCHA code',
'P00117': 'bad username or password',
}
code = validation_result['code']
if code != 'A00000':
msg = MSG_MAP.get(code)
if not msg:
msg = 'error %s' % code
if validation_result.get('msg'):
msg += ': ' + validation_result['msg']
self._downloader.report_warning('unable to log in: ' + msg)
return False
return True
def _authenticate_vip_video(self, api_video_url, video_id, tvid, _uuid, do_report_warning):
auth_params = {
# version and platform hard-coded in com/qiyi/player/core/model/remote/AuthenticationRemote.as
'version': '2.0',
'platform': 'b6c13e26323c537d',
'aid': tvid,
'tvid': tvid,
'uid': '',
'deviceId': _uuid,
'playType': 'main', # XXX: always main?
'filename': os.path.splitext(url_basename(api_video_url))[0],
}
qd_items = compat_parse_qs(compat_urllib_parse_urlparse(api_video_url).query)
for key, val in qd_items.items():
auth_params[key] = val[0]
auth_req = sanitized_Request(
'http://api.vip.iqiyi.com/services/ckn.action',
urlencode_postdata(auth_params))
# iQiyi server throws HTTP 405 error without the following header
auth_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
auth_result = self._download_json(
auth_req, video_id,
note='Downloading video authentication JSON',
errnote='Unable to download video authentication JSON')
if auth_result['code'] == 'Q00506': # requires a VIP account
if do_report_warning:
self.report_warning('Needs a VIP account for full video')
return False
return auth_result
def construct_video_urls(self, data, video_id, _uuid, tvid):
def do_xor(x, y):
a = y % 3
if a == 1:
@ -134,9 +432,10 @@ class IqiyiIE(InfoExtractor):
note='Download path key of segment %d for format %s' % (segment_index + 1, format_id)
)['t']
t = str(int(math.floor(int(tm) / (600.0))))
return self.md5_text(t + mg + x)
return md5_text(t + mg + x)
video_urls_dict = {}
need_vip_warning_report = True
for format_item in data['vp']['tkl'][0]['vs']:
if 0 < int(format_item['bid']) <= 10:
format_id = self.get_format(format_item['bid'])
@ -155,11 +454,13 @@ class IqiyiIE(InfoExtractor):
vl = segment['l']
if not vl.startswith('/'):
vl = get_encode_code(vl)
key = get_path_key(
vl.split('/')[-1].split('.')[0], format_id, segment_index)
is_vip_video = '/vip/' in vl
filesize = segment['b']
base_url = data['vp']['du'].split('/')
base_url.insert(-1, key)
if not is_vip_video:
key = get_path_key(
vl.split('/')[-1].split('.')[0], format_id, segment_index)
base_url.insert(-1, key)
base_url = '/'.join(base_url)
param = {
'su': _uuid,
@ -170,8 +471,23 @@ class IqiyiIE(InfoExtractor):
'ct': '',
'tn': str(int(time.time()))
}
api_video_url = base_url + vl + '?' + \
compat_urllib_parse.urlencode(param)
api_video_url = base_url + vl
if is_vip_video:
api_video_url = api_video_url.replace('.f4v', '.hml')
auth_result = self._authenticate_vip_video(
api_video_url, video_id, tvid, _uuid, need_vip_warning_report)
if auth_result is False:
need_vip_warning_report = False
break
param.update({
't': auth_result['data']['t'],
# cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as
'cid': 'afbe8fd3d73448c9',
'vid': video_id,
'QY00001': auth_result['data']['u'],
})
api_video_url += '?' if '?' not in api_video_url else '&'
api_video_url += compat_urllib_parse.urlencode(param)
js = self._download_json(
api_video_url, video_id,
note='Download video info of segment %d for format %s' % (segment_index + 1, format_id))
@ -195,16 +511,17 @@ class IqiyiIE(InfoExtractor):
tail = tm + tvid
param = {
'key': 'fvip',
'src': self.md5_text('youtube-dl'),
'src': md5_text('youtube-dl'),
'tvId': tvid,
'vid': video_id,
'vinfo': 1,
'tm': tm,
'enc': self.md5_text(enc_key + tail),
'enc': md5_text(enc_key + tail),
'qyid': _uuid,
'tn': random.random(),
'um': 0,
'authkey': self.md5_text(self.md5_text('') + tail),
'authkey': md5_text(md5_text('') + tail),
'k_tag': 1,
}
api_url = 'http://cache.video.qiyi.com/vms' + '?' + \
@ -218,9 +535,49 @@ class IqiyiIE(InfoExtractor):
enc_key = '6ab6d0280511493ba85594779759d4ed'
return enc_key
def _extract_playlist(self, webpage):
PAGE_SIZE = 50
links = re.findall(
r'<a[^>]+class="site-piclist_pic_link"[^>]+href="(http://www\.iqiyi\.com/.+\.html)"',
webpage)
if not links:
return
album_id = self._search_regex(
r'albumId\s*:\s*(\d+),', webpage, 'album ID')
album_title = self._search_regex(
r'data-share-title="([^"]+)"', webpage, 'album title', fatal=False)
entries = list(map(self.url_result, links))
# Start from 2 because links in the first page are already on webpage
for page_num in itertools.count(2):
pagelist_page = self._download_webpage(
'http://cache.video.qiyi.com/jp/avlist/%s/%d/%d/' % (album_id, page_num, PAGE_SIZE),
album_id,
note='Download playlist page %d' % page_num,
errnote='Failed to download playlist page %d' % page_num)
pagelist = self._parse_json(
remove_start(pagelist_page, 'var tvInfoJs='), album_id)
vlist = pagelist['data']['vlist']
for item in vlist:
entries.append(self.url_result(item['vurl']))
if len(vlist) < PAGE_SIZE:
break
return self.playlist_result(entries, album_id, album_title)
def _real_extract(self, url):
webpage = self._download_webpage(
url, 'temp_id', note='download video page')
# There's no simple way to determine whether an URL is a playlist or not
# So detect it
playlist_result = self._extract_playlist(webpage)
if playlist_result:
return playlist_result
tvid = self._search_regex(
r'data-player-tvid\s*=\s*[\'"](\d+)', webpage, 'tvid')
video_id = self._search_regex(
@ -236,16 +593,13 @@ class IqiyiIE(InfoExtractor):
if raw_data['code'] != 'A000000':
raise ExtractorError('Unable to load data. Error code: ' + raw_data['code'])
if not raw_data['data']['vp']['tkl']:
raise ExtractorError('No support iQiqy VIP video')
data = raw_data['data']
title = data['vi']['vn']
# generate video_urls_dict
video_urls_dict = self.construct_video_urls(
data, video_id, _uuid)
data, video_id, _uuid, tvid)
# construct info
entries = []

View File

@ -28,7 +28,7 @@ class KankanIE(InfoExtractor):
title = self._search_regex(r'(?:G_TITLE=|G_MOVIE_TITLE = )[\'"](.+?)[\'"]', webpage, 'video title')
surls = re.search(r'surls:\[\'.+?\'\]|lurl:\'.+?\.flv\'', webpage).group(0)
gcids = re.findall(r"http://.+?/.+?/(.+?)/", surls)
gcids = re.findall(r'http://.+?/.+?/(.+?)/', surls)
gcid = gcids[-1]
info_url = 'http://p2s.cl.kankan.com/getCdnresource_flv?gcid=%s' % gcid

View File

@ -1,86 +1,125 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import random
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
sanitized_Request,
unified_strdate,
urlencode_postdata,
xpath_element,
xpath_text,
)
class Laola1TvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?laola1\.tv/(?P<lang>[a-z]+)-(?P<portal>[a-z]+)/.*?/(?P<id>[0-9]+)\.html'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?laola1\.tv/(?P<lang>[a-z]+)-(?P<portal>[a-z]+)/[^/]+/(?P<slug>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie/227883.html',
'info_dict': {
'id': '227883',
'ext': 'mp4',
'display_id': 'straubing-tigers-koelner-haie',
'ext': 'flv',
'title': 'Straubing Tigers - Kölner Haie',
'categories': ['Eishockey'],
'upload_date': '20140912',
'is_live': False,
'categories': ['Eishockey'],
},
'params': {
'skip_download': True,
}
}
}, {
'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie',
'info_dict': {
'id': '464602',
'display_id': 'straubing-tigers-koelner-haie',
'ext': 'flv',
'title': 'Straubing Tigers - Kölner Haie',
'upload_date': '20160129',
'is_live': False,
'categories': ['Eishockey'],
},
'params': {
'skip_download': True,
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('slug')
lang = mobj.group('lang')
portal = mobj.group('portal')
webpage = self._download_webpage(url, video_id)
iframe_url = self._search_regex(
r'<iframe[^>]*?class="main_tv_player"[^>]*?src="([^"]+)"',
webpage, 'iframe URL')
webpage = self._download_webpage(url, display_id)
iframe = self._download_webpage(
iframe_url, video_id, note='Downloading iframe')
flashvars_m = re.findall(
r'flashvars\.([_a-zA-Z0-9]+)\s*=\s*"([^"]*)";', iframe)
flashvars = dict((m[0], m[1]) for m in flashvars_m)
iframe_url = self._search_regex(
r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"',
webpage, 'iframe url')
video_id = self._search_regex(
r'videoid=(\d+)', iframe_url, 'video id')
iframe = self._download_webpage(compat_urlparse.urljoin(
url, iframe_url), display_id, 'Downloading iframe')
partner_id = self._search_regex(
r'partnerid\s*:\s*"([^"]+)"', iframe, 'partner id')
r'partnerid\s*:\s*(["\'])(?P<partner_id>.+?)\1',
iframe, 'partner id', group='partner_id')
xml_url = ('http://www.laola1.tv/server/hd_video.php?' +
'play=%s&partner=%s&portal=%s&v5ident=&lang=%s' % (
video_id, partner_id, portal, lang))
hd_doc = self._download_xml(xml_url, video_id)
hd_doc = self._download_xml(
'http://www.laola1.tv/server/hd_video.php?%s'
% compat_urllib_parse.urlencode({
'play': video_id,
'partner': partner_id,
'portal': portal,
'lang': lang,
'v5ident': '',
}), display_id)
title = xpath_text(hd_doc, './/video/title', fatal=True)
flash_url = xpath_text(hd_doc, './/video/url', fatal=True)
uploader = xpath_text(hd_doc, './/video/meta_organistation')
is_live = xpath_text(hd_doc, './/video/islive') == 'true'
_v = lambda x, **k: xpath_text(hd_doc, './/video/' + x, **k)
title = _v('title', fatal=True)
categories = xpath_text(hd_doc, './/video/meta_sports')
if categories:
categories = categories.split(',')
req = sanitized_Request(
'https://club.laola1.tv/sp/laola1/api/v3/user/session/premium/player/stream-access?%s' %
compat_urllib_parse.urlencode({
'videoId': video_id,
'target': '2',
'label': 'laola1tv',
'area': _v('area'),
}),
urlencode_postdata(
dict((i, v) for i, v in enumerate(_v('req_liga_abos').split(',')))))
ident = random.randint(10000000, 99999999)
token_url = '%s&ident=%s&klub=0&unikey=0&timestamp=%s&auth=%s' % (
flash_url, ident, flashvars['timestamp'], flashvars['auth'])
token_url = self._download_json(req, display_id)['data']['stream-access'][0]
token_doc = self._download_xml(token_url, display_id, 'Downloading token')
token_doc = self._download_xml(
token_url, video_id, note='Downloading token')
token_attrib = token_doc.find('.//token').attrib
if token_attrib.get('auth') in ('blocked', 'restricted'):
token_attrib = xpath_element(token_doc, './/token').attrib
token_auth = token_attrib['auth']
if token_auth in ('blocked', 'restricted', 'error'):
raise ExtractorError(
'Token error: %s' % token_attrib.get('comment'), expected=True)
'Token error: %s' % token_attrib['comment'], expected=True)
video_url = '%s?hdnea=%s&hdcore=3.2.0' % (
token_attrib['url'], token_attrib['auth'])
formats = self._extract_f4m_formats(
'%s?hdnea=%s&hdcore=3.2.0' % (token_attrib['url'], token_auth),
video_id, f4m_id='hds')
categories_str = _v('meta_sports')
categories = categories_str.split(',') if categories_str else []
return {
'id': video_id,
'is_live': is_live,
'display_id': display_id,
'title': title,
'url': video_url,
'uploader': uploader,
'upload_date': unified_strdate(_v('time_date')),
'uploader': _v('meta_organisation'),
'categories': categories,
'ext': 'mp4',
'is_live': _v('islive') == 'true',
'formats': formats,
}

View File

@ -47,7 +47,7 @@ class LiveLeakIE(InfoExtractor):
'info_dict': {
'id': '801_1409392012',
'ext': 'mp4',
'description': "Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.",
'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.',
'uploader': 'bony333',
'title': 'Crazy Hungarian tourist films close call waterspout in Croatia'
}

View File

@ -4,6 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
remove_end,
)
class MailRuIE(InfoExtractor):
@ -34,14 +38,30 @@ class MailRuIE(InfoExtractor):
'id': '46843144_1263',
'ext': 'mp4',
'title': 'Samsung Galaxy S5 Hammer Smash Fail Battery Explosion',
'timestamp': 1397217632,
'upload_date': '20140411',
'uploader': 'hitech',
'timestamp': 1397039888,
'upload_date': '20140409',
'uploader': 'hitech@corp.mail.ru',
'uploader_id': 'hitech@corp.mail.ru',
'duration': 245,
},
'skip': 'Not accessible from Travis CI server',
},
{
# only available via metaUrl API
'url': 'http://my.mail.ru/mail/720pizle/video/_myvideo/502.html',
'md5': '3b26d2491c6949d031a32b96bd97c096',
'info_dict': {
'id': '56664382_502',
'ext': 'mp4',
'title': ':8336',
'timestamp': 1449094163,
'upload_date': '20151202',
'uploader': '720pizle@mail.ru',
'uploader_id': '720pizle@mail.ru',
'duration': 6001,
},
'skip': 'Not accessible from Travis CI server',
}
]
def _real_extract(self, url):
@ -51,32 +71,55 @@ class MailRuIE(InfoExtractor):
if not video_id:
video_id = mobj.group('idv2prefix') + mobj.group('idv2suffix')
video_data = self._download_json(
'http://api.video.mail.ru/videos/%s.json?new=1' % video_id, video_id, 'Downloading video JSON')
webpage = self._download_webpage(url, video_id)
author = video_data['author']
uploader = author['name']
uploader_id = author.get('id') or author.get('email')
view_count = video_data.get('views_count')
video_data = None
page_config = self._parse_json(self._search_regex(
r'(?s)<script[^>]+class="sp-video__page-config"[^>]*>(.+?)</script>',
webpage, 'page config', default='{}'), video_id, fatal=False)
if page_config:
meta_url = page_config.get('metaUrl') or page_config.get('video', {}).get('metaUrl')
if meta_url:
video_data = self._download_json(
meta_url, video_id, 'Downloading video meta JSON', fatal=False)
# Fallback old approach
if not video_data:
video_data = self._download_json(
'http://api.video.mail.ru/videos/%s.json?new=1' % video_id,
video_id, 'Downloading video JSON')
formats = []
for f in video_data['videos']:
video_url = f.get('url')
if not video_url:
continue
format_id = f.get('key')
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', format_id, 'height', default=None)) if format_id else None
formats.append({
'url': video_url,
'format_id': format_id,
'height': height,
})
self._sort_formats(formats)
meta_data = video_data['meta']
content_id = '%s_%s' % (
meta_data.get('accId', ''), meta_data['itemId'])
title = meta_data['title']
if title.endswith('.mp4'):
title = title[:-4]
thumbnail = meta_data['poster']
duration = meta_data['duration']
timestamp = meta_data['timestamp']
title = remove_end(meta_data['title'], '.mp4')
formats = [
{
'url': video['url'],
'format_id': video['key'],
'height': int(video['key'].rstrip('p'))
} for video in video_data['videos']
]
self._sort_formats(formats)
author = video_data.get('author')
uploader = author.get('name')
uploader_id = author.get('id') or author.get('email')
view_count = int_or_none(video_data.get('viewsCount') or video_data.get('views_count'))
acc_id = meta_data.get('accId')
item_id = meta_data.get('itemId')
content_id = '%s_%s' % (acc_id, item_id) if acc_id and item_id else video_id
thumbnail = meta_data.get('poster')
duration = int_or_none(meta_data.get('duration'))
timestamp = int_or_none(meta_data.get('timestamp'))
return {
'id': content_id,

View File

@ -38,7 +38,7 @@ class MofosexIE(InfoExtractor):
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = "-".join(format)
format = '-'.join(format)
age_limit = self._rta_search(webpage)

View File

@ -11,6 +11,7 @@ from ..utils import (
ExtractorError,
find_xpath_attr,
fix_xml_ampersands,
float_or_none,
HEADRequest,
sanitized_Request,
unescapeHTML,
@ -110,7 +111,8 @@ class MTVServicesInfoExtractor(InfoExtractor):
uri = itemdoc.find('guid').text
video_id = self._id_from_uri(uri)
self.report_extraction(video_id)
mediagen_url = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content'))).attrib['url']
content_el = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content')))
mediagen_url = content_el.attrib['url']
# Remove the templates, like &device={device}
mediagen_url = re.sub(r'&[^=]*?={.*?}(?=(&|$))', '', mediagen_url)
if 'acceptMethods' not in mediagen_url:
@ -165,6 +167,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
'id': video_id,
'thumbnail': self._get_thumbnail_url(uri, itemdoc),
'description': description,
'duration': float_or_none(content_el.attrib.get('duration')),
}
def _get_feed_query(self, uri):

View File

@ -18,8 +18,8 @@ class MySpassIE(InfoExtractor):
'info_dict': {
'id': '11741',
'ext': 'mp4',
"description": "Wer kann in die Fu\u00dfstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?",
"title": "Absolute Mehrheit vom 17.02.2013 - Die Highlights, Teil 2",
'description': 'Wer kann in die Fu\u00dfstapfen von Wolfgang Kubicki treten und die Mehrheit der Zuschauer hinter sich versammeln? Wird vielleicht sogar die Absolute Mehrheit geknackt und der Jackpot von 200.000 Euro mit nach Hause genommen?',
'title': 'Absolute Mehrheit vom 17.02.2013 - Die Highlights, Teil 2',
},
}

View File

@ -19,6 +19,7 @@ from ..utils import (
class MyVideoIE(InfoExtractor):
_WORKING = False
_VALID_URL = r'http://(?:www\.)?myvideo\.de/(?:[^/]+/)?watch/(?P<id>[0-9]+)/[^?/]+.*'
IE_NAME = 'myvideo'
_TEST = {

View File

@ -57,7 +57,7 @@ class NBCIE(InfoExtractor):
{
# This video has expired but with an escaped embedURL
'url': 'http://www.nbc.com/parenthood/episode-guide/season-5/just-like-at-home/515',
'skip': 'Expired'
'only_matching': True,
}
]

View File

@ -18,14 +18,14 @@ class NerdCubedFeedIE(InfoExtractor):
}
def _real_extract(self, url):
feed = self._download_json(url, url, "Downloading NerdCubed JSON feed")
feed = self._download_json(url, url, 'Downloading NerdCubed JSON feed')
entries = [{
'_type': 'url',
'title': feed_entry['title'],
'uploader': feed_entry['source']['name'] if feed_entry['source'] else None,
'upload_date': datetime.datetime.strptime(feed_entry['date'], '%Y-%m-%d').strftime('%Y%m%d'),
'url': "http://www.youtube.com/watch?v=" + feed_entry['youtube_id'],
'url': 'http://www.youtube.com/watch?v=' + feed_entry['youtube_id'],
} for feed_entry in feed]
return {

View File

@ -0,0 +1,69 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
int_or_none,
xpath_text,
)
class NozIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?noz\.de/video/(?P<id>[0-9]+)/'
_TESTS = [{
'url': 'http://www.noz.de/video/25151/32-Deutschland-gewinnt-Badminton-Lnderspiel-in-Melle',
'info_dict': {
'id': '25151',
'ext': 'mp4',
'duration': 215,
'title': '3:2 - Deutschland gewinnt Badminton-Länderspiel in Melle',
'description': 'Vor rund 370 Zuschauern gewinnt die deutsche Badminton-Nationalmannschaft am Donnerstag ein EM-Vorbereitungsspiel gegen Frankreich in Melle. Video Moritz Frankenberg.',
'thumbnail': 're:^http://.*\.jpg',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
description = self._og_search_description(webpage)
edge_url = self._html_search_regex(
r'<script\s+(?:type="text/javascript"\s+)?src="(.*?/videojs_.*?)"',
webpage, 'edge URL')
edge_content = self._download_webpage(edge_url, 'meta configuration')
config_url_encoded = self._search_regex(
r'so\.addVariable\("config_url","[^,]*,(.*?)"',
edge_content, 'config URL'
)
config_url = compat_urllib_parse_unquote(config_url_encoded)
doc = self._download_xml(config_url, 'video configuration')
title = xpath_text(doc, './/title')
thumbnail = xpath_text(doc, './/article/thumbnail/url')
duration = int_or_none(xpath_text(
doc, './/article/movie/file/duration'))
formats = []
for qnode in doc.findall('.//article/movie/file/qualities/qual'):
video_node = qnode.find('./html_urls/video_url[@format="video/mp4"]')
if video_node is None:
continue # auto
formats.append({
'url': video_node.text,
'format_name': xpath_text(qnode, './name'),
'format_id': xpath_text(qnode, './id'),
'height': int_or_none(xpath_text(qnode, './height')),
'width': int_or_none(xpath_text(qnode, './width')),
'tbr': int_or_none(xpath_text(qnode, './bitrate'), scale=1000),
})
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'title': title,
'duration': duration,
'description': description,
'thumbnail': thumbnail,
}

View File

@ -112,6 +112,7 @@ class ORFTVthekIE(InfoExtractor):
% geo_str),
fatal=False)
self._check_formats(formats, video_id)
self._sort_formats(formats)
upload_date = unified_strdate(sd['created_date'])

View File

@ -4,10 +4,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
js_to_json,
strip_jsonp,
unified_strdate,
US_RATINGS,
@ -199,7 +201,7 @@ class PBSIE(InfoExtractor):
'id': '2365006249',
'ext': 'mp4',
'title': 'Constitution USA with Peter Sagal - A More Perfect Union',
'description': 'md5:ba0c207295339c8d6eced00b7c363c6a',
'description': 'md5:36f341ae62e251b8f5bd2b754b95a071',
'duration': 3190,
},
'params': {
@ -213,7 +215,7 @@ class PBSIE(InfoExtractor):
'id': '2365297690',
'ext': 'mp4',
'title': 'FRONTLINE - Losing Iraq',
'description': 'md5:f5bfbefadf421e8bb8647602011caf8e',
'description': 'md5:4d3eaa01f94e61b3e73704735f1196d9',
'duration': 5050,
},
'params': {
@ -227,7 +229,7 @@ class PBSIE(InfoExtractor):
'id': '2201174722',
'ext': 'mp4',
'title': 'PBS NewsHour - Cyber Schools Gain Popularity, but Quality Questions Persist',
'description': 'md5:5871c15cba347c1b3d28ac47a73c7c28',
'description': 'md5:95a19f568689d09a166dff9edada3301',
'duration': 801,
},
},
@ -237,8 +239,8 @@ class PBSIE(InfoExtractor):
'info_dict': {
'id': '2365297708',
'ext': 'mp4',
'description': 'md5:68d87ef760660eb564455eb30ca464fe',
'title': 'Great Performances - Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full',
'description': 'md5:657897370e09e2bc6bf0f8d2cd313c6b',
'duration': 6559,
'thumbnail': 're:^https?://.*\.jpg$',
},
@ -278,7 +280,7 @@ class PBSIE(InfoExtractor):
'display_id': 'player',
'ext': 'mp4',
'title': 'American Experience - Death and the Civil War, Chapter 1',
'description': 'American Experience, TVs most-watched history series, brings to life the compelling stories from our past that inform our understanding of the world today.',
'description': 'md5:1b80a74e0380ed2a4fb335026de1600d',
'duration': 682,
'thumbnail': 're:^https?://.*\.jpg$',
},
@ -287,20 +289,19 @@ class PBSIE(InfoExtractor):
},
},
{
'url': 'http://video.pbs.org/video/2365367186/',
'url': 'http://www.pbs.org/video/2365245528/',
'info_dict': {
'id': '2365367186',
'display_id': '2365367186',
'id': '2365245528',
'display_id': '2365245528',
'ext': 'mp4',
'title': 'To Catch A Comet - Full Episode',
'description': 'On November 12, 2014, billions of kilometers from Earth, spacecraft orbiter Rosetta and lander Philae did what no other had dared to attempt \u2014 land on the volatile surface of a comet as it zooms around the sun at 67,000 km/hr. The European Space Agency hopes this mission can help peer into our past and unlock secrets of our origins.',
'duration': 3342,
'title': 'FRONTLINE - United States of Secrets (Part One)',
'description': 'md5:55756bd5c551519cc4b7703e373e217e',
'duration': 6851,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True, # requires ffmpeg
},
'skip': 'Expired',
},
{
# Video embedded in iframe containing angle brackets as attribute's value (e.g.
@ -312,7 +313,7 @@ class PBSIE(InfoExtractor):
'display_id': 'a-chefs-life-season-3-episode-5-prickly-business',
'ext': 'mp4',
'title': "A Chef's Life - Season 3, Ep. 5: Prickly Business",
'description': 'md5:61db2ddf27c9912f09c241014b118ed1',
'description': 'md5:54033c6baa1f9623607c6e2ed245888b',
'duration': 1480,
'thumbnail': 're:^https?://.*\.jpg$',
},
@ -328,7 +329,7 @@ class PBSIE(InfoExtractor):
'display_id': 'the-atomic-artists',
'ext': 'mp4',
'title': 'FRONTLINE - The Atomic Artists',
'description': 'md5:f5bfbefadf421e8bb8647602011caf8e',
'description': 'md5:1a2481e86b32b2e12ec1905dd473e2c1',
'duration': 723,
'thumbnail': 're:^https?://.*\.jpg$',
},
@ -336,6 +337,21 @@ class PBSIE(InfoExtractor):
'skip_download': True, # requires ffmpeg
},
},
{
# Serves hd only via wigget/partnerplayer page
'url': 'http://www.pbs.org/video/2365641075/',
'info_dict': {
'id': '2365641075',
'ext': 'mp4',
'title': 'FRONTLINE - Netanyahu at War',
'duration': 6852,
'thumbnail': 're:^https?://.*\.jpg$',
'formats': 'mincount:8',
},
'params': {
'skip_download': True, # requires ffmpeg
},
},
{
'url': 'http://player.pbs.org/widget/partnerplayer/2365297708/?start=0&end=0&chapterbar=false&endscreen=false&topbar=true',
'only_matching': True,
@ -365,10 +381,14 @@ class PBSIE(InfoExtractor):
webpage, 'upload date', default=None))
# tabbed frontline videos
tabbed_videos = re.findall(
r'<div[^>]+class="videotab[^"]*"[^>]+vid="(\d+)"', webpage)
if tabbed_videos:
return tabbed_videos, presumptive_id, upload_date
MULTI_PART_REGEXES = (
r'<div[^>]+class="videotab[^"]*"[^>]+vid="(\d+)"',
r'<a[^>]+href=["\']#video-\d+["\'][^>]+data-coveid=["\'](\d+)',
)
for p in MULTI_PART_REGEXES:
tabbed_videos = re.findall(p, webpage)
if tabbed_videos:
return tabbed_videos, presumptive_id, upload_date
MEDIA_ID_REGEXES = [
r"div\s*:\s*'videoembed'\s*,\s*mediaid\s*:\s*'(\d+)'", # frontline video embed
@ -432,22 +452,54 @@ class PBSIE(InfoExtractor):
for vid_id in video_id]
return self.playlist_result(entries, display_id)
info = self._download_json(
'http://player.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
display_id)
info = None
redirects = []
redirect_urls = set()
def extract_redirect_urls(info):
for encoding_name in ('recommended_encoding', 'alternate_encoding'):
redirect = info.get(encoding_name)
if not redirect:
continue
redirect_url = redirect.get('url')
if redirect_url and redirect_url not in redirect_urls:
redirects.append(redirect)
redirect_urls.add(redirect_url)
try:
video_info = self._download_json(
'http://player.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
display_id, 'Downloading video info JSON')
extract_redirect_urls(video_info)
info = video_info
except ExtractorError as e:
# videoInfo API may not work for some videos
if not isinstance(e.cause, compat_HTTPError) or e.cause.code != 404:
raise
# Player pages may also serve different qualities
for page in ('widget/partnerplayer', 'portalplayer'):
player = self._download_webpage(
'http://player.pbs.org/%s/%s' % (page, video_id),
display_id, 'Downloading %s page' % page, fatal=False)
if player:
video_info = self._parse_json(
self._search_regex(
r'(?s)PBS\.videoData\s*=\s*({.+?});\n',
player, '%s video data' % page, default='{}'),
display_id, transform_source=js_to_json, fatal=False)
if video_info:
extract_redirect_urls(video_info)
if not info:
info = video_info
formats = []
for encoding_name in ('recommended_encoding', 'alternate_encoding'):
redirect = info.get(encoding_name)
if not redirect:
continue
redirect_url = redirect.get('url')
if not redirect_url:
continue
for num, redirect in enumerate(redirects):
redirect_id = redirect.get('eeid')
redirect_info = self._download_json(
redirect_url + '?format=json', display_id,
'Downloading %s video url info' % encoding_name)
'%s?format=json' % redirect['url'], display_id,
'Downloading %s video url info' % (redirect_id or num))
if redirect_info['status'] == 'error':
raise ExtractorError(
@ -466,8 +518,9 @@ class PBSIE(InfoExtractor):
else:
formats.append({
'url': format_url,
'format_id': redirect.get('eeid'),
'format_id': redirect_id,
})
self._remove_duplicate_formats(formats)
self._sort_formats(formats)
rating_str = info.get('rating')
@ -493,7 +546,7 @@ class PBSIE(InfoExtractor):
'id': video_id,
'display_id': display_id,
'title': info['title'],
'description': info['program'].get('description'),
'description': info.get('description') or info.get('program', {}).get('description'),
'thumbnail': info.get('image_url'),
'duration': int_or_none(info.get('duration')),
'age_limit': age_limit,

View File

@ -0,0 +1,51 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class PlaysTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?plays\.tv/video/(?P<id>[0-9a-f]{18})'
_TEST = {
'url': 'http://plays.tv/video/56af17f56c95335490/when-you-outplay-the-azir-wall',
'md5': 'dfeac1198506652b5257a62762cec7bc',
'info_dict': {
'id': '56af17f56c95335490',
'ext': 'mp4',
'title': 'When you outplay the Azir wall',
'description': 'Posted by Bjergsen',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
content = self._parse_json(
self._search_regex(
r'R\.bindContent\(({.+?})\);', webpage,
'content'), video_id)['content']
mpd_url, sources = re.search(
r'(?s)<video[^>]+data-mpd="([^"]+)"[^>]*>(.+?)</video>',
content).groups()
formats = self._extract_mpd_formats(
self._proto_relative_url(mpd_url), video_id, mpd_id='DASH')
for format_id, height, format_url in re.findall(r'<source\s+res="((\d+)h?)"\s+src="([^"]+)"', sources):
formats.append({
'url': self._proto_relative_url(format_url),
'format_id': 'http-' + format_id,
'height': int_or_none(height),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
}

View File

@ -11,6 +11,7 @@ from ..compat import (
)
from ..utils import (
ExtractorError,
int_or_none,
sanitized_Request,
str_to_int,
)
@ -23,13 +24,18 @@ class PornHubIE(InfoExtractor):
_VALID_URL = r'https?://(?:[a-z]+\.)?pornhub\.com/(?:view_video\.php\?viewkey=|embed/)(?P<id>[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015',
'md5': '882f488fa1f0026f023f33576004a2ed',
'md5': '1e19b41231a02eba417839222ac9d58e',
'info_dict': {
'id': '648719015',
'ext': 'mp4',
"uploader": "Babes",
"title": "Seductive Indian beauty strips down and fingers her pink pussy",
"age_limit": 18
'title': 'Seductive Indian beauty strips down and fingers her pink pussy',
'uploader': 'Babes',
'duration': 361,
'view_count': int,
'like_count': int,
'dislike_count': int,
'comment_count': int,
'age_limit': 18,
}
}, {
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph557bbb6676d2d',
@ -67,13 +73,23 @@ class PornHubIE(InfoExtractor):
'PornHub said: %s' % error_msg,
expected=True, video_id=video_id)
video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
flashvars = self._parse_json(
self._search_regex(
r'var\s+flashv1ars_\d+\s*=\s*({.+?});', webpage, 'flashvars', default='{}'),
video_id)
if flashvars:
video_title = flashvars.get('video_title')
thumbnail = flashvars.get('image_url')
duration = int_or_none(flashvars.get('video_duration'))
else:
video_title, thumbnail, duration = [None] * 3
if not video_title:
video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
video_uploader = self._html_search_regex(
r'(?s)From:&nbsp;.+?<(?:a href="/users/|a href="/channels/|span class="username)[^>]+>(.+?)<',
webpage, 'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, 'thumbnail', fatal=False)
if thumbnail:
thumbnail = compat_urllib_parse_unquote(thumbnail)
view_count = self._extract_count(
r'<span class="count">([\d,\.]+)</span> views', webpage, 'view')
@ -95,7 +111,7 @@ class PornHubIE(InfoExtractor):
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2]
format = "-".join(format)
format = '-'.join(format)
m = re.match(r'^(?P<height>[0-9]+)[pP]-(?P<tbr>[0-9]+)[kK]$', format)
if m is None:
@ -120,6 +136,7 @@ class PornHubIE(InfoExtractor):
'uploader': video_uploader,
'title': video_title,
'thumbnail': thumbnail,
'duration': duration,
'view_count': view_count,
'like_count': like_count,
'dislike_count': dislike_count,
@ -129,7 +146,31 @@ class PornHubIE(InfoExtractor):
}
class PornHubPlaylistIE(InfoExtractor):
class PornHubPlaylistBaseIE(InfoExtractor):
def _extract_entries(self, webpage):
return [
self.url_result('http://www.pornhub.com/%s' % video_url, PornHubIE.ie_key())
for video_url in set(re.findall(
r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"', webpage))
]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = self._extract_entries(webpage)
playlist = self._parse_json(
self._search_regex(
r'playlistObject\s*=\s*({.+?});', webpage, 'playlist'),
playlist_id)
return self.playlist_result(
entries, playlist_id, playlist.get('title'), playlist.get('description'))
class PornHubPlaylistIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/playlist/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.pornhub.com/playlist/6201671',
@ -140,21 +181,20 @@ class PornHubPlaylistIE(InfoExtractor):
'playlist_mincount': 35,
}]
class PornHubUserVideosIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/users/(?P<id>[^/]+)/videos'
_TESTS = [{
'url': 'http://www.pornhub.com/users/rushandlia/videos',
'info_dict': {
'id': 'rushandlia',
},
'playlist_mincount': 13,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
user_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
webpage = self._download_webpage(url, user_id)
entries = [
self.url_result('http://www.pornhub.com/%s' % video_url, 'PornHub')
for video_url in set(re.findall(
r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"', webpage))
]
playlist = self._parse_json(
self._search_regex(
r'playlistObject\s*=\s*({.+?});', webpage, 'playlist'),
playlist_id)
return self.playlist_result(
entries, playlist_id, playlist.get('title'), playlist.get('description'))
return self.playlist_result(self._extract_entries(webpage), user_id)

View File

@ -56,7 +56,7 @@ class PornoVoisinesIE(InfoExtractor):
r'<h1>(.+?)</h1>', webpage, 'title', flags=re.DOTALL)
description = self._html_search_regex(
r'<article id="descriptif">(.+?)</article>',
webpage, "description", fatal=False, flags=re.DOTALL)
webpage, 'description', fatal=False, flags=re.DOTALL)
thumbnail = self._search_regex(
r'<div id="mediaspace%s">\s*<img src="/?([^"]+)"' % video_id,

View File

@ -28,16 +28,16 @@ class RadioBremenIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
meta_url = "http://www.radiobremen.de/apps/php/mediathek/metadaten.php?id=%s" % video_id
meta_url = 'http://www.radiobremen.de/apps/php/mediathek/metadaten.php?id=%s' % video_id
meta_doc = self._download_webpage(
meta_url, video_id, 'Downloading metadata')
title = self._html_search_regex(
r"<h1.*>(?P<title>.+)</h1>", meta_doc, "title")
r'<h1.*>(?P<title>.+)</h1>', meta_doc, 'title')
description = self._html_search_regex(
r"<p>(?P<description>.*)</p>", meta_doc, "description", fatal=False)
r'<p>(?P<description>.*)</p>', meta_doc, 'description', fatal=False)
duration = parse_duration(self._html_search_regex(
r"L&auml;nge:</td>\s+<td>(?P<duration>[0-9]+:[0-9]+)</td>",
meta_doc, "duration", fatal=False))
r'L&auml;nge:</td>\s+<td>(?P<duration>[0-9]+:[0-9]+)</td>',
meta_doc, 'duration', fatal=False))
page_doc = self._download_webpage(
url, video_id, 'Downloading video information')
@ -51,7 +51,7 @@ class RadioBremenIE(InfoExtractor):
formats = [{
'url': video_url,
'ext': 'mp4',
'width': int(mobj.group("width")),
'width': int(mobj.group('width')),
}]
return {
'id': video_id,

View File

@ -16,9 +16,9 @@ class RadioFranceIE(InfoExtractor):
'info_dict': {
'id': 'one-one',
'ext': 'ogg',
"title": "One to one",
"description": "Plutôt que d'imaginer la radio de demain comme technologie ou comme création de contenu, je veux montrer que quelles que soient ses évolutions, j'ai l'intime conviction que la radio continuera d'être un grand média de proximité pour les auditeurs.",
"uploader": "Thomas Hercouët",
'title': 'One to one',
'description': "Plutôt que d'imaginer la radio de demain comme technologie ou comme création de contenu, je veux montrer que quelles que soient ses évolutions, j'ai l'intime conviction que la radio continuera d'être un grand média de proximité pour les auditeurs.",
'uploader': 'Thomas Hercouët',
},
}

View File

@ -18,11 +18,11 @@ class RBMARadioIE(InfoExtractor):
'info_dict': {
'id': 'ford-lopatin-live-at-primavera-sound-2011',
'ext': 'mp3',
"uploader_id": "ford-lopatin",
"location": "Spain",
"description": "Joel Ford and Daniel Oneohtrix Point Never Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.",
"uploader": "Ford & Lopatin",
"title": "Live at Primavera Sound 2011",
'uploader_id': 'ford-lopatin',
'location': 'Spain',
'description': 'Joel Ford and Daniel Oneohtrix Point Never Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.',
'uploader': 'Ford & Lopatin',
'title': 'Live at Primavera Sound 2011',
},
}

View File

@ -12,12 +12,12 @@ class ReverbNationIE(InfoExtractor):
'url': 'http://www.reverbnation.com/alkilados/song/16965047-mona-lisa',
'md5': '3da12ebca28c67c111a7f8b262d3f7a7',
'info_dict': {
"id": "16965047",
"ext": "mp3",
"title": "MONA LISA",
"uploader": "ALKILADOS",
"uploader_id": "216429",
"thumbnail": "re:^https://gp1\.wac\.edgecastcdn\.net/.*?\.jpg$"
'id': '16965047',
'ext': 'mp3',
'title': 'MONA LISA',
'uploader': 'ALKILADOS',
'uploader_id': '216429',
'thumbnail': 're:^https://gp1\.wac\.edgecastcdn\.net/.*?\.jpg$'
},
}]

View File

@ -8,13 +8,13 @@ from .common import InfoExtractor
class RingTVIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?ringtv\.craveonline\.com/(?P<type>news|videos/video)/(?P<id>[^/?#]+)'
_TEST = {
"url": "http://ringtv.craveonline.com/news/310833-luis-collazo-says-victor-ortiz-better-not-quit-on-jan-30",
"md5": "d25945f5df41cdca2d2587165ac28720",
"info_dict": {
'url': 'http://ringtv.craveonline.com/news/310833-luis-collazo-says-victor-ortiz-better-not-quit-on-jan-30',
'md5': 'd25945f5df41cdca2d2587165ac28720',
'info_dict': {
'id': '857645',
'ext': 'mp4',
"title": 'Video: Luis Collazo says Victor Ortiz "better not quit on Jan. 30" - Ring TV',
"description": 'Luis Collazo is excited about his Jan. 30 showdown with fellow former welterweight titleholder Victor Ortiz at Barclays Center in his hometown of Brooklyn. The SuperBowl week fight headlines a Golden Boy Live! card on Fox Sports 1.',
'title': 'Video: Luis Collazo says Victor Ortiz "better not quit on Jan. 30" - Ring TV',
'description': 'Luis Collazo is excited about his Jan. 30 showdown with fellow former welterweight titleholder Victor Ortiz at Barclays Center in his hometown of Brooklyn. The SuperBowl week fight headlines a Golden Boy Live! card on Fox Sports 1.',
}
}
@ -32,8 +32,8 @@ class RingTVIE(InfoExtractor):
description = self._html_search_regex(
r'addthis:description="([^"]+)"',
webpage, 'description', fatal=False)
final_url = "http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/conversion/%s.mp4" % video_id
thumbnail_url = "http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/snapshots/%s.jpg" % video_id
final_url = 'http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/conversion/%s.mp4' % video_id
thumbnail_url = 'http://ringtv.craveonline.springboardplatform.com/storage/ringtv.craveonline.com/snapshots/%s.jpg' % video_id
return {
'id': video_id,

View File

@ -43,7 +43,7 @@ class RteIE(InfoExtractor):
r'<meta name="thumbnail" content="uri:irus:(.*?)" />', webpage, 'thumbnail')
thumbnail = 'http://img.rasset.ie/' + thumbnail_id + '.jpg'
feeds_url = self._html_search_meta("feeds-prefix", webpage, 'feeds url') + video_id
feeds_url = self._html_search_meta('feeds-prefix', webpage, 'feeds url') + video_id
json_string = self._download_json(feeds_url, video_id)
# f4m_url = server + relative_url

View File

@ -63,7 +63,7 @@ class RTL2IE(InfoExtractor):
download_url = video_info['streamurl']
download_url = download_url.replace('\\', '')
stream_url = 'mp4:' + self._html_search_regex(r'ondemand/(.*)', download_url, 'stream URL')
rtmp_conn = ["S:connect", "O:1", "NS:pageUrl:" + url, "NB:fpad:0", "NN:videoFunction:1", "O:0"]
rtmp_conn = ['S:connect', 'O:1', 'NS:pageUrl:' + url, 'NB:fpad:0', 'NN:videoFunction:1', 'O:0']
formats = [{
'url': download_url,

View File

@ -0,0 +1,138 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
parse_age_limit,
)
class ScreenJunkiesIE(InfoExtractor):
_VALID_URL = r'http://www.screenjunkies.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)'
_TESTS = [{
'url': 'http://www.screenjunkies.com/video/best-quentin-tarantino-movie-2841915',
'md5': '5c2b686bec3d43de42bde9ec047536b0',
'info_dict': {
'id': '2841915',
'display_id': 'best-quentin-tarantino-movie',
'ext': 'mp4',
'title': 'Best Quentin Tarantino Movie',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 3671,
'age_limit': 13,
'tags': list,
},
}, {
'url': 'http://www.screenjunkies.com/video/honest-trailers-the-dark-knight',
'info_dict': {
'id': '2348808',
'display_id': 'honest-trailers-the-dark-knight',
'ext': 'mp4',
'title': "Honest Trailers: 'The Dark Knight'",
'thumbnail': 're:^https?://.*\.jpg',
'age_limit': 10,
'tags': list,
},
}, {
# requires subscription but worked around
'url': 'http://www.screenjunkies.com/video/knocking-dead-ep-1-the-show-so-far-3003285',
'info_dict': {
'id': '3003285',
'display_id': 'knocking-dead-ep-1-the-show-so-far',
'ext': 'mp4',
'title': 'Knocking Dead Ep 1: State of The Dead Recap',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 3307,
'age_limit': 13,
'tags': list,
},
}]
_DEFAULT_BITRATES = (48, 150, 496, 864, 2240)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
if not video_id:
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
(r'src=["\']/embed/(\d+)', r'data-video-content-id=["\'](\d+)'),
webpage, 'video id')
webpage = self._download_webpage(
'http://www.screenjunkies.com/embed/%s' % video_id,
display_id, 'Downloading video embed page')
embed_vars = self._parse_json(
self._search_regex(
r'(?s)embedVars\s*=\s*({.+?})\s*</script>', webpage, 'embed vars'),
display_id)
title = embed_vars['contentName']
formats = []
bitrates = []
for f in embed_vars.get('media', []):
if not f.get('uri') or f.get('mediaPurpose') != 'play':
continue
bitrate = int_or_none(f.get('bitRate'))
if bitrate:
bitrates.append(bitrate)
formats.append({
'url': f['uri'],
'format_id': 'http-%d' % bitrate if bitrate else 'http',
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'tbr': bitrate,
'format': 'mp4',
})
if not bitrates:
# When subscriptionLevel > 0, i.e. plus subscription is required
# media list will be empty. However, hds and hls uris are still
# available. We can grab them assuming bitrates to be default.
bitrates = self._DEFAULT_BITRATES
auth_token = embed_vars.get('AuthToken')
def construct_manifest_url(base_url, ext):
pieces = [base_url]
pieces.extend([compat_str(b) for b in bitrates])
pieces.append('_kbps.mp4.%s?%s' % (ext, auth_token))
return ','.join(pieces)
if bitrates and auth_token:
hds_url = embed_vars.get('hdsUri')
if hds_url:
f4m_formats = self._extract_f4m_formats(
construct_manifest_url(hds_url, 'f4m'),
display_id, f4m_id='hds', fatal=False)
if len(f4m_formats) == len(bitrates):
for f, bitrate in zip(f4m_formats, bitrates):
if not f.get('tbr'):
f['format_id'] = 'hds-%d' % bitrate
f['tbr'] = bitrate
# TODO: fix f4m downloader to handle manifests without bitrates if possible
# formats.extend(f4m_formats)
hls_url = embed_vars.get('hlsUri')
if hls_url:
formats.extend(self._extract_m3u8_formats(
construct_manifest_url(hls_url, 'm3u8'),
display_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'thumbnail': embed_vars.get('thumbUri'),
'duration': int_or_none(embed_vars.get('videoLengthInSeconds')) or None,
'age_limit': parse_age_limit(embed_vars.get('audienceRating')),
'tags': embed_vars.get('tags', '').split(','),
'formats': formats,
}

View File

@ -40,7 +40,7 @@ class ScreenwaveMediaIE(InfoExtractor):
re.sub(
r'(?s)/\*.*?\*/', '',
self._search_regex(
r"sources\s*:\s*(\[[^\]]+?\])", playerconfig,
r'sources\s*:\s*(\[[^\]]+?\])', playerconfig,
'sources',
).replace(
"' + thisObj.options.videoserver + '",

View File

@ -15,37 +15,37 @@ from ..compat import (
class SenateISVPIE(InfoExtractor):
_COMM_MAP = [
["ag", "76440", "http://ag-f.akamaihd.net"],
["aging", "76442", "http://aging-f.akamaihd.net"],
["approps", "76441", "http://approps-f.akamaihd.net"],
["armed", "76445", "http://armed-f.akamaihd.net"],
["banking", "76446", "http://banking-f.akamaihd.net"],
["budget", "76447", "http://budget-f.akamaihd.net"],
["cecc", "76486", "http://srs-f.akamaihd.net"],
["commerce", "80177", "http://commerce1-f.akamaihd.net"],
["csce", "75229", "http://srs-f.akamaihd.net"],
["dpc", "76590", "http://dpc-f.akamaihd.net"],
["energy", "76448", "http://energy-f.akamaihd.net"],
["epw", "76478", "http://epw-f.akamaihd.net"],
["ethics", "76449", "http://ethics-f.akamaihd.net"],
["finance", "76450", "http://finance-f.akamaihd.net"],
["foreign", "76451", "http://foreign-f.akamaihd.net"],
["govtaff", "76453", "http://govtaff-f.akamaihd.net"],
["help", "76452", "http://help-f.akamaihd.net"],
["indian", "76455", "http://indian-f.akamaihd.net"],
["intel", "76456", "http://intel-f.akamaihd.net"],
["intlnarc", "76457", "http://intlnarc-f.akamaihd.net"],
["jccic", "85180", "http://jccic-f.akamaihd.net"],
["jec", "76458", "http://jec-f.akamaihd.net"],
["judiciary", "76459", "http://judiciary-f.akamaihd.net"],
["rpc", "76591", "http://rpc-f.akamaihd.net"],
["rules", "76460", "http://rules-f.akamaihd.net"],
["saa", "76489", "http://srs-f.akamaihd.net"],
["smbiz", "76461", "http://smbiz-f.akamaihd.net"],
["srs", "75229", "http://srs-f.akamaihd.net"],
["uscc", "76487", "http://srs-f.akamaihd.net"],
["vetaff", "76462", "http://vetaff-f.akamaihd.net"],
["arch", "", "http://ussenate-f.akamaihd.net/"]
['ag', '76440', 'http://ag-f.akamaihd.net'],
['aging', '76442', 'http://aging-f.akamaihd.net'],
['approps', '76441', 'http://approps-f.akamaihd.net'],
['armed', '76445', 'http://armed-f.akamaihd.net'],
['banking', '76446', 'http://banking-f.akamaihd.net'],
['budget', '76447', 'http://budget-f.akamaihd.net'],
['cecc', '76486', 'http://srs-f.akamaihd.net'],
['commerce', '80177', 'http://commerce1-f.akamaihd.net'],
['csce', '75229', 'http://srs-f.akamaihd.net'],
['dpc', '76590', 'http://dpc-f.akamaihd.net'],
['energy', '76448', 'http://energy-f.akamaihd.net'],
['epw', '76478', 'http://epw-f.akamaihd.net'],
['ethics', '76449', 'http://ethics-f.akamaihd.net'],
['finance', '76450', 'http://finance-f.akamaihd.net'],
['foreign', '76451', 'http://foreign-f.akamaihd.net'],
['govtaff', '76453', 'http://govtaff-f.akamaihd.net'],
['help', '76452', 'http://help-f.akamaihd.net'],
['indian', '76455', 'http://indian-f.akamaihd.net'],
['intel', '76456', 'http://intel-f.akamaihd.net'],
['intlnarc', '76457', 'http://intlnarc-f.akamaihd.net'],
['jccic', '85180', 'http://jccic-f.akamaihd.net'],
['jec', '76458', 'http://jec-f.akamaihd.net'],
['judiciary', '76459', 'http://judiciary-f.akamaihd.net'],
['rpc', '76591', 'http://rpc-f.akamaihd.net'],
['rules', '76460', 'http://rules-f.akamaihd.net'],
['saa', '76489', 'http://srs-f.akamaihd.net'],
['smbiz', '76461', 'http://smbiz-f.akamaihd.net'],
['srs', '75229', 'http://srs-f.akamaihd.net'],
['uscc', '76487', 'http://srs-f.akamaihd.net'],
['vetaff', '76462', 'http://vetaff-f.akamaihd.net'],
['arch', '', 'http://ussenate-f.akamaihd.net/']
]
_IE_NAME = 'senate.gov'
_VALID_URL = r'http://www\.senate\.gov/isvp/?\?(?P<qs>.+)'

View File

@ -13,8 +13,8 @@ class SlutloadIE(InfoExtractor):
'info_dict': {
'id': 'TD73btpBqSxc',
'ext': 'mp4',
"title": "virginie baisee en cam",
"age_limit": 18,
'title': 'virginie baisee en cam',
'age_limit': 18,
'thumbnail': 're:https?://.*?\.jpg'
}
}

View File

@ -170,7 +170,7 @@ class SmotriIE(InfoExtractor):
'getvideoinfo': '1',
}
video_password = self._downloader.params.get('videopassword', None)
video_password = self._downloader.params.get('videopassword')
if video_password:
video_form['pass'] = hashlib.md5(video_password.encode('utf-8')).hexdigest()
@ -356,7 +356,7 @@ class SmotriBroadcastIE(InfoExtractor):
url = 'http://smotri.com/broadcast/view/url/?ticket=%s' % ticket
broadcast_password = self._downloader.params.get('videopassword', None)
broadcast_password = self._downloader.params.get('videopassword')
if broadcast_password:
url += '&pass=%s' % hashlib.md5(broadcast_password.encode('utf-8')).hexdigest()

View File

@ -43,7 +43,7 @@ class SnotrIE(InfoExtractor):
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
video_url = "http://cdn.videos.snotr.com/%s.flv" % video_id
video_url = 'http://cdn.videos.snotr.com/%s.flv' % video_id
view_count = str_to_int(self._html_search_regex(
r'<p>\n<strong>Views:</strong>\n([\d,\.]+)</p>',

View File

@ -222,7 +222,7 @@ class SoundcloudIE(InfoExtractor):
full_title = track_id
token = mobj.group('secret_token')
if token:
info_json_url += "&secret_token=" + token
info_json_url += '&secret_token=' + token
elif mobj.group('player'):
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
real_url = query['url'][0]

View File

@ -22,23 +22,23 @@ class SteamIE(InfoExtractor):
_VIDEO_PAGE_TEMPLATE = 'http://store.steampowered.com/video/%s/'
_AGECHECK_TEMPLATE = 'http://store.steampowered.com/agecheck/video/%s/?snr=1_agecheck_agecheck__age-gate&ageDay=1&ageMonth=January&ageYear=1970'
_TESTS = [{
"url": "http://store.steampowered.com/video/105600/",
"playlist": [
'url': 'http://store.steampowered.com/video/105600/',
'playlist': [
{
"md5": "f870007cee7065d7c76b88f0a45ecc07",
"info_dict": {
'md5': 'f870007cee7065d7c76b88f0a45ecc07',
'info_dict': {
'id': '81300',
'ext': 'flv',
"title": "Terraria 1.1 Trailer",
'title': 'Terraria 1.1 Trailer',
'playlist_index': 1,
}
},
{
"md5": "61aaf31a5c5c3041afb58fb83cbb5751",
"info_dict": {
'md5': '61aaf31a5c5c3041afb58fb83cbb5751',
'info_dict': {
'id': '80859',
'ext': 'flv',
"title": "Terraria Trailer",
'title': 'Terraria Trailer',
'playlist_index': 2,
}
}

View File

@ -27,10 +27,10 @@ class TenPlayIE(InfoExtractor):
}
_video_fields = [
"id", "name", "shortDescription", "longDescription", "creationDate",
"publishedDate", "lastModifiedDate", "customFields", "videoStillURL",
"thumbnailURL", "referenceId", "length", "playsTotal",
"playsTrailingWeek", "renditions", "captioning", "startDate", "endDate"]
'id', 'name', 'shortDescription', 'longDescription', 'creationDate',
'publishedDate', 'lastModifiedDate', 'customFields', 'videoStillURL',
'thumbnailURL', 'referenceId', 'length', 'playsTotal',
'playsTrailingWeek', 'renditions', 'captioning', 'startDate', 'endDate']
def _real_extract(self, url):
webpage = self._download_webpage(url, url)

View File

@ -50,6 +50,4 @@ class TF1IE(InfoExtractor):
wat_id = self._html_search_regex(
r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})\1',
webpage, 'wat id', group='id')
wat_info = self._download_json(
'http://www.wat.tv/interface/contentv3/%s' % wat_id, video_id)
return self.url_result(wat_info['media']['url'], 'Wat')
return self.url_result('wat:%s' % wat_id, 'Wat')

View File

@ -20,8 +20,8 @@ from ..utils import (
int_or_none,
sanitized_Request,
unsmuggle_url,
url_basename,
xpath_with_ns,
mimetype2ext,
)
default_ns = 'http://www.w3.org/2005/SMIL21/Language'
@ -69,7 +69,7 @@ class ThePlatformBaseIE(InfoExtractor):
for caption in captions:
lang, src, mime = caption.get('lang', 'en'), caption.get('src'), caption.get('type')
subtitles[lang] = [{
'ext': 'srt' if mime == 'text/srt' else 'ttml',
'ext': mimetype2ext(mime),
'url': src,
}]
@ -283,8 +283,8 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
first_video_id = None
duration = None
for item in entry['media$content']:
smil_url = item['plfile$url'] + '&format=SMIL&Tracking=true&Embedded=true&formats=MPEG4,F4M'
cur_video_id = url_basename(smil_url)
smil_url = item['plfile$url'] + '&format=SMIL&mbr=true'
cur_video_id = ThePlatformIE._match_id(smil_url)
if first_video_id is None:
first_video_id = cur_video_id
duration = float_or_none(item.get('plfile$duration'))

View File

@ -48,22 +48,22 @@ class TheSixtyOneIE(InfoExtractor):
]
_DECODE_MAP = {
"x": "a",
"m": "b",
"w": "c",
"q": "d",
"n": "e",
"p": "f",
"a": "0",
"h": "1",
"e": "2",
"u": "3",
"s": "4",
"i": "5",
"o": "6",
"y": "7",
"r": "8",
"c": "9"
'x': 'a',
'm': 'b',
'w': 'c',
'q': 'd',
'n': 'e',
'p': 'f',
'a': '0',
'h': '1',
'e': '2',
'u': '3',
's': '4',
'i': '5',
'o': '6',
'y': '7',
'r': '8',
'c': '9'
}
def _real_extract(self, url):

View File

@ -38,12 +38,12 @@ class TrailerAddictIE(InfoExtractor):
# Presence of (no)watchplus function indicates HD quality is available
if re.search(r'function (no)?watchplus()', webpage):
fvar = "fvarhd"
fvar = 'fvarhd'
else:
fvar = "fvar"
fvar = 'fvar'
info_url = "http://www.traileraddict.com/%s.php?tid=%s" % (fvar, str(video_id))
info_webpage = self._download_webpage(info_url, video_id, "Downloading the info webpage")
info_url = 'http://www.traileraddict.com/%s.php?tid=%s' % (fvar, str(video_id))
info_webpage = self._download_webpage(info_url, video_id, 'Downloading the info webpage')
final_url = self._search_regex(r'&fileurl=(.+)',
info_webpage, 'Download url').replace('%3F', '?')

View File

@ -49,7 +49,7 @@ class TudouIE(InfoExtractor):
info_url = 'http://v2.tudou.com/f?id=' + compat_str(video_id)
if quality:
info_url += '&hd' + quality
xml_data = self._download_xml(info_url, video_id, "Opening the info XML page")
xml_data = self._download_xml(info_url, video_id, 'Opening the info XML page')
final_url = xml_data.text
return final_url

View File

@ -14,13 +14,19 @@ from ..utils import (
)
class TwitterCardIE(InfoExtractor):
class TwitterBaseIE(InfoExtractor):
def _get_vmap_video_url(self, vmap_url, video_id):
vmap_data = self._download_xml(vmap_url, video_id)
return xpath_text(vmap_data, './/MediaFile').strip()
class TwitterCardIE(TwitterBaseIE):
IE_NAME = 'twitter:card'
_VALID_URL = r'https?://(?:www\.)?twitter\.com/i/cards/tfw/v1/(?P<id>\d+)'
_TESTS = [
{
'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889',
'md5': '4fa26a35f9d1bf4b646590ba8e84be19',
# MD5 checksums are different in different places
'info_dict': {
'id': '560070183650213889',
'ext': 'mp4',
@ -42,7 +48,7 @@ class TwitterCardIE(InfoExtractor):
},
{
'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977',
'md5': 'b6f35e8b08a0bec6c8af77a2f4b3a814',
'md5': 'd4724ffe6d2437886d004fa5de1043b3',
'info_dict': {
'id': 'dq4Oj5quskI',
'ext': 'mp4',
@ -62,8 +68,8 @@ class TwitterCardIE(InfoExtractor):
'ext': 'mp4',
'upload_date': '20151113',
'uploader_id': '1189339351084113920',
'uploader': '@ArsenalTerje',
'title': 'Vine by @ArsenalTerje',
'uploader': 'ArsenalTerje',
'title': 'Vine by ArsenalTerje',
},
'add_ie': ['Vine'],
}
@ -96,10 +102,8 @@ class TwitterCardIE(InfoExtractor):
video_id)
if 'playlist' not in config:
if 'vmapUrl' in config:
vmap_data = self._download_xml(config['vmapUrl'], video_id)
video_url = xpath_text(vmap_data, './/MediaFile').strip()
formats.append({
'url': video_url,
'url': self._get_vmap_video_url(config['vmapUrl'], video_id),
})
break # same video regardless of UA
continue
@ -138,7 +142,7 @@ class TwitterIE(InfoExtractor):
_TESTS = [{
'url': 'https://twitter.com/freethenipple/status/643211948184596480',
'md5': 'db6612ec5d03355953c3ca9250c97e5e',
# MD5 checksums are different in different places
'info_dict': {
'id': '643211948184596480',
'ext': 'mp4',
@ -161,6 +165,7 @@ class TwitterIE(InfoExtractor):
'uploader': 'Gifs',
'uploader_id': 'giphz',
},
'expected_warnings': ['height', 'width'],
}, {
'url': 'https://twitter.com/starwars/status/665052190608723968',
'md5': '39b7199856dee6cd4432e72c74bc69d4',
@ -208,21 +213,82 @@ class TwitterIE(InfoExtractor):
return info
mobj = re.search(r'''(?x)
<video[^>]+class="animated-gif"[^>]+
(?:data-height="(?P<height>\d+)")?[^>]+
(?:data-width="(?P<width>\d+)")?[^>]+
(?:poster="(?P<poster>[^"]+)")?[^>]*>\s*
<video[^>]+class="animated-gif"(?P<more_info>[^>]+)>\s*
<source[^>]+video-src="(?P<url>[^"]+)"
''', webpage)
if mobj:
more_info = mobj.group('more_info')
height = int_or_none(self._search_regex(
r'data-height="(\d+)"', more_info, 'height', fatal=False))
width = int_or_none(self._search_regex(
r'data-width="(\d+)"', more_info, 'width', fatal=False))
thumbnail = self._search_regex(
r'poster="([^"]+)"', more_info, 'poster', fatal=False)
info.update({
'id': twid,
'url': mobj.group('url'),
'height': int_or_none(mobj.group('height')),
'width': int_or_none(mobj.group('width')),
'thumbnail': mobj.group('poster'),
'height': height,
'width': width,
'thumbnail': thumbnail,
})
return info
raise ExtractorError('There\'s not video in this tweet.')
raise ExtractorError('There\'s no video in this tweet.')
class TwitterAmplifyIE(TwitterBaseIE):
IE_NAME = 'twitter:amplify'
_VALID_URL = 'https?://amp\.twimg\.com/v/(?P<id>[0-9a-f\-]{36})'
_TEST = {
'url': 'https://amp.twimg.com/v/0ba0c3c7-0af3-4c0a-bed5-7efd1ffa2951',
'md5': '7df102d0b9fd7066b86f3159f8e81bf6',
'info_dict': {
'id': '0ba0c3c7-0af3-4c0a-bed5-7efd1ffa2951',
'ext': 'mp4',
'title': 'Twitter Video',
'thumbnail': 're:^https?://.*',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
vmap_url = self._html_search_meta(
'twitter:amplify:vmap', webpage, 'vmap url')
video_url = self._get_vmap_video_url(vmap_url, video_id)
thumbnails = []
thumbnail = self._html_search_meta(
'twitter:image:src', webpage, 'thumbnail', fatal=False)
def _find_dimension(target):
w = int_or_none(self._html_search_meta(
'twitter:%s:width' % target, webpage, fatal=False))
h = int_or_none(self._html_search_meta(
'twitter:%s:height' % target, webpage, fatal=False))
return w, h
if thumbnail:
thumbnail_w, thumbnail_h = _find_dimension('image')
thumbnails.append({
'url': thumbnail,
'width': thumbnail_w,
'height': thumbnail_h,
})
video_w, video_h = _find_dimension('player')
formats = [{
'url': video_url,
'width': video_w,
'height': video_h,
}]
return {
'id': video_id,
'title': 'Twitter Video',
'formats': formats,
'thumbnails': thumbnails,
}

View File

@ -47,7 +47,7 @@ class Vbox7IE(InfoExtractor):
title = self._html_search_regex(r'<title>(.*)</title>',
webpage, 'title').split('/')[0].strip()
info_url = "http://vbox7.com/play/magare.do"
info_url = 'http://vbox7.com/play/magare.do'
data = compat_urllib_parse.urlencode({'as3': '1', 'vid': video_id})
info_request = sanitized_Request(info_url, data)
info_request.add_header('Content-Type', 'application/x-www-form-urlencoded')

View File

@ -1,6 +1,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urlparse,
)
from ..utils import (
float_or_none,
int_or_none,
@ -12,10 +16,10 @@ class ViddlerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?viddler\.com/(?:v|embed|player)/(?P<id>[a-z0-9]+)'
_TESTS = [{
'url': 'http://www.viddler.com/v/43903784',
'md5': 'ae43ad7cb59431ce043f0ff7fa13cbf4',
'md5': '9eee21161d2c7f5b39690c3e325fab2f',
'info_dict': {
'id': '43903784',
'ext': 'mp4',
'ext': 'mov',
'title': 'Video Made Easy',
'description': 'md5:6a697ebd844ff3093bd2e82c37b409cd',
'uploader': 'viddler',
@ -29,10 +33,10 @@ class ViddlerIE(InfoExtractor):
}
}, {
'url': 'http://www.viddler.com/v/4d03aad9/',
'md5': 'faa71fbf70c0bee7ab93076fd007f4b0',
'md5': 'f12c5a7fa839c47a79363bfdf69404fb',
'info_dict': {
'id': '4d03aad9',
'ext': 'mp4',
'ext': 'ts',
'title': 'WALL-TO-GORTAT',
'upload_date': '20150126',
'uploader': 'deadspin',
@ -42,10 +46,10 @@ class ViddlerIE(InfoExtractor):
}
}, {
'url': 'http://www.viddler.com/player/221ebbbd/0/',
'md5': '0defa2bd0ea613d14a6e9bd1db6be326',
'md5': '740511f61d3d1bb71dc14a0fe01a1c10',
'info_dict': {
'id': '221ebbbd',
'ext': 'mp4',
'ext': 'mov',
'title': 'LETeens-Grammar-snack-third-conditional',
'description': ' ',
'upload_date': '20140929',
@ -54,16 +58,42 @@ class ViddlerIE(InfoExtractor):
'view_count': int,
'comment_count': int,
}
}, {
# secret protected
'url': 'http://www.viddler.com/v/890c0985?secret=34051570',
'info_dict': {
'id': '890c0985',
'ext': 'mp4',
'title': 'Complete Property Training - Traineeships',
'description': ' ',
'upload_date': '20130606',
'uploader': 'TiffanyBowtell',
'timestamp': 1370496993,
'view_count': int,
'comment_count': int,
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
json_url = (
'http://api.viddler.com/api/v2/viddler.videos.getPlaybackDetails.json?video_id=%s&key=v0vhrt7bg2xq1vyxhkct' %
video_id)
query = {
'video_id': video_id,
'key': 'v0vhrt7bg2xq1vyxhkct',
}
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
secret = qs.get('secret', [None])[0]
if secret:
query['secret'] = secret
headers = {'Referer': 'http://static.cdn-ec.viddler.com/js/arpeggio/v2/embed.html'}
request = sanitized_Request(json_url, None, headers)
request = sanitized_Request(
'http://api.viddler.com/api/v2/viddler.videos.getPlaybackDetails.json?%s'
% compat_urllib_parse.urlencode(query), None, headers)
data = self._download_json(request, video_id)['video']
formats = []

View File

@ -26,7 +26,7 @@ class VideoPremiumIE(InfoExtractor):
webpage_url = 'http://videopremium.tv/' + video_id
webpage = self._download_webpage(webpage_url, video_id)
if re.match(r"^<html><head><script[^>]*>window.location\s*=", webpage):
if re.match(r'^<html><head><script[^>]*>window.location\s*=', webpage):
# Download again, we need a cookie
webpage = self._download_webpage(
webpage_url, video_id,
@ -37,10 +37,10 @@ class VideoPremiumIE(InfoExtractor):
return {
'id': video_id,
'url': "rtmp://e%d.md.iplay.md/play" % random.randint(1, 16),
'play_path': "mp4:%s.f4v" % video_id,
'page_url': "http://videopremium.tv/" + video_id,
'player_url': "http://videopremium.tv/uplayer/uppod.swf",
'url': 'rtmp://e%d.md.iplay.md/play' % random.randint(1, 16),
'play_path': 'mp4:%s.f4v' % video_id,
'page_url': 'http://videopremium.tv/' + video_id,
'player_url': 'http://videopremium.tv/uplayer/uppod.swf',
'ext': 'f4v',
'title': video_title,
}

View File

@ -57,7 +57,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
def _extract_xsrft_and_vuid(self, webpage):
xsrft = self._search_regex(
r'xsrft\s*[=:]\s*(?P<q>["\'])(?P<xsrft>.+?)(?P=q)',
r'(?:(?P<q1>["\'])xsrft(?P=q1)\s*:|xsrft\s*[=:])\s*(?P<q>["\'])(?P<xsrft>.+?)(?P=q)',
webpage, 'login token', group='xsrft')
vuid = self._search_regex(
r'["\']vuid["\']\s*:\s*(["\'])(?P<vuid>.+?)\1',
@ -232,7 +232,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
return mobj.group(1)
def _verify_video_password(self, url, video_id, webpage):
password = self._downloader.params.get('videopassword', None)
password = self._downloader.params.get('videopassword')
if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
token, vuid = self._extract_xsrft_and_vuid(webpage)
@ -252,7 +252,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'Verifying the password', 'Wrong password')
def _verify_player_video_password(self, url, video_id):
password = self._downloader.params.get('videopassword', None)
password = self._downloader.params.get('videopassword')
if password is None:
raise ExtractorError('This video is protected by a password, use the --video-password option')
data = urlencode_postdata(encode_dict({'password': password}))
@ -368,16 +368,16 @@ class VimeoIE(VimeoBaseInfoExtractor):
{'force_feature_id': True}), 'Vimeo')
# Extract title
video_title = config["video"]["title"]
video_title = config['video']['title']
# Extract uploader and uploader_id
video_uploader = config["video"]["owner"]["name"]
video_uploader_id = config["video"]["owner"]["url"].split('/')[-1] if config["video"]["owner"]["url"] else None
video_uploader = config['video']['owner']['name']
video_uploader_id = config['video']['owner']['url'].split('/')[-1] if config['video']['owner']['url'] else None
# Extract video thumbnail
video_thumbnail = config["video"].get("thumbnail")
video_thumbnail = config['video'].get('thumbnail')
if video_thumbnail is None:
video_thumbs = config["video"].get("thumbs")
video_thumbs = config['video'].get('thumbs')
if video_thumbs and isinstance(video_thumbs, dict):
_, video_thumbnail = sorted((int(width if width.isdigit() else 0), t_url) for (width, t_url) in video_thumbs.items())[-1]
@ -401,7 +401,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
self._downloader.report_warning('Cannot find video description')
# Extract video duration
video_duration = int_or_none(config["video"].get("duration"))
video_duration = int_or_none(config['video'].get('duration'))
# Extract upload date
video_upload_date = None
@ -516,7 +516,7 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
if not login_form:
return webpage
password = self._downloader.params.get('videopassword', None)
password = self._downloader.params.get('videopassword')
if password is None:
raise ExtractorError('This album is protected by a password, use the --video-password option', expected=True)
fields = self._hidden_inputs(login_form)
@ -703,10 +703,10 @@ class VimeoLikesIE(InfoExtractor):
_TEST = {
'url': 'https://vimeo.com/user755559/likes/',
'playlist_mincount': 293,
"info_dict": {
'info_dict': {
'id': 'user755559_likes',
"description": "See all the videos urza likes",
"title": 'Videos urza likes',
'description': 'See all the videos urza likes',
'title': 'Videos urza likes',
},
}

View File

@ -119,7 +119,7 @@ class VineIE(InfoExtractor):
class VineUserIE(InfoExtractor):
IE_NAME = 'vine:user'
_VALID_URL = r'(?:https?://)?vine\.co/(?P<u>u/)?(?P<user>[^/]+)/?(\?.*)?$'
_VINE_BASE_URL = "https://vine.co/"
_VINE_BASE_URL = 'https://vine.co/'
_TESTS = [
{
'url': 'https://vine.co/Visa',
@ -139,7 +139,7 @@ class VineUserIE(InfoExtractor):
user = mobj.group('user')
u = mobj.group('u')
profile_url = "%sapi/users/profiles/%s%s" % (
profile_url = '%sapi/users/profiles/%s%s' % (
self._VINE_BASE_URL, 'vanity/' if not u else '', user)
profile_data = self._download_json(
profile_url, user, note='Downloading user profile data')
@ -147,7 +147,7 @@ class VineUserIE(InfoExtractor):
user_id = profile_data['data']['userId']
timeline_data = []
for pagenum in itertools.count(1):
timeline_url = "%sapi/timelines/users/%s?page=%s&size=100" % (
timeline_url = '%sapi/timelines/users/%s?page=%s&size=100' % (
self._VINE_BASE_URL, user_id, pagenum)
timeline_page = self._download_json(
timeline_url, user, note='Downloading page %d' % pagenum)

View File

@ -73,11 +73,16 @@ class VRTIE(InfoExtractor):
if mobj:
formats.extend(self._extract_m3u8_formats(
'%s/%s' % (mobj.group('server'), mobj.group('path')),
video_id, 'mp4', m3u8_id='hls'))
video_id, 'mp4', m3u8_id='hls', fatal=False))
mobj = re.search(r'data-video-src="(?P<src>[^"]+)"', webpage)
if mobj:
formats.extend(self._extract_f4m_formats(
'%s/manifest.f4m' % mobj.group('src'), video_id, f4m_id='hds'))
'%s/manifest.f4m' % mobj.group('src'),
video_id, f4m_id='hds', fatal=False))
if not formats and 'data-video-geoblocking="true"' in webpage:
self.raise_geo_restricted('This video is only available in Belgium')
self._sort_formats(formats)
title = self._og_search_title(webpage)

View File

@ -12,7 +12,7 @@ from ..utils import (
class WatIE(InfoExtractor):
_VALID_URL = r'http://www\.wat\.tv/video/(?P<display_id>.*)-(?P<short_id>.*?)_.*?\.html'
_VALID_URL = r'(?:wat:(?P<real_id>\d{8})|http://www\.wat\.tv/video/(?P<display_id>.*)-(?P<short_id>.*?)_.*?\.html)'
IE_NAME = 'wat.tv'
_TESTS = [
{
@ -54,10 +54,12 @@ class WatIE(InfoExtractor):
def real_id_for_chapter(chapter):
return chapter['tc_start'].split('-')[0]
mobj = re.match(self._VALID_URL, url)
short_id = mobj.group('short_id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id or short_id)
real_id = self._search_regex(r'xtpage = ".*-(.*?)";', webpage, 'real id')
real_id = mobj.group('real_id')
if not real_id:
short_id = mobj.group('short_id')
webpage = self._download_webpage(url, display_id or short_id)
real_id = self._search_regex(r'xtpage = ".*-(.*?)";', webpage, 'real id')
video_info = self.download_video_info(real_id)

View File

@ -8,12 +8,12 @@ from .common import InfoExtractor
class WorldStarHipHopIE(InfoExtractor):
_VALID_URL = r'https?://(?:www|m)\.worldstar(?:candy|hiphop)\.com/(?:videos|android)/video\.php\?v=(?P<id>.*)'
_TESTS = [{
"url": "http://www.worldstarhiphop.com/videos/video.php?v=wshh6a7q1ny0G34ZwuIO",
"md5": "9d04de741161603bf7071bbf4e883186",
"info_dict": {
"id": "wshh6a7q1ny0G34ZwuIO",
"ext": "mp4",
"title": "KO Of The Week: MMA Fighter Gets Knocked Out By Swift Head Kick!"
'url': 'http://www.worldstarhiphop.com/videos/video.php?v=wshh6a7q1ny0G34ZwuIO',
'md5': '9d04de741161603bf7071bbf4e883186',
'info_dict': {
'id': 'wshh6a7q1ny0G34ZwuIO',
'ext': 'mp4',
'title': 'KO Of The Week: MMA Fighter Gets Knocked Out By Swift Head Kick!'
}
}, {
'url': 'http://m.worldstarhiphop.com/android/video.php?v=wshh6a7q1ny0G34ZwuIO',
@ -21,7 +21,7 @@ class WorldStarHipHopIE(InfoExtractor):
'info_dict': {
'id': 'wshh6a7q1ny0G34ZwuIO',
'ext': 'mp4',
"title": "KO Of The Week: MMA Fighter Gets Knocked Out By Swift Head Kick!"
'title': 'KO Of The Week: MMA Fighter Gets Knocked Out By Swift Head Kick!'
}
}]

Some files were not shown because too many files have changed in this diff Show More