Compare commits

...

83 Commits

Author SHA1 Message Date
97bc05116e Merge branch 'master' into totalwebcasting 2018-01-07 15:03:28 +01:00
0a5b1295b7 [motherless:group] Relax entry extraction and add a fallback scenario 2018-01-07 00:31:53 +07:00
a133eb7764 [motherless:group] Capture leading slash of video path 2018-01-07 00:02:41 +07:00
f12628f934 [mitele] Fix extraction (closes #15186) 2018-01-06 23:58:00 +07:00
45283afdec [motherless] Add support for groups 2018-01-06 23:33:40 +07:00
b7c74c0403 [lynda] Relax _VALID_URL (closes #15185) 2018-01-06 23:12:30 +07:00
0b0870f9d0 [soundcloud] Fallback to avatar picture for thumbnail (closes #12878) 2018-01-05 08:25:42 +07:00
c2f18e1c49 [ChangeLog] Update after #15137
[skip ci]
2018-01-04 22:28:00 +08:00
da35331c6c [youku] Fix list extraction.(close #15135) (#15137)
* [youku] Fix list extraction.(close #15135)

Change-Id: I2e9c920143f4f16012252625943a8f18b8ff40eb

* [youku] Remove KeyError try-except

Change-Id: Ic46327905cbef1356b7b12d5eb3db5d9746ca338
2018-01-04 22:25:28 +08:00
de329f64ab [openload] Fix extraction (closes #15166) 2018-01-04 13:26:08 +08:00
75ba0efb52 [lynda] Skip invalid subtitles (closes #15159) 2018-01-03 16:41:28 +07:00
f0c6c2bce2 [twitch] Pass video id to url_result when extracting playlist 2018-01-03 16:22:55 +07:00
9650c3e91d [rtve.es:alacarta] Fix extraction of some new URLs 2018-01-02 21:12:39 +01:00
b5e531f31a [acast] Fix extraction 2018-01-02 23:32:17 +07:00
7a6c204fcb [travis] Add Jython build 2018-01-02 21:13:41 +07:00
d7cd9a9e84 [utils] Fix youtube-dl under PyPy3 on Windows 2018-01-01 22:48:27 +07:00
54009c246e [travis] Add PyPy builds 2018-01-01 21:54:28 +07:00
b300cda476 [YoutubeDL] Output python implementation in debug header 2018-01-01 21:52:24 +07:00
04cf1a191a release 2017.12.31 2017-12-31 04:30:49 +07:00
c95c08a856 [ChangeLog] Actualize 2017-12-31 04:28:01 +07:00
126f225bcf [extractor/common] Add container meta field for formats extracted in _parse_mpd_formats 2017-12-31 04:04:09 +07:00
4f5cf31977 [slutload] Add support for mobile URLs 2017-12-31 01:41:07 +07:00
77341dae14 [abc:iview] Improve extraction and bypass geo restriction (closes #14782) 2017-12-31 01:27:28 +07:00
2e65e7db9e [abc:iview] Fix extraction (closes #14711)
ABC dropped unmetering, so change to metered hls urls which
require auth.
2017-12-31 01:27:22 +07:00
538d4f8681 [downloader/hls] Use HTTP headers for key request 2017-12-31 01:15:35 +07:00
620ee8712e [openload] Fix extraction (closes #15118) 2017-12-30 15:03:13 +08:00
2ca7ed41fe [mediasite] Improve extraction and code style, add support for DASH (closes #11185, closes #14343, refs #5428) 2017-12-30 08:04:43 +07:00
8056c8542d [mediasite] Add extractor, subsume sandia and collegerama extractors 2017-12-30 07:23:41 +07:00
2501d41ef4 [common] use AACL as the default fourcc when AudioTag is 255 2017-12-30 07:22:07 +07:00
d97cb84b31 [ufctv] Add new extractor(closes #14520) 2017-12-30 00:30:41 +01:00
2c8e11b4af [pluralsight] Fix missing first line of subtitles (closes #11118) 2017-12-30 05:56:47 +07:00
d2c5b5a951 [openload] Fallback on f-page extraction (closes #14665, closes #14879) 2017-12-30 05:53:56 +07:00
580f3c79d5 [vimeo] Improve password protected videos extraction (closes #15114) 2017-12-30 03:54:14 +07:00
9d6ac71c27 [extractor/common] Fix extraction of DASH formats with the same representation id (closes #15111) 2017-12-29 23:14:56 +07:00
84f085d4bd [aws] fix canonical/signed headers generation in python 2(closes #15102) 2017-12-29 00:13:40 +01:00
a491fd0c6f release 2017.12.28 2017-12-28 23:12:56 +07:00
99277daaac [ChangeLog] Actualize 2017-12-28 23:10:42 +07:00
640788f6f4 [internazionale] Improve extraction (closes #14973) 2017-12-27 23:27:48 +07:00
1ae0f0a21d [internazionale] Add extractor 2017-12-27 23:27:43 +07:00
616bb95b28 [playtvak] Relax video regex and make description optional 2017-12-27 22:57:26 +07:00
be069839b4 [filmweb] improve extraction 2017-12-26 19:41:08 +01:00
a14001a5a1 [Filmweb] Add extractor 2017-12-26 15:19:37 +01:00
db145ee54a [espn] Add new extractor for http://fivethirtyeight.com(closes #6864) 2017-12-26 14:20:21 +01:00
45d20488f1 [umg:de] Add new extractor(closes #11582)(closes #11584) 2017-12-26 12:32:04 +01:00
0f897e0929 [espn] add support for espnfc and extract more formats(closes #8053) 2017-12-25 23:29:09 +01:00
173558ce96 [ChangeLog] Update after #15065 2017-12-25 22:06:18 +08:00
d3ca283235 [youku] Add test case.
Some playlist has no data-id value.

Change-Id: I97455f2907f08bda03b538cdc13ec827e2f8ce26
2017-12-25 22:02:47 +08:00
d99a1000c7 [youku] Fix list extraction.(close #15065)
Change-Id: I578fdc5b69509bdcd8d3191e3917afe47c234ff6
2017-12-25 22:02:47 +08:00
a75419586b [openload] Remove a confusing exception
If phantomjs is not installed, there's an error besides the missing
phantomjs exception:

Exception ignored in: <bound method PhantomJSwrapper.__del__ of <youtube_dl.extractor.openload.PhantomJSwrapper object at 0x7f8ad5e78278>>
Traceback (most recent call last):
  File "/home/yen/Projects/youtube-dl/youtube_dl/extractor/openload.py", line 142, in __del__
    os.remove(self._TMP_FILES[name].name)
AttributeError: 'PhantomJSwrapper' object has no attribute '_TMP_FILES'
2017-12-24 20:47:42 +08:00
273c23d960 [openload] Add support for oload.stream (closes #15070) 2017-12-24 13:53:27 +07:00
b954e72c87 [ChangeLog] typo 2017-12-23 23:42:02 +08:00
116561697d [ChangeLog] Update after #14903 2017-12-23 23:41:24 +08:00
0e25a1a278 [youku] Update ccode
Change-Id: Id397e814e81ff560506d68563b7409eebbe5943d
2017-12-23 23:34:42 +08:00
307a7588b0 release 2017.12.23 2017-12-23 21:24:18 +07:00
c2f2f8b120 [kaltura] Fix typo 2017-12-23 21:22:41 +07:00
f5a6321107 [ChangeLog] Actualize 2017-12-23 21:17:53 +07:00
69d69da98a [kaltura] Add another embed pattern for entry_id
For cases when player configuration map is setup via indexing operator, e.g. kalturaPlayerConfiguration_1_lre6rg3i_10[entry_id] = 1_lre6rg3i (see https://www.heise.de/video/artikel/odcast-c-t-uplink-20-1-Apple-CarPlay-vs-Android-Auto-Galileo-3D-Sound-erklaert-3919694.html)
2017-12-23 21:17:53 +07:00
5c5e60cff8 [voot] Fix video identification 2017-12-23 21:17:53 +07:00
2132edaa03 [extractor/common] Move X-Forwarded-For setup code into _request_webpage 2017-12-23 21:17:53 +07:00
4b7dd1705a [7plus] Add new extractor(closes #15043) 2017-12-23 13:22:20 +01:00
9e3682d555 [MANIFEST.in] Include all test data in PyPI package 2017-12-22 23:53:27 +07:00
3e191da6d9 [Makefile] Add AUTHORS to youtube-dl.tar.gz 2017-12-22 23:46:08 +07:00
963d237d26 Add LICENSE, AUTHORS and ChangeLog to PyPI package (closes #15054) 2017-12-22 23:38:16 +07:00
d2d766bc6d [animeondemand] Fix typo 2017-12-20 23:18:14 +07:00
17c3aced5d [animeondemand] Relax login error regex 2017-12-19 22:53:04 +07:00
78466fcab5 [shahid] add support for show pages(closes #7401) 2017-12-19 02:00:38 +01:00
3961c6cb9d [YoutubeDL] Add support for playlist_uploader and playlist_uploader_id in output template (closes #11427, #15018) 2017-12-19 03:53:44 +07:00
07aeced68e [youtube] Extract uploader, uploader_id and uploader_url for playlists (#11427, #15018) 2017-12-19 03:51:28 +07:00
c10c93238e [extractor/common] Introduce uploader, uploader_id and uploader_url meta fields for playlists (#11427, #15018) 2017-12-19 03:51:03 +07:00
4a109f81bc [afreecatv] Improve format extraction (closes #15019) 2017-12-19 00:38:39 +07:00
99081da90c [downloader/fragment] Encode filename of fragment being removed (closes #15020) 2017-12-18 03:31:53 +07:00
7e81010987 [cspan] add support for audio only pages and catch page errors(closes #14995) 2017-12-17 19:15:59 +01:00
549bb416f5 [mailru] Fix issues and improve (closes #14904) 2017-12-17 18:38:27 +07:00
25475dfab3 [mailru] Add support for embed URLs 2017-12-17 18:37:03 +07:00
3dfa9ec213 [crunchyroll] Future-proof XML element checks(closes #15013) 2017-12-17 09:15:44 +01:00
06dbcd7be4 [cbslocal] Fix timestamp extraction (closes #14999, closes #15000) 2017-12-16 21:57:30 +07:00
b555ae9bf1 [utils] Add another date format pattern (#14999) 2017-12-16 21:56:16 +07:00
c402e7f3a0 [discoverygo] correct ttml subtitle extension 2017-12-16 12:55:44 +01:00
498a8a4ca5 [vk] Make view count optional (closes #14979) 2017-12-15 22:53:56 +07:00
d05ba4b89e [disney] skip Apple FairPlay formats(#14982) 2017-12-15 09:28:07 +01:00
23f511f5c7 [voot] sort formats 2017-12-15 09:05:59 +01:00
1c4804ef9b [voot] fix format extraction(closes #14758) 2017-12-14 23:05:43 +01:00
7608a91ee7 [totalwebcasting] Add new extractor 2017-01-11 18:51:25 -05:00
61 changed files with 1687 additions and 551 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.12.14*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.12.31*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.12.14** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.12.31**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.12.14 [debug] youtube-dl version 2017.12.31
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -7,16 +7,21 @@ python:
- "3.4" - "3.4"
- "3.5" - "3.5"
- "3.6" - "3.6"
- "pypy"
- "pypy3"
sudo: false sudo: false
env: env:
- YTDL_TEST_SET=core - YTDL_TEST_SET=core
- YTDL_TEST_SET=download - YTDL_TEST_SET=download
matrix: matrix:
include:
- env: JYTHON=true; YTDL_TEST_SET=core
- env: JYTHON=true; YTDL_TEST_SET=download
fast_finish: true fast_finish: true
allow_failures: allow_failures:
- env: YTDL_TEST_SET=download - env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core
- env: JYTHON=true; YTDL_TEST_SET=download
before_install:
- if [ "$JYTHON" == "true" ]; then ./devscripts/install_jython.sh; export PATH="$HOME/jython/bin:$PATH"; fi
script: ./devscripts/run_tests.sh script: ./devscripts/run_tests.sh
notifications:
email:
- filippo.valsorda@gmail.com
- yasoob.khld@gmail.com

View File

@ -1,3 +1,83 @@
version <unreleased>
Extractors
* [youku] Fix list extraction (#15135)
* [openload] Fix extraction (#15166)
* [rtve.es:alacarta] Fix extraction of some new URLs
version 2017.12.31
Core
+ [extractor/common] Add container meta field for formats extracted
in _parse_mpd_formats (#13616)
+ [downloader/hls] Use HTTP headers for key request
* [common] Use AACL as the default fourcc when AudioTag is 255
* [extractor/common] Fix extraction of DASH formats with the same
representation id (#15111)
Extractors
+ [slutload] Add support for mobile URLs (#14806)
* [abc:iview] Bypass geo restriction
* [abc:iview] Fix extraction (#14711, #14782, #14838, #14917, #14963, #14985,
#15035, #15057, #15061, #15071, #15095, #15106)
* [openload] Fix extraction (#15118)
- [sandia] Remove extractor
- [collegerama] Remove extractor
+ [mediasite] Add support for sites based on Mediasite Video Platform (#5428,
#11185, #14343)
+ [ufctv] Add support for ufc.tv (#14520)
* [pluralsight] Fix missing first line of subtitles (#11118)
* [openload] Fallback on f-page extraction (#14665, #14879)
* [vimeo] Improve password protected videos extraction (#15114)
* [aws] Fix canonical/signed headers generation on python 2 (#15102)
version 2017.12.28
Extractors
+ [internazionale] Add support for internazionale.it (#14973)
* [playtvak] Relax video regular expression and make description optional
(#15037)
+ [filmweb] Add support for filmweb.no (#8773, #10368)
+ [23video] Add support for 23video.com
+ [espn] Add support for fivethirtyeight.com (#6864)
+ [umg:de] Add support for universal-music.de (#11582, #11584)
+ [espn] Add support for espnfc and extract more formats (#8053)
* [youku] Update ccode (#14880)
+ [openload] Add support for oload.stream (#15070)
* [youku] Fix list extraction (#15065)
version 2017.12.23
Core
* [extractor/common] Move X-Forwarded-For setup code into _request_webpage
+ [YoutubeDL] Add support for playlist_uploader and playlist_uploader_id in
output template (#11427, #15018)
+ [extractor/common] Introduce uploader, uploader_id and uploader_url
meta fields for playlists (#11427, #15018)
* [downloader/fragment] Encode filename of fragment being removed (#15020)
+ [utils] Add another date format pattern (#14999)
Extractors
+ [kaltura] Add another embed pattern for entry_id
+ [7plus] Add support for 7plus.com.au (#15043)
* [animeondemand] Relax login error regular expression
+ [shahid] Add support for show pages (#7401)
+ [youtube] Extract uploader, uploader_id and uploader_url for playlists
(#11427, #15018)
* [afreecatv] Improve format extraction (#15019)
+ [cspan] Add support for audio only pages and catch page errors (#14995)
+ [mailru] Add support for embed URLs (#14904)
* [crunchyroll] Future-proof XML element checks (#15013)
* [cbslocal] Fix timestamp extraction (#14999, #15000)
* [discoverygo] Correct TTML subtitle extension
* [vk] Make view count optional (#14979)
* [disney] Skip Apple FairPlay formats (#14982)
* [voot] Fix format extraction (#14758)
version 2017.12.14 version 2017.12.14
Core Core
@ -148,8 +228,8 @@ Extractors
+ [fxnetworks] Extract series metadata (#14603) + [fxnetworks] Extract series metadata (#14603)
+ [younow] Add support for younow.com (#9255, #9432, #12436) + [younow] Add support for younow.com (#9255, #9432, #12436)
* [dctptv] Fix extraction (#14599) * [dctptv] Fix extraction (#14599)
* [youtube] Restrict embed regex (#14600) * [youtube] Restrict embed regular expression (#14600)
* [vimeo] Restrict iframe embed regex (#14600) * [vimeo] Restrict iframe embed regular expression (#14600)
* [soundgasm] Improve extraction (#14588) * [soundgasm] Improve extraction (#14588)
- [myvideo] Remove extractor (#8557) - [myvideo] Remove extractor (#8557)
+ [nbc] Add support for classic-tv videos (#14575) + [nbc] Add support for classic-tv videos (#14575)

View File

@ -1,7 +1,9 @@
include README.md include README.md
include test/*.py include LICENSE
include test/*.json include AUTHORS
include ChangeLog
include youtube-dl.bash-completion include youtube-dl.bash-completion
include youtube-dl.fish include youtube-dl.fish
include youtube-dl.1 include youtube-dl.1
recursive-include docs Makefile conf.py *.rst recursive-include docs Makefile conf.py *.rst
recursive-include test *

View File

@ -110,7 +110,7 @@ _EXTRACTOR_FILES = $(shell find youtube_dl/extractor -iname '*.py' -and -not -in
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES) youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@ $(PYTHON) devscripts/make_lazy_extractors.py $@
youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish ChangeLog youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish ChangeLog AUTHORS
@tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \ @tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
--exclude '*.DS_Store' \ --exclude '*.DS_Store' \
--exclude '*.kate-swp' \ --exclude '*.kate-swp' \
@ -122,7 +122,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude 'docs/_build' \ --exclude 'docs/_build' \
-- \ -- \
bin devscripts test youtube_dl docs \ bin devscripts test youtube_dl docs \
ChangeLog LICENSE README.md README.txt \ ChangeLog AUTHORS LICENSE README.md README.txt \
Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \ Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \
youtube-dl.zsh youtube-dl.fish setup.py setup.cfg \ youtube-dl.zsh youtube-dl.fish setup.py setup.cfg \
youtube-dl youtube-dl

View File

@ -539,6 +539,8 @@ The basic usage is not to set any template arguments when downloading a single f
- `playlist_index` (numeric): Index of the video in the playlist padded with leading zeros according to the total length of the playlist - `playlist_index` (numeric): Index of the video in the playlist padded with leading zeros according to the total length of the playlist
- `playlist_id` (string): Playlist identifier - `playlist_id` (string): Playlist identifier
- `playlist_title` (string): Playlist title - `playlist_title` (string): Playlist title
- `playlist_uploader` (string): Full name of the playlist uploader
- `playlist_uploader_id` (string): Nickname or id of the playlist uploader
Available for the video that belongs to some logical chapter or section: Available for the video that belongs to some logical chapter or section:

5
devscripts/install_jython.sh Executable file
View File

@ -0,0 +1,5 @@
#!/bin/bash
wget http://central.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar
java -jar jython-installer-2.7.1.jar -s -d "$HOME/jython"
$HOME/jython/bin/jython -m pip install nose

View File

@ -3,6 +3,7 @@
- **1up.com** - **1up.com**
- **20min** - **20min**
- **220.ro** - **220.ro**
- **23video**
- **24video** - **24video**
- **3qsdn**: 3Q SDN - **3qsdn**: 3Q SDN
- **3sat** - **3sat**
@ -10,6 +11,7 @@
- **56.com** - **56.com**
- **5min** - **5min**
- **6play** - **6play**
- **7plus**
- **8tracks** - **8tracks**
- **91porn** - **91porn**
- **9c9media** - **9c9media**
@ -169,7 +171,6 @@
- **CNN** - **CNN**
- **CNNArticle** - **CNNArticle**
- **CNNBlogs** - **CNNBlogs**
- **CollegeRama**
- **ComCarCoff** - **ComCarCoff**
- **ComedyCentral** - **ComedyCentral**
- **ComedyCentralFullEpisodes** - **ComedyCentralFullEpisodes**
@ -268,6 +269,8 @@
- **Fczenit** - **Fczenit**
- **filmon** - **filmon**
- **filmon:channel** - **filmon:channel**
- **Filmweb**
- **FiveThirtyEight**
- **FiveTV** - **FiveTV**
- **Flickr** - **Flickr**
- **Flipagram** - **Flipagram**
@ -358,6 +361,7 @@
- **InfoQ** - **InfoQ**
- **Instagram** - **Instagram**
- **instagram:user**: Instagram user profile - **instagram:user**: Instagram user profile
- **Internazionale**
- **InternetVideoArchive** - **InternetVideoArchive**
- **IPrima** - **IPrima**
- **iqiyi**: 爱奇艺 - **iqiyi**: 爱奇艺
@ -444,6 +448,7 @@
- **media.ccc.de** - **media.ccc.de**
- **Medialaan** - **Medialaan**
- **Mediaset** - **Mediaset**
- **Mediasite**
- **Medici** - **Medici**
- **megaphone.fm**: megaphone.fm embedded players - **megaphone.fm**: megaphone.fm embedded players
- **Meipai**: 美拍 - **Meipai**: 美拍
@ -712,7 +717,6 @@
- **safari**: safaribooksonline.com online video - **safari**: safaribooksonline.com online video
- **safari:api** - **safari:api**
- **safari:course**: safaribooksonline.com online courses - **safari:course**: safaribooksonline.com online courses
- **Sandia**: Sandia National Laboratories
- **Sapo**: SAPO Vídeos - **Sapo**: SAPO Vídeos
- **savefrom.net** - **savefrom.net**
- **SBS**: sbs.com.au - **SBS**: sbs.com.au
@ -728,6 +732,7 @@
- **Servus** - **Servus**
- **Sexu** - **Sexu**
- **Shahid** - **Shahid**
- **ShahidShow**
- **Shared**: shared.sx - **Shared**: shared.sx
- **ShowRoomLive** - **ShowRoomLive**
- **Sina** - **Sina**
@ -886,7 +891,9 @@
- **udemy** - **udemy**
- **udemy:course** - **udemy:course**
- **UDNEmbed**: 聯合影音 - **UDNEmbed**: 聯合影音
- **UFCTV**
- **UKTVPlay** - **UKTVPlay**
- **umg:de**: Universal Music Deutschland
- **Unistra** - **Unistra**
- **Unity** - **Unity**
- **uol.com.br** - **uol.com.br**

View File

@ -109,6 +109,7 @@ setup(
author_email='ytdl@yt-dl.org', author_email='ytdl@yt-dl.org',
maintainer='Sergey M.', maintainer='Sergey M.',
maintainer_email='dstftw@gmail.com', maintainer_email='dstftw@gmail.com',
license='Unlicense',
packages=[ packages=[
'youtube_dl', 'youtube_dl',
'youtube_dl.extractor', 'youtube_dl.downloader', 'youtube_dl.extractor', 'youtube_dl.downloader',

View File

@ -493,9 +493,20 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
_TEST_CASES = [ _TEST_CASES = [
( (
# https://github.com/rg3/youtube-dl/issues/13919 # https://github.com/rg3/youtube-dl/issues/13919
# Also tests duplicate representation ids, see
# https://github.com/rg3/youtube-dl/issues/15111
'float_duration', 'float_duration',
'http://unknown/manifest.mpd', 'http://unknown/manifest.mpd',
[{ [{
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'm4a',
'format_id': '318597',
'format_note': 'DASH audio',
'protocol': 'http_dash_segments',
'acodec': 'mp4a.40.2',
'vcodec': 'none',
'tbr': 61.587,
}, {
'manifest_url': 'http://unknown/manifest.mpd', 'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4', 'ext': 'mp4',
'format_id': '318597', 'format_id': '318597',

View File

@ -343,6 +343,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100) self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
self.assertEqual(unified_timestamp('2017-03-30T17:52:41Q'), 1490896361) self.assertEqual(unified_timestamp('2017-03-30T17:52:41Q'), 1490896361)
self.assertEqual(unified_timestamp('Sep 11, 2013 | 5:49 AM'), 1378878540) self.assertEqual(unified_timestamp('Sep 11, 2013 | 5:49 AM'), 1378878540)
self.assertEqual(unified_timestamp('December 15, 2017 at 7:49 am'), 1513324140)
def test_determine_ext(self): def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4') self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')

View File

@ -975,6 +975,8 @@ class YoutubeDL(object):
'playlist': playlist, 'playlist': playlist,
'playlist_id': ie_result.get('id'), 'playlist_id': ie_result.get('id'),
'playlist_title': ie_result.get('title'), 'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': i + playliststart, 'playlist_index': i + playliststart,
'extractor': ie_result['extractor'], 'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'], 'webpage_url': ie_result['webpage_url'],
@ -2231,8 +2233,16 @@ class YoutubeDL(object):
sys.exc_clear() sys.exc_clear()
except Exception: except Exception:
pass pass
self._write_string('[debug] Python version %s - %s\n' % (
platform.python_version(), platform_name())) def python_implementation():
impl_name = platform.python_implementation()
if impl_name == 'PyPy' and hasattr(sys, 'pypy_version_info'):
return impl_name + ' version %d.%d.%d' % sys.pypy_version_info[:3]
return impl_name
self._write_string('[debug] Python version %s (%s) - %s\n' % (
platform.python_version(), python_implementation(),
platform_name()))
exe_versions = FFmpegPostProcessor.get_versions(self) exe_versions = FFmpegPostProcessor.get_versions(self)
exe_versions['rtmpdump'] = rtmpdump_version() exe_versions['rtmpdump'] = rtmpdump_version()

View File

@ -3,12 +3,14 @@ from __future__ import unicode_literals
import binascii import binascii
import collections import collections
import ctypes
import email import email
import getpass import getpass
import io import io
import itertools import itertools
import optparse import optparse
import os import os
import platform
import re import re
import shlex import shlex
import shutil import shutil
@ -2906,6 +2908,24 @@ except ImportError: # not 2.6+ or is 3.x
except ImportError: except ImportError:
compat_zip = zip compat_zip = zip
if platform.python_implementation() == 'PyPy' and sys.pypy_version_info < (5, 4, 0):
# PyPy2 prior to version 5.4.0 expects byte strings as Windows function
# names, see the original PyPy issue [1] and the youtube-dl one [2].
# 1. https://bitbucket.org/pypy/pypy/issues/2360/windows-ctypescdll-typeerror-function-name
# 2. https://github.com/rg3/youtube-dl/pull/4392
def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
real = ctypes.WINFUNCTYPE(*args, **kwargs)
def resf(tpl, *args, **kwargs):
funcname, dll = tpl
return real((str(funcname), dll), *args, **kwargs)
return resf
else:
def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
return ctypes.WINFUNCTYPE(*args, **kwargs)
__all__ = [ __all__ = [
'compat_HTMLParseError', 'compat_HTMLParseError',
'compat_HTMLParser', 'compat_HTMLParser',
@ -2914,6 +2934,7 @@ __all__ = [
'compat_chr', 'compat_chr',
'compat_cookiejar', 'compat_cookiejar',
'compat_cookies', 'compat_cookies',
'compat_ctypes_WINFUNCTYPE',
'compat_etree_fromstring', 'compat_etree_fromstring',
'compat_etree_register_namespace', 'compat_etree_register_namespace',
'compat_expanduser', 'compat_expanduser',

View File

@ -112,7 +112,7 @@ class FragmentFD(FileDownloader):
if self.__do_ytdl_file(ctx): if self.__do_ytdl_file(ctx):
self._write_ytdl_file(ctx) self._write_ytdl_file(ctx)
if not self.params.get('keep_fragments', False): if not self.params.get('keep_fragments', False):
os.remove(ctx['fragment_filename_sanitized']) os.remove(encodeFilename(ctx['fragment_filename_sanitized']))
del ctx['fragment_filename_sanitized'] del ctx['fragment_filename_sanitized']
def _prepare_frag_download(self, ctx): def _prepare_frag_download(self, ctx):

View File

@ -163,7 +163,8 @@ class HlsFD(FragmentFD):
return False return False
if decrypt_info['METHOD'] == 'AES-128': if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence) iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(decrypt_info['URI']).read() decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(
self._prepare_url(info_dict, decrypt_info['URI'])).read()
frag_content = AES.new( frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content) decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
self._append_fragment(ctx, frag_content) self._append_fragment(ctx, frag_content)

View File

@ -1,6 +1,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import hashlib
import hmac
import re import re
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
@ -10,6 +13,7 @@ from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
try_get, try_get,
update_url_query,
) )
@ -101,21 +105,24 @@ class ABCIE(InfoExtractor):
class ABCIViewIE(InfoExtractor): class ABCIViewIE(InfoExtractor):
IE_NAME = 'abc.net.au:iview' IE_NAME = 'abc.net.au:iview'
_VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)' _VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)'
_GEO_COUNTRIES = ['AU']
# ABC iview programs are normally available for 14 days only. # ABC iview programs are normally available for 14 days only.
_TESTS = [{ _TESTS = [{
'url': 'http://iview.abc.net.au/programs/diaries-of-a-broken-mind/ZX9735A001S00', 'url': 'http://iview.abc.net.au/programs/call-the-midwife/ZW0898A003S00',
'md5': 'cde42d728b3b7c2b32b1b94b4a548afc', 'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
'info_dict': { 'info_dict': {
'id': 'ZX9735A001S00', 'id': 'ZW0898A003S00',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Diaries Of A Broken Mind', 'title': 'Series 5 Ep 3',
'description': 'md5:7de3903874b7a1be279fe6b68718fc9e', 'description': 'md5:e0ef7d4f92055b86c4f33611f180ed79',
'upload_date': '20161010', 'upload_date': '20171228',
'uploader_id': 'abc2', 'uploader_id': 'abc1',
'timestamp': 1476064920, 'timestamp': 1514499187,
},
'params': {
'skip_download': True,
}, },
'skip': 'Video gone',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -126,20 +133,30 @@ class ABCIViewIE(InfoExtractor):
title = video_params.get('title') or video_params['seriesTitle'] title = video_params.get('title') or video_params['seriesTitle']
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program') stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
format_urls = [ house_number = video_params.get('episodeHouseNumber')
try_get(stream, lambda x: x['hds-unmetered'], compat_str)] path = '/auth/hls/sign?ts={0}&hn={1}&d=android-mobile'.format(
int(time.time()), house_number)
sig = hmac.new(
'android.content.res.Resources'.encode('utf-8'),
path.encode('utf-8'), hashlib.sha256).hexdigest()
token = self._download_webpage(
'http://iview.abc.net.au{0}&sig={1}'.format(path, sig), video_id)
# May have higher quality video def tokenize_url(url, token):
sd_url = try_get( return update_url_query(url, {
stream, lambda x: x['streams']['hds']['sd'], compat_str) 'hdnea': token,
if sd_url: })
format_urls.append(sd_url.replace('metered', 'um'))
formats = [] for sd in ('sd', 'sd-low'):
for format_url in format_urls: sd_url = try_get(
if format_url: stream, lambda x: x['streams']['hls'][sd], compat_str)
formats.extend( if not sd_url:
self._extract_akamai_formats(format_url, video_id)) continue
formats = self._extract_m3u8_formats(
tokenize_url(sd_url, token), video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
if formats:
break
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}

View File

@ -8,7 +8,7 @@ from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, unified_timestamp,
OnDemandPagedList, OnDemandPagedList,
) )
@ -32,7 +32,7 @@ class ACastIE(InfoExtractor):
}, { }, {
# test with multiple blings # test with multiple blings
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna', 'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'md5': '55c0097badd7095f494c99a172f86501', 'md5': 'e87d5b8516cd04c0d81b6ee1caca28d0',
'info_dict': { 'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9', 'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3', 'ext': 'mp3',
@ -40,23 +40,24 @@ class ACastIE(InfoExtractor):
'timestamp': 1477346700, 'timestamp': 1477346700,
'upload_date': '20161024', 'upload_date': '20161024',
'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4', 'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4',
'duration': 2797, 'duration': 2766,
} }
}] }]
def _real_extract(self, url): def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups() channel, display_id = re.match(self._VALID_URL, url).groups()
cast_data = self._download_json( cast_data = self._download_json(
'https://embed.acast.com/api/acasts/%s/%s' % (channel, display_id), display_id) 'https://play-api.acast.com/splash/%s/%s' % (channel, display_id), display_id)
e = cast_data['result']['episode']
return { return {
'id': compat_str(cast_data['id']), 'id': compat_str(e['id']),
'display_id': display_id, 'display_id': display_id,
'url': [b['audio'] for b in cast_data['blings'] if b['type'] == 'BlingAudio'][0], 'url': e['mediaUrl'],
'title': cast_data['name'], 'title': e['name'],
'description': cast_data.get('description'), 'description': e.get('description'),
'thumbnail': cast_data.get('image'), 'thumbnail': e.get('image'),
'timestamp': parse_iso8601(cast_data.get('publishingDate')), 'timestamp': unified_timestamp(e.get('publishingDate')),
'duration': int_or_none(cast_data.get('duration')), 'duration': int_or_none(e.get('duration')),
} }

View File

@ -228,10 +228,19 @@ class AfreecaTVIE(InfoExtractor):
r'^(\d{8})_', key, 'upload date', default=None) r'^(\d{8})_', key, 'upload date', default=None)
file_duration = int_or_none(file_element.get('duration')) file_duration = int_or_none(file_element.get('duration'))
format_id = key if key else '%s_%s' % (video_id, file_num) format_id = key if key else '%s_%s' % (video_id, file_num)
formats = self._extract_m3u8_formats( if determine_ext(file_url) == 'm3u8':
file_url, video_id, 'mp4', entry_protocol='m3u8_native', formats = self._extract_m3u8_formats(
m3u8_id='hls', file_url, video_id, 'mp4', entry_protocol='m3u8_native',
note='Downloading part %d m3u8 information' % file_num) m3u8_id='hls',
note='Downloading part %d m3u8 information' % file_num)
else:
formats = [{
'url': file_url,
'format_id': 'http',
}]
if not formats:
continue
self._sort_formats(formats)
file_info = common_entry.copy() file_info = common_entry.copy()
file_info.update({ file_info.update({
'id': format_id, 'id': format_id,

View File

@ -85,8 +85,8 @@ class AnimeOnDemandIE(InfoExtractor):
if all(p not in response for p in ('>Logout<', 'href="/users/sign_out"')): if all(p not in response for p in ('>Logout<', 'href="/users/sign_out"')):
error = self._search_regex( error = self._search_regex(
r'<p class="alert alert-danger">(.+?)</p>', r'<p[^>]+\bclass=(["\'])(?:(?!\1).)*\balert\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</p>',
response, 'error', default=None) response, 'error', default=None, group='error')
if error: if error:
raise ExtractorError('Unable to login: %s' % error, expected=True) raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')

View File

@ -0,0 +1,78 @@
# coding: utf-8
from __future__ import unicode_literals
import datetime
import hashlib
import hmac
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode
class AWSIE(InfoExtractor):
_AWS_ALGORITHM = 'AWS4-HMAC-SHA256'
_AWS_REGION = 'us-east-1'
def _aws_execute_api(self, aws_dict, video_id, query=None):
query = query or {}
amz_date = datetime.datetime.utcnow().strftime('%Y%m%dT%H%M%SZ')
date = amz_date[:8]
headers = {
'Accept': 'application/json',
'Host': self._AWS_PROXY_HOST,
'X-Amz-Date': amz_date,
'X-Api-Key': self._AWS_API_KEY
}
session_token = aws_dict.get('session_token')
if session_token:
headers['X-Amz-Security-Token'] = session_token
def aws_hash(s):
return hashlib.sha256(s.encode('utf-8')).hexdigest()
# Task 1: http://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html
canonical_querystring = compat_urllib_parse_urlencode(query)
canonical_headers = ''
for header_name, header_value in sorted(headers.items()):
canonical_headers += '%s:%s\n' % (header_name.lower(), header_value)
signed_headers = ';'.join([header.lower() for header in sorted(headers.keys())])
canonical_request = '\n'.join([
'GET',
aws_dict['uri'],
canonical_querystring,
canonical_headers,
signed_headers,
aws_hash('')
])
# Task 2: http://docs.aws.amazon.com/general/latest/gr/sigv4-create-string-to-sign.html
credential_scope_list = [date, self._AWS_REGION, 'execute-api', 'aws4_request']
credential_scope = '/'.join(credential_scope_list)
string_to_sign = '\n'.join([self._AWS_ALGORITHM, amz_date, credential_scope, aws_hash(canonical_request)])
# Task 3: http://docs.aws.amazon.com/general/latest/gr/sigv4-calculate-signature.html
def aws_hmac(key, msg):
return hmac.new(key, msg.encode('utf-8'), hashlib.sha256)
def aws_hmac_digest(key, msg):
return aws_hmac(key, msg).digest()
def aws_hmac_hexdigest(key, msg):
return aws_hmac(key, msg).hexdigest()
k_signing = ('AWS4' + aws_dict['secret_key']).encode('utf-8')
for value in credential_scope_list:
k_signing = aws_hmac_digest(k_signing, value)
signature = aws_hmac_hexdigest(k_signing, string_to_sign)
# Task 4: http://docs.aws.amazon.com/general/latest/gr/sigv4-add-signature-to-request.html
headers['Authorization'] = ', '.join([
'%s Credential=%s/%s' % (self._AWS_ALGORITHM, aws_dict['access_key'], credential_scope),
'SignedHeaders=%s' % signed_headers,
'Signature=%s' % signature,
])
return self._download_json(
'https://%s%s%s' % (self._AWS_PROXY_HOST, aws_dict['uri'], '?' + canonical_querystring if canonical_querystring else ''),
video_id, headers=headers)

View File

@ -464,7 +464,7 @@ class BrightcoveNewIE(AdobePassIE):
'timestamp': 1441391203, 'timestamp': 1441391203,
'upload_date': '20150904', 'upload_date': '20150904',
'uploader_id': '929656772001', 'uploader_id': '929656772001',
'formats': 'mincount:22', 'formats': 'mincount:20',
}, },
}, { }, {
# with rtmp streams # with rtmp streams
@ -478,7 +478,7 @@ class BrightcoveNewIE(AdobePassIE):
'timestamp': 1433556729, 'timestamp': 1433556729,
'upload_date': '20150606', 'upload_date': '20150606',
'uploader_id': '4036320279001', 'uploader_id': '4036320279001',
'formats': 'mincount:41', 'formats': 'mincount:39',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -564,59 +564,7 @@ class BrightcoveNewIE(AdobePassIE):
return entries return entries
def _real_extract(self, url): def _parse_brightcove_metadata(self, json_data, video_id):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
account_id, player_id, embed, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(
'http://players.brightcove.net/%s/%s_%s/index.min.js'
% (account_id, player_id, embed), video_id)
policy_key = None
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
if not policy_key:
policy_key = self._search_regex(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s' % (account_id, video_id)
try:
json_data = self._download_json(api_url, video_id, headers={
'Accept': 'application/json;pk=%s' % policy_key
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
message = json_data.get('message') or json_data['error_code']
if json_data.get('error_subcode') == 'CLIENT_GEO':
self.raise_geo_restricted(msg=message)
raise ExtractorError(message, expected=True)
raise
errors = json_data.get('errors')
if errors and errors[0].get('error_subcode') == 'TVE_AUTH':
custom_fields = json_data['custom_fields']
tve_token = self._extract_mvpd_auth(
smuggled_data['source_url'], video_id,
custom_fields['bcadobepassrequestorid'],
custom_fields['bcadobepassresourceid'])
json_data = self._download_json(
api_url, video_id, headers={
'Accept': 'application/json;pk=%s' % policy_key
}, query={
'tveToken': tve_token,
})
title = json_data['name'].strip() title = json_data['name'].strip()
formats = [] formats = []
@ -682,6 +630,7 @@ class BrightcoveNewIE(AdobePassIE):
}) })
formats.append(f) formats.append(f)
errors = json_data.get('errors')
if not formats and errors: if not formats and errors:
error = errors[0] error = errors[0]
raise ExtractorError( raise ExtractorError(
@ -708,9 +657,64 @@ class BrightcoveNewIE(AdobePassIE):
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'), 'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': duration, 'duration': duration,
'timestamp': parse_iso8601(json_data.get('published_at')), 'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': account_id, 'uploader_id': json_data.get('account_id'),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'tags': json_data.get('tags', []), 'tags': json_data.get('tags', []),
'is_live': is_live, 'is_live': is_live,
} }
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
account_id, player_id, embed, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(
'http://players.brightcove.net/%s/%s_%s/index.min.js'
% (account_id, player_id, embed), video_id)
policy_key = None
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
if not policy_key:
policy_key = self._search_regex(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s' % (account_id, video_id)
try:
json_data = self._download_json(api_url, video_id, headers={
'Accept': 'application/json;pk=%s' % policy_key
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
message = json_data.get('message') or json_data['error_code']
if json_data.get('error_subcode') == 'CLIENT_GEO':
self.raise_geo_restricted(msg=message)
raise ExtractorError(message, expected=True)
raise
errors = json_data.get('errors')
if errors and errors[0].get('error_subcode') == 'TVE_AUTH':
custom_fields = json_data['custom_fields']
tve_token = self._extract_mvpd_auth(
smuggled_data['source_url'], video_id,
custom_fields['bcadobepassrequestorid'],
custom_fields['bcadobepassresourceid'])
json_data = self._download_json(
api_url, video_id, headers={
'Accept': 'application/json;pk=%s' % policy_key
}, query={
'tveToken': tve_token,
})
return self._parse_brightcove_metadata(json_data, video_id)

View File

@ -91,12 +91,10 @@ class CBSLocalIE(AnvatoIE):
info_dict = self._extract_anvato_videos(webpage, display_id) info_dict = self._extract_anvato_videos(webpage, display_id)
time_str = self._html_search_regex( timestamp = unified_timestamp(self._html_search_regex(
r'class="entry-date">([^<]+)<', webpage, 'released date', default=None) r'class="(?:entry|post)-date"[^>]*>([^<]+)', webpage,
if time_str: 'released date', default=None)) or parse_iso8601(
timestamp = unified_timestamp(time_str) self._html_search_meta('uploadDate', webpage))
else:
timestamp = parse_iso8601(self._html_search_meta('uploadDate', webpage))
info_dict.update({ info_dict.update({
'display_id': display_id, 'display_id': display_id,

View File

@ -1,93 +0,0 @@
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
sanitized_Request,
)
class CollegeRamaIE(InfoExtractor):
_VALID_URL = r'https?://collegerama\.tudelft\.nl/Mediasite/Play/(?P<id>[\da-f]+)'
_TESTS = [
{
'url': 'https://collegerama.tudelft.nl/Mediasite/Play/585a43626e544bdd97aeb71a0ec907a01d',
'md5': '481fda1c11f67588c0d9d8fbdced4e39',
'info_dict': {
'id': '585a43626e544bdd97aeb71a0ec907a01d',
'ext': 'mp4',
'title': 'Een nieuwe wereld: waarden, bewustzijn en techniek van de mensheid 2.0.',
'description': '',
'thumbnail': r're:^https?://.*\.jpg(?:\?.*?)?$',
'duration': 7713.088,
'timestamp': 1413309600,
'upload_date': '20141014',
},
},
{
'url': 'https://collegerama.tudelft.nl/Mediasite/Play/86a9ea9f53e149079fbdb4202b521ed21d?catalog=fd32fd35-6c99-466c-89d4-cd3c431bc8a4',
'md5': 'ef1fdded95bdf19b12c5999949419c92',
'info_dict': {
'id': '86a9ea9f53e149079fbdb4202b521ed21d',
'ext': 'wmv',
'title': '64ste Vakantiecursus: Afvalwater',
'description': 'md5:7fd774865cc69d972f542b157c328305',
'thumbnail': r're:^https?://.*\.jpg(?:\?.*?)?$',
'duration': 10853,
'timestamp': 1326446400,
'upload_date': '20120113',
},
},
]
def _real_extract(self, url):
video_id = self._match_id(url)
player_options_request = {
'getPlayerOptionsRequest': {
'ResourceId': video_id,
'QueryString': '',
}
}
request = sanitized_Request(
'http://collegerama.tudelft.nl/Mediasite/PlayerService/PlayerService.svc/json/GetPlayerOptions',
json.dumps(player_options_request))
request.add_header('Content-Type', 'application/json')
player_options = self._download_json(request, video_id)
presentation = player_options['d']['Presentation']
title = presentation['Title']
description = presentation.get('Description')
thumbnail = None
duration = float_or_none(presentation.get('Duration'), 1000)
timestamp = int_or_none(presentation.get('UnixTime'), 1000)
formats = []
for stream in presentation['Streams']:
for video in stream['VideoUrls']:
thumbnail_url = stream.get('ThumbnailUrl')
if thumbnail_url:
thumbnail = 'http://collegerama.tudelft.nl' + thumbnail_url
format_id = video['MediaType']
if format_id == 'SS':
continue
formats.append({
'url': video['Location'],
'format_id': format_id,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@ -301,8 +301,9 @@ class InfoExtractor(object):
There must be a key "entries", which is a list, an iterable, or a PagedList There must be a key "entries", which is a list, an iterable, or a PagedList
object, each element of which is a valid dictionary by this specification. object, each element of which is a valid dictionary by this specification.
Additionally, playlists can have "title", "description" and "id" attributes Additionally, playlists can have "id", "title", "description", "uploader",
with the same semantics as videos (see above). "uploader_id", "uploader_url" attributes with the same semantics as videos
(see above).
_type "multi_video" indicates that there are multiple videos that _type "multi_video" indicates that there are multiple videos that
@ -494,6 +495,16 @@ class InfoExtractor(object):
self.to_screen('%s' % (note,)) self.to_screen('%s' % (note,))
else: else:
self.to_screen('%s: %s' % (video_id, note)) self.to_screen('%s: %s' % (video_id, note))
# Some sites check X-Forwarded-For HTTP header in order to figure out
# the origin of the client behind proxy. This allows bypassing geo
# restriction by faking this header's value to IP that belongs to some
# geo unrestricted country. We will do so once we encounter any
# geo restriction error.
if self._x_forwarded_for_ip:
if 'X-Forwarded-For' not in headers:
headers['X-Forwarded-For'] = self._x_forwarded_for_ip
if isinstance(url_or_request, compat_urllib_request.Request): if isinstance(url_or_request, compat_urllib_request.Request):
url_or_request = update_Request( url_or_request = update_Request(
url_or_request, data=data, headers=headers, query=query) url_or_request, data=data, headers=headers, query=query)
@ -523,15 +534,6 @@ class InfoExtractor(object):
if isinstance(url_or_request, (compat_str, str)): if isinstance(url_or_request, (compat_str, str)):
url_or_request = url_or_request.partition('#')[0] url_or_request = url_or_request.partition('#')[0]
# Some sites check X-Forwarded-For HTTP header in order to figure out
# the origin of the client behind proxy. This allows bypassing geo
# restriction by faking this header's value to IP that belongs to some
# geo unrestricted country. We will do so once we encounter any
# geo restriction error.
if self._x_forwarded_for_ip:
if 'X-Forwarded-For' not in headers:
headers['X-Forwarded-For'] = self._x_forwarded_for_ip
urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query) urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query)
if urlh is False: if urlh is False:
assert not fatal assert not fatal
@ -1878,6 +1880,7 @@ class InfoExtractor(object):
'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None, 'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None,
'format_note': 'DASH %s' % content_type, 'format_note': 'DASH %s' % content_type,
'filesize': filesize, 'filesize': filesize,
'container': mimetype2ext(mime_type) + '_dash',
} }
f.update(parse_codecs(representation_attrib.get('codecs'))) f.update(parse_codecs(representation_attrib.get('codecs')))
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info) representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
@ -2005,16 +2008,14 @@ class InfoExtractor(object):
f['url'] = initialization_url f['url'] = initialization_url
f['fragments'].append({location_key(initialization_url): initialization_url}) f['fragments'].append({location_key(initialization_url): initialization_url})
f['fragments'].extend(representation_ms_info['fragments']) f['fragments'].extend(representation_ms_info['fragments'])
try: # According to [1, 5.3.5.2, Table 7, page 35] @id of Representation
existing_format = next( # is not necessarily unique within a Period thus formats with
fo for fo in formats # the same `format_id` are quite possible. There are numerous examples
if fo['format_id'] == representation_id) # of such manifests (see https://github.com/rg3/youtube-dl/issues/15111,
except StopIteration: # https://github.com/rg3/youtube-dl/issues/13919)
full_info = formats_dict.get(representation_id, {}).copy() full_info = formats_dict.get(representation_id, {}).copy()
full_info.update(f) full_info.update(f)
formats.append(full_info) formats.append(full_info)
else:
existing_format.update(f)
else: else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type) self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
return formats return formats
@ -2054,7 +2055,7 @@ class InfoExtractor(object):
stream_timescale = int_or_none(stream.get('TimeScale')) or timescale stream_timescale = int_or_none(stream.get('TimeScale')) or timescale
stream_name = stream.get('Name') stream_name = stream.get('Name')
for track in stream.findall('QualityLevel'): for track in stream.findall('QualityLevel'):
fourcc = track.get('FourCC') fourcc = track.get('FourCC', 'AACL' if track.get('AudioTag') == '255' else None)
# TODO: add support for WVC1 and WMAP # TODO: add support for WVC1 and WMAP
if fourcc not in ('H264', 'AVC1', 'AACL'): if fourcc not in ('H264', 'AVC1', 'AACL'):
self.report_warning('%s is not a supported codec' % fourcc) self.report_warning('%s is not a supported codec' % fourcc)

View File

@ -392,7 +392,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'Downloading subtitles for ' + sub_name, data={ 'Downloading subtitles for ' + sub_name, data={
'subtitle_script_id': sub_id, 'subtitle_script_id': sub_id,
}) })
if not sub_doc: if sub_doc is None:
continue continue
sid = sub_doc.get('id') sid = sub_doc.get('id')
iv = xpath_text(sub_doc, 'iv', 'subtitle iv') iv = xpath_text(sub_doc, 'iv', 'subtitle iv')
@ -479,9 +479,9 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'video_quality': stream_quality, 'video_quality': stream_quality,
'current_page': url, 'current_page': url,
}) })
if streamdata: if streamdata is not None:
stream_info = streamdata.find('./{default}preload/stream_info') stream_info = streamdata.find('./{default}preload/stream_info')
if stream_info: if stream_info is not None:
stream_infos.append(stream_info) stream_infos.append(stream_info)
stream_info = self._call_rpc_api( stream_info = self._call_rpc_api(
'VideoEncode_GetStreamInfo', video_id, 'VideoEncode_GetStreamInfo', video_id,
@ -490,7 +490,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'video_format': stream_format, 'video_format': stream_format,
'video_encode_quality': stream_quality, 'video_encode_quality': stream_quality,
}) })
if stream_info: if stream_info is not None:
stream_infos.append(stream_info) stream_infos.append(stream_info)
for stream_info in stream_infos: for stream_info in stream_infos:
video_encode_id = xpath_text(stream_info, './video_encode_id') video_encode_id = xpath_text(stream_info, './video_encode_id')

View File

@ -4,13 +4,14 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none,
unescapeHTML,
find_xpath_attr,
smuggle_url,
determine_ext, determine_ext,
ExtractorError, ExtractorError,
extract_attributes, extract_attributes,
find_xpath_attr,
get_element_by_class,
int_or_none,
smuggle_url,
unescapeHTML,
) )
from .senateisvp import SenateISVPIE from .senateisvp import SenateISVPIE
from .ustream import UstreamIE from .ustream import UstreamIE
@ -68,6 +69,10 @@ class CSpanIE(InfoExtractor):
'uploader': 'HouseCommittee', 'uploader': 'HouseCommittee',
'uploader_id': '12987475', 'uploader_id': '12987475',
}, },
}, {
# Audio Only
'url': 'https://www.c-span.org/video/?437336-1/judiciary-antitrust-competition-policy-consumer-rights',
'only_matching': True,
}] }]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s' BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s'
@ -111,7 +116,15 @@ class CSpanIE(InfoExtractor):
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
surl = smuggle_url(senate_isvp_url, {'force_title': title}) surl = smuggle_url(senate_isvp_url, {'force_title': title})
return self.url_result(surl, 'SenateISVP', video_id, title) return self.url_result(surl, 'SenateISVP', video_id, title)
video_id = self._search_regex(
r'jwsetup\.clipprog\s*=\s*(\d+);',
webpage, 'jwsetup program id', default=None)
if video_id:
video_type = 'program'
if video_type is None or video_id is None: if video_type is None or video_id is None:
error_message = get_element_by_class('VLplayer-error-message', webpage)
if error_message:
raise ExtractorError(error_message)
raise ExtractorError('unable to find video id and type') raise ExtractorError('unable to find video id and type')
def get_text_attr(d, attr): def get_text_attr(d, attr):
@ -138,7 +151,7 @@ class CSpanIE(InfoExtractor):
entries = [] entries = []
for partnum, f in enumerate(files): for partnum, f in enumerate(files):
formats = [] formats = []
for quality in f['qualities']: for quality in f.get('qualities', []):
formats.append({ formats.append({
'format_id': '%s-%sp' % (get_text_attr(quality, 'bitrate'), get_text_attr(quality, 'height')), 'format_id': '%s-%sp' % (get_text_attr(quality, 'bitrate'), get_text_attr(quality, 'height')),
'url': unescapeHTML(get_text_attr(quality, 'file')), 'url': unescapeHTML(get_text_attr(quality, 'file')),

View File

@ -5,6 +5,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext,
extract_attributes, extract_attributes,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
@ -73,7 +74,11 @@ class DiscoveryGoBaseIE(InfoExtractor):
not subtitle_url.startswith('http')): not subtitle_url.startswith('http')):
continue continue
lang = caption.get('fileLang', 'en') lang = caption.get('fileLang', 'en')
subtitles.setdefault(lang, []).append({'url': subtitle_url}) ext = determine_ext(subtitle_url)
subtitles.setdefault(lang, []).append({
'url': subtitle_url,
'ext': 'ttml' if ext == 'xml' else ext,
})
return { return {
'id': video_id, 'id': video_id,

View File

@ -10,6 +10,7 @@ from ..utils import (
compat_str, compat_str,
determine_ext, determine_ext,
ExtractorError, ExtractorError,
update_url_query,
) )
@ -108,9 +109,16 @@ class DisneyIE(InfoExtractor):
continue continue
tbr = int_or_none(flavor.get('bitrate')) tbr = int_or_none(flavor.get('bitrate'))
if tbr == 99999: if tbr == 99999:
formats.extend(self._extract_m3u8_formats( # wrong ks(Kaltura Signature) causes 404 Error
flavor_url = update_url_query(flavor_url, {'ks': ''})
m3u8_formats = self._extract_m3u8_formats(
flavor_url, video_id, 'mp4', flavor_url, video_id, 'mp4',
m3u8_id=flavor_format, fatal=False)) m3u8_id=flavor_format, fatal=False)
for f in m3u8_formats:
# Apple FairPlay
if '/fpshls/' in f['url']:
continue
formats.append(f)
continue continue
format_id = [] format_id = []
if flavor_format: if flavor_format:

View File

@ -1,6 +1,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from .once import OnceIE
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
@ -9,22 +12,27 @@ from ..utils import (
) )
class ESPNIE(InfoExtractor): class ESPNIE(OnceIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:
(?:(?:\w+\.)+)?espn\.go|
(?:www\.)?espn
)\.com/
(?: (?:
(?: (?:
video/clip| (?:
watch/player (?:(?:\w+\.)+)?espn\.go|
) (?:www\.)?espn
(?: )\.com/
\?.*?\bid=| (?:
/_/id/ (?:
) video/(?:clip|iframe/twitter)|
watch/player
)
(?:
.*?\?.*?\bid=|
/_/id/
)
)
)|
(?:www\.)espnfc\.(?:com|us)/(?:video/)?[^/]+/\d+/video/
) )
(?P<id>\d+) (?P<id>\d+)
''' '''
@ -77,6 +85,15 @@ class ESPNIE(InfoExtractor):
}, { }, {
'url': 'http://www.espn.com/video/clip/_/id/17989860', 'url': 'http://www.espn.com/video/clip/_/id/17989860',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://espn.go.com/video/iframe/twitter/?cms=espn&id=10365079',
'only_matching': True,
}, {
'url': 'http://www.espnfc.us/video/espn-fc-tv/86/video/3319154/nashville-unveiled-as-the-newest-club-in-mls',
'only_matching': True,
}, {
'url': 'http://www.espnfc.com/english-premier-league/23/video/3324163/premier-league-in-90-seconds-golden-tweets',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -93,7 +110,9 @@ class ESPNIE(InfoExtractor):
def traverse_source(source, base_source_id=None): def traverse_source(source, base_source_id=None):
for source_id, source in source.items(): for source_id, source in source.items():
if isinstance(source, compat_str): if source_id == 'alert':
continue
elif isinstance(source, compat_str):
extract_source(source, base_source_id) extract_source(source, base_source_id)
elif isinstance(source, dict): elif isinstance(source, dict):
traverse_source( traverse_source(
@ -106,7 +125,9 @@ class ESPNIE(InfoExtractor):
return return
format_urls.add(source_url) format_urls.add(source_url)
ext = determine_ext(source_url) ext = determine_ext(source_url)
if ext == 'smil': if OnceIE.suitable(source_url):
formats.extend(self._extract_once_formats(source_url))
elif ext == 'smil':
formats.extend(self._extract_smil_formats( formats.extend(self._extract_smil_formats(
source_url, video_id, fatal=False)) source_url, video_id, fatal=False))
elif ext == 'f4m': elif ext == 'f4m':
@ -117,12 +138,24 @@ class ESPNIE(InfoExtractor):
source_url, video_id, 'mp4', entry_protocol='m3u8_native', source_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=source_id, fatal=False)) m3u8_id=source_id, fatal=False))
else: else:
formats.append({ f = {
'url': source_url, 'url': source_url,
'format_id': source_id, 'format_id': source_id,
}) }
mobj = re.search(r'(\d+)p(\d+)_(\d+)k\.', source_url)
if mobj:
f.update({
'height': int(mobj.group(1)),
'fps': int(mobj.group(2)),
'tbr': int(mobj.group(3)),
})
if source_id == 'mezzanine':
f['preference'] = 1
formats.append(f)
traverse_source(clip['links']['source']) links = clip.get('links', {})
traverse_source(links.get('source', {}))
traverse_source(links.get('mobile', {}))
self._sort_formats(formats) self._sort_formats(formats)
description = clip.get('caption') or clip.get('description') description = clip.get('caption') or clip.get('description')
@ -144,9 +177,6 @@ class ESPNIE(InfoExtractor):
class ESPNArticleIE(InfoExtractor): class ESPNArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:espn\.go|(?:www\.)?espn)\.com/(?:[^/]+/)*(?P<id>[^/]+)' _VALID_URL = r'https?://(?:espn\.go|(?:www\.)?espn)\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://espn.go.com/video/iframe/twitter/?cms=espn&id=10365079',
'only_matching': True,
}, {
'url': 'http://espn.go.com/nba/recap?gameId=400793786', 'url': 'http://espn.go.com/nba/recap?gameId=400793786',
'only_matching': True, 'only_matching': True,
}, { }, {
@ -175,3 +205,34 @@ class ESPNArticleIE(InfoExtractor):
return self.url_result( return self.url_result(
'http://espn.go.com/video/clip?id=%s' % video_id, ESPNIE.ie_key()) 'http://espn.go.com/video/clip?id=%s' % video_id, ESPNIE.ie_key())
class FiveThirtyEightIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fivethirtyeight\.com/features/(?P<id>[^/?#]+)'
_TEST = {
'url': 'http://fivethirtyeight.com/features/how-the-6-8-raiders-can-still-make-the-playoffs/',
'info_dict': {
'id': '21846851',
'ext': 'mp4',
'title': 'FiveThirtyEight: The Raiders can still make the playoffs',
'description': 'Neil Paine breaks down the simplest scenario that will put the Raiders into the playoffs at 8-8.',
'timestamp': 1513960621,
'upload_date': '20171222',
},
'params': {
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r'data-video-id=["\'](?P<id>\d+)',
webpage, 'video id', group='id')
return self.url_result(
'http://espn.go.com/video/clip?id=%s' % video_id, ESPNIE.ie_key())

View File

@ -205,7 +205,6 @@ from .cnn import (
CNNArticleIE, CNNArticleIE,
) )
from .coub import CoubIE from .coub import CoubIE
from .collegerama import CollegeRamaIE
from .comedycentral import ( from .comedycentral import (
ComedyCentralFullEpisodesIE, ComedyCentralFullEpisodesIE,
ComedyCentralIE, ComedyCentralIE,
@ -322,6 +321,7 @@ from .escapist import EscapistIE
from .espn import ( from .espn import (
ESPNIE, ESPNIE,
ESPNArticleIE, ESPNArticleIE,
FiveThirtyEightIE,
) )
from .esri import EsriVideoIE from .esri import EsriVideoIE
from .etonline import ETOnlineIE from .etonline import ETOnlineIE
@ -344,6 +344,7 @@ from .filmon import (
FilmOnIE, FilmOnIE,
FilmOnChannelIE, FilmOnChannelIE,
) )
from .filmweb import FilmwebIE
from .firsttv import FirstTVIE from .firsttv import FirstTVIE
from .fivemin import FiveMinIE from .fivemin import FiveMinIE
from .fivetv import FiveTVIE from .fivetv import FiveTVIE
@ -464,6 +465,7 @@ from .indavideo import (
) )
from .infoq import InfoQIE from .infoq import InfoQIE
from .instagram import InstagramIE, InstagramUserIE from .instagram import InstagramIE, InstagramUserIE
from .internazionale import InternazionaleIE
from .internetvideoarchive import InternetVideoArchiveIE from .internetvideoarchive import InternetVideoArchiveIE
from .iprima import IPrimaIE from .iprima import IPrimaIE
from .iqiyi import IqiyiIE from .iqiyi import IqiyiIE
@ -573,6 +575,7 @@ from .massengeschmacktv import MassengeschmackTVIE
from .matchtv import MatchTVIE from .matchtv import MatchTVIE
from .mdr import MDRIE from .mdr import MDRIE
from .mediaset import MediasetIE from .mediaset import MediasetIE
from .mediasite import MediasiteIE
from .medici import MediciIE from .medici import MediciIE
from .megaphone import MegaphoneIE from .megaphone import MegaphoneIE
from .meipai import MeipaiIE from .meipai import MeipaiIE
@ -606,7 +609,10 @@ from .mofosex import MofosexIE
from .mojvideo import MojvideoIE from .mojvideo import MojvideoIE
from .moniker import MonikerIE from .moniker import MonikerIE
from .morningstar import MorningstarIE from .morningstar import MorningstarIE
from .motherless import MotherlessIE from .motherless import (
MotherlessIE,
MotherlessGroupIE
)
from .motorsport import MotorsportIE from .motorsport import MotorsportIE
from .movieclips import MovieClipsIE from .movieclips import MovieClipsIE
from .moviezine import MoviezineIE from .moviezine import MoviezineIE
@ -909,7 +915,6 @@ from .rutube import (
from .rutv import RUTVIE from .rutv import RUTVIE
from .ruutu import RuutuIE from .ruutu import RuutuIE
from .ruv import RuvIE from .ruv import RuvIE
from .sandia import SandiaIE
from .safari import ( from .safari import (
SafariIE, SafariIE,
SafariApiIE, SafariApiIE,
@ -926,8 +931,12 @@ from .senateisvp import SenateISVPIE
from .sendtonews import SendtoNewsIE from .sendtonews import SendtoNewsIE
from .servingsys import ServingSysIE from .servingsys import ServingSysIE
from .servus import ServusIE from .servus import ServusIE
from .sevenplus import SevenPlusIE
from .sexu import SexuIE from .sexu import SexuIE
from .shahid import ShahidIE from .shahid import (
ShahidIE,
ShahidShowIE,
)
from .shared import ( from .shared import (
SharedIE, SharedIE,
VivoIE, VivoIE,
@ -1059,6 +1068,7 @@ from .tnaflix import (
from .toggle import ToggleIE from .toggle import ToggleIE
from .tonline import TOnlineIE from .tonline import TOnlineIE
from .toongoggles import ToonGogglesIE from .toongoggles import ToonGogglesIE
from .totalwebcasting import TotalWebCastingIE
from .toutv import TouTvIE from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE from .traileraddict import TrailerAddictIE
@ -1115,6 +1125,7 @@ from .tvplayer import TVPlayerIE
from .tweakers import TweakersIE from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE from .twentyfourvideo import TwentyFourVideoIE
from .twentymin import TwentyMinutenIE from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE
from .twitch import ( from .twitch import (
TwitchVideoIE, TwitchVideoIE,
TwitchChapterIE, TwitchChapterIE,
@ -1137,8 +1148,10 @@ from .udemy import (
UdemyCourseIE UdemyCourseIE
) )
from .udn import UDNEmbedIE from .udn import UDNEmbedIE
from .ufctv import UFCTVIE
from .uktvplay import UKTVPlayIE from .uktvplay import UKTVPlayIE
from .digiteka import DigitekaIE from .digiteka import DigitekaIE
from .umg import UMGDeIE
from .unistra import UnistraIE from .unistra import UnistraIE
from .unity import UnityIE from .unity import UnityIE
from .uol import UOLIE from .uol import UOLIE

View File

@ -0,0 +1,42 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class FilmwebIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?filmweb\.no/(?P<type>trailere|filmnytt)/article(?P<id>\d+)\.ece'
_TEST = {
'url': 'http://www.filmweb.no/trailere/article1264921.ece',
'md5': 'e353f47df98e557d67edaceda9dece89',
'info_dict': {
'id': '13033574',
'ext': 'mp4',
'title': 'Det som en gang var',
'upload_date': '20160316',
'timestamp': 1458140101,
'uploader_id': '12639966',
'uploader': 'Live Roaldset',
}
}
def _real_extract(self, url):
article_type, article_id = re.match(self._VALID_URL, url).groups()
if article_type == 'filmnytt':
webpage = self._download_webpage(url, article_id)
article_id = self._search_regex(r'data-videoid="(\d+)"', webpage, 'article id')
embed_code = self._download_json(
'https://www.filmweb.no/template_v2/ajax/json_trailerEmbed.jsp',
article_id, query={
'articleId': article_id,
})['embedCode']
iframe_url = self._proto_relative_url(self._search_regex(
r'<iframe[^>]+src="([^"]+)', embed_code, 'iframe url'))
return {
'_type': 'url_transparent',
'id': article_id,
'url': iframe_url,
'ie_key': 'TwentyThreeVideo',
}

View File

@ -100,6 +100,7 @@ from .megaphone import MegaphoneIE
from .vzaar import VzaarIE from .vzaar import VzaarIE
from .channel9 import Channel9IE from .channel9 import Channel9IE
from .vshare import VShareIE from .vshare import VShareIE
from .mediasite import MediasiteIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@ -1925,6 +1926,18 @@ class GenericIE(InfoExtractor):
'title': 'vl14062007715967', 'title': 'vl14062007715967',
'ext': 'mp4', 'ext': 'mp4',
} }
},
{
'url': 'http://www.heidelberg-laureate-forum.org/blog/video/lecture-friday-september-23-2016-sir-c-antony-r-hoare/',
'md5': 'aecd089f55b1cb5a59032cb049d3a356',
'info_dict': {
'id': '90227f51a80c4d8f86c345a7fa62bd9a1d',
'ext': 'mp4',
'title': 'Lecture: Friday, September 23, 2016 - Sir Tony Hoare',
'description': 'md5:5a51db84a62def7b7054df2ade403c6c',
'timestamp': 1474354800,
'upload_date': '20160920',
}
} }
# { # {
# # TODO: find another test # # TODO: find another test
@ -2883,6 +2896,16 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
vshare_urls, video_id, video_title, ie=VShareIE.ie_key()) vshare_urls, video_id, video_title, ie=VShareIE.ie_key())
# Look for Mediasite embeds
mediasite_urls = MediasiteIE._extract_urls(webpage)
if mediasite_urls:
entries = [
self.url_result(smuggle_url(
compat_urlparse.urljoin(url, mediasite_url),
{'UrlReferrer': url}), ie=MediasiteIE.ie_key())
for mediasite_url in mediasite_urls]
return self.playlist_result(entries, video_id, video_title)
def merge_dicts(dict1, dict2): def merge_dicts(dict1, dict2):
merged = {} merged = {}
for k, v in dict1.items(): for k, v in dict1.items():

View File

@ -0,0 +1,64 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unified_timestamp
class InternazionaleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?internazionale\.it/video/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.internazionale.it/video/2015/02/19/richard-linklater-racconta-una-scena-di-boyhood',
'md5': '3e39d32b66882c1218e305acbf8348ca',
'info_dict': {
'id': '265968',
'display_id': 'richard-linklater-racconta-una-scena-di-boyhood',
'ext': 'mp4',
'title': 'Richard Linklater racconta una scena di Boyhood',
'description': 'md5:efb7e5bbfb1a54ae2ed5a4a015f0e665',
'timestamp': 1424354635,
'upload_date': '20150219',
'thumbnail': r're:^https?://.*\.jpg$',
},
'params': {
'format': 'bestvideo',
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
DATA_RE = r'data-%s=(["\'])(?P<value>(?:(?!\1).)+)\1'
title = self._search_regex(
DATA_RE % 'video-title', webpage, 'title', default=None,
group='value') or self._og_search_title(webpage)
video_id = self._search_regex(
DATA_RE % 'job-id', webpage, 'video id', group='value')
video_path = self._search_regex(
DATA_RE % 'video-path', webpage, 'video path', group='value')
video_base = 'https://video.internazionale.it/%s/%s.' % (video_path, video_id)
formats = self._extract_m3u8_formats(
video_base + 'm3u8', display_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
formats.extend(self._extract_mpd_formats(
video_base + 'mpd', display_id, mpd_id='dash', fatal=False))
self._sort_formats(formats)
timestamp = unified_timestamp(self._html_search_meta(
'article:published_time', webpage, 'timestamp'))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage),
'timestamp': timestamp,
'formats': formats,
}

View File

@ -125,9 +125,12 @@ class KalturaIE(InfoExtractor):
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)* (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
(?P=q1).*? (?P=q1).*?
(?: (?:
entry_?[Ii]d| (?:
(?P<q2>["'])entry_?[Ii]d(?P=q2) entry_?[Ii]d|
)\s*:\s* (?P<q2>["'])entry_?[Ii]d(?P=q2)
)\s*:\s*|
\[\s*(?P<q2_1>["'])entry_?[Ii]d(?P=q2_1)\s*\]\s*=\s*
)
(?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3) (?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
''', webpage) or ''', webpage) or
re.search( re.search(

View File

@ -94,7 +94,15 @@ class LyndaBaseIE(InfoExtractor):
class LyndaIE(LyndaBaseIE): class LyndaIE(LyndaBaseIE):
IE_NAME = 'lynda' IE_NAME = 'lynda'
IE_DESC = 'lynda.com videos' IE_DESC = 'lynda.com videos'
_VALID_URL = r'https?://(?:www\.)?(?:lynda\.com|educourse\.ga)/(?:[^/]+/[^/]+/(?P<course_id>\d+)|player/embed)/(?P<id>\d+)' _VALID_URL = r'''(?x)
https?://
(?:www\.)?(?:lynda\.com|educourse\.ga)/
(?:
(?:[^/]+/){2,3}(?P<course_id>\d+)|
player/embed
)/
(?P<id>\d+)
'''
_TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]' _TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]'
@ -113,6 +121,9 @@ class LyndaIE(LyndaBaseIE):
}, { }, {
'url': 'https://educourse.ga/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html', 'url': 'https://educourse.ga/Bootstrap-tutorials/Using-exercise-files/110885/114408-4.html',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.lynda.com/de/Graphic-Design-tutorials/Willkommen-Grundlagen-guten-Gestaltung/393570/393572-4.html',
'only_matching': True,
}] }]
def _raise_unavailable(self, video_id): def _raise_unavailable(self, video_id):
@ -244,8 +255,9 @@ class LyndaIE(LyndaBaseIE):
def _get_subtitles(self, video_id): def _get_subtitles(self, video_id):
url = 'https://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id url = 'https://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
subs = self._download_json(url, None, False) subs = self._download_json(url, None, False)
if subs: fixed_subs = self._fix_subtitles(subs)
return {'en': [{'ext': 'srt', 'data': self._fix_subtitles(subs)}]} if fixed_subs:
return {'en': [{'ext': 'srt', 'data': fixed_subs}]}
else: else:
return {} return {}
@ -256,7 +268,15 @@ class LyndaCourseIE(LyndaBaseIE):
# Course link equals to welcome/introduction video link of same course # Course link equals to welcome/introduction video link of same course
# We will recognize it as course link # We will recognize it as course link
_VALID_URL = r'https?://(?:www|m)\.(?:lynda\.com|educourse\.ga)/(?P<coursepath>[^/]+/[^/]+/(?P<courseid>\d+))-\d\.html' _VALID_URL = r'https?://(?:www|m)\.(?:lynda\.com|educourse\.ga)/(?P<coursepath>(?:[^/]+/){2,3}(?P<courseid>\d+))-2\.html'
_TESTS = [{
'url': 'https://www.lynda.com/Graphic-Design-tutorials/Grundlagen-guten-Gestaltung/393570-2.html',
'only_matching': True,
}, {
'url': 'https://www.lynda.com/de/Graphic-Design-tutorials/Grundlagen-guten-Gestaltung/393570-2.html',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)

View File

@ -13,8 +13,15 @@ from ..utils import (
class MailRuIE(InfoExtractor): class MailRuIE(InfoExtractor):
IE_NAME = 'mailru' IE_NAME = 'mailru'
IE_DESC = 'Видео@Mail.Ru' IE_DESC = 'Видео@Mail.Ru'
_VALID_URL = r'https?://(?:(?:www|m)\.)?my\.mail\.ru/(?:video/.*#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|(?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html)' _VALID_URL = r'''(?x)
https?://
(?:(?:www|m)\.)?my\.mail\.ru/
(?:
video/.*\#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|
(?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html|
(?:video/embed|\+/video/meta)/(?P<metaid>\d+)
)
'''
_TESTS = [ _TESTS = [
{ {
'url': 'http://my.mail.ru/video/top#video=/mail/sonypicturesrus/75/76', 'url': 'http://my.mail.ru/video/top#video=/mail/sonypicturesrus/75/76',
@ -23,7 +30,7 @@ class MailRuIE(InfoExtractor):
'id': '46301138_76', 'id': '46301138_76',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Новый Человек-Паук. Высокое напряжение. Восстание Электро', 'title': 'Новый Человек-Паук. Высокое напряжение. Восстание Электро',
'timestamp': 1393232740, 'timestamp': 1393235077,
'upload_date': '20140224', 'upload_date': '20140224',
'uploader': 'sonypicturesrus', 'uploader': 'sonypicturesrus',
'uploader_id': 'sonypicturesrus@mail.ru', 'uploader_id': 'sonypicturesrus@mail.ru',
@ -40,7 +47,7 @@ class MailRuIE(InfoExtractor):
'title': 'Samsung Galaxy S5 Hammer Smash Fail Battery Explosion', 'title': 'Samsung Galaxy S5 Hammer Smash Fail Battery Explosion',
'timestamp': 1397039888, 'timestamp': 1397039888,
'upload_date': '20140409', 'upload_date': '20140409',
'uploader': 'hitech@corp.mail.ru', 'uploader': 'hitech',
'uploader_id': 'hitech@corp.mail.ru', 'uploader_id': 'hitech@corp.mail.ru',
'duration': 245, 'duration': 245,
}, },
@ -65,28 +72,42 @@ class MailRuIE(InfoExtractor):
{ {
'url': 'http://m.my.mail.ru/mail/3sktvtr/video/_myvideo/138.html', 'url': 'http://m.my.mail.ru/mail/3sktvtr/video/_myvideo/138.html',
'only_matching': True, 'only_matching': True,
},
{
'url': 'https://my.mail.ru/video/embed/7949340477499637815',
'only_matching': True,
},
{
'url': 'http://my.mail.ru/+/video/meta/7949340477499637815',
'only_matching': True,
} }
] ]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('idv1') meta_id = mobj.group('metaid')
if not video_id: video_id = None
video_id = mobj.group('idv2prefix') + mobj.group('idv2suffix') if meta_id:
meta_url = 'https://my.mail.ru/+/video/meta/%s' % meta_id
webpage = self._download_webpage(url, video_id) else:
video_id = mobj.group('idv1')
if not video_id:
video_id = mobj.group('idv2prefix') + mobj.group('idv2suffix')
webpage = self._download_webpage(url, video_id)
page_config = self._parse_json(self._search_regex(
r'(?s)<script[^>]+class="sp-video__page-config"[^>]*>(.+?)</script>',
webpage, 'page config', default='{}'), video_id, fatal=False)
if page_config:
meta_url = page_config.get('metaUrl') or page_config.get('video', {}).get('metaUrl')
else:
meta_url = None
video_data = None video_data = None
if meta_url:
page_config = self._parse_json(self._search_regex( video_data = self._download_json(
r'(?s)<script[^>]+class="sp-video__page-config"[^>]*>(.+?)</script>', meta_url, video_id or meta_id, 'Downloading video meta JSON',
webpage, 'page config', default='{}'), video_id, fatal=False) fatal=not video_id)
if page_config:
meta_url = page_config.get('metaUrl') or page_config.get('video', {}).get('metaUrl')
if meta_url:
video_data = self._download_json(
meta_url, video_id, 'Downloading video meta JSON', fatal=False)
# Fallback old approach # Fallback old approach
if not video_data: if not video_data:

View File

@ -0,0 +1,214 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
ExtractorError,
float_or_none,
mimetype2ext,
unescapeHTML,
unsmuggle_url,
urljoin,
)
class MediasiteIE(InfoExtractor):
_VALID_URL = r'(?xi)https?://[^/]+/Mediasite/Play/(?P<id>[0-9a-f]{32,34})(?P<query>\?[^#]+|)'
_TESTS = [
{
'url': 'https://hitsmediaweb.h-its.org/mediasite/Play/2db6c271681e4f199af3c60d1f82869b1d',
'info_dict': {
'id': '2db6c271681e4f199af3c60d1f82869b1d',
'ext': 'mp4',
'title': 'Lecture: Tuesday, September 20, 2016 - Sir Andrew Wiles',
'description': 'Sir Andrew Wiles: “Equations in arithmetic”\\n\\nI will describe some of the interactions between modern number theory and the problem of solving equations in rational numbers or integers\\u0027.',
'timestamp': 1474268400.0,
'upload_date': '20160919',
},
},
{
'url': 'http://mediasite.uib.no/Mediasite/Play/90bb363295d945d6b548c867d01181361d?catalog=a452b7df-9ae1-46b7-a3ba-aceeb285f3eb',
'info_dict': {
'id': '90bb363295d945d6b548c867d01181361d',
'ext': 'mp4',
'upload_date': '20150429',
'title': '5) IT-forum 2015-Dag 1 - Dungbeetle - How and why Rain created a tiny bug tracker for Unity',
'timestamp': 1430311380.0,
},
},
{
'url': 'https://collegerama.tudelft.nl/Mediasite/Play/585a43626e544bdd97aeb71a0ec907a01d',
'md5': '481fda1c11f67588c0d9d8fbdced4e39',
'info_dict': {
'id': '585a43626e544bdd97aeb71a0ec907a01d',
'ext': 'mp4',
'title': 'Een nieuwe wereld: waarden, bewustzijn en techniek van de mensheid 2.0.',
'description': '',
'thumbnail': r're:^https?://.*\.jpg(?:\?.*)?$',
'duration': 7713.088,
'timestamp': 1413309600,
'upload_date': '20141014',
},
},
{
'url': 'https://collegerama.tudelft.nl/Mediasite/Play/86a9ea9f53e149079fbdb4202b521ed21d?catalog=fd32fd35-6c99-466c-89d4-cd3c431bc8a4',
'md5': 'ef1fdded95bdf19b12c5999949419c92',
'info_dict': {
'id': '86a9ea9f53e149079fbdb4202b521ed21d',
'ext': 'wmv',
'title': '64ste Vakantiecursus: Afvalwater',
'description': 'md5:7fd774865cc69d972f542b157c328305',
'thumbnail': r're:^https?://.*\.jpg(?:\?.*?)?$',
'duration': 10853,
'timestamp': 1326446400,
'upload_date': '20120113',
},
},
{
'url': 'http://digitalops.sandia.gov/Mediasite/Play/24aace4429fc450fb5b38cdbf424a66e1d',
'md5': '9422edc9b9a60151727e4b6d8bef393d',
'info_dict': {
'id': '24aace4429fc450fb5b38cdbf424a66e1d',
'ext': 'mp4',
'title': 'Xyce Software Training - Section 1',
'description': r're:(?s)SAND Number: SAND 2013-7800.{200,}',
'upload_date': '20120409',
'timestamp': 1333983600,
'duration': 7794,
}
}
]
# look in Mediasite.Core.js (Mediasite.ContentStreamType[*])
_STREAM_TYPES = {
0: 'video1', # the main video
2: 'slide',
3: 'presentation',
4: 'video2', # screencast?
5: 'video3',
}
@staticmethod
def _extract_urls(webpage):
return [
unescapeHTML(mobj.group('url'))
for mobj in re.finditer(
r'(?xi)<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:(?:https?:)?//[^/]+)?/Mediasite/Play/[0-9a-f]{32,34}(?:\?.*?)?)\1',
webpage)]
def _real_extract(self, url):
url, data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url)
resource_id = mobj.group('id')
query = mobj.group('query')
webpage, urlh = self._download_webpage_handle(url, resource_id) # XXX: add UrlReferrer?
redirect_url = compat_str(urlh.geturl())
# XXX: might have also extracted UrlReferrer and QueryString from the html
service_path = compat_urlparse.urljoin(redirect_url, self._html_search_regex(
r'<div[^>]+\bid=["\']ServicePath[^>]+>(.+?)</div>', webpage, resource_id,
default='/Mediasite/PlayerService/PlayerService.svc/json'))
player_options = self._download_json(
'%s/GetPlayerOptions' % service_path, resource_id,
headers={
'Content-type': 'application/json; charset=utf-8',
'X-Requested-With': 'XMLHttpRequest',
},
data=json.dumps({
'getPlayerOptionsRequest': {
'ResourceId': resource_id,
'QueryString': query,
'UrlReferrer': data.get('UrlReferrer', ''),
'UseScreenReader': False,
}
}).encode('utf-8'))['d']
presentation = player_options['Presentation']
title = presentation['Title']
if presentation is None:
raise ExtractorError(
'Mediasite says: %s' % player_options['PlayerPresentationStatusMessage'],
expected=True)
thumbnails = []
formats = []
for snum, Stream in enumerate(presentation['Streams']):
stream_type = Stream.get('StreamType')
if stream_type is None:
continue
video_urls = Stream.get('VideoUrls')
if not isinstance(video_urls, list):
video_urls = []
stream_id = self._STREAM_TYPES.get(
stream_type, 'type%u' % stream_type)
stream_formats = []
for unum, VideoUrl in enumerate(video_urls):
video_url = VideoUrl.get('Location')
if not video_url or not isinstance(video_url, compat_str):
continue
# XXX: if Stream.get('CanChangeScheme', False), switch scheme to HTTP/HTTPS
media_type = VideoUrl.get('MediaType')
if media_type == 'SS':
stream_formats.extend(self._extract_ism_formats(
video_url, resource_id,
ism_id='%s-%u.%u' % (stream_id, snum, unum),
fatal=False))
elif media_type == 'Dash':
stream_formats.extend(self._extract_mpd_formats(
video_url, resource_id,
mpd_id='%s-%u.%u' % (stream_id, snum, unum),
fatal=False))
else:
stream_formats.append({
'format_id': '%s-%u.%u' % (stream_id, snum, unum),
'url': video_url,
'ext': mimetype2ext(VideoUrl.get('MimeType')),
})
# TODO: if Stream['HasSlideContent']:
# synthesise an MJPEG video stream '%s-%u.slides' % (stream_type, snum)
# from Stream['Slides']
# this will require writing a custom downloader...
# disprefer 'secondary' streams
if stream_type != 0:
for fmt in stream_formats:
fmt['preference'] = -1
thumbnail_url = Stream.get('ThumbnailUrl')
if thumbnail_url:
thumbnails.append({
'id': '%s-%u' % (stream_id, snum),
'url': urljoin(redirect_url, thumbnail_url),
'preference': -1 if stream_type != 0 else 0,
})
formats.extend(stream_formats)
self._sort_formats(formats)
# XXX: Presentation['Presenters']
# XXX: Presentation['Transcript']
return {
'id': resource_id,
'title': title,
'description': presentation.get('Description'),
'duration': float_or_none(presentation.get('Duration'), 1000),
'timestamp': float_or_none(presentation.get('UnixTime'), 1000),
'formats': formats,
'thumbnails': thumbnails,
}

View File

@ -1,13 +1,13 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import uuid import uuid
from .common import InfoExtractor from .common import InfoExtractor
from .ooyala import OoyalaIE from .ooyala import OoyalaIE
from ..compat import ( from ..compat import (
compat_str, compat_str,
compat_urllib_parse_urlencode,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
@ -42,31 +42,33 @@ class MiTeleBaseIE(InfoExtractor):
duration = int_or_none(mmc.get('duration')) duration = int_or_none(mmc.get('duration'))
for location in mmc['locations']: for location in mmc['locations']:
gat = self._proto_relative_url(location.get('gat'), 'http:') gat = self._proto_relative_url(location.get('gat'), 'http:')
bas = location.get('bas') gcp = location.get('gcp')
loc = location.get('loc')
ogn = location.get('ogn') ogn = location.get('ogn')
if None in (gat, bas, loc, ogn): if None in (gat, gcp, ogn):
continue continue
token_data = { token_data = {
'bas': bas, 'gcp': gcp,
'icd': loc,
'ogn': ogn, 'ogn': ogn,
'sta': '0', 'sta': 0,
} }
media = self._download_json( media = self._download_json(
'%s/?%s' % (gat, compat_urllib_parse_urlencode(token_data)), gat, video_id, data=json.dumps(token_data).encode('utf-8'),
video_id, 'Downloading %s JSON' % location['loc']) headers={
file_ = media.get('file') 'Content-Type': 'application/json;charset=utf-8',
if not file_: 'Referer': url,
})
stream = media.get('stream') or media.get('file')
if not stream:
continue continue
ext = determine_ext(file_) ext = determine_ext(stream)
if ext == 'f4m': if ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18', stream + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
video_id, f4m_id='hds', fatal=False)) video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
file_, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)) stream, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -4,8 +4,11 @@ import datetime
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
InAdvancePagedList,
orderedSet,
str_to_int, str_to_int,
unified_strdate, unified_strdate,
) )
@ -114,3 +117,86 @@ class MotherlessIE(InfoExtractor):
'age_limit': age_limit, 'age_limit': age_limit,
'url': video_url, 'url': video_url,
} }
class MotherlessGroupIE(InfoExtractor):
_VALID_URL = 'https?://(?:www\.)?motherless\.com/gv?/(?P<id>[a-z0-9_]+)'
_TESTS = [{
'url': 'http://motherless.com/g/movie_scenes',
'info_dict': {
'id': 'movie_scenes',
'title': 'Movie Scenes',
'description': 'Hot and sexy scenes from "regular" movies... '
'Beautiful actresses fully nude... A looot of '
'skin! :)Enjoy!',
},
'playlist_mincount': 662,
}, {
'url': 'http://motherless.com/gv/sex_must_be_funny',
'info_dict': {
'id': 'sex_must_be_funny',
'title': 'Sex must be funny',
'description': 'Sex can be funny. Wide smiles,laugh, games, fun of '
'any kind!'
},
'playlist_mincount': 9,
}]
@classmethod
def suitable(cls, url):
return (False if MotherlessIE.suitable(url)
else super(MotherlessGroupIE, cls).suitable(url))
def _extract_entries(self, webpage, base):
entries = []
for mobj in re.finditer(
r'href="(?P<href>/[^"]+)"[^>]*>(?:\s*<img[^>]+alt="[^-]+-\s(?P<title>[^"]+)")?',
webpage):
video_url = compat_urlparse.urljoin(base, mobj.group('href'))
if not MotherlessIE.suitable(video_url):
continue
video_id = MotherlessIE._match_id(video_url)
title = mobj.group('title')
entries.append(self.url_result(
video_url, ie=MotherlessIE.ie_key(), video_id=video_id,
video_title=title))
# Alternative fallback
if not entries:
entries = [
self.url_result(
compat_urlparse.urljoin(base, '/' + video_id),
ie=MotherlessIE.ie_key(), video_id=video_id)
for video_id in orderedSet(re.findall(
r'data-codename=["\']([A-Z0-9]+)', webpage))]
return entries
def _real_extract(self, url):
group_id = self._match_id(url)
page_url = compat_urlparse.urljoin(url, '/gv/%s' % group_id)
webpage = self._download_webpage(page_url, group_id)
title = self._search_regex(
r'<title>([\w\s]+\w)\s+-', webpage, 'title', fatal=False)
description = self._html_search_meta(
'description', webpage, fatal=False)
page_count = self._int(self._search_regex(
r'(\d+)</(?:a|span)><(?:a|span)[^>]+>\s*NEXT',
webpage, 'page_count'), 'page_count')
PAGE_SIZE = 80
def _get_page(idx):
webpage = self._download_webpage(
page_url, group_id, query={'page': idx + 1},
note='Downloading page %d/%d' % (idx + 1, page_count)
)
for entry in self._extract_entries(webpage, url):
yield entry
playlist = InAdvancePagedList(_get_page, page_count, PAGE_SIZE)
return {
'_type': 'playlist',
'id': group_id,
'title': title,
'description': description,
'entries': playlist
}

View File

@ -112,6 +112,8 @@ class PhantomJSwrapper(object):
return get_exe_version('phantomjs', version_re=r'([0-9.]+)') return get_exe_version('phantomjs', version_re=r'([0-9.]+)')
def __init__(self, extractor, required_version=None, timeout=10000): def __init__(self, extractor, required_version=None, timeout=10000):
self._TMP_FILES = {}
self.exe = check_executable('phantomjs', ['-v']) self.exe = check_executable('phantomjs', ['-v'])
if not self.exe: if not self.exe:
raise ExtractorError('PhantomJS executable not found in PATH, ' raise ExtractorError('PhantomJS executable not found in PATH, '
@ -130,7 +132,6 @@ class PhantomJSwrapper(object):
self.options = { self.options = {
'timeout': timeout, 'timeout': timeout,
} }
self._TMP_FILES = {}
for name in self._TMP_FILE_NAMES: for name in self._TMP_FILE_NAMES:
tmp = tempfile.NamedTemporaryFile(delete=False) tmp = tempfile.NamedTemporaryFile(delete=False)
tmp.close() tmp.close()
@ -140,7 +141,7 @@ class PhantomJSwrapper(object):
for name in self._TMP_FILE_NAMES: for name in self._TMP_FILE_NAMES:
try: try:
os.remove(self._TMP_FILES[name].name) os.remove(self._TMP_FILES[name].name)
except (IOError, OSError): except (IOError, OSError, KeyError):
pass pass
def _save_cookies(self, url): def _save_cookies(self, url):
@ -242,7 +243,7 @@ class PhantomJSwrapper(object):
class OpenloadIE(InfoExtractor): class OpenloadIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:openload\.(?:co|io|link)|oload\.tv)/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)' _VALID_URL = r'https?://(?:www\.)?(?:openload\.(?:co|io|link)|oload\.(?:tv|stream))/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://openload.co/f/kUEfGclsU9o', 'url': 'https://openload.co/f/kUEfGclsU9o',
@ -283,12 +284,20 @@ class OpenloadIE(InfoExtractor):
# for title and ext # for title and ext
'url': 'https://openload.co/embed/Sxz5sADo82g/', 'url': 'https://openload.co/embed/Sxz5sADo82g/',
'only_matching': True, 'only_matching': True,
}, {
# unavailable via https://openload.co/embed/e-Ixz9ZR5L0/ but available
# via https://openload.co/f/e-Ixz9ZR5L0/
'url': 'https://openload.co/f/e-Ixz9ZR5L0/',
'only_matching': True,
}, { }, {
'url': 'https://oload.tv/embed/KnG-kKZdcfY/', 'url': 'https://oload.tv/embed/KnG-kKZdcfY/',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://www.openload.link/f/KnG-kKZdcfY', 'url': 'http://www.openload.link/f/KnG-kKZdcfY',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://oload.stream/f/KnG-kKZdcfY',
'only_matching': True,
}] }]
_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' _USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
@ -301,20 +310,34 @@ class OpenloadIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
url = 'https://openload.co/embed/%s/' % video_id url_pattern = 'https://openload.co/%%s/%s/' % video_id
headers = { headers = {
'User-Agent': self._USER_AGENT, 'User-Agent': self._USER_AGENT,
} }
webpage = self._download_webpage(url, video_id, headers=headers) for path in ('embed', 'f'):
page_url = url_pattern % path
if 'File not found' in webpage or 'deleted by the owner' in webpage: last = path == 'f'
raise ExtractorError('File not found', expected=True, video_id=video_id) webpage = self._download_webpage(
page_url, video_id, 'Downloading %s webpage' % path,
headers=headers, fatal=last)
if not webpage:
continue
if 'File not found' in webpage or 'deleted by the owner' in webpage:
if not last:
continue
raise ExtractorError('File not found', expected=True, video_id=video_id)
break
phantom = PhantomJSwrapper(self, required_version='2.0') phantom = PhantomJSwrapper(self, required_version='2.0')
webpage, _ = phantom.get(url, html=webpage, video_id=video_id, headers=headers) webpage, _ = phantom.get(page_url, html=webpage, video_id=video_id, headers=headers)
decoded_id = get_element_by_id('streamurl', webpage) decoded_id = (get_element_by_id('streamurl', webpage) or
get_element_by_id('streamuri', webpage) or
get_element_by_id('streamurj', webpage))
if not decoded_id:
raise ExtractorError('Can\'t find stream URL', video_id=video_id)
video_url = 'https://openload.co/stream/%s?mime=true' % decoded_id video_url = 'https://openload.co/stream/%s?mime=true' % decoded_id
@ -323,7 +346,7 @@ class OpenloadIE(InfoExtractor):
'title', default=None) or self._html_search_meta( 'title', default=None) or self._html_search_meta(
'description', webpage, 'title', fatal=True) 'description', webpage, 'title', fatal=True)
entries = self._parse_html5_media_entries(url, webpage, video_id) entries = self._parse_html5_media_entries(page_url, webpage, video_id)
entry = entries[0] if entries else {} entry = entries[0] if entries else {}
subtitles = entry.get('subtitles') subtitles = entry.get('subtitles')

View File

@ -24,7 +24,7 @@ class PlaytvakIE(InfoExtractor):
'id': 'A150730_150323_hodinovy-manzel_kuko', 'id': 'A150730_150323_hodinovy-manzel_kuko',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Vyžeňte vosy a sršně ze zahrady', 'title': 'Vyžeňte vosy a sršně ze zahrady',
'description': 'md5:f93d398691044d303bc4a3de62f3e976', 'description': 'md5:4436e61b7df227a093778efb7e373571',
'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$', 'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$',
'duration': 279, 'duration': 279,
'timestamp': 1438732860, 'timestamp': 1438732860,
@ -36,9 +36,19 @@ class PlaytvakIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'A150624_164934_planespotting_cat', 'id': 'A150624_164934_planespotting_cat',
'ext': 'flv', 'ext': 'flv',
'title': 're:^Přímý přenos iDNES.cz [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'title': 're:^Planespotting [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'Sledujte provoz na ranveji Letiště Václava Havla v Praze', 'description': 'Sledujte provoz na ranveji Letiště Václava Havla v Praze',
'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$', 'is_live': True,
},
'params': {
'skip_download': True, # requires rtmpdump
},
}, { # another live stream, this one without Misc.videoFLV
'url': 'https://slowtv.playtvak.cz/zive-sledujte-vlaky-v-primem-prenosu-dwi-/hlavni-nadrazi.aspx?c=A151218_145728_hlavni-nadrazi_plap',
'info_dict': {
'id': 'A151218_145728_hlavni-nadrazi_plap',
'ext': 'flv',
'title': 're:^Hlavní nádraží [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'is_live': True, 'is_live': True,
}, },
'params': { 'params': {
@ -95,7 +105,7 @@ class PlaytvakIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
info_url = self._html_search_regex( info_url = self._html_search_regex(
r'Misc\.videoFLV\(\s*{\s*data\s*:\s*"([^"]+)"', webpage, 'info url') r'Misc\.video(?:FLV)?\(\s*{\s*data\s*:\s*"([^"]+)"', webpage, 'info url')
parsed_url = compat_urlparse.urlparse(info_url) parsed_url = compat_urlparse.urlparse(info_url)
@ -160,7 +170,7 @@ class PlaytvakIE(InfoExtractor):
if is_live: if is_live:
title = self._live_title(title) title = self._live_title(title)
description = self._og_search_description(webpage, default=None) or self._html_search_meta( description = self._og_search_description(webpage, default=None) or self._html_search_meta(
'description', webpage, 'description') 'description', webpage, 'description', default=None)
timestamp = None timestamp = None
duration = None duration = None
if not is_live: if not is_live:

View File

@ -171,12 +171,12 @@ class PluralsightIE(PluralsightBaseIE):
for num, current in enumerate(subs): for num, current in enumerate(subs):
current = subs[num] current = subs[num]
start, text = ( start, text = (
float_or_none(dict_get(current, TIME_OFFSET_KEYS)), float_or_none(dict_get(current, TIME_OFFSET_KEYS, skip_false_values=False)),
dict_get(current, TEXT_KEYS)) dict_get(current, TEXT_KEYS))
if start is None or text is None: if start is None or text is None:
continue continue
end = duration if num == len(subs) - 1 else float_or_none( end = duration if num == len(subs) - 1 else float_or_none(
dict_get(subs[num + 1], TIME_OFFSET_KEYS)) dict_get(subs[num + 1], TIME_OFFSET_KEYS, skip_false_values=False))
if end is None: if end is None:
continue continue
srt += os.linesep.join( srt += os.linesep.join(

View File

@ -31,6 +31,9 @@ def _decrypt_url(png):
hash_index = data.index('#') hash_index = data.index('#')
alphabet_data = data[:hash_index] alphabet_data = data[:hash_index]
url_data = data[hash_index + 1:] url_data = data[hash_index + 1:]
if url_data[0] == 'H' and url_data[3] == '%':
# remove useless HQ%% at the start
url_data = url_data[4:]
alphabet = [] alphabet = []
e = 0 e = 0

View File

@ -1,65 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
int_or_none,
mimetype2ext,
)
class SandiaIE(InfoExtractor):
IE_DESC = 'Sandia National Laboratories'
_VALID_URL = r'https?://digitalops\.sandia\.gov/Mediasite/Play/(?P<id>[0-9a-f]+)'
_TEST = {
'url': 'http://digitalops.sandia.gov/Mediasite/Play/24aace4429fc450fb5b38cdbf424a66e1d',
'md5': '9422edc9b9a60151727e4b6d8bef393d',
'info_dict': {
'id': '24aace4429fc450fb5b38cdbf424a66e1d',
'ext': 'mp4',
'title': 'Xyce Software Training - Section 1',
'description': 're:(?s)SAND Number: SAND 2013-7800.{200,}',
'upload_date': '20120409',
'timestamp': 1333983600,
'duration': 7794,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
presentation_data = self._download_json(
'http://digitalops.sandia.gov/Mediasite/PlayerService/PlayerService.svc/json/GetPlayerOptions',
video_id, data=json.dumps({
'getPlayerOptionsRequest': {
'ResourceId': video_id,
'QueryString': '',
}
}), headers={
'Content-Type': 'application/json; charset=utf-8',
})['d']['Presentation']
title = presentation_data['Title']
formats = []
for stream in presentation_data.get('Streams', []):
for fd in stream.get('VideoUrls', []):
formats.append({
'format_id': fd['MediaType'],
'format_note': fd['MimeType'].partition('/')[2],
'ext': mimetype2ext(fd['MimeType']),
'url': fd['Location'],
'protocol': 'f4m' if fd['MimeType'] == 'video/x-mp4-fragmented' else None,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': presentation_data.get('Description'),
'formats': formats,
'timestamp': int_or_none(presentation_data.get('UnixTime'), 1000),
'duration': int_or_none(presentation_data.get('Duration'), 1000),
}

View File

@ -1,13 +1,11 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import datetime
import json import json
import hashlib import hashlib
import hmac
import re import re
from .common import InfoExtractor from .aws import AWSIE
from .anvato import AnvatoIE from .anvato import AnvatoIE
from ..utils import ( from ..utils import (
smuggle_url, smuggle_url,
@ -16,7 +14,7 @@ from ..utils import (
) )
class ScrippsNetworksWatchIE(InfoExtractor): class ScrippsNetworksWatchIE(AWSIE):
IE_NAME = 'scrippsnetworks:watch' IE_NAME = 'scrippsnetworks:watch'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@ -64,44 +62,27 @@ class ScrippsNetworksWatchIE(InfoExtractor):
'travelchannel': 'trav', 'travelchannel': 'trav',
'geniuskitchen': 'genius', 'geniuskitchen': 'genius',
} }
_SNI_HOST = 'web.api.video.snidigital.com'
_AWS_REGION = 'us-east-1'
_AWS_IDENTITY_ID_JSON = json.dumps({
'IdentityId': '%s:7655847c-0ae7-4d9b-80d6-56c062927eb3' % _AWS_REGION
})
_AWS_USER_AGENT = 'aws-sdk-js/2.80.0 callback'
_AWS_API_KEY = 'E7wSQmq0qK6xPrF13WmzKiHo4BQ7tip4pQcSXVl1' _AWS_API_KEY = 'E7wSQmq0qK6xPrF13WmzKiHo4BQ7tip4pQcSXVl1'
_AWS_SERVICE = 'execute-api' _AWS_PROXY_HOST = 'web.api.video.snidigital.com'
_AWS_REQUEST = 'aws4_request'
_AWS_SIGNED_HEADERS = ';'.join([
'host', 'x-amz-date', 'x-amz-security-token', 'x-api-key'])
_AWS_CANONICAL_REQUEST_TEMPLATE = '''GET
%(uri)s
host:%(host)s _AWS_USER_AGENT = 'aws-sdk-js/2.80.0 callback'
x-amz-date:%(date)s
x-amz-security-token:%(token)s
x-api-key:%(key)s
%(signed_headers)s
%(payload_hash)s'''
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
site_id, video_id = mobj.group('site', 'id') site_id, video_id = mobj.group('site', 'id')
def aws_hash(s): aws_identity_id_json = json.dumps({
return hashlib.sha256(s.encode('utf-8')).hexdigest() 'IdentityId': '%s:7655847c-0ae7-4d9b-80d6-56c062927eb3' % self._AWS_REGION
}).encode('utf-8')
token = self._download_json( token = self._download_json(
'https://cognito-identity.us-east-1.amazonaws.com/', video_id, 'https://cognito-identity.%s.amazonaws.com/' % self._AWS_REGION, video_id,
data=self._AWS_IDENTITY_ID_JSON.encode('utf-8'), data=aws_identity_id_json,
headers={ headers={
'Accept': '*/*', 'Accept': '*/*',
'Content-Type': 'application/x-amz-json-1.1', 'Content-Type': 'application/x-amz-json-1.1',
'Referer': url, 'Referer': url,
'X-Amz-Content-Sha256': aws_hash(self._AWS_IDENTITY_ID_JSON), 'X-Amz-Content-Sha256': hashlib.sha256(aws_identity_id_json).hexdigest(),
'X-Amz-Target': 'AWSCognitoIdentityService.GetOpenIdToken', 'X-Amz-Target': 'AWSCognitoIdentityService.GetOpenIdToken',
'X-Amz-User-Agent': self._AWS_USER_AGENT, 'X-Amz-User-Agent': self._AWS_USER_AGENT,
})['Token'] })['Token']
@ -124,64 +105,12 @@ x-api-key:%(key)s
sts, './/{https://sts.amazonaws.com/doc/2011-06-15/}%s' % key, sts, './/{https://sts.amazonaws.com/doc/2011-06-15/}%s' % key,
fatal=True) fatal=True)
access_key_id = get('AccessKeyId') mcp_id = self._aws_execute_api({
secret_access_key = get('SecretAccessKey') 'uri': '/1/web/brands/%s/episodes/scrid/%s' % (self._SNI_TABLE[site_id], video_id),
session_token = get('SessionToken') 'access_key': get('AccessKeyId'),
'secret_key': get('SecretAccessKey'),
# Task 1: http://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html 'session_token': get('SessionToken'),
uri = '/1/web/brands/%s/episodes/scrid/%s' % (self._SNI_TABLE[site_id], video_id) }, video_id)['results'][0]['mcpId']
datetime_now = datetime.datetime.utcnow().strftime('%Y%m%dT%H%M%SZ')
date = datetime_now[:8]
canonical_string = self._AWS_CANONICAL_REQUEST_TEMPLATE % {
'uri': uri,
'host': self._SNI_HOST,
'date': datetime_now,
'token': session_token,
'key': self._AWS_API_KEY,
'signed_headers': self._AWS_SIGNED_HEADERS,
'payload_hash': aws_hash(''),
}
# Task 2: http://docs.aws.amazon.com/general/latest/gr/sigv4-create-string-to-sign.html
credential_string = '/'.join([date, self._AWS_REGION, self._AWS_SERVICE, self._AWS_REQUEST])
string_to_sign = '\n'.join([
'AWS4-HMAC-SHA256', datetime_now, credential_string,
aws_hash(canonical_string)])
# Task 3: http://docs.aws.amazon.com/general/latest/gr/sigv4-calculate-signature.html
def aws_hmac(key, msg):
return hmac.new(key, msg.encode('utf-8'), hashlib.sha256)
def aws_hmac_digest(key, msg):
return aws_hmac(key, msg).digest()
def aws_hmac_hexdigest(key, msg):
return aws_hmac(key, msg).hexdigest()
k_secret = 'AWS4' + secret_access_key
k_date = aws_hmac_digest(k_secret.encode('utf-8'), date)
k_region = aws_hmac_digest(k_date, self._AWS_REGION)
k_service = aws_hmac_digest(k_region, self._AWS_SERVICE)
k_signing = aws_hmac_digest(k_service, self._AWS_REQUEST)
signature = aws_hmac_hexdigest(k_signing, string_to_sign)
auth_header = ', '.join([
'AWS4-HMAC-SHA256 Credential=%s' % '/'.join(
[access_key_id, date, self._AWS_REGION, self._AWS_SERVICE, self._AWS_REQUEST]),
'SignedHeaders=%s' % self._AWS_SIGNED_HEADERS,
'Signature=%s' % signature,
])
mcp_id = self._download_json(
'https://%s%s' % (self._SNI_HOST, uri), video_id, headers={
'Accept': '*/*',
'Referer': url,
'Authorization': auth_header,
'X-Amz-Date': datetime_now,
'X-Amz-Security-Token': session_token,
'X-Api-Key': self._AWS_API_KEY,
})['results'][0]['mcpId']
return self.url_result( return self.url_result(
smuggle_url( smuggle_url(

View File

@ -0,0 +1,67 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .brightcove import BrightcoveNewIE
from ..utils import update_url_query
class SevenPlusIE(BrightcoveNewIE):
IE_NAME = '7plus'
_VALID_URL = r'https?://(?:www\.)?7plus\.com\.au/(?P<path>[^?]+\?.*?\bepisode-id=(?P<id>[^&#]+))'
_TESTS = [{
'url': 'https://7plus.com.au/BEAT?episode-id=BEAT-001',
'info_dict': {
'id': 'BEAT-001',
'ext': 'mp4',
'title': 'S1 E1 - Help / Lucy In The Sky With Diamonds',
'description': 'md5:37718bea20a8eedaca7f7361af566131',
'uploader_id': '5303576322001',
'upload_date': '20171031',
'timestamp': 1509440068,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
}
}, {
'url': 'https://7plus.com.au/UUUU?episode-id=AUMS43-001',
'only_matching': True,
}]
def _real_extract(self, url):
path, episode_id = re.match(self._VALID_URL, url).groups()
media = self._download_json(
'https://videoservice.swm.digital/playback', episode_id, query={
'appId': '7plus',
'deviceType': 'web',
'platformType': 'web',
'accountId': 5303576322001,
'referenceId': 'ref:' + episode_id,
'deliveryId': 'csai',
'videoType': 'vod',
})['media']
for source in media.get('sources', {}):
src = source.get('src')
if not src:
continue
source['src'] = update_url_query(src, {'rule': ''})
info = self._parse_brightcove_metadata(media, episode_id)
content = self._download_json(
'https://component-cdn.swm.digital/content/' + path,
episode_id, headers={
'market-id': 4,
}, fatal=False) or {}
for item in content.get('items', {}):
if item.get('componentData', {}).get('componentType') == 'infoPanel':
for src_key, dst_key in [('title', 'title'), ('shortSynopsis', 'description')]:
value = item.get(src_key)
if value:
info[dst_key] = value
return info

View File

@ -1,22 +1,53 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import json import json
import math
import re
from .common import InfoExtractor from .aws import AWSIE
from ..compat import compat_HTTPError from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
clean_html,
ExtractorError, ExtractorError,
InAdvancePagedList,
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
str_or_none, str_or_none,
urlencode_postdata, urlencode_postdata,
clean_html,
) )
class ShahidIE(InfoExtractor): class ShahidBaseIE(AWSIE):
_AWS_PROXY_HOST = 'api2.shahid.net'
_AWS_API_KEY = '2RRtuMHx95aNI1Kvtn2rChEuwsCogUd4samGPjLh'
def _handle_error(self, e):
fail_data = self._parse_json(
e.cause.read().decode('utf-8'), None, fatal=False)
if fail_data:
faults = fail_data.get('faults', [])
faults_message = ', '.join([clean_html(fault['userMessage']) for fault in faults if fault.get('userMessage')])
if faults_message:
raise ExtractorError(faults_message, expected=True)
def _call_api(self, path, video_id, request=None):
query = {}
if request:
query['request'] = json.dumps(request)
try:
return self._aws_execute_api({
'uri': '/proxy/v2/' + path,
'access_key': 'AKIAI6X4TYCIXM2B7MUQ',
'secret_key': '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',
}, video_id, query)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
self._handle_error(e)
raise
class ShahidIE(ShahidBaseIE):
_NETRC_MACHINE = 'shahid' _NETRC_MACHINE = 'shahid'
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)' _VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
@ -41,34 +72,25 @@ class ShahidIE(InfoExtractor):
'only_matching': True 'only_matching': True
}] }]
def _api2_request(self, *args, **kwargs):
try:
return self._download_json(*args, **kwargs)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
fail_data = self._parse_json(
e.cause.read().decode('utf-8'), None, fatal=False)
if fail_data:
faults = fail_data.get('faults', [])
faults_message = ', '.join([clean_html(fault['userMessage']) for fault in faults if fault.get('userMessage')])
if faults_message:
raise ExtractorError(faults_message, expected=True)
raise
def _real_initialize(self): def _real_initialize(self):
email, password = self._get_login_info() email, password = self._get_login_info()
if email is None: if email is None:
return return
user_data = self._api2_request( try:
'https://shahid.mbc.net/wd/service/users/login', user_data = self._download_json(
None, 'Logging in', data=json.dumps({ 'https://shahid.mbc.net/wd/service/users/login',
'email': email, None, 'Logging in', data=json.dumps({
'password': password, 'email': email,
'basic': 'false', 'password': password,
}).encode('utf-8'), headers={ 'basic': 'false',
'Content-Type': 'application/json; charset=UTF-8', }).encode('utf-8'), headers={
})['user'] 'Content-Type': 'application/json; charset=UTF-8',
})['user']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
self._handle_error(e)
raise
self._download_webpage( self._download_webpage(
'https://shahid.mbc.net/populateContext', 'https://shahid.mbc.net/populateContext',
@ -81,25 +103,13 @@ class ShahidIE(InfoExtractor):
'sessionId': user_data['sessionId'], 'sessionId': user_data['sessionId'],
})) }))
def _get_api_data(self, response):
data = response.get('data', {})
error = data.get('error')
if error:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, '\n'.join(error.values())),
expected=True)
return data
def _real_extract(self, url): def _real_extract(self, url):
page_type, video_id = re.match(self._VALID_URL, url).groups() page_type, video_id = re.match(self._VALID_URL, url).groups()
if page_type == 'clip': if page_type == 'clip':
page_type = 'episode' page_type = 'episode'
playout = self._api2_request( playout = self._call_api(
'https://api2.shahid.net/proxy/v2/playout/url/' + video_id, 'playout/url/' + video_id, video_id)['playout']
video_id, 'Downloading player JSON')['playout']
if playout.get('drm'): if playout.get('drm'):
raise ExtractorError('This video is DRM protected.', expected=True) raise ExtractorError('This video is DRM protected.', expected=True)
@ -107,13 +117,27 @@ class ShahidIE(InfoExtractor):
formats = self._extract_m3u8_formats(playout['url'], video_id, 'mp4') formats = self._extract_m3u8_formats(playout['url'], video_id, 'mp4')
self._sort_formats(formats) self._sort_formats(formats)
video = self._get_api_data(self._download_json( # video = self._call_api(
# 'product/id', video_id, {
# 'id': video_id,
# 'productType': 'ASSET',
# 'productSubType': page_type.upper()
# })['productModel']
response = self._download_json(
'http://api.shahid.net/api/v1_1/%s/%s' % (page_type, video_id), 'http://api.shahid.net/api/v1_1/%s/%s' % (page_type, video_id),
video_id, 'Downloading video JSON', query={ video_id, 'Downloading video JSON', query={
'apiKey': 'sh@hid0nlin3', 'apiKey': 'sh@hid0nlin3',
'hash': 'b2wMCTHpSmyxGqQjJFOycRmLSex+BpTK/ooxy6vHaqs=', 'hash': 'b2wMCTHpSmyxGqQjJFOycRmLSex+BpTK/ooxy6vHaqs=',
}))[page_type] })
data = response.get('data', {})
error = data.get('error')
if error:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, '\n'.join(error.values())),
expected=True)
video = data[page_type]
title = video['title'] title = video['title']
categories = [ categories = [
category['name'] category['name']
@ -135,3 +159,57 @@ class ShahidIE(InfoExtractor):
'episode_id': video_id, 'episode_id': video_id,
'formats': formats, 'formats': formats,
} }
class ShahidShowIE(ShahidBaseIE):
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:show|serie)s/[^/]+/(?:show|series)-(?P<id>\d+)'
_TESTS = [{
'url': 'https://shahid.mbc.net/ar/shows/%D8%B1%D8%A7%D9%85%D8%B2-%D9%82%D8%B1%D8%B4-%D8%A7%D9%84%D8%A8%D8%AD%D8%B1/show-79187',
'info_dict': {
'id': '79187',
'title': 'رامز قرش البحر',
'description': 'md5:c88fa7e0f02b0abd39d417aee0d046ff',
},
'playlist_mincount': 32,
}, {
'url': 'https://shahid.mbc.net/ar/series/How-to-live-Longer-(The-Big-Think)/series-291861',
'only_matching': True
}]
_PAGE_SIZE = 30
def _real_extract(self, url):
show_id = self._match_id(url)
product = self._call_api(
'playableAsset', show_id, {'showId': show_id})['productModel']
playlist = product['playlist']
playlist_id = playlist['id']
show = product.get('show', {})
def page_func(page_num):
playlist = self._call_api(
'product/playlist', show_id, {
'playListId': playlist_id,
'pageNumber': page_num,
'pageSize': 30,
'sorts': [{
'order': 'DESC',
'type': 'SORTDATE'
}],
})
for product in playlist.get('productList', {}).get('products', []):
product_url = product.get('productUrl', []).get('url')
if not product_url:
continue
yield self.url_result(
product_url, 'Shahid',
str_or_none(product.get('id')),
product.get('title'))
entries = InAdvancePagedList(
page_func,
math.ceil(playlist['count'] / self._PAGE_SIZE),
self._PAGE_SIZE)
return self.playlist_result(
entries, show_id, show.get('title'), show.get('description'))

View File

@ -1,11 +1,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
class SlutloadIE(InfoExtractor): class SlutloadIE(InfoExtractor):
_VALID_URL = r'^https?://(?:\w+\.)?slutload\.com/video/[^/]+/(?P<id>[^/]+)/?$' _VALID_URL = r'^https?://(?:\w+\.)?slutload\.com/video/[^/]+/(?P<id>[^/]+)/?$'
_TEST = { _TESTS = [{
'url': 'http://www.slutload.com/video/virginie-baisee-en-cam/TD73btpBqSxc/', 'url': 'http://www.slutload.com/video/virginie-baisee-en-cam/TD73btpBqSxc/',
'md5': '868309628ba00fd488cf516a113fd717', 'md5': '868309628ba00fd488cf516a113fd717',
'info_dict': { 'info_dict': {
@ -15,11 +17,17 @@ class SlutloadIE(InfoExtractor):
'age_limit': 18, 'age_limit': 18,
'thumbnail': r're:https?://.*?\.jpg' 'thumbnail': r're:https?://.*?\.jpg'
} }
} }, {
# mobile site
'url': 'http://mobile.slutload.com/video/masturbation-solo/fviFLmc6kzJ/',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
desktop_url = re.sub(r'^(https?://)mobile\.', r'\1', url)
webpage = self._download_webpage(desktop_url, video_id)
video_title = self._html_search_regex(r'<h1><strong>([^<]+)</strong>', video_title = self._html_search_regex(r'<h1><strong>([^<]+)</strong>',
webpage, 'title').strip() webpage, 'title').strip()

View File

@ -136,6 +136,25 @@ class SoundcloudIE(InfoExtractor):
'license': 'all-rights-reserved', 'license': 'all-rights-reserved',
}, },
}, },
# no album art, use avatar pic for thumbnail
{
'url': 'https://soundcloud.com/garyvee/sideways-prod-mad-real',
'md5': '59c7872bc44e5d99b7211891664760c2',
'info_dict': {
'id': '309699954',
'ext': 'mp3',
'title': 'Sideways (Prod. Mad Real)',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'uploader': 'garyvee',
'upload_date': '20170226',
'duration': 207,
'thumbnail': r're:https?://.*\.jpg',
'license': 'all-rights-reserved',
},
'params': {
'skip_download': True,
},
},
] ]
_CLIENT_ID = 'c6CU49JDMapyrQo06UxU9xouB9ZVzqCn' _CLIENT_ID = 'c6CU49JDMapyrQo06UxU9xouB9ZVzqCn'
@ -160,7 +179,7 @@ class SoundcloudIE(InfoExtractor):
name = full_title or track_id name = full_title or track_id
if quiet: if quiet:
self.report_extraction(name) self.report_extraction(name)
thumbnail = info.get('artwork_url') thumbnail = info.get('artwork_url') or info.get('user', {}).get('avatar_url')
if isinstance(thumbnail, compat_str): if isinstance(thumbnail, compat_str):
thumbnail = thumbnail.replace('-large', '-t500x500') thumbnail = thumbnail.replace('-large', '-t500x500')
ext = 'mp3' ext = 'mp3'

View File

@ -0,0 +1,50 @@
from __future__ import unicode_literals
from .common import InfoExtractor
class TotalWebCastingIE(InfoExtractor):
IE_NAME = 'totalwebcasting.com'
_VALID_URL = r'https?://www\.totalwebcasting\.com/view/\?func=VOFF.*'
_TEST = {
'url': 'https://www.totalwebcasting.com/view/?func=VOFF&id=columbia&date=2017-01-04&seq=1',
'info_dict': {
'id': '270e1c415d443924485f547403180906731570466a42740764673853041316737548',
'title': 'Real World Cryptography Conference 2017',
'description': 'md5:47a31e91ed537a2bb0d3a091659dc80c',
},
'playlist_count': 6,
}
def _real_extract(self, url):
params = url.split('?', 1)[1]
webpage = self._download_webpage(url, params)
aprm = self._search_regex(r"startVideo\('(\w+)'", webpage, 'aprm')
VLEV = self._download_json("https://www.totalwebcasting.com/view/?func=VLEV&aprm=%s&style=G" % aprm, aprm)
parts = []
for s in VLEV["aiTimes"].values():
n = int(s[:-5])
if n == 99:
continue
if n not in parts:
parts.append(n)
parts.sort()
title = VLEV["title"]
entries = []
for p in parts:
VLEV = self._download_json("https://www.totalwebcasting.com/view/?func=VLEV&aprm=%s&style=G&refP=1&nf=%d&time=1&cs=1&ns=1" % (aprm, p), aprm)
for s in VLEV["playerObj"]["clip"]["sources"]:
if s["type"] != "video/mp4":
continue
entries.append({
"id": "%s_part%d" % (aprm, p),
"url": "https:" + s["src"],
"title": title,
})
return {
'_type': 'multi_video',
'id': aprm,
'entries': entries,
'title': title,
'description': VLEV.get("desc"),
}

View File

@ -0,0 +1,77 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class TwentyThreeVideoIE(InfoExtractor):
IE_NAME = '23video'
_VALID_URL = r'https?://video\.(?P<domain>twentythree\.net|23video\.com|filmweb\.no)/v\.ihtml/player\.html\?(?P<query>.*?\bphoto(?:_|%5f)id=(?P<id>\d+).*)'
_TEST = {
'url': 'https://video.twentythree.net/v.ihtml/player.html?showDescriptions=0&source=site&photo%5fid=20448876&autoPlay=1',
'md5': '75fcf216303eb1dae9920d651f85ced4',
'info_dict': {
'id': '20448876',
'ext': 'mp4',
'title': 'Video Marketing Minute: Personalized Video',
'timestamp': 1513855354,
'upload_date': '20171221',
'uploader_id': '12258964',
'uploader': 'Rasmus Bysted',
}
}
def _real_extract(self, url):
domain, query, photo_id = re.match(self._VALID_URL, url).groups()
base_url = 'https://video.%s' % domain
photo_data = self._download_json(
base_url + '/api/photo/list?' + query, photo_id, query={
'format': 'json',
}, transform_source=lambda s: self._search_regex(r'(?s)({.+})', s, 'photo data'))['photo']
title = photo_data['title']
formats = []
audio_path = photo_data.get('audio_download')
if audio_path:
formats.append({
'format_id': 'audio',
'url': base_url + audio_path,
'filesize': int_or_none(photo_data.get('audio_size')),
'vcodec': 'none',
})
def add_common_info_to_list(l, template, id_field, id_value):
f_base = template % id_value
f_path = photo_data.get(f_base + 'download')
if not f_path:
return
l.append({
id_field: id_value,
'url': base_url + f_path,
'width': int_or_none(photo_data.get(f_base + 'width')),
'height': int_or_none(photo_data.get(f_base + 'height')),
'filesize': int_or_none(photo_data.get(f_base + 'size')),
})
for f in ('mobile_high', 'medium', 'hd', '1080p', '4k'):
add_common_info_to_list(formats, 'video_%s_', 'format_id', f)
thumbnails = []
for t in ('quad16', 'quad50', 'quad75', 'quad100', 'small', 'portrait', 'standard', 'medium', 'large', 'original'):
add_common_info_to_list(thumbnails, '%s_', 'id', t)
return {
'id': photo_id,
'title': title,
'timestamp': int_or_none(photo_data.get('creation_date_epoch')),
'duration': int_or_none(photo_data.get('video_length')),
'view_count': int_or_none(photo_data.get('view_count')),
'comment_count': int_or_none(photo_data.get('number_of_comments')),
'uploader_id': photo_data.get('user_id'),
'uploader': photo_data.get('display_name'),
'thumbnails': thumbnails,
'formats': formats,
}

View File

@ -358,9 +358,16 @@ class TwitchPlaylistBaseIE(TwitchBaseIE):
break break
offset += limit offset += limit
return self.playlist_result( return self.playlist_result(
[self.url_result(entry) for entry in orderedSet(entries)], [self._make_url_result(entry) for entry in orderedSet(entries)],
channel_id, channel_name) channel_id, channel_name)
def _make_url_result(self, url):
try:
video_id = 'v%s' % TwitchVodIE._match_id(url)
return self.url_result(url, TwitchVodIE.ie_key(), video_id=video_id)
except AssertionError:
return self.url_result(url)
def _extract_playlist_page(self, response): def _extract_playlist_page(self, response):
videos = response.get('videos') videos = response.get('videos')
return [video['url'] for video in videos] if videos else [] return [video['url'] for video in videos] if videos else []

View File

@ -0,0 +1,55 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
parse_duration,
parse_iso8601,
)
class UFCTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ufc\.tv/video/(?P<id>[^/]+)'
_TEST = {
'url': 'https://www.ufc.tv/video/ufc-219-countdown-full-episode',
'info_dict': {
'id': '34167',
'ext': 'mp4',
'title': 'UFC 219 Countdown: Full Episode',
'description': 'md5:26d4e8bf4665ae5878842d7050c3c646',
'timestamp': 1513962360,
'upload_date': '20171222',
},
'params': {
# m3u8 download
'skip_download': True,
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
video_data = self._download_json(url, display_id, query={
'format': 'json',
})
video_id = str(video_data['id'])
title = video_data['name']
m3u8_url = self._download_json(
'https://www.ufc.tv/service/publishpoint', video_id, query={
'type': 'video',
'format': 'json',
'id': video_id,
}, headers={
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0_1 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A402 Safari/604.1',
})['path']
m3u8_url = m3u8_url.replace('_iphone.', '.')
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'duration': parse_duration(video_data.get('runtime')),
'timestamp': parse_iso8601(video_data.get('releaseDate')),
'formats': formats,
}

103
youtube_dl/extractor/umg.py Normal file
View File

@ -0,0 +1,103 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_filesize,
parse_iso8601,
)
class UMGDeIE(InfoExtractor):
IE_NAME = 'umg:de'
IE_DESC = 'Universal Music Deutschland'
_VALID_URL = r'https?://(?:www\.)?universal-music\.de/[^/]+/videos/[^/?#]+-(?P<id>\d+)'
_TEST = {
'url': 'https://www.universal-music.de/sido/videos/jedes-wort-ist-gold-wert-457803',
'md5': 'ebd90f48c80dcc82f77251eb1902634f',
'info_dict': {
'id': '457803',
'ext': 'mp4',
'title': 'Jedes Wort ist Gold wert',
'timestamp': 1513591800,
'upload_date': '20171218',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'https://api.universal-music.de/graphql',
video_id, query={
'query': '''{
universalMusic(channel:16) {
video(id:%s) {
headline
formats {
formatId
url
type
width
height
mimeType
fileSize
}
duration
createdDate
}
}
}''' % video_id})['data']['universalMusic']['video']
title = video_data['headline']
hls_url_template = 'http://mediadelivery.universal-music-services.de/vod/mp4:autofill/storage/' + '/'.join(list(video_id)) + '/content/%s/file/playlist.m3u8'
thumbnails = []
formats = []
def add_m3u8_format(format_id):
m3u8_formats = self._extract_m3u8_formats(
hls_url_template % format_id, video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal='False')
if m3u8_formats and m3u8_formats[0].get('height'):
formats.extend(m3u8_formats)
for f in video_data.get('formats', []):
f_url = f.get('url')
mime_type = f.get('mimeType')
if not f_url or mime_type == 'application/mxf':
continue
fmt = {
'url': f_url,
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'filesize': parse_filesize(f.get('fileSize')),
}
f_type = f.get('type')
if f_type == 'Image':
thumbnails.append(fmt)
elif f_type == 'Video':
format_id = f.get('formatId')
if format_id:
fmt['format_id'] = format_id
if mime_type == 'video/mp4':
add_m3u8_format(format_id)
urlh = self._request_webpage(f_url, video_id, fatal=False)
if urlh:
first_byte = urlh.read(1)
if first_byte not in (b'F', b'\x00'):
continue
formats.append(fmt)
if not formats:
for format_id in (867, 836, 940):
add_m3u8_format(format_id)
self._sort_formats(formats, ('width', 'height', 'filesize', 'tbr'))
return {
'id': video_id,
'title': title,
'duration': int_or_none(video_data.get('duration')),
'timestamp': parse_iso8601(video_data.get('createdDate'), ' '),
'thumbnails': thumbnails,
'formats': formats,
}

View File

@ -468,11 +468,12 @@ class VimeoIE(VimeoBaseInfoExtractor):
request = sanitized_Request(url, headers=headers) request = sanitized_Request(url, headers=headers)
try: try:
webpage, urlh = self._download_webpage_handle(request, video_id) webpage, urlh = self._download_webpage_handle(request, video_id)
redirect_url = compat_str(urlh.geturl())
# Some URLs redirect to ondemand can't be extracted with # Some URLs redirect to ondemand can't be extracted with
# this extractor right away thus should be passed through # this extractor right away thus should be passed through
# ondemand extractor (e.g. https://vimeo.com/73445910) # ondemand extractor (e.g. https://vimeo.com/73445910)
if VimeoOndemandIE.suitable(urlh.geturl()): if VimeoOndemandIE.suitable(redirect_url):
return self.url_result(urlh.geturl(), VimeoOndemandIE.ie_key()) return self.url_result(redirect_url, VimeoOndemandIE.ie_key())
except ExtractorError as ee: except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403: if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
errmsg = ee.cause.read() errmsg = ee.cause.read()
@ -541,15 +542,15 @@ class VimeoIE(VimeoBaseInfoExtractor):
if re.search(r'<form[^>]+?id="pw_form"', webpage) is not None: if re.search(r'<form[^>]+?id="pw_form"', webpage) is not None:
if '_video_password_verified' in data: if '_video_password_verified' in data:
raise ExtractorError('video password verification failed!') raise ExtractorError('video password verification failed!')
self._verify_video_password(url, video_id, webpage) self._verify_video_password(redirect_url, video_id, webpage)
return self._real_extract( return self._real_extract(
smuggle_url(url, {'_video_password_verified': 'verified'})) smuggle_url(redirect_url, {'_video_password_verified': 'verified'}))
else: else:
raise ExtractorError('Unable to extract info section', raise ExtractorError('Unable to extract info section',
cause=e) cause=e)
else: else:
if config.get('view') == 4: if config.get('view') == 4:
config = self._verify_player_video_password(url, video_id) config = self._verify_player_video_password(redirect_url, video_id)
def is_rented(): def is_rented():
if '>You rented this title.<' in webpage: if '>You rented this title.<' in webpage:

View File

@ -414,7 +414,7 @@ class VKIE(VKBaseIE):
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
r'class=["\']mv_views_count[^>]+>\s*([\d,.]+)', r'class=["\']mv_views_count[^>]+>\s*([\d,.]+)',
info_page, 'view count', fatal=False)) info_page, 'view count', default=None))
formats = [] formats = []
for format_id, format_url in data.items(): for format_id, format_url in data.items():

View File

@ -2,7 +2,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
@ -21,7 +20,6 @@ class VootIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ishq Ka Rang Safed - Season 01 - Episode 340', 'title': 'Ishq Ka Rang Safed - Season 01 - Episode 340',
'description': 'md5:06291fbbbc4dcbe21235c40c262507c1', 'description': 'md5:06291fbbbc4dcbe21235c40c262507c1',
'uploader_id': 'batchUser',
'timestamp': 1472162937, 'timestamp': 1472162937,
'upload_date': '20160825', 'upload_date': '20160825',
'duration': 1146, 'duration': 1146,
@ -63,6 +61,10 @@ class VootIE(InfoExtractor):
entry_id = media['EntryId'] entry_id = media['EntryId']
title = media['MediaName'] title = media['MediaName']
formats = self._extract_m3u8_formats(
'https://cdnapisec.kaltura.com/p/1982551/playManifest/pt/https/f/applehttp/t/web/e/' + entry_id,
video_id, 'mp4', m3u8_id='hls')
self._sort_formats(formats)
description, series, season_number, episode, episode_number = [None] * 5 description, series, season_number, episode, episode_number = [None] * 5
@ -82,9 +84,8 @@ class VootIE(InfoExtractor):
episode_number = int_or_none(value) episode_number = int_or_none(value)
return { return {
'_type': 'url_transparent', 'extractor_key': 'Kaltura',
'url': 'kaltura:1982551:%s' % entry_id, 'id': entry_id,
'ie_key': KalturaIE.ie_key(),
'title': title, 'title': title,
'description': description, 'description': description,
'series': series, 'series': series,
@ -95,4 +96,5 @@ class VootIE(InfoExtractor):
'duration': int_or_none(media.get('Duration')), 'duration': int_or_none(media.get('Duration')),
'view_count': int_or_none(media.get('ViewCounter')), 'view_count': int_or_none(media.get('ViewCounter')),
'like_count': int_or_none(media.get('like_counter')), 'like_count': int_or_none(media.get('like_counter')),
'formats': formats,
} }

View File

@ -154,7 +154,7 @@ class YoukuIE(InfoExtractor):
# request basic data # request basic data
basic_data_params = { basic_data_params = {
'vid': video_id, 'vid': video_id,
'ccode': '0501', 'ccode': '0507',
'client_ip': '192.168.1.1', 'client_ip': '192.168.1.1',
'utid': cna, 'utid': cna,
'client_ts': time.time() / 1000, 'client_ts': time.time() / 1000,
@ -241,13 +241,23 @@ class YoukuShowIE(InfoExtractor):
# Ongoing playlist. The initial page is the last one # Ongoing playlist. The initial page is the last one
'url': 'http://list.youku.com/show/id_za7c275ecd7b411e1a19e.html', 'url': 'http://list.youku.com/show/id_za7c275ecd7b411e1a19e.html',
'only_matching': True, 'only_matching': True,
}, {
# No data-id value.
'url': 'http://list.youku.com/show/id_zefbfbd61237fefbfbdef.html',
'only_matching': True,
}, {
# Wrong number of reload_id.
'url': 'http://list.youku.com/show/id_z20eb4acaf5c211e3b2ad.html',
'only_matching': True,
}] }]
def _extract_entries(self, playlist_data_url, show_id, note, query): def _extract_entries(self, playlist_data_url, show_id, note, query):
query['callback'] = 'cb' query['callback'] = 'cb'
playlist_data = self._download_json( playlist_data = self._download_json(
playlist_data_url, show_id, query=query, note=note, playlist_data_url, show_id, query=query, note=note,
transform_source=lambda s: js_to_json(strip_jsonp(s)))['html'] transform_source=lambda s: js_to_json(strip_jsonp(s))).get('html')
if playlist_data is None:
return [None, None]
drama_list = (get_element_by_class('p-drama-grid', playlist_data) or drama_list = (get_element_by_class('p-drama-grid', playlist_data) or
get_element_by_class('p-drama-half-row', playlist_data)) get_element_by_class('p-drama-half-row', playlist_data))
if drama_list is None: if drama_list is None:
@ -276,9 +286,9 @@ class YoukuShowIE(InfoExtractor):
r'<div[^>]+id="(reload_\d+)', first_page, 'first page reload id') r'<div[^>]+id="(reload_\d+)', first_page, 'first page reload id')
# The first reload_id has the same items as first_page # The first reload_id has the same items as first_page
reload_ids = re.findall('<li[^>]+data-id="([^"]+)">', first_page) reload_ids = re.findall('<li[^>]+data-id="([^"]+)">', first_page)
entries.extend(initial_entries)
for idx, reload_id in enumerate(reload_ids): for idx, reload_id in enumerate(reload_ids):
if reload_id == first_page_reload_id: if reload_id == first_page_reload_id:
entries.extend(initial_entries)
continue continue
_, new_entries = self._extract_entries( _, new_entries = self._extract_entries(
'http://list.youku.com/show/episode', show_id, 'http://list.youku.com/show/episode', show_id,
@ -287,8 +297,8 @@ class YoukuShowIE(InfoExtractor):
'id': page_config['showid'], 'id': page_config['showid'],
'stage': reload_id, 'stage': reload_id,
}) })
entries.extend(new_entries) if new_entries is not None:
entries.extend(new_entries)
desc = self._html_search_meta('description', webpage, fatal=False) desc = self._html_search_meta('description', webpage, fatal=False)
playlist_title = desc.split(',')[0] if desc else None playlist_title = desc.split(',')[0] if desc else None
detail_li = get_element_by_class('p-intro', webpage) detail_li = get_element_by_class('p-intro', webpage)

View File

@ -2270,6 +2270,19 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
r'(?s)<h1 class="pl-header-title[^"]*"[^>]*>\s*(.*?)\s*</h1>', r'(?s)<h1 class="pl-header-title[^"]*"[^>]*>\s*(.*?)\s*</h1>',
page, 'title', default=None) page, 'title', default=None)
_UPLOADER_BASE = r'class=["\']pl-header-details[^>]+>\s*<li>\s*<a[^>]+\bhref='
uploader = self._search_regex(
r'%s["\']/(?:user|channel)/[^>]+>([^<]+)' % _UPLOADER_BASE,
page, 'uploader', default=None)
mobj = re.search(
r'%s(["\'])(?P<path>/(?:user|channel)/(?P<uploader_id>.+?))\1' % _UPLOADER_BASE,
page)
if mobj:
uploader_id = mobj.group('uploader_id')
uploader_url = compat_urlparse.urljoin(url, mobj.group('path'))
else:
uploader_id = uploader_url = None
has_videos = True has_videos = True
if not playlist_title: if not playlist_title:
@ -2280,8 +2293,15 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
except StopIteration: except StopIteration:
has_videos = False has_videos = False
return has_videos, self.playlist_result( playlist = self.playlist_result(
self._entries(page, playlist_id), playlist_id, playlist_title) self._entries(page, playlist_id), playlist_id, playlist_title)
playlist.update({
'uploader': uploader,
'uploader_id': uploader_id,
'uploader_url': uploader_url,
})
return has_videos, playlist
def _check_download_just_video(self, url, playlist_id): def _check_download_just_video(self, url, playlist_id):
# Check if it's a video-specific URL # Check if it's a video-specific URL

View File

@ -39,6 +39,7 @@ from .compat import (
compat_HTMLParser, compat_HTMLParser,
compat_basestring, compat_basestring,
compat_chr, compat_chr,
compat_ctypes_WINFUNCTYPE,
compat_etree_fromstring, compat_etree_fromstring,
compat_expanduser, compat_expanduser,
compat_html_entities, compat_html_entities,
@ -159,6 +160,8 @@ DATE_FORMATS = (
'%Y-%m-%dT%H:%M', '%Y-%m-%dT%H:%M',
'%b %d %Y at %H:%M', '%b %d %Y at %H:%M',
'%b %d %Y at %H:%M:%S', '%b %d %Y at %H:%M:%S',
'%B %d %Y at %H:%M',
'%B %d %Y at %H:%M:%S',
) )
DATE_FORMATS_DAY_FIRST = list(DATE_FORMATS) DATE_FORMATS_DAY_FIRST = list(DATE_FORMATS)
@ -1328,24 +1331,24 @@ def _windows_write_string(s, out):
if fileno not in WIN_OUTPUT_IDS: if fileno not in WIN_OUTPUT_IDS:
return False return False
GetStdHandle = ctypes.WINFUNCTYPE( GetStdHandle = compat_ctypes_WINFUNCTYPE(
ctypes.wintypes.HANDLE, ctypes.wintypes.DWORD)( ctypes.wintypes.HANDLE, ctypes.wintypes.DWORD)(
(b'GetStdHandle', ctypes.windll.kernel32)) ('GetStdHandle', ctypes.windll.kernel32))
h = GetStdHandle(WIN_OUTPUT_IDS[fileno]) h = GetStdHandle(WIN_OUTPUT_IDS[fileno])
WriteConsoleW = ctypes.WINFUNCTYPE( WriteConsoleW = compat_ctypes_WINFUNCTYPE(
ctypes.wintypes.BOOL, ctypes.wintypes.HANDLE, ctypes.wintypes.LPWSTR, ctypes.wintypes.BOOL, ctypes.wintypes.HANDLE, ctypes.wintypes.LPWSTR,
ctypes.wintypes.DWORD, ctypes.POINTER(ctypes.wintypes.DWORD), ctypes.wintypes.DWORD, ctypes.POINTER(ctypes.wintypes.DWORD),
ctypes.wintypes.LPVOID)((b'WriteConsoleW', ctypes.windll.kernel32)) ctypes.wintypes.LPVOID)(('WriteConsoleW', ctypes.windll.kernel32))
written = ctypes.wintypes.DWORD(0) written = ctypes.wintypes.DWORD(0)
GetFileType = ctypes.WINFUNCTYPE(ctypes.wintypes.DWORD, ctypes.wintypes.DWORD)((b'GetFileType', ctypes.windll.kernel32)) GetFileType = compat_ctypes_WINFUNCTYPE(ctypes.wintypes.DWORD, ctypes.wintypes.DWORD)(('GetFileType', ctypes.windll.kernel32))
FILE_TYPE_CHAR = 0x0002 FILE_TYPE_CHAR = 0x0002
FILE_TYPE_REMOTE = 0x8000 FILE_TYPE_REMOTE = 0x8000
GetConsoleMode = ctypes.WINFUNCTYPE( GetConsoleMode = compat_ctypes_WINFUNCTYPE(
ctypes.wintypes.BOOL, ctypes.wintypes.HANDLE, ctypes.wintypes.BOOL, ctypes.wintypes.HANDLE,
ctypes.POINTER(ctypes.wintypes.DWORD))( ctypes.POINTER(ctypes.wintypes.DWORD))(
(b'GetConsoleMode', ctypes.windll.kernel32)) ('GetConsoleMode', ctypes.windll.kernel32))
INVALID_HANDLE_VALUE = ctypes.wintypes.DWORD(-1).value INVALID_HANDLE_VALUE = ctypes.wintypes.DWORD(-1).value
def not_a_console(handle): def not_a_console(handle):

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.12.14' __version__ = '2017.12.31'