Compare commits
193 Commits
2017.06.23
...
2017.08.13
Author | SHA1 | Date | |
---|---|---|---|
16393d6535 | |||
4f049e4aa8 | |||
475bcb225f | |||
b3c6515365 | |||
eb02940cc7 | |||
4ef9152428 | |||
0c43a481b9 | |||
868f79db41 | |||
70851a95c3 | |||
e74e3b63e3 | |||
ac8491fcca | |||
82889d4ae5 | |||
92a5c41532 | |||
1663bd6e1c | |||
41918eaa5c | |||
6ed99754bb | |||
0e7dfa7d16 | |||
baba5f4d1d | |||
dee04d24a4 | |||
5b3ddadcc3 | |||
5b232f46dc | |||
4bf22f7a10 | |||
15d1e8a23d | |||
ee6a611665 | |||
463e7216c8 | |||
903a183b6a | |||
92740e4241 | |||
fac188c695 | |||
16afce174e | |||
e2b4808fd8 | |||
daaaf5f594 | |||
f172c86dcd | |||
1d5472290f | |||
c983cc3b71 | |||
1141e9104b | |||
8519b88f67 | |||
bbbe1cebfc | |||
f31fd0693b | |||
799802f368 | |||
b3b5870cba | |||
57a38a38c3 | |||
11a6793f80 | |||
1f03fef994 | |||
183062a4ab | |||
8cda78ef72 | |||
9118c9f18a | |||
5c9ea67bc0 | |||
f701827e31 | |||
8b9f50d7cb | |||
0ed4758023 | |||
a0a477b885 | |||
198d4cb40c | |||
ca127ab2c1 | |||
e445850e69 | |||
836ef26486 | |||
c04017519d | |||
2a7a823211 | |||
95908ce453 | |||
cbbe66635f | |||
c5a49ff084 | |||
24e966e8da | |||
9682666bda | |||
f9c48d895b | |||
c99d6890cb | |||
70bfab0e9a | |||
f0e31e32c9 | |||
3150976669 | |||
e3ce912c3d | |||
73095e013f | |||
905d18a7aa | |||
0db492c02a | |||
425f41319a | |||
71dde5eecf | |||
935d6c20c0 | |||
e0f1fb0a27 | |||
0017d9ad6d | |||
327c8364f1 | |||
359aa2fdd1 | |||
f76c02c87b | |||
7d9a1db111 | |||
0396806f67 | |||
dc6520aa3d | |||
c653326a14 | |||
3fcf346ac1 | |||
fa63cf6c23 | |||
85f5a74b6c | |||
d20b1c6725 | |||
bb176df3bb | |||
83d00044c1 | |||
7abed4e06c | |||
13eb526f11 | |||
00d06e3cfc | |||
749ca5eced | |||
3f59b0154a | |||
089b97cfee | |||
decf86044d | |||
94b817edeb | |||
cea931a9e5 | |||
ef78563e9c | |||
961ea474b6 | |||
ea3f20494f | |||
c7604d79e9 | |||
4e826cd9ae | |||
2583c0b54e | |||
7d02dcfaa2 | |||
00dbdfc1f7 | |||
f354d84807 | |||
15da37c7dc | |||
9a0942ad55 | |||
f2bb33a986 | |||
3615bfe1b4 | |||
e8f20ffa03 | |||
9be31e771c | |||
7f176ac477 | |||
2edfd745df | |||
708f6f511e | |||
bb13949197 | |||
c3c94ca4a4 | |||
e3cd1fcdd1 | |||
b71c18b434 | |||
7bf539edcc | |||
65c416dda8 | |||
207acd8465 | |||
71a1db8919 | |||
6e925598d6 | |||
73cf76a93f | |||
256a746d21 | |||
58179eb7d9 | |||
485cb37576 | |||
ed84454d35 | |||
a02682fd13 | |||
0d2f0b0357 | |||
c319d1c483 | |||
d2b9f362fa | |||
4328ddf82b | |||
250b042c7e | |||
665e945246 | |||
5af2fd7fa0 | |||
15237fcd51 | |||
7a57730907 | |||
8b347a389e | |||
a49804816c | |||
eadd313321 | |||
d852c6bc59 | |||
00e5c36315 | |||
8a04ade86b | |||
ab328411d5 | |||
ddeff4be3f | |||
60d4401c5e | |||
dee2ff1d81 | |||
6554708252 | |||
0a2e1b2e30 | |||
babbc04d45 | |||
609ff8ca19 | |||
b6c9fe4162 | |||
4d9ba27bba | |||
50ae3f646e | |||
99a7e76240 | |||
a3a6d01a96 | |||
02d61a65e2 | |||
9b35297be1 | |||
4917478803 | |||
54faac2235 | |||
c69701c6ab | |||
d4f8ce6e91 | |||
b311b0ead2 | |||
72d256c434 | |||
b2ed954fc6 | |||
a919ca0ad6 | |||
88d6b7c2bd | |||
fd1c5fba6b | |||
0646e34c7d | |||
bf2dc9cc6e | |||
f1c051009b | |||
33ffb645a6 | |||
35544690e4 | |||
136503e302 | |||
4a87de72df | |||
a7ce8f16c4 | |||
a5aea53fc8 | |||
0c7a631b61 | |||
fd9ee4de8c | |||
5744cf6c03 | |||
9c48b5a193 | |||
449c665776 | |||
23aec3d623 | |||
27449ad894 | |||
bd65f18153 | |||
73af5cc817 | |||
b5f523ed62 | |||
4f4dd8d797 | |||
4cb18ab1b9 | |||
ac7409eec5 |
16
.github/ISSUE_TEMPLATE.md
vendored
16
.github/ISSUE_TEMPLATE.md
vendored
@ -1,16 +1,16 @@
|
||||
## Please follow the guide below
|
||||
|
||||
- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
|
||||
- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
|
||||
- Use *Preview* tab to see how your issue will actually look like
|
||||
- Put an `x` into all the boxes [ ] relevant to your *issue* (like this: `[x]`)
|
||||
- Use the *Preview* tab to see what your issue will actually look like
|
||||
|
||||
---
|
||||
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.06.23*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.06.23**
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.08.13*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.08.13**
|
||||
|
||||
### Before submitting an *issue* make sure you have:
|
||||
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
||||
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
||||
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
|
||||
|
||||
### What is the purpose of your *issue*?
|
||||
@ -28,14 +28,14 @@
|
||||
|
||||
### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
|
||||
|
||||
Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
|
||||
Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl -v <your command line>`), copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
|
||||
|
||||
```
|
||||
$ youtube-dl -v <your command line>
|
||||
[debug] System config: []
|
||||
[debug] User config: []
|
||||
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
||||
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
||||
[debug] youtube-dl version 2017.06.23
|
||||
[debug] youtube-dl version 2017.08.13
|
||||
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
||||
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
||||
[debug] Proxy map: {}
|
||||
|
12
.github/ISSUE_TEMPLATE_tmpl.md
vendored
12
.github/ISSUE_TEMPLATE_tmpl.md
vendored
@ -1,16 +1,16 @@
|
||||
## Please follow the guide below
|
||||
|
||||
- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
|
||||
- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
|
||||
- Use *Preview* tab to see how your issue will actually look like
|
||||
- Put an `x` into all the boxes [ ] relevant to your *issue* (like this: `[x]`)
|
||||
- Use the *Preview* tab to see what your issue will actually look like
|
||||
|
||||
---
|
||||
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *%(version)s*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *%(version)s*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **%(version)s**
|
||||
|
||||
### Before submitting an *issue* make sure you have:
|
||||
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
||||
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
||||
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
|
||||
|
||||
### What is the purpose of your *issue*?
|
||||
@ -28,9 +28,9 @@
|
||||
|
||||
### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
|
||||
|
||||
Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
|
||||
Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl -v <your command line>`), copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
|
||||
|
||||
```
|
||||
$ youtube-dl -v <your command line>
|
||||
[debug] System config: []
|
||||
[debug] User config: []
|
||||
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
||||
|
3
AUTHORS
3
AUTHORS
@ -220,3 +220,6 @@ gritstub
|
||||
Adam Voss
|
||||
Mike Fährmann
|
||||
Jan Kundrát
|
||||
Giuseppe Fabiano
|
||||
Örn Guðjónsson
|
||||
Parmjit Virk
|
||||
|
198
ChangeLog
198
ChangeLog
@ -1,3 +1,201 @@
|
||||
version 2017.08.13
|
||||
|
||||
Core
|
||||
* [YoutubeDL] Make sure format id is not empty
|
||||
* [extractor/common] Make _family_friendly_search optional
|
||||
* [extractor/common] Respect source's type attribute for HTML5 media (#13892)
|
||||
|
||||
Extractors
|
||||
* [pornhub:playlistbase] Skip videos from drop-down menu (#12819, #13902)
|
||||
+ [fourtube] Add support pornerbros.com (#6022)
|
||||
+ [fourtube] Add support porntube.com (#7859, #13901)
|
||||
+ [fourtube] Add support fux.com
|
||||
* [limelight] Improve embeds detection (#13895)
|
||||
+ [reddit] Add support for v.redd.it and reddit.com (#13847)
|
||||
* [aparat] Extract all formats (#13887)
|
||||
* [mixcloud] Fix play info decryption (#13885)
|
||||
+ [generic] Add support for vzaar embeds (#13876)
|
||||
|
||||
|
||||
version 2017.08.09
|
||||
|
||||
Core
|
||||
* [utils] Skip missing params in cli_bool_option (#13865)
|
||||
|
||||
Extractors
|
||||
* [xxxymovies] Fix title extraction (#13868)
|
||||
+ [nick] Add support for nick.com.pl (#13860)
|
||||
* [mixcloud] Fix play info decryption (#13867)
|
||||
* [20min] Fix embeds extraction (#13852)
|
||||
* [dplayit] Fix extraction (#13851)
|
||||
+ [niconico] Support videos with multiple formats (#13522)
|
||||
+ [niconico] Support HTML5-only videos (#13806)
|
||||
|
||||
|
||||
version 2017.08.06
|
||||
|
||||
Core
|
||||
* Use relative paths for DASH fragments (#12990)
|
||||
|
||||
Extractors
|
||||
* [pluralsight] Fix format selection
|
||||
- [mpora] Remove extractor (#13826)
|
||||
+ [voot] Add support for voot.com (#10255, #11644, #11814, #12350, #13218)
|
||||
* [vlive:channel] Limit number of videos per page to 100 (#13830)
|
||||
* [podomatic] Extend URL regular expression (#13827)
|
||||
* [cinchcast] Extend URL regular expression
|
||||
* [yandexdisk] Relax URL regular expression (#13824)
|
||||
* [vidme] Extract DASH and HLS formats
|
||||
- [teamfour] Remove extractor (#13782)
|
||||
* [pornhd] Fix extraction (#13783)
|
||||
* [udemy] Fix subtitles extraction (#13812)
|
||||
* [mlb] Extend URL regular expression (#13740, #13773)
|
||||
+ [pbs] Add support for new URL schema (#13801)
|
||||
* [nrktv] Update API host (#13796)
|
||||
|
||||
|
||||
version 2017.07.30.1
|
||||
|
||||
Core
|
||||
* [downloader/hls] Use redirect URL as manifest base (#13755)
|
||||
* [options] Correctly hide login info from debug outputs (#13696)
|
||||
|
||||
Extractors
|
||||
+ [watchbox] Add support for watchbox.de (#13739)
|
||||
- [clipfish] Remove extractor
|
||||
+ [youjizz] Fix extraction (#13744)
|
||||
+ [generic] Add support for another ooyala embed pattern (#13727)
|
||||
+ [ard] Add support for lives (#13771)
|
||||
* [soundcloud] Update client id
|
||||
+ [soundcloud:trackstation] Add support for track stations (#13733)
|
||||
* [svtplay] Use geo verification proxy for API request
|
||||
* [svtplay] Update API URL (#13767)
|
||||
+ [yandexdisk] Add support for yadi.sk (#13755)
|
||||
+ [megaphone] Add support for megaphone.fm
|
||||
* [amcnetworks] Make rating optional (#12453)
|
||||
* [cloudy] Fix extraction (#13737)
|
||||
+ [nickru] Add support for nickelodeon.ru
|
||||
* [mtv] Improve thumbnal extraction
|
||||
* [nick] Automate geo-restriction bypass (#13711)
|
||||
* [niconico] Improve error reporting (#13696)
|
||||
|
||||
|
||||
version 2017.07.23
|
||||
|
||||
Core
|
||||
* [YoutubeDL] Improve default format specification (#13704)
|
||||
* [YoutubeDL] Do not override id, extractor and extractor_key for
|
||||
url_transparent entities
|
||||
* [extractor/common] Fix playlist_from_matches
|
||||
|
||||
Extractors
|
||||
* [itv] Fix production id extraction (#13671, #13703)
|
||||
* [vidio] Make duration non fatal and fix typo
|
||||
* [mtv] Skip missing video parts (#13690)
|
||||
* [sportbox:embed] Fix extraction
|
||||
+ [npo] Add support for npo3.nl URLs (#13695)
|
||||
* [dramafever] Remove video id from title (#13699)
|
||||
+ [egghead:lesson] Add support for lessons (#6635)
|
||||
* [funnyordie] Extract more metadata (#13677)
|
||||
* [youku:show] Fix playlist extraction (#13248)
|
||||
+ [dispeak] Recognize sevt subdomain (#13276)
|
||||
* [adn] Improve error reporting (#13663)
|
||||
* [crunchyroll] Relax series and season regex (#13659)
|
||||
+ [spiegel:article] Add support for nexx iframe embeds (#13029)
|
||||
+ [nexx:embed] Add support for iframe embeds
|
||||
* [nexx] Improve JS embed extraction
|
||||
+ [pearvideo] Add support for pearvideo.com (#13031)
|
||||
|
||||
|
||||
version 2017.07.15
|
||||
|
||||
Core
|
||||
* [YoutubeDL] Don't expand environment variables in meta fields (#13637)
|
||||
|
||||
Extractors
|
||||
* [spiegeltv] Delegate extraction to nexx extractor (#13159)
|
||||
+ [nexx] Add support for nexx.cloud (#10807, #13465)
|
||||
* [generic] Fix rutube embeds extraction (#13641)
|
||||
* [karrierevideos] Fix title extraction (#13641)
|
||||
* [youtube] Don't capture YouTube Red ad for creator meta field (#13621)
|
||||
* [slideshare] Fix extraction (#13617)
|
||||
+ [5tv] Add another video URL pattern (#13354, #13606)
|
||||
* [drtv] Make HLS and HDS extraction non fatal
|
||||
* [ted] Fix subtitles extraction (#13628, #13629)
|
||||
* [vine] Make sure the title won't be empty
|
||||
+ [twitter] Support HLS streams in vmap URLs
|
||||
+ [periscope] Support pscp.tv URLs in embedded frames
|
||||
* [twitter] Extract mp4 urls via mobile API (#12726)
|
||||
* [niconico] Fix authentication error handling (#12486)
|
||||
* [giantbomb] Extract m3u8 formats (#13626)
|
||||
+ [vlive:playlist] Add support for playlists (#13613)
|
||||
|
||||
|
||||
version 2017.07.09
|
||||
|
||||
Core
|
||||
+ [extractor/common] Add support for AMP tags in _parse_html5_media_entries
|
||||
+ [utils] Support attributes with no values in get_elements_by_attribute
|
||||
|
||||
Extractors
|
||||
+ [dailymail] Add support for embeds
|
||||
+ [joj] Add support for joj.sk (#13268)
|
||||
* [abc.net.au:iview] Extract more formats (#13492, #13489)
|
||||
* [egghead:course] Fix extraction (#6635, #13370)
|
||||
+ [cjsw] Add support for cjsw.com (#13525)
|
||||
+ [eagleplatform] Add support for referrer protected videos (#13557)
|
||||
+ [eagleplatform] Add support for another embed pattern (#13557)
|
||||
* [veoh] Extend URL regular expression (#13601)
|
||||
* [npo:live] Fix live stream id extraction (#13568, #13605)
|
||||
* [googledrive] Fix height extraction (#13603)
|
||||
+ [dailymotion] Add support for new layout (#13580)
|
||||
- [yam] Remove extractor
|
||||
* [xhamster] Extract all formats and fix duration extraction (#13593)
|
||||
+ [xhamster] Add support for new URL schema (#13593)
|
||||
* [espn] Extend URL regular expression (#13244, #13549)
|
||||
* [kaltura] Fix typo in subtitles extraction (#13569)
|
||||
* [vier] Adapt extraction to redesign (#13575)
|
||||
|
||||
|
||||
version 2017.07.02
|
||||
|
||||
Core
|
||||
* [extractor/common] Improve _json_ld
|
||||
|
||||
Extractors
|
||||
+ [thisoldhouse] Add more fallbacks for video id
|
||||
* [thisoldhouse] Fix video id extraction (#13540, #13541)
|
||||
* [xfileshare] Extend format regular expression (#13536)
|
||||
* [ted] Fix extraction (#13535)
|
||||
+ [tastytrade] Add support for tastytrade.com (#13521)
|
||||
* [dplayit] Relax video id regular expression (#13524)
|
||||
+ [generic] Extract more generic metadata (#13527)
|
||||
+ [bbccouk] Capture and output error message (#13501, #13518)
|
||||
* [cbsnews] Relax video info regular expression (#13284, #13503)
|
||||
+ [facebook] Add support for plugin video embeds and multiple embeds (#13493)
|
||||
* [soundcloud] Switch to https for API requests (#13502)
|
||||
* [pandatv] Switch to https for API and download URLs
|
||||
+ [pandatv] Add support for https URLs (#13491)
|
||||
+ [niconico] Support sp subdomain (#13494)
|
||||
|
||||
|
||||
version 2017.06.25
|
||||
|
||||
Core
|
||||
+ [adobepass] Add support for DIRECTV NOW (mso ATTOTT) (#13472)
|
||||
* [YoutubeDL] Skip malformed formats for better extraction robustness
|
||||
|
||||
Extractors
|
||||
+ [wsj] Add support for barrons.com (#13470)
|
||||
+ [ign] Add another video id pattern (#13328)
|
||||
+ [raiplay:live] Add support for live streams (#13414)
|
||||
+ [redbulltv] Add support for live videos and segments (#13486)
|
||||
+ [onetpl] Add support for videos embedded via pulsembed (#13482)
|
||||
* [ooyala] Make more robust
|
||||
* [ooyala] Skip empty format URLs (#13471, #13476)
|
||||
* [hgtv.com:show] Fix typo
|
||||
|
||||
|
||||
version 2017.06.23
|
||||
|
||||
Core
|
||||
|
@ -584,7 +584,7 @@ If you are using an output template inside a Windows batch file then you must es
|
||||
|
||||
#### Output template examples
|
||||
|
||||
Note on Windows you may need to use double quotes instead of single.
|
||||
Note that on Windows you may need to use double quotes instead of single.
|
||||
|
||||
```bash
|
||||
$ youtube-dl --get-filename -o '%(title)s.%(ext)s' BaW_jenozKc
|
||||
@ -671,7 +671,7 @@ If you want to preserve the old format selection behavior (prior to youtube-dl 2
|
||||
|
||||
#### Format selection examples
|
||||
|
||||
Note on Windows you may need to use double quotes instead of single.
|
||||
Note that on Windows you may need to use double quotes instead of single.
|
||||
|
||||
```bash
|
||||
# Download best mp4 format available or any other best if no mp4 available
|
||||
|
@ -42,7 +42,7 @@
|
||||
- **Allocine**
|
||||
- **AlphaPorno**
|
||||
- **AMCNetworks**
|
||||
- **anderetijden**: npo.nl and ntr.nl
|
||||
- **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
|
||||
- **AnimeOnDemand**
|
||||
- **anitube.se**
|
||||
- **Anvato**
|
||||
@ -154,7 +154,7 @@
|
||||
- **chirbit**
|
||||
- **chirbit:profile**
|
||||
- **Cinchcast**
|
||||
- **Clipfish**
|
||||
- **CJSW**
|
||||
- **cliphunter**
|
||||
- **ClipRs**
|
||||
- **Clipsyndicate**
|
||||
@ -237,6 +237,7 @@
|
||||
- **EbaumsWorld**
|
||||
- **EchoMsk**
|
||||
- **egghead:course**: egghead.io course
|
||||
- **egghead:lesson**: egghead.io lesson
|
||||
- **eHow**
|
||||
- **Einthusan**
|
||||
- **eitb.tv**
|
||||
@ -293,6 +294,7 @@
|
||||
- **Funimation**
|
||||
- **FunnyOrDie**
|
||||
- **Fusion**
|
||||
- **Fux**
|
||||
- **FXNetworks**
|
||||
- **GameInformer**
|
||||
- **GameOne**
|
||||
@ -369,6 +371,7 @@
|
||||
- **Jamendo**
|
||||
- **JamendoAlbum**
|
||||
- **JeuxVideo**
|
||||
- **Joj**
|
||||
- **Jove**
|
||||
- **jpopsuki.tv**
|
||||
- **JWPlatform**
|
||||
@ -437,6 +440,7 @@
|
||||
- **Medialaan**
|
||||
- **Mediaset**
|
||||
- **Medici**
|
||||
- **megaphone.fm**: megaphone.fm embedded players
|
||||
- **Meipai**: 美拍
|
||||
- **MelonVOD**
|
||||
- **META**
|
||||
@ -469,7 +473,6 @@
|
||||
- **MovieFap**
|
||||
- **Moviezine**
|
||||
- **MovingImage**
|
||||
- **MPORA**
|
||||
- **MSN**
|
||||
- **mtg**: MTG services
|
||||
- **mtv**
|
||||
@ -519,6 +522,8 @@
|
||||
- **NextMedia**: 蘋果日報
|
||||
- **NextMediaActionNews**: 蘋果日報 - 動新聞
|
||||
- **NextTV**: 壹電視
|
||||
- **Nexx**
|
||||
- **NexxEmbed**
|
||||
- **nfb**: National Film Board of Canada
|
||||
- **nfl.com**
|
||||
- **NhkVod**
|
||||
@ -528,6 +533,7 @@
|
||||
- **nhl.com:videocenter:category**: NHL videocenter category
|
||||
- **nick.com**
|
||||
- **nick.de**
|
||||
- **nickelodeonru**
|
||||
- **nicknight**
|
||||
- **niconico**: ニコニコ動画
|
||||
- **NiconicoPlaylist**
|
||||
@ -549,7 +555,7 @@
|
||||
- **NowTVList**
|
||||
- **nowvideo**: NowVideo
|
||||
- **Noz**
|
||||
- **npo**: npo.nl and ntr.nl
|
||||
- **npo**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
|
||||
- **npo.nl:live**
|
||||
- **npo.nl:radio**
|
||||
- **npo.nl:radio:fragment**
|
||||
@ -593,6 +599,7 @@
|
||||
- **Patreon**
|
||||
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
|
||||
- **pcmag**
|
||||
- **PearVideo**
|
||||
- **People**
|
||||
- **periscope**: Periscope
|
||||
- **periscope:user**: Periscope user videos
|
||||
@ -615,6 +622,7 @@
|
||||
- **PolskieRadio**
|
||||
- **PolskieRadioCategory**
|
||||
- **PornCom**
|
||||
- **PornerBros**
|
||||
- **PornFlip**
|
||||
- **PornHd**
|
||||
- **PornHub**: PornHub and Thumbzilla
|
||||
@ -623,6 +631,7 @@
|
||||
- **Pornotube**
|
||||
- **PornoVoisines**
|
||||
- **PornoXO**
|
||||
- **PornTube**
|
||||
- **PressTV**
|
||||
- **PrimeShareTV**
|
||||
- **PromptFile**
|
||||
@ -644,9 +653,12 @@
|
||||
- **RadioJavan**
|
||||
- **Rai**
|
||||
- **RaiPlay**
|
||||
- **RaiPlayLive**
|
||||
- **RBMARadio**
|
||||
- **RDS**: RDS.ca
|
||||
- **RedBullTV**
|
||||
- **Reddit**
|
||||
- **RedditR**
|
||||
- **RedTube**
|
||||
- **RegioTV**
|
||||
- **RENTV**
|
||||
@ -727,6 +739,7 @@
|
||||
- **soundcloud:playlist**
|
||||
- **soundcloud:search**: Soundcloud search
|
||||
- **soundcloud:set**
|
||||
- **soundcloud:trackstation**
|
||||
- **soundcloud:user**
|
||||
- **soundgasm**
|
||||
- **soundgasm:profile**
|
||||
@ -767,13 +780,13 @@
|
||||
- **Tagesschau**
|
||||
- **tagesschau:player**
|
||||
- **Tass**
|
||||
- **TBS**
|
||||
- **TastyTrade**
|
||||
- **TBS** (Currently broken)
|
||||
- **TDSLifeway**
|
||||
- **teachertube**: teachertube.com videos
|
||||
- **teachertube:user:collection**: teachertube.com user and collection videos
|
||||
- **TeachingChannel**
|
||||
- **Teamcoco**
|
||||
- **TeamFourStar**
|
||||
- **TechTalks**
|
||||
- **techtv.mit.edu**
|
||||
- **ted**
|
||||
@ -938,13 +951,15 @@
|
||||
- **vk:wallpost**
|
||||
- **vlive**
|
||||
- **vlive:channel**
|
||||
- **vlive:playlist**
|
||||
- **Vodlocker**
|
||||
- **VODPl**
|
||||
- **VODPlatform**
|
||||
- **VoiceRepublic**
|
||||
- **Voot**
|
||||
- **VoxMedia**
|
||||
- **Vporn**
|
||||
- **vpro**: npo.nl and ntr.nl
|
||||
- **vpro**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
|
||||
- **Vrak**
|
||||
- **VRT**: deredactie.be, sporza.be, cobra.be and cobra.canvas.be
|
||||
- **vrv**
|
||||
@ -959,6 +974,7 @@
|
||||
- **washingtonpost**
|
||||
- **washingtonpost:article**
|
||||
- **wat.tv**
|
||||
- **WatchBox**
|
||||
- **WatchIndianPorn**: Watch Indian Porn
|
||||
- **WDR**
|
||||
- **wdr:mobile**
|
||||
@ -970,7 +986,7 @@
|
||||
- **wholecloud**: WholeCloud
|
||||
- **Wimp**
|
||||
- **Wistia**
|
||||
- **wnl**: npo.nl and ntr.nl
|
||||
- **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
|
||||
- **WorldStarHipHop**
|
||||
- **wrzuta.pl**
|
||||
- **wrzuta.pl:playlist**
|
||||
@ -994,7 +1010,7 @@
|
||||
- **XVideos**
|
||||
- **XXXYMovies**
|
||||
- **Yahoo**: Yahoo screen and movies
|
||||
- **Yam**: 蕃薯藤yam天空部落
|
||||
- **YandexDisk**
|
||||
- **yandexmusic:album**: Яндекс.Музыка - Альбом
|
||||
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
|
||||
- **yandexmusic:track**: Яндекс.Музыка - Трек
|
||||
|
@ -41,6 +41,7 @@ def _make_result(formats, **kwargs):
|
||||
'id': 'testid',
|
||||
'title': 'testttitle',
|
||||
'extractor': 'testex',
|
||||
'extractor_key': 'TestEx',
|
||||
}
|
||||
res.update(**kwargs)
|
||||
return res
|
||||
@ -370,6 +371,19 @@ class TestFormatSelection(unittest.TestCase):
|
||||
ydl = YDL({'format': 'best[height>360]'})
|
||||
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
|
||||
|
||||
def test_format_selection_issue_10083(self):
|
||||
# See https://github.com/rg3/youtube-dl/issues/10083
|
||||
formats = [
|
||||
{'format_id': 'regular', 'height': 360, 'url': TEST_URL},
|
||||
{'format_id': 'video', 'height': 720, 'acodec': 'none', 'url': TEST_URL},
|
||||
{'format_id': 'audio', 'vcodec': 'none', 'url': TEST_URL},
|
||||
]
|
||||
info_dict = _make_result(formats)
|
||||
|
||||
ydl = YDL({'format': 'best[height>360]/bestvideo[height>360]+bestaudio'})
|
||||
ydl.process_ie_result(info_dict.copy())
|
||||
self.assertEqual(ydl.downloaded_info_dicts[0]['format_id'], 'video+audio')
|
||||
|
||||
def test_invalid_format_specs(self):
|
||||
def assert_syntax_error(format_spec):
|
||||
ydl = YDL({'format': format_spec})
|
||||
@ -448,6 +462,17 @@ class TestFormatSelection(unittest.TestCase):
|
||||
pass
|
||||
self.assertEqual(ydl.downloaded_info_dicts, [])
|
||||
|
||||
def test_default_format_spec(self):
|
||||
ydl = YDL({'simulate': True})
|
||||
self.assertEqual(ydl._default_format_spec({}), 'bestvideo+bestaudio/best')
|
||||
|
||||
ydl = YDL({'outtmpl': '-'})
|
||||
self.assertEqual(ydl._default_format_spec({}), 'best')
|
||||
|
||||
ydl = YDL({})
|
||||
self.assertEqual(ydl._default_format_spec({}, download=False), 'bestvideo+bestaudio/best')
|
||||
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best')
|
||||
|
||||
|
||||
class TestYoutubeDL(unittest.TestCase):
|
||||
def test_subtitles(self):
|
||||
@ -527,6 +552,8 @@ class TestYoutubeDL(unittest.TestCase):
|
||||
'ext': 'mp4',
|
||||
'width': None,
|
||||
'height': 1080,
|
||||
'title1': '$PATH',
|
||||
'title2': '%PATH%',
|
||||
}
|
||||
|
||||
def fname(templ):
|
||||
@ -545,10 +572,14 @@ class TestYoutubeDL(unittest.TestCase):
|
||||
self.assertEqual(fname('%(height)0 6d.%(ext)s'), ' 01080.mp4')
|
||||
self.assertEqual(fname('%(height)0 6d.%(ext)s'), ' 01080.mp4')
|
||||
self.assertEqual(fname('%(height) 0 6d.%(ext)s'), ' 01080.mp4')
|
||||
self.assertEqual(fname('%%'), '%')
|
||||
self.assertEqual(fname('%%%%'), '%%')
|
||||
self.assertEqual(fname('%%(height)06d.%(ext)s'), '%(height)06d.mp4')
|
||||
self.assertEqual(fname('%(width)06d.%(ext)s'), 'NA.mp4')
|
||||
self.assertEqual(fname('%(width)06d.%%(ext)s'), 'NA.%(ext)s')
|
||||
self.assertEqual(fname('%%(width)06d.%(ext)s'), '%(width)06d.mp4')
|
||||
self.assertEqual(fname('Hello %(title1)s'), 'Hello $PATH')
|
||||
self.assertEqual(fname('Hello %(title2)s'), 'Hello %PATH%')
|
||||
|
||||
def test_format_note(self):
|
||||
ydl = YoutubeDL()
|
||||
@ -755,7 +786,8 @@ class TestYoutubeDL(unittest.TestCase):
|
||||
'_type': 'url_transparent',
|
||||
'url': 'foo2:',
|
||||
'ie_key': 'Foo2',
|
||||
'title': 'foo1 title'
|
||||
'title': 'foo1 title',
|
||||
'id': 'foo1_id',
|
||||
}
|
||||
|
||||
class Foo2IE(InfoExtractor):
|
||||
@ -781,6 +813,9 @@ class TestYoutubeDL(unittest.TestCase):
|
||||
downloaded = ydl.downloaded_info_dicts[0]
|
||||
self.assertEqual(downloaded['url'], TEST_URL)
|
||||
self.assertEqual(downloaded['title'], 'foo1 title')
|
||||
self.assertEqual(downloaded['id'], 'testid')
|
||||
self.assertEqual(downloaded['extractor'], 'testex')
|
||||
self.assertEqual(downloaded['extractor_key'], 'TestEx')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
26
test/test_options.py
Normal file
26
test/test_options.py
Normal file
@ -0,0 +1,26 @@
|
||||
# coding: utf-8
|
||||
|
||||
from __future__ import unicode_literals
|
||||
|
||||
# Allow direct execution
|
||||
import os
|
||||
import sys
|
||||
import unittest
|
||||
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
from youtube_dl.options import _hide_login_info
|
||||
|
||||
|
||||
class TestOptions(unittest.TestCase):
|
||||
def test_hide_login_info(self):
|
||||
self.assertEqual(_hide_login_info(['-u', 'foo', '-p', 'bar']),
|
||||
['-u', 'PRIVATE', '-p', 'PRIVATE'])
|
||||
self.assertEqual(_hide_login_info(['-u']), ['-u'])
|
||||
self.assertEqual(_hide_login_info(['-u', 'foo', '-u', 'bar']),
|
||||
['-u', 'PRIVATE', '-u', 'PRIVATE'])
|
||||
self.assertEqual(_hide_login_info(['--username=foo']),
|
||||
['--username=PRIVATE'])
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
@ -98,6 +98,7 @@ from youtube_dl.compat import (
|
||||
compat_chr,
|
||||
compat_etree_fromstring,
|
||||
compat_getenv,
|
||||
compat_os_name,
|
||||
compat_setenv,
|
||||
compat_urlparse,
|
||||
compat_parse_qs,
|
||||
@ -448,7 +449,9 @@ class TestUtil(unittest.TestCase):
|
||||
|
||||
def test_shell_quote(self):
|
||||
args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')]
|
||||
self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""")
|
||||
self.assertEqual(
|
||||
shell_quote(args),
|
||||
"""ffmpeg -i 'ñ€ß'"'"'.mp4'""" if compat_os_name != 'nt' else '''ffmpeg -i "ñ€ß'.mp4"''')
|
||||
|
||||
def test_str_to_int(self):
|
||||
self.assertEqual(str_to_int('123,456'), 123456)
|
||||
@ -932,7 +935,7 @@ class TestUtil(unittest.TestCase):
|
||||
def test_args_to_str(self):
|
||||
self.assertEqual(
|
||||
args_to_str(['foo', 'ba/r', '-baz', '2 be', '']),
|
||||
'foo ba/r -baz \'2 be\' \'\''
|
||||
'foo ba/r -baz \'2 be\' \'\'' if compat_os_name != 'nt' else 'foo ba/r -baz "2 be" ""'
|
||||
)
|
||||
|
||||
def test_parse_filesize(self):
|
||||
@ -1179,6 +1182,10 @@ part 3</font></u>
|
||||
cli_bool_option(
|
||||
{'nocheckcertificate': False}, '--check-certificate', 'nocheckcertificate', 'false', 'true', '='),
|
||||
['--check-certificate=true'])
|
||||
self.assertEqual(
|
||||
cli_bool_option(
|
||||
{}, '--check-certificate', 'nocheckcertificate', 'false', 'true', '='),
|
||||
[])
|
||||
|
||||
def test_ohdave_rsa_encrypt(self):
|
||||
N = 0xab86b6371b5318aaa1d3c9e612a9f1264f372323c8c0f19875b5fc3b3fd3afcc1e5bec527aa94bfa85bffc157e4245aebda05389a5357b75115ac94f074aefcd
|
||||
@ -1228,6 +1235,12 @@ part 3</font></u>
|
||||
self.assertEqual(get_element_by_attribute('class', 'foo', html), None)
|
||||
self.assertEqual(get_element_by_attribute('class', 'no-such-foo', html), None)
|
||||
|
||||
html = '''
|
||||
<div itemprop="author" itemscope>foo</div>
|
||||
'''
|
||||
|
||||
self.assertEqual(get_element_by_attribute('itemprop', 'author', html), 'foo')
|
||||
|
||||
def test_get_elements_by_class(self):
|
||||
html = '''
|
||||
<span class="foo bar">nice</span><span class="foo bar">also nice</span>
|
||||
|
@ -26,6 +26,8 @@ import tokenize
|
||||
import traceback
|
||||
import random
|
||||
|
||||
from string import ascii_letters
|
||||
|
||||
from .compat import (
|
||||
compat_basestring,
|
||||
compat_cookiejar,
|
||||
@ -674,7 +676,19 @@ class YoutubeDL(object):
|
||||
FORMAT_RE.format(numeric_field),
|
||||
r'%({0})s'.format(numeric_field), outtmpl)
|
||||
|
||||
filename = expand_path(outtmpl % template_dict)
|
||||
# expand_path translates '%%' into '%' and '$$' into '$'
|
||||
# correspondingly that is not what we want since we need to keep
|
||||
# '%%' intact for template dict substitution step. Working around
|
||||
# with boundary-alike separator hack.
|
||||
sep = ''.join([random.choice(ascii_letters) for _ in range(32)])
|
||||
outtmpl = outtmpl.replace('%%', '%{0}%'.format(sep)).replace('$$', '${0}$'.format(sep))
|
||||
|
||||
# outtmpl should be expand_path'ed before template dict substitution
|
||||
# because meta fields may contain env variables we don't want to
|
||||
# be expanded. For example, for outtmpl "%(title)s.%(ext)s" and
|
||||
# title "Hello $PATH", we don't want `$PATH` to be expanded.
|
||||
filename = expand_path(outtmpl).replace(sep, '') % template_dict
|
||||
|
||||
# Temporary fix for #4787
|
||||
# 'Treat' all problem characters by passing filename through preferredencoding
|
||||
# to workaround encoding issues with subprocess on python2 @ Windows
|
||||
@ -846,7 +860,7 @@ class YoutubeDL(object):
|
||||
|
||||
force_properties = dict(
|
||||
(k, v) for k, v in ie_result.items() if v is not None)
|
||||
for f in ('_type', 'url', 'ie_key'):
|
||||
for f in ('_type', 'url', 'id', 'extractor', 'extractor_key', 'ie_key'):
|
||||
if f in force_properties:
|
||||
del force_properties[f]
|
||||
new_result = info.copy()
|
||||
@ -1050,6 +1064,25 @@ class YoutubeDL(object):
|
||||
return op(actual_value, comparison_value)
|
||||
return _filter
|
||||
|
||||
def _default_format_spec(self, info_dict, download=True):
|
||||
req_format_list = []
|
||||
|
||||
def can_have_partial_formats():
|
||||
if self.params.get('simulate', False):
|
||||
return True
|
||||
if not download:
|
||||
return True
|
||||
if self.params.get('outtmpl', DEFAULT_OUTTMPL) == '-':
|
||||
return False
|
||||
if info_dict.get('is_live'):
|
||||
return False
|
||||
merger = FFmpegMergerPP(self)
|
||||
return merger.available and merger.can_merge()
|
||||
if can_have_partial_formats():
|
||||
req_format_list.append('bestvideo+bestaudio')
|
||||
req_format_list.append('best')
|
||||
return '/'.join(req_format_list)
|
||||
|
||||
def build_format_selector(self, format_spec):
|
||||
def syntax_error(note, start):
|
||||
message = (
|
||||
@ -1448,18 +1481,26 @@ class YoutubeDL(object):
|
||||
if not formats:
|
||||
raise ExtractorError('No video formats found!')
|
||||
|
||||
def is_wellformed(f):
|
||||
url = f.get('url')
|
||||
valid_url = url and isinstance(url, compat_str)
|
||||
if not valid_url:
|
||||
self.report_warning(
|
||||
'"url" field is missing or empty - skipping format, '
|
||||
'there is an error in extractor')
|
||||
return valid_url
|
||||
|
||||
# Filter out malformed formats for better extraction robustness
|
||||
formats = list(filter(is_wellformed, formats))
|
||||
|
||||
formats_dict = {}
|
||||
|
||||
# We check that all the formats have the format and format_id fields
|
||||
for i, format in enumerate(formats):
|
||||
if 'url' not in format:
|
||||
raise ExtractorError('Missing "url" key in result (index %d)' % i)
|
||||
|
||||
sanitize_string_field(format, 'format_id')
|
||||
sanitize_numeric_fields(format)
|
||||
format['url'] = sanitize_url(format['url'])
|
||||
|
||||
if format.get('format_id') is None:
|
||||
if not format.get('format_id'):
|
||||
format['format_id'] = compat_str(i)
|
||||
else:
|
||||
# Sanitize format_id from characters used in format selector expression
|
||||
@ -1512,14 +1553,10 @@ class YoutubeDL(object):
|
||||
|
||||
req_format = self.params.get('format')
|
||||
if req_format is None:
|
||||
req_format_list = []
|
||||
if (self.params.get('outtmpl', DEFAULT_OUTTMPL) != '-' and
|
||||
not info_dict.get('is_live')):
|
||||
merger = FFmpegMergerPP(self)
|
||||
if merger.available and merger.can_merge():
|
||||
req_format_list.append('bestvideo+bestaudio')
|
||||
req_format_list.append('best')
|
||||
req_format = '/'.join(req_format_list)
|
||||
req_format = self._default_format_spec(info_dict, download=download)
|
||||
if self.params.get('verbose'):
|
||||
self.to_stdout('[debug] Default format spec: %s' % req_format)
|
||||
|
||||
format_selector = self.build_format_selector(req_format)
|
||||
|
||||
# While in format selection we may need to have an access to the original
|
||||
@ -1882,7 +1919,7 @@ class YoutubeDL(object):
|
||||
info_dict.get('protocol') == 'm3u8' and
|
||||
self.params.get('hls_prefer_native')):
|
||||
if fixup_policy == 'warn':
|
||||
self.report_warning('%s: malformated aac bitstream.' % (
|
||||
self.report_warning('%s: malformed AAC bitstream detected.' % (
|
||||
info_dict['id']))
|
||||
elif fixup_policy == 'detect_or_warn':
|
||||
fixup_pp = FFmpegFixupM3u8PP(self)
|
||||
@ -1891,7 +1928,7 @@ class YoutubeDL(object):
|
||||
info_dict['__postprocessors'].append(fixup_pp)
|
||||
else:
|
||||
self.report_warning(
|
||||
'%s: malformated aac bitstream. %s'
|
||||
'%s: malformed AAC bitstream detected. %s'
|
||||
% (info_dict['id'], INSTALL_FFMPEG_MESSAGE))
|
||||
else:
|
||||
assert fixup_policy in ('ignore', 'never')
|
||||
|
@ -2,6 +2,7 @@ from __future__ import unicode_literals
|
||||
|
||||
from .fragment import FragmentFD
|
||||
from ..compat import compat_urllib_error
|
||||
from ..utils import urljoin
|
||||
|
||||
|
||||
class DashSegmentsFD(FragmentFD):
|
||||
@ -12,12 +13,13 @@ class DashSegmentsFD(FragmentFD):
|
||||
FD_NAME = 'dashsegments'
|
||||
|
||||
def real_download(self, filename, info_dict):
|
||||
segments = info_dict['fragments'][:1] if self.params.get(
|
||||
fragment_base_url = info_dict.get('fragment_base_url')
|
||||
fragments = info_dict['fragments'][:1] if self.params.get(
|
||||
'test', False) else info_dict['fragments']
|
||||
|
||||
ctx = {
|
||||
'filename': filename,
|
||||
'total_frags': len(segments),
|
||||
'total_frags': len(fragments),
|
||||
}
|
||||
|
||||
self._prepare_and_start_frag_download(ctx)
|
||||
@ -26,7 +28,7 @@ class DashSegmentsFD(FragmentFD):
|
||||
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
|
||||
|
||||
frag_index = 0
|
||||
for i, segment in enumerate(segments):
|
||||
for i, fragment in enumerate(fragments):
|
||||
frag_index += 1
|
||||
if frag_index <= ctx['fragment_index']:
|
||||
continue
|
||||
@ -36,7 +38,11 @@ class DashSegmentsFD(FragmentFD):
|
||||
count = 0
|
||||
while count <= fragment_retries:
|
||||
try:
|
||||
success, frag_content = self._download_fragment(ctx, segment['url'], info_dict)
|
||||
fragment_url = fragment.get('url')
|
||||
if not fragment_url:
|
||||
assert fragment_base_url
|
||||
fragment_url = urljoin(fragment_base_url, fragment['path'])
|
||||
success, frag_content = self._download_fragment(ctx, fragment_url, info_dict)
|
||||
if not success:
|
||||
return False
|
||||
self._append_fragment(ctx, frag_content)
|
||||
|
@ -59,9 +59,9 @@ class HlsFD(FragmentFD):
|
||||
man_url = info_dict['url']
|
||||
self.to_screen('[%s] Downloading m3u8 manifest' % self.FD_NAME)
|
||||
|
||||
manifest = self.ydl.urlopen(self._prepare_url(info_dict, man_url)).read()
|
||||
|
||||
s = manifest.decode('utf-8', 'ignore')
|
||||
urlh = self.ydl.urlopen(self._prepare_url(info_dict, man_url))
|
||||
man_url = urlh.geturl()
|
||||
s = urlh.read().decode('utf-8', 'ignore')
|
||||
|
||||
if not self.can_download(s, info_dict):
|
||||
if info_dict.get('extra_param_to_segment_url'):
|
||||
|
@ -98,7 +98,7 @@ def write_piff_header(stream, params):
|
||||
|
||||
if is_audio:
|
||||
smhd_payload = s88.pack(0) # balance
|
||||
smhd_payload = u16.pack(0) # reserved
|
||||
smhd_payload += u16.pack(0) # reserved
|
||||
media_header_box = full_box(b'smhd', 0, 0, smhd_payload) # Sound Media Header
|
||||
else:
|
||||
vmhd_payload = u16.pack(0) # graphics mode
|
||||
@ -126,7 +126,6 @@ def write_piff_header(stream, params):
|
||||
if fourcc == 'AACL':
|
||||
sample_entry_box = box(b'mp4a', sample_entry_payload)
|
||||
else:
|
||||
sample_entry_payload = sample_entry_payload
|
||||
sample_entry_payload += u16.pack(0) # pre defined
|
||||
sample_entry_payload += u16.pack(0) # reserved
|
||||
sample_entry_payload += u32.pack(0) * 3 # pre defined
|
||||
|
@ -3,11 +3,13 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
js_to_json,
|
||||
int_or_none,
|
||||
parse_iso8601,
|
||||
try_get,
|
||||
)
|
||||
|
||||
|
||||
@ -124,7 +126,20 @@ class ABCIViewIE(InfoExtractor):
|
||||
title = video_params.get('title') or video_params['seriesTitle']
|
||||
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
|
||||
|
||||
formats = self._extract_akamai_formats(stream['hds-unmetered'], video_id)
|
||||
format_urls = [
|
||||
try_get(stream, lambda x: x['hds-unmetered'], compat_str)]
|
||||
|
||||
# May have higher quality video
|
||||
sd_url = try_get(
|
||||
stream, lambda x: x['streams']['hds']['sd'], compat_str)
|
||||
if sd_url:
|
||||
format_urls.append(sd_url.replace('metered', 'um'))
|
||||
|
||||
formats = []
|
||||
for format_url in format_urls:
|
||||
if format_url:
|
||||
formats.extend(
|
||||
self._extract_akamai_formats(format_url, video_id))
|
||||
self._sort_formats(formats)
|
||||
|
||||
subtitles = {}
|
||||
|
@ -107,11 +107,13 @@ class ADNIE(InfoExtractor):
|
||||
metas = options.get('metas') or {}
|
||||
title = metas.get('title') or video_info['title']
|
||||
links = player_config.get('links') or {}
|
||||
error = None
|
||||
if not links:
|
||||
links_url = player_config['linksurl']
|
||||
links_data = self._download_json(urljoin(
|
||||
self._BASE_URL, links_url), video_id)
|
||||
links = links_data.get('links') or {}
|
||||
error = links_data.get('error')
|
||||
|
||||
formats = []
|
||||
for format_id, qualities in links.items():
|
||||
@ -130,7 +132,8 @@ class ADNIE(InfoExtractor):
|
||||
for f in m3u8_formats:
|
||||
f['language'] = 'fr'
|
||||
formats.extend(m3u8_formats)
|
||||
error = options.get('error')
|
||||
if not error:
|
||||
error = options.get('error')
|
||||
if not formats and error:
|
||||
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
|
||||
self._sort_formats(formats)
|
||||
|
@ -15,6 +15,7 @@ from ..utils import (
|
||||
urlencode_postdata,
|
||||
unified_timestamp,
|
||||
ExtractorError,
|
||||
NO_DEFAULT,
|
||||
)
|
||||
|
||||
|
||||
@ -24,6 +25,11 @@ MSO_INFO = {
|
||||
'username_field': 'username',
|
||||
'password_field': 'password',
|
||||
},
|
||||
'ATTOTT': {
|
||||
'name': 'DIRECTV NOW',
|
||||
'username_field': 'email',
|
||||
'password_field': 'loginpassword',
|
||||
},
|
||||
'Rogers': {
|
||||
'name': 'Rogers',
|
||||
'username_field': 'UserName',
|
||||
@ -1316,6 +1322,8 @@ class AdobePassIE(InfoExtractor):
|
||||
_USER_AGENT = 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0'
|
||||
_MVPD_CACHE = 'ap-mvpd'
|
||||
|
||||
_DOWNLOADING_LOGIN_PAGE = 'Downloading Provider Login Page'
|
||||
|
||||
def _download_webpage_handle(self, *args, **kwargs):
|
||||
headers = kwargs.get('headers', {})
|
||||
headers.update(self.geo_verification_headers())
|
||||
@ -1365,6 +1373,21 @@ class AdobePassIE(InfoExtractor):
|
||||
'Use --ap-mso to specify Adobe Pass Multiple-system operator Identifier '
|
||||
'and --ap-username and --ap-password or --netrc to provide account credentials.', expected=True)
|
||||
|
||||
def extract_redirect_url(html, url=None, fatal=False):
|
||||
# TODO: eliminate code duplication with generic extractor and move
|
||||
# redirection code into _download_webpage_handle
|
||||
REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)'
|
||||
redirect_url = self._search_regex(
|
||||
r'(?i)<meta\s+(?=(?:[a-z-]+="[^"]+"\s+)*http-equiv="refresh")'
|
||||
r'(?:[a-z-]+="[^"]+"\s+)*?content="%s' % REDIRECT_REGEX,
|
||||
html, 'meta refresh redirect',
|
||||
default=NO_DEFAULT if fatal else None, fatal=fatal)
|
||||
if not redirect_url:
|
||||
return None
|
||||
if url:
|
||||
redirect_url = compat_urlparse.urljoin(url, unescapeHTML(redirect_url))
|
||||
return redirect_url
|
||||
|
||||
mvpd_headers = {
|
||||
'ap_42': 'anonymous',
|
||||
'ap_11': 'Linux i686',
|
||||
@ -1414,16 +1437,15 @@ class AdobePassIE(InfoExtractor):
|
||||
if '<form name="signin"' in provider_redirect_page:
|
||||
provider_login_page_res = provider_redirect_page_res
|
||||
elif 'http-equiv="refresh"' in provider_redirect_page:
|
||||
oauth_redirect_url = self._html_search_regex(
|
||||
r'content="0;\s*url=([^\'"]+)',
|
||||
provider_redirect_page, 'meta refresh redirect')
|
||||
oauth_redirect_url = extract_redirect_url(
|
||||
provider_redirect_page, fatal=True)
|
||||
provider_login_page_res = self._download_webpage_handle(
|
||||
oauth_redirect_url, video_id,
|
||||
'Downloading Provider Login Page')
|
||||
self._DOWNLOADING_LOGIN_PAGE)
|
||||
else:
|
||||
provider_login_page_res = post_form(
|
||||
provider_redirect_page_res,
|
||||
'Downloading Provider Login Page')
|
||||
self._DOWNLOADING_LOGIN_PAGE)
|
||||
|
||||
mvpd_confirm_page_res = post_form(
|
||||
provider_login_page_res, 'Logging in', {
|
||||
@ -1470,8 +1492,17 @@ class AdobePassIE(InfoExtractor):
|
||||
'Content-Type': 'application/x-www-form-urlencoded'
|
||||
})
|
||||
else:
|
||||
# Some providers (e.g. DIRECTV NOW) have another meta refresh
|
||||
# based redirect that should be followed.
|
||||
provider_redirect_page, urlh = provider_redirect_page_res
|
||||
provider_refresh_redirect_url = extract_redirect_url(
|
||||
provider_redirect_page, url=urlh.geturl())
|
||||
if provider_refresh_redirect_url:
|
||||
provider_redirect_page_res = self._download_webpage_handle(
|
||||
provider_refresh_redirect_url, video_id,
|
||||
'Downloading Provider Redirect Page (meta refresh)')
|
||||
provider_login_page_res = post_form(
|
||||
provider_redirect_page_res, 'Downloading Provider Login Page')
|
||||
provider_redirect_page_res, self._DOWNLOADING_LOGIN_PAGE)
|
||||
mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', {
|
||||
mso_info.get('username_field', 'username'): username,
|
||||
mso_info.get('password_field', 'password'): password,
|
||||
|
@ -3,9 +3,10 @@ from __future__ import unicode_literals
|
||||
|
||||
from .theplatform import ThePlatformIE
|
||||
from ..utils import (
|
||||
update_url_query,
|
||||
parse_age_limit,
|
||||
int_or_none,
|
||||
parse_age_limit,
|
||||
try_get,
|
||||
update_url_query,
|
||||
)
|
||||
|
||||
|
||||
@ -68,7 +69,8 @@ class AMCNetworksIE(ThePlatformIE):
|
||||
info = self._parse_theplatform_metadata(theplatform_metadata)
|
||||
video_id = theplatform_metadata['pid']
|
||||
title = theplatform_metadata['title']
|
||||
rating = theplatform_metadata['ratings'][0]['rating']
|
||||
rating = try_get(
|
||||
theplatform_metadata, lambda x: x['ratings'][0]['rating'])
|
||||
auth_required = self._search_regex(
|
||||
r'window\.authRequired\s*=\s*(true|false);',
|
||||
webpage, 'auth required')
|
||||
|
@ -3,13 +3,13 @@ from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
HEADRequest,
|
||||
int_or_none,
|
||||
mimetype2ext,
|
||||
)
|
||||
|
||||
|
||||
class AparatIE(InfoExtractor):
|
||||
_VALID_URL = r'^https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)'
|
||||
_VALID_URL = r'https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://www.aparat.com/v/wP8On',
|
||||
@ -29,30 +29,41 @@ class AparatIE(InfoExtractor):
|
||||
# Note: There is an easier-to-parse configuration at
|
||||
# http://www.aparat.com/video/video/config/videohash/%video_id
|
||||
# but the URL in there does not work
|
||||
embed_url = 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id
|
||||
webpage = self._download_webpage(embed_url, video_id)
|
||||
|
||||
file_list = self._parse_json(self._search_regex(
|
||||
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage, 'file list'), video_id)
|
||||
for i, item in enumerate(file_list[0]):
|
||||
video_url = item['file']
|
||||
req = HEADRequest(video_url)
|
||||
res = self._request_webpage(
|
||||
req, video_id, note='Testing video URL %d' % i, errnote=False)
|
||||
if res:
|
||||
break
|
||||
else:
|
||||
raise ExtractorError('No working video URLs found')
|
||||
webpage = self._download_webpage(
|
||||
'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
|
||||
video_id)
|
||||
|
||||
title = self._search_regex(r'\s+title:\s*"([^"]+)"', webpage, 'title')
|
||||
|
||||
file_list = self._parse_json(
|
||||
self._search_regex(
|
||||
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage,
|
||||
'file list'),
|
||||
video_id)
|
||||
|
||||
formats = []
|
||||
for item in file_list[0]:
|
||||
file_url = item.get('file')
|
||||
if not file_url:
|
||||
continue
|
||||
ext = mimetype2ext(item.get('type'))
|
||||
label = item.get('label')
|
||||
formats.append({
|
||||
'url': file_url,
|
||||
'ext': ext,
|
||||
'format_id': label or ext,
|
||||
'height': int_or_none(self._search_regex(
|
||||
r'(\d+)[pP]', label or '', 'height', default=None)),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
thumbnail = self._search_regex(
|
||||
r'image:\s*"([^"]+)"', webpage, 'thumbnail', fatal=False)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'url': video_url,
|
||||
'ext': 'mp4',
|
||||
'thumbnail': thumbnail,
|
||||
'age_limit': self._family_friendly_search(webpage),
|
||||
'formats': formats,
|
||||
}
|
||||
|
@ -93,6 +93,7 @@ class ARDMediathekIE(InfoExtractor):
|
||||
|
||||
duration = int_or_none(media_info.get('_duration'))
|
||||
thumbnail = media_info.get('_previewImage')
|
||||
is_live = media_info.get('_isLive') is True
|
||||
|
||||
subtitles = {}
|
||||
subtitle_url = media_info.get('_subtitleUrl')
|
||||
@ -106,6 +107,7 @@ class ARDMediathekIE(InfoExtractor):
|
||||
'id': video_id,
|
||||
'duration': duration,
|
||||
'thumbnail': thumbnail,
|
||||
'is_live': is_live,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
@ -166,9 +168,11 @@ class ARDMediathekIE(InfoExtractor):
|
||||
# determine video id from url
|
||||
m = re.match(self._VALID_URL, url)
|
||||
|
||||
document_id = None
|
||||
|
||||
numid = re.search(r'documentId=([0-9]+)', url)
|
||||
if numid:
|
||||
video_id = numid.group(1)
|
||||
document_id = video_id = numid.group(1)
|
||||
else:
|
||||
video_id = m.group('video_id')
|
||||
|
||||
@ -228,12 +232,16 @@ class ARDMediathekIE(InfoExtractor):
|
||||
'formats': formats,
|
||||
}
|
||||
else: # request JSON file
|
||||
if not document_id:
|
||||
video_id = self._search_regex(
|
||||
r'/play/(?:config|media)/(\d+)', webpage, 'media id')
|
||||
info = self._extract_media_info(
|
||||
'http://www.ardmediathek.de/play/media/%s' % video_id, webpage, video_id)
|
||||
'http://www.ardmediathek.de/play/media/%s' % video_id,
|
||||
webpage, video_id)
|
||||
|
||||
info.update({
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'title': self._live_title(title) if info.get('is_live') else title,
|
||||
'description': description,
|
||||
'thumbnail': thumbnail,
|
||||
})
|
||||
|
@ -43,7 +43,7 @@ class AudioBoomIE(InfoExtractor):
|
||||
|
||||
def from_clip(field):
|
||||
if clip:
|
||||
clip.get(field)
|
||||
return clip.get(field)
|
||||
|
||||
audio_url = from_clip('clipURLPriorToLoading') or self._og_search_property(
|
||||
'audio', webpage, 'audio url')
|
||||
|
@ -36,7 +36,7 @@ class BBCCoUkIE(InfoExtractor):
|
||||
(?:
|
||||
programmes/(?!articles/)|
|
||||
iplayer(?:/[^/]+)?/(?:episode/|playlist/)|
|
||||
music/clips[/#]|
|
||||
music/(?:clips|audiovideo/popular)[/#]|
|
||||
radio/player/
|
||||
)
|
||||
(?P<id>%s)(?!/(?:episodes|broadcasts|clips))
|
||||
@ -229,8 +229,10 @@ class BBCCoUkIE(InfoExtractor):
|
||||
}, {
|
||||
'url': 'http://www.bbc.co.uk/radio/player/p03cchwf',
|
||||
'only_matching': True,
|
||||
}
|
||||
]
|
||||
}, {
|
||||
'url': 'https://www.bbc.co.uk/music/audiovideo/popular#p055bc55',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
_USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
|
||||
|
||||
@ -523,6 +525,12 @@ class BBCCoUkIE(InfoExtractor):
|
||||
|
||||
webpage = self._download_webpage(url, group_id, 'Downloading video page')
|
||||
|
||||
error = self._search_regex(
|
||||
r'<div\b[^>]+\bclass=["\']smp__message delta["\'][^>]*>([^<]+)<',
|
||||
webpage, 'error', default=None)
|
||||
if error:
|
||||
raise ExtractorError(error, expected=True)
|
||||
|
||||
programme_id = None
|
||||
duration = None
|
||||
|
||||
|
@ -84,9 +84,10 @@ class BuzzFeedIE(InfoExtractor):
|
||||
continue
|
||||
entries.append(self.url_result(video['url']))
|
||||
|
||||
facebook_url = FacebookIE._extract_url(webpage)
|
||||
if facebook_url:
|
||||
entries.append(self.url_result(facebook_url))
|
||||
facebook_urls = FacebookIE._extract_urls(webpage)
|
||||
entries.extend([
|
||||
self.url_result(facebook_url)
|
||||
for facebook_url in facebook_urls])
|
||||
|
||||
return {
|
||||
'_type': 'playlist',
|
||||
|
@ -15,19 +15,23 @@ class CBSNewsIE(CBSIE):
|
||||
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'http://www.cbsnews.com/news/tesla-and-spacex-elon-musks-industrial-empire/',
|
||||
# 60 minutes
|
||||
'url': 'http://www.cbsnews.com/news/artificial-intelligence-positioned-to-be-a-game-changer/',
|
||||
'info_dict': {
|
||||
'id': 'tesla-and-spacex-elon-musks-industrial-empire',
|
||||
'ext': 'flv',
|
||||
'title': 'Tesla and SpaceX: Elon Musk\'s industrial empire',
|
||||
'thumbnail': 'http://beta.img.cbsnews.com/i/2014/03/30/60147937-2f53-4565-ad64-1bdd6eb64679/60-0330-pelley-640x360.jpg',
|
||||
'duration': 791,
|
||||
'id': '_B6Ga3VJrI4iQNKsir_cdFo9Re_YJHE_',
|
||||
'ext': 'mp4',
|
||||
'title': 'Artificial Intelligence',
|
||||
'description': 'md5:8818145f9974431e0fb58a1b8d69613c',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'duration': 1606,
|
||||
'uploader': 'CBSI-NEW',
|
||||
'timestamp': 1498431900,
|
||||
'upload_date': '20170625',
|
||||
},
|
||||
'params': {
|
||||
# rtmp download
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'Subscribers only',
|
||||
},
|
||||
{
|
||||
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
|
||||
@ -52,6 +56,22 @@ class CBSNewsIE(CBSIE):
|
||||
'skip_download': True,
|
||||
},
|
||||
},
|
||||
{
|
||||
# 48 hours
|
||||
'url': 'http://www.cbsnews.com/news/maria-ridulph-murder-will-the-nations-oldest-cold-case-to-go-to-trial-ever-get-solved/',
|
||||
'info_dict': {
|
||||
'id': 'QpM5BJjBVEAUFi7ydR9LusS69DPLqPJ1',
|
||||
'ext': 'mp4',
|
||||
'title': 'Cold as Ice',
|
||||
'description': 'Can a childhood memory of a friend\'s murder solve a 1957 cold case? "48 Hours" correspondent Erin Moriarty has the latest.',
|
||||
'upload_date': '20170604',
|
||||
'timestamp': 1496538000,
|
||||
'uploader': 'CBSI-NEW',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -60,7 +80,7 @@ class CBSNewsIE(CBSIE):
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
video_info = self._parse_json(self._html_search_regex(
|
||||
r'(?:<ul class="media-list items" id="media-related-items"><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
|
||||
r'(?:<ul class="media-list items" id="media-related-items"[^>]*><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
|
||||
webpage, 'video JSON info', default='{}'), video_id, fatal=False)
|
||||
|
||||
if video_info:
|
||||
|
@ -9,12 +9,20 @@ from ..utils import (
|
||||
|
||||
|
||||
class CinchcastIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://player\.cinchcast\.com/.*?assetId=(?P<id>[0-9]+)'
|
||||
_TEST = {
|
||||
_VALID_URL = r'https?://player\.cinchcast\.com/.*?(?:assetId|show_id)=(?P<id>[0-9]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://player.cinchcast.com/?show_id=5258197&platformId=1&assetType=single',
|
||||
'info_dict': {
|
||||
'id': '5258197',
|
||||
'ext': 'mp3',
|
||||
'title': 'Train Your Brain to Up Your Game with Coach Mandy',
|
||||
'upload_date': '20130816',
|
||||
},
|
||||
}, {
|
||||
# Actual test is run in generic, look for undergroundwellness
|
||||
'url': 'http://player.cinchcast.com/?platformId=1&assetType=single&assetId=7141703',
|
||||
'only_matching': True,
|
||||
}
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
72
youtube_dl/extractor/cjsw.py
Normal file
72
youtube_dl/extractor/cjsw.py
Normal file
@ -0,0 +1,72 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
unescapeHTML,
|
||||
)
|
||||
|
||||
|
||||
class CJSWIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?cjsw\.com/program/(?P<program>[^/]+)/episode/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://cjsw.com/program/freshly-squeezed/episode/20170620',
|
||||
'md5': 'cee14d40f1e9433632c56e3d14977120',
|
||||
'info_dict': {
|
||||
'id': '91d9f016-a2e7-46c5-8dcb-7cbcd7437c41',
|
||||
'ext': 'mp3',
|
||||
'title': 'Freshly Squeezed – Episode June 20, 2017',
|
||||
'description': 'md5:c967d63366c3898a80d0c7b0ff337202',
|
||||
'series': 'Freshly Squeezed',
|
||||
'episode_id': '20170620',
|
||||
},
|
||||
}, {
|
||||
# no description
|
||||
'url': 'http://cjsw.com/program/road-pops/episode/20170707/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
program, episode_id = mobj.group('program', 'id')
|
||||
audio_id = '%s/%s' % (program, episode_id)
|
||||
|
||||
webpage = self._download_webpage(url, episode_id)
|
||||
|
||||
title = unescapeHTML(self._search_regex(
|
||||
(r'<h1[^>]+class=["\']episode-header__title["\'][^>]*>(?P<title>[^<]+)',
|
||||
r'data-audio-title=(["\'])(?P<title>(?:(?!\1).)+)\1'),
|
||||
webpage, 'title', group='title'))
|
||||
|
||||
audio_url = self._search_regex(
|
||||
r'<button[^>]+data-audio-src=(["\'])(?P<url>(?:(?!\1).)+)\1',
|
||||
webpage, 'audio url', group='url')
|
||||
|
||||
audio_id = self._search_regex(
|
||||
r'/([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})\.mp3',
|
||||
audio_url, 'audio id', default=audio_id)
|
||||
|
||||
formats = [{
|
||||
'url': audio_url,
|
||||
'ext': determine_ext(audio_url, 'mp3'),
|
||||
'vcodec': 'none',
|
||||
}]
|
||||
|
||||
description = self._html_search_regex(
|
||||
r'<p>(?P<description>.+?)</p>', webpage, 'description',
|
||||
default=None)
|
||||
series = self._search_regex(
|
||||
r'data-showname=(["\'])(?P<name>(?:(?!\1).)+)\1', webpage,
|
||||
'series', default=program, group='name')
|
||||
|
||||
return {
|
||||
'id': audio_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'formats': formats,
|
||||
'series': series,
|
||||
'episode_id': episode_id,
|
||||
}
|
@ -1,67 +0,0 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
unified_strdate,
|
||||
)
|
||||
|
||||
|
||||
class ClipfishIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?clipfish\.de/(?:[^/]+/)+video/(?P<id>[0-9]+)'
|
||||
_TEST = {
|
||||
'url': 'http://www.clipfish.de/special/ugly-americans/video/4343170/s01-e01-ugly-americans-date-in-der-hoelle/',
|
||||
'md5': 'b9a5dc46294154c1193e2d10e0c95693',
|
||||
'info_dict': {
|
||||
'id': '4343170',
|
||||
'ext': 'mp4',
|
||||
'title': 'S01 E01 - Ugly Americans - Date in der Hölle',
|
||||
'description': 'Mark Lilly arbeitet im Sozialdienst der Stadt New York und soll Immigranten bei ihrer Einbürgerung in die USA zur Seite stehen.',
|
||||
'upload_date': '20161005',
|
||||
'duration': 1291,
|
||||
'view_count': int,
|
||||
}
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
video_info = self._download_json(
|
||||
'http://www.clipfish.de/devapi/id/%s?format=json&apikey=hbbtv' % video_id,
|
||||
video_id)['items'][0]
|
||||
|
||||
formats = []
|
||||
|
||||
m3u8_url = video_info.get('media_videourl_hls')
|
||||
if m3u8_url:
|
||||
formats.append({
|
||||
'url': m3u8_url.replace('de.hls.fra.clipfish.de', 'hls.fra.clipfish.de'),
|
||||
'ext': 'mp4',
|
||||
'format_id': 'hls',
|
||||
})
|
||||
|
||||
mp4_url = video_info.get('media_videourl')
|
||||
if mp4_url:
|
||||
formats.append({
|
||||
'url': mp4_url,
|
||||
'format_id': 'mp4',
|
||||
'width': int_or_none(video_info.get('width')),
|
||||
'height': int_or_none(video_info.get('height')),
|
||||
'tbr': int_or_none(video_info.get('bitrate')),
|
||||
})
|
||||
|
||||
descr = video_info.get('descr')
|
||||
if descr:
|
||||
descr = descr.strip()
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': video_info['title'],
|
||||
'description': descr,
|
||||
'formats': formats,
|
||||
'thumbnail': video_info.get('media_content_thumbnail_large') or video_info.get('media_thumbnail'),
|
||||
'duration': int_or_none(video_info.get('media_length')),
|
||||
'upload_date': unified_strdate(video_info.get('pubDate')),
|
||||
'view_count': int_or_none(video_info.get('media_views'))
|
||||
}
|
@ -30,7 +30,11 @@ class CloudyIE(InfoExtractor):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(
|
||||
'http://www.cloudy.ec/embed.php?id=%s' % video_id, video_id)
|
||||
'https://www.cloudy.ec/embed.php', video_id, query={
|
||||
'id': video_id,
|
||||
'playerPage': 1,
|
||||
'autoplay': 1,
|
||||
})
|
||||
|
||||
info = self._parse_html5_media_entries(url, webpage, video_id)[0]
|
||||
|
||||
|
@ -730,12 +730,12 @@ class InfoExtractor(object):
|
||||
video_info['title'] = video_title
|
||||
return video_info
|
||||
|
||||
def playlist_from_matches(self, matches, video_id, video_title, getter=None, ie=None):
|
||||
urlrs = orderedSet(
|
||||
def playlist_from_matches(self, matches, playlist_id=None, playlist_title=None, getter=None, ie=None):
|
||||
urls = orderedSet(
|
||||
self.url_result(self._proto_relative_url(getter(m) if getter else m), ie)
|
||||
for m in matches)
|
||||
return self.playlist_result(
|
||||
urlrs, playlist_id=video_id, playlist_title=video_title)
|
||||
urls, playlist_id=playlist_id, playlist_title=playlist_title)
|
||||
|
||||
@staticmethod
|
||||
def playlist_result(entries, playlist_id=None, playlist_title=None, playlist_description=None):
|
||||
@ -940,7 +940,8 @@ class InfoExtractor(object):
|
||||
|
||||
def _family_friendly_search(self, html):
|
||||
# See http://schema.org/VideoObject
|
||||
family_friendly = self._html_search_meta('isFamilyFriendly', html)
|
||||
family_friendly = self._html_search_meta(
|
||||
'isFamilyFriendly', html, default=None)
|
||||
|
||||
if not family_friendly:
|
||||
return None
|
||||
@ -1002,17 +1003,17 @@ class InfoExtractor(object):
|
||||
item_type = e.get('@type')
|
||||
if expected_type is not None and expected_type != item_type:
|
||||
return info
|
||||
if item_type == 'TVEpisode':
|
||||
if item_type in ('TVEpisode', 'Episode'):
|
||||
info.update({
|
||||
'episode': unescapeHTML(e.get('name')),
|
||||
'episode_number': int_or_none(e.get('episodeNumber')),
|
||||
'description': unescapeHTML(e.get('description')),
|
||||
})
|
||||
part_of_season = e.get('partOfSeason')
|
||||
if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
|
||||
if isinstance(part_of_season, dict) and part_of_season.get('@type') in ('TVSeason', 'Season', 'CreativeWorkSeason'):
|
||||
info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
|
||||
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
|
||||
if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
|
||||
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
|
||||
info['series'] = unescapeHTML(part_of_series.get('name'))
|
||||
elif item_type == 'Article':
|
||||
info.update({
|
||||
@ -1022,10 +1023,10 @@ class InfoExtractor(object):
|
||||
})
|
||||
elif item_type == 'VideoObject':
|
||||
extract_video_object(e)
|
||||
elif item_type == 'WebPage':
|
||||
video = e.get('video')
|
||||
if isinstance(video, dict) and video.get('@type') == 'VideoObject':
|
||||
extract_video_object(video)
|
||||
continue
|
||||
video = e.get('video')
|
||||
if isinstance(video, dict) and video.get('@type') == 'VideoObject':
|
||||
extract_video_object(video)
|
||||
break
|
||||
return dict((k, v) for k, v in info.items() if v is not None)
|
||||
|
||||
@ -1892,9 +1893,13 @@ class InfoExtractor(object):
|
||||
'Bandwidth': bandwidth,
|
||||
}
|
||||
|
||||
def location_key(location):
|
||||
return 'url' if re.match(r'^https?://', location) else 'path'
|
||||
|
||||
if 'segment_urls' not in representation_ms_info and 'media' in representation_ms_info:
|
||||
|
||||
media_template = prepare_template('media', ('Number', 'Bandwidth', 'Time'))
|
||||
media_location_key = location_key(media_template)
|
||||
|
||||
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
|
||||
# can't be used at the same time
|
||||
@ -1904,7 +1909,7 @@ class InfoExtractor(object):
|
||||
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
|
||||
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
|
||||
representation_ms_info['fragments'] = [{
|
||||
'url': media_template % {
|
||||
media_location_key: media_template % {
|
||||
'Number': segment_number,
|
||||
'Bandwidth': bandwidth,
|
||||
},
|
||||
@ -1928,7 +1933,7 @@ class InfoExtractor(object):
|
||||
'Number': segment_number,
|
||||
}
|
||||
representation_ms_info['fragments'].append({
|
||||
'url': segment_url,
|
||||
media_location_key: segment_url,
|
||||
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
|
||||
})
|
||||
|
||||
@ -1952,8 +1957,9 @@ class InfoExtractor(object):
|
||||
for s in representation_ms_info['s']:
|
||||
duration = float_or_none(s['d'], timescale)
|
||||
for r in range(s.get('r', 0) + 1):
|
||||
segment_uri = representation_ms_info['segment_urls'][segment_index]
|
||||
fragments.append({
|
||||
'url': representation_ms_info['segment_urls'][segment_index],
|
||||
location_key(segment_uri): segment_uri,
|
||||
'duration': duration,
|
||||
})
|
||||
segment_index += 1
|
||||
@ -1962,6 +1968,7 @@ class InfoExtractor(object):
|
||||
# No fragments key is present in this case.
|
||||
if 'fragments' in representation_ms_info:
|
||||
f.update({
|
||||
'fragment_base_url': base_url,
|
||||
'fragments': [],
|
||||
'protocol': 'http_dash_segments',
|
||||
})
|
||||
@ -1969,10 +1976,8 @@ class InfoExtractor(object):
|
||||
initialization_url = representation_ms_info['initialization_url']
|
||||
if not f.get('url'):
|
||||
f['url'] = initialization_url
|
||||
f['fragments'].append({'url': initialization_url})
|
||||
f['fragments'].append({location_key(initialization_url): initialization_url})
|
||||
f['fragments'].extend(representation_ms_info['fragments'])
|
||||
for fragment in f['fragments']:
|
||||
fragment['url'] = urljoin(base_url, fragment['url'])
|
||||
try:
|
||||
existing_format = next(
|
||||
fo for fo in formats
|
||||
@ -2110,9 +2115,9 @@ class InfoExtractor(object):
|
||||
return f
|
||||
return {}
|
||||
|
||||
def _media_formats(src, cur_media_type):
|
||||
def _media_formats(src, cur_media_type, type_info={}):
|
||||
full_url = absolute_url(src)
|
||||
ext = determine_ext(full_url)
|
||||
ext = type_info.get('ext') or determine_ext(full_url)
|
||||
if ext == 'm3u8':
|
||||
is_plain_url = False
|
||||
formats = self._extract_m3u8_formats(
|
||||
@ -2132,15 +2137,18 @@ class InfoExtractor(object):
|
||||
return is_plain_url, formats
|
||||
|
||||
entries = []
|
||||
# amp-video and amp-audio are very similar to their HTML5 counterparts
|
||||
# so we wll include them right here (see
|
||||
# https://www.ampproject.org/docs/reference/components/amp-video)
|
||||
media_tags = [(media_tag, media_type, '')
|
||||
for media_tag, media_type
|
||||
in re.findall(r'(?s)(<(video|audio)[^>]*/>)', webpage)]
|
||||
in re.findall(r'(?s)(<(?:amp-)?(video|audio)[^>]*/>)', webpage)]
|
||||
media_tags.extend(re.findall(
|
||||
# We only allow video|audio followed by a whitespace or '>'.
|
||||
# Allowing more characters may end up in significant slow down (see
|
||||
# https://github.com/rg3/youtube-dl/issues/11979, example URL:
|
||||
# http://www.porntrex.com/maps/videositemap.xml).
|
||||
r'(?s)(<(?P<tag>video|audio)(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage))
|
||||
r'(?s)(<(?P<tag>(?:amp-)?(?:video|audio))(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage))
|
||||
for media_tag, media_type, media_content in media_tags:
|
||||
media_info = {
|
||||
'formats': [],
|
||||
@ -2158,9 +2166,9 @@ class InfoExtractor(object):
|
||||
src = source_attributes.get('src')
|
||||
if not src:
|
||||
continue
|
||||
is_plain_url, formats = _media_formats(src, media_type)
|
||||
f = parse_content_type(source_attributes.get('type'))
|
||||
is_plain_url, formats = _media_formats(src, media_type, f)
|
||||
if is_plain_url:
|
||||
f = parse_content_type(source_attributes.get('type'))
|
||||
f.update(formats[0])
|
||||
media_info['formats'].append(f)
|
||||
else:
|
||||
|
@ -510,7 +510,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
|
||||
|
||||
# webpage provide more accurate data than series_title from XML
|
||||
series = self._html_search_regex(
|
||||
r'id=["\']showmedia_about_episode_num[^>]+>\s*<a[^>]+>([^<]+)',
|
||||
r'(?s)<h\d[^>]+\bid=["\']showmedia_about_episode_num[^>]+>(.+?)</h\d',
|
||||
webpage, 'series', fatal=False)
|
||||
season = xpath_text(metadata, 'series_title')
|
||||
|
||||
@ -518,7 +518,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
|
||||
episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
|
||||
|
||||
season_number = int_or_none(self._search_regex(
|
||||
r'(?s)<h4[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h4>\s*<h4>\s*Season (\d+)',
|
||||
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
|
||||
webpage, 'season number', default=None))
|
||||
|
||||
return {
|
||||
|
@ -1,6 +1,8 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
@ -12,8 +14,8 @@ from ..utils import (
|
||||
|
||||
|
||||
class DailyMailIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?dailymail\.co\.uk/video/[^/]+/video-(?P<id>[0-9]+)'
|
||||
_TEST = {
|
||||
_VALID_URL = r'https?://(?:www\.)?dailymail\.co\.uk/(?:video/[^/]+/video-|embed/video/)(?P<id>[0-9]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.dailymail.co.uk/video/tvshowbiz/video-1295863/The-Mountain-appears-sparkling-water-ad-Heavy-Bubbles.html',
|
||||
'md5': 'f6129624562251f628296c3a9ffde124',
|
||||
'info_dict': {
|
||||
@ -22,7 +24,16 @@ class DailyMailIE(InfoExtractor):
|
||||
'title': 'The Mountain appears in sparkling water ad for \'Heavy Bubbles\'',
|
||||
'description': 'md5:a93d74b6da172dd5dc4d973e0b766a84',
|
||||
}
|
||||
}
|
||||
}, {
|
||||
'url': 'http://www.dailymail.co.uk/embed/video/1295863.html',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def _extract_urls(webpage):
|
||||
return re.findall(
|
||||
r'<iframe\b[^>]+\bsrc=["\'](?P<url>(?:https?:)?//(?:www\.)?dailymail\.co\.uk/embed/video/\d+\.html)',
|
||||
webpage)
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
@ -147,7 +147,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
|
||||
view_count_str = self._search_regex(
|
||||
(r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([\s\d,.]+)"',
|
||||
r'video_views_count[^>]+>\s+([\s\d\,.]+)'),
|
||||
webpage, 'view count', fatal=False)
|
||||
webpage, 'view count', default=None)
|
||||
if view_count_str:
|
||||
view_count_str = re.sub(r'\s', '', view_count_str)
|
||||
view_count = str_to_int(view_count_str)
|
||||
@ -159,7 +159,9 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
|
||||
[r'buildPlayer\(({.+?})\);\n', # See https://github.com/rg3/youtube-dl/issues/7826
|
||||
r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
|
||||
r'buildPlayer\(({.+?})\);',
|
||||
r'var\s+config\s*=\s*({.+?});'],
|
||||
r'var\s+config\s*=\s*({.+?});',
|
||||
# New layout regex (see https://github.com/rg3/youtube-dl/issues/13580)
|
||||
r'__PLAYER_CONFIG__\s*=\s*({.+?});'],
|
||||
webpage, 'player v5', default=None)
|
||||
if player_v5:
|
||||
player = self._parse_json(player_v5, video_id)
|
||||
|
@ -13,7 +13,7 @@ from ..utils import (
|
||||
|
||||
|
||||
class DigitallySpeakingIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
|
||||
_VALID_URL = r'https?://(?:s?evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
|
||||
|
||||
_TESTS = [{
|
||||
# From http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface
|
||||
@ -28,6 +28,10 @@ class DigitallySpeakingIE(InfoExtractor):
|
||||
# From http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC
|
||||
'url': 'http://events.digitallyspeaking.com/gdc/sf11/xml/12396_1299111843500GMPX.xml',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# From http://www.gdcvault.com/play/1013700/Advanced-Material
|
||||
'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _parse_mp4(self, metadata):
|
||||
|
@ -7,16 +7,18 @@ import time
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_urlparse,
|
||||
compat_HTTPError,
|
||||
compat_str,
|
||||
compat_urlparse,
|
||||
)
|
||||
from ..utils import (
|
||||
USER_AGENTS,
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
unified_strdate,
|
||||
remove_end,
|
||||
try_get,
|
||||
unified_strdate,
|
||||
update_url_query,
|
||||
USER_AGENTS,
|
||||
)
|
||||
|
||||
|
||||
@ -183,28 +185,44 @@ class DPlayItIE(InfoExtractor):
|
||||
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
info_url = self._search_regex(
|
||||
r'url\s*:\s*["\']((?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)',
|
||||
webpage, 'video id')
|
||||
|
||||
title = remove_end(self._og_search_title(webpage), ' | Dplay')
|
||||
|
||||
try:
|
||||
info = self._download_json(
|
||||
info_url, display_id, headers={
|
||||
'Authorization': 'Bearer %s' % self._get_cookies(url).get(
|
||||
'dplayit_token').value,
|
||||
'Referer': url,
|
||||
})
|
||||
except ExtractorError as e:
|
||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 403):
|
||||
info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
|
||||
error = info['errors'][0]
|
||||
if error.get('code') == 'access.denied.geoblocked':
|
||||
self.raise_geo_restricted(
|
||||
msg=error.get('detail'), countries=self._GEO_COUNTRIES)
|
||||
raise ExtractorError(info['errors'][0]['detail'], expected=True)
|
||||
raise
|
||||
video_id = None
|
||||
|
||||
info = self._search_regex(
|
||||
r'playback_json\s*:\s*JSON\.parse\s*\(\s*("(?:\\.|[^"\\])+?")',
|
||||
webpage, 'playback JSON', default=None)
|
||||
if info:
|
||||
for _ in range(2):
|
||||
info = self._parse_json(info, display_id, fatal=False)
|
||||
if not info:
|
||||
break
|
||||
else:
|
||||
video_id = try_get(info, lambda x: x['data']['id'])
|
||||
|
||||
if not info:
|
||||
info_url = self._search_regex(
|
||||
r'url\s*[:=]\s*["\']((?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)',
|
||||
webpage, 'info url')
|
||||
|
||||
video_id = info_url.rpartition('/')[-1]
|
||||
|
||||
try:
|
||||
info = self._download_json(
|
||||
info_url, display_id, headers={
|
||||
'Authorization': 'Bearer %s' % self._get_cookies(url).get(
|
||||
'dplayit_token').value,
|
||||
'Referer': url,
|
||||
})
|
||||
except ExtractorError as e:
|
||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 403):
|
||||
info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
|
||||
error = info['errors'][0]
|
||||
if error.get('code') == 'access.denied.geoblocked':
|
||||
self.raise_geo_restricted(
|
||||
msg=error.get('detail'), countries=self._GEO_COUNTRIES)
|
||||
raise ExtractorError(info['errors'][0]['detail'], expected=True)
|
||||
raise
|
||||
|
||||
hls_url = info['data']['attributes']['streaming']['hls']['url']
|
||||
|
||||
@ -230,7 +248,7 @@ class DPlayItIE(InfoExtractor):
|
||||
season_number = episode_number = upload_date = None
|
||||
|
||||
return {
|
||||
'id': info_url.rpartition('/')[-1],
|
||||
'id': compat_str(video_id or display_id),
|
||||
'display_id': display_id,
|
||||
'title': title,
|
||||
'description': self._og_search_description(webpage),
|
||||
|
@ -12,6 +12,7 @@ from ..utils import (
|
||||
ExtractorError,
|
||||
clean_html,
|
||||
int_or_none,
|
||||
remove_end,
|
||||
sanitized_Request,
|
||||
urlencode_postdata
|
||||
)
|
||||
@ -72,15 +73,15 @@ class DramaFeverIE(DramaFeverBaseIE):
|
||||
'url': 'http://www.dramafever.com/drama/4512/1/Cooking_with_Shin/',
|
||||
'info_dict': {
|
||||
'id': '4512.1',
|
||||
'ext': 'mp4',
|
||||
'title': 'Cooking with Shin 4512.1',
|
||||
'ext': 'flv',
|
||||
'title': 'Cooking with Shin',
|
||||
'description': 'md5:a8eec7942e1664a6896fcd5e1287bfd0',
|
||||
'episode': 'Episode 1',
|
||||
'episode_number': 1,
|
||||
'thumbnail': r're:^https?://.*\.jpg',
|
||||
'timestamp': 1404336058,
|
||||
'upload_date': '20140702',
|
||||
'duration': 343,
|
||||
'duration': 344,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
@ -90,15 +91,15 @@ class DramaFeverIE(DramaFeverBaseIE):
|
||||
'url': 'http://www.dramafever.com/drama/4826/4/Mnet_Asian_Music_Awards_2015/?ap=1',
|
||||
'info_dict': {
|
||||
'id': '4826.4',
|
||||
'ext': 'mp4',
|
||||
'title': 'Mnet Asian Music Awards 2015 4826.4',
|
||||
'ext': 'flv',
|
||||
'title': 'Mnet Asian Music Awards 2015',
|
||||
'description': 'md5:3ff2ee8fedaef86e076791c909cf2e91',
|
||||
'episode': 'Mnet Asian Music Awards 2015 - Part 3',
|
||||
'episode_number': 4,
|
||||
'thumbnail': r're:^https?://.*\.jpg',
|
||||
'timestamp': 1450213200,
|
||||
'upload_date': '20151215',
|
||||
'duration': 5602,
|
||||
'duration': 5359,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
@ -122,6 +123,10 @@ class DramaFeverIE(DramaFeverBaseIE):
|
||||
countries=self._GEO_COUNTRIES)
|
||||
raise
|
||||
|
||||
# title is postfixed with video id for some reason, removing
|
||||
if info.get('title'):
|
||||
info['title'] = remove_end(info['title'], video_id).strip()
|
||||
|
||||
series_id, episode_number = video_id.split('.')
|
||||
episode_info = self._download_json(
|
||||
# We only need a single episode info, so restricting page size to one episode
|
||||
|
@ -118,7 +118,7 @@ class DRTVIE(InfoExtractor):
|
||||
if target == 'HDS':
|
||||
f4m_formats = self._extract_f4m_formats(
|
||||
uri + '?hdcore=3.3.0&plugin=aasp-3.3.0.99.43',
|
||||
video_id, preference, f4m_id=format_id)
|
||||
video_id, preference, f4m_id=format_id, fatal=False)
|
||||
if kind == 'AudioResource':
|
||||
for f in f4m_formats:
|
||||
f['vcodec'] = 'none'
|
||||
@ -126,7 +126,8 @@ class DRTVIE(InfoExtractor):
|
||||
elif target == 'HLS':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
uri, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||
preference=preference, m3u8_id=format_id))
|
||||
preference=preference, m3u8_id=format_id,
|
||||
fatal=False))
|
||||
else:
|
||||
bitrate = link.get('Bitrate')
|
||||
if bitrate:
|
||||
|
@ -11,6 +11,7 @@ from ..compat import (
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
unsmuggle_url,
|
||||
)
|
||||
|
||||
|
||||
@ -50,6 +51,10 @@ class EaglePlatformIE(InfoExtractor):
|
||||
'view_count': int,
|
||||
},
|
||||
'skip': 'Georestricted',
|
||||
}, {
|
||||
# referrer protected video (https://tvrain.ru/lite/teleshow/kak_vse_nachinalos/namin-418921/)
|
||||
'url': 'eagleplatform:tvrainru.media.eagleplatform.com:582306',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
@ -60,16 +65,40 @@ class EaglePlatformIE(InfoExtractor):
|
||||
webpage)
|
||||
if mobj is not None:
|
||||
return mobj.group('url')
|
||||
# Basic usage embedding (see http://dultonmedia.github.io/eplayer/)
|
||||
PLAYER_JS_RE = r'''
|
||||
<script[^>]+
|
||||
src=(?P<qjs>["\'])(?:https?:)?//(?P<host>(?:(?!(?P=qjs)).)+\.media\.eagleplatform\.com)/player/player\.js(?P=qjs)
|
||||
.+?
|
||||
'''
|
||||
# "Basic usage" embedding (see http://dultonmedia.github.io/eplayer/)
|
||||
mobj = re.search(
|
||||
r'''(?xs)
|
||||
<script[^>]+
|
||||
src=(?P<q1>["\'])(?:https?:)?//(?P<host>.+?\.media\.eagleplatform\.com)/player/player\.js(?P=q1)
|
||||
.+?
|
||||
%s
|
||||
<div[^>]+
|
||||
class=(?P<q2>["\'])eagleplayer(?P=q2)[^>]+
|
||||
class=(?P<qclass>["\'])eagleplayer(?P=qclass)[^>]+
|
||||
data-id=["\'](?P<id>\d+)
|
||||
''', webpage)
|
||||
''' % PLAYER_JS_RE, webpage)
|
||||
if mobj is not None:
|
||||
return 'eagleplatform:%(host)s:%(id)s' % mobj.groupdict()
|
||||
# Generalization of "Javascript code usage", "Combined usage" and
|
||||
# "Usage without attaching to DOM" embeddings (see
|
||||
# http://dultonmedia.github.io/eplayer/)
|
||||
mobj = re.search(
|
||||
r'''(?xs)
|
||||
%s
|
||||
<script>
|
||||
.+?
|
||||
new\s+EaglePlayer\(
|
||||
(?:[^,]+\s*,\s*)?
|
||||
{
|
||||
.+?
|
||||
\bid\s*:\s*["\']?(?P<id>\d+)
|
||||
.+?
|
||||
}
|
||||
\s*\)
|
||||
.+?
|
||||
</script>
|
||||
''' % PLAYER_JS_RE, webpage)
|
||||
if mobj is not None:
|
||||
return 'eagleplatform:%(host)s:%(id)s' % mobj.groupdict()
|
||||
|
||||
@ -79,9 +108,10 @@ class EaglePlatformIE(InfoExtractor):
|
||||
if status != 200:
|
||||
raise ExtractorError(' '.join(response['errors']), expected=True)
|
||||
|
||||
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', *args, **kwargs):
|
||||
def _download_json(self, url_or_request, video_id, *args, **kwargs):
|
||||
try:
|
||||
response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
|
||||
response = super(EaglePlatformIE, self)._download_json(
|
||||
url_or_request, video_id, *args, **kwargs)
|
||||
except ExtractorError as ee:
|
||||
if isinstance(ee.cause, compat_HTTPError):
|
||||
response = self._parse_json(ee.cause.read().decode('utf-8'), video_id)
|
||||
@ -93,11 +123,24 @@ class EaglePlatformIE(InfoExtractor):
|
||||
return self._download_json(url_or_request, video_id, note)['data'][0]
|
||||
|
||||
def _real_extract(self, url):
|
||||
url, smuggled_data = unsmuggle_url(url, {})
|
||||
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
host, video_id = mobj.group('custom_host') or mobj.group('host'), mobj.group('id')
|
||||
|
||||
headers = {}
|
||||
query = {
|
||||
'id': video_id,
|
||||
}
|
||||
|
||||
referrer = smuggled_data.get('referrer')
|
||||
if referrer:
|
||||
headers['Referer'] = referrer
|
||||
query['referrer'] = referrer
|
||||
|
||||
player_data = self._download_json(
|
||||
'http://%s/api/player_data?id=%s' % (host, video_id), video_id)
|
||||
'http://%s/api/player_data' % host, video_id,
|
||||
headers=headers, query=query)
|
||||
|
||||
media = player_data['data']['playlist']['viewports'][0]['medialist'][0]
|
||||
|
||||
|
@ -1,15 +1,18 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
try_get,
|
||||
unified_timestamp,
|
||||
)
|
||||
|
||||
|
||||
class EggheadCourseIE(InfoExtractor):
|
||||
IE_DESC = 'egghead.io course'
|
||||
IE_NAME = 'egghead:course'
|
||||
_VALID_URL = r'https://egghead\.io/courses/(?P<id>[a-zA-Z_0-9-]+)'
|
||||
_VALID_URL = r'https://egghead\.io/courses/(?P<id>[^/?#&]+)'
|
||||
_TEST = {
|
||||
'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript',
|
||||
'playlist_count': 29,
|
||||
@ -22,18 +25,60 @@ class EggheadCourseIE(InfoExtractor):
|
||||
|
||||
def _real_extract(self, url):
|
||||
playlist_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, playlist_id)
|
||||
|
||||
title = self._html_search_regex(r'<h1 class="title">([^<]+)</h1>', webpage, 'title')
|
||||
ul = self._search_regex(r'(?s)<ul class="series-lessons-list">(.*?)</ul>', webpage, 'session list')
|
||||
course = self._download_json(
|
||||
'https://egghead.io/api/v1/series/%s' % playlist_id, playlist_id)
|
||||
|
||||
found = re.findall(r'(?s)<a class="[^"]*"\s*href="([^"]+)">\s*<li class="item', ul)
|
||||
entries = [self.url_result(m) for m in found]
|
||||
entries = [
|
||||
self.url_result(
|
||||
'wistia:%s' % lesson['wistia_id'], ie='Wistia',
|
||||
video_id=lesson['wistia_id'], video_title=lesson.get('title'))
|
||||
for lesson in course['lessons'] if lesson.get('wistia_id')]
|
||||
|
||||
return self.playlist_result(
|
||||
entries, playlist_id, course.get('title'),
|
||||
course.get('description'))
|
||||
|
||||
|
||||
class EggheadLessonIE(InfoExtractor):
|
||||
IE_DESC = 'egghead.io lesson'
|
||||
IE_NAME = 'egghead:lesson'
|
||||
_VALID_URL = r'https://egghead\.io/lessons/(?P<id>[^/?#&]+)'
|
||||
_TEST = {
|
||||
'url': 'https://egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
|
||||
'info_dict': {
|
||||
'id': 'fv5yotjxcg',
|
||||
'ext': 'mp4',
|
||||
'title': 'Create linear data flow with container style types (Box)',
|
||||
'description': 'md5:9aa2cdb6f9878ed4c39ec09e85a8150e',
|
||||
'thumbnail': r're:^https?:.*\.jpg$',
|
||||
'timestamp': 1481296768,
|
||||
'upload_date': '20161209',
|
||||
'duration': 304,
|
||||
'view_count': 0,
|
||||
'tags': ['javascript', 'free'],
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
lesson_id = self._match_id(url)
|
||||
|
||||
lesson = self._download_json(
|
||||
'https://egghead.io/api/v1/lessons/%s' % lesson_id, lesson_id)
|
||||
|
||||
return {
|
||||
'_type': 'playlist',
|
||||
'id': playlist_id,
|
||||
'title': title,
|
||||
'description': self._og_search_description(webpage),
|
||||
'entries': entries,
|
||||
'_type': 'url_transparent',
|
||||
'ie_key': 'Wistia',
|
||||
'url': 'wistia:%s' % lesson['wistia_id'],
|
||||
'id': lesson['wistia_id'],
|
||||
'title': lesson.get('title'),
|
||||
'description': lesson.get('summary'),
|
||||
'thumbnail': lesson.get('thumb_nail'),
|
||||
'timestamp': unified_timestamp(lesson.get('published_at')),
|
||||
'duration': int_or_none(lesson.get('duration')),
|
||||
'view_count': int_or_none(lesson.get('plays_count')),
|
||||
'tags': try_get(lesson, lambda x: x['tag_list'], list),
|
||||
}
|
||||
|
@ -10,7 +10,25 @@ from ..utils import (
|
||||
|
||||
|
||||
class ESPNIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:espn\.go|(?:www\.)?espn)\.com/video/clip(?:\?.*?\bid=|/_/id/)(?P<id>\d+)'
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://
|
||||
(?:
|
||||
(?:(?:\w+\.)+)?espn\.go|
|
||||
(?:www\.)?espn
|
||||
)\.com/
|
||||
(?:
|
||||
(?:
|
||||
video/clip|
|
||||
watch/player
|
||||
)
|
||||
(?:
|
||||
\?.*?\bid=|
|
||||
/_/id/
|
||||
)
|
||||
)
|
||||
(?P<id>\d+)
|
||||
'''
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://espn.go.com/video/clip?id=10365079',
|
||||
'info_dict': {
|
||||
@ -25,20 +43,34 @@ class ESPNIE(InfoExtractor):
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
# intl video, from http://www.espnfc.us/video/mls-highlights/150/video/2743663/must-see-moments-best-of-the-mls-season
|
||||
'url': 'http://espn.go.com/video/clip?id=2743663',
|
||||
'url': 'https://broadband.espn.go.com/video/clip?id=18910086',
|
||||
'info_dict': {
|
||||
'id': '2743663',
|
||||
'id': '18910086',
|
||||
'ext': 'mp4',
|
||||
'title': 'Must-See Moments: Best of the MLS season',
|
||||
'description': 'md5:4c2d7232beaea572632bec41004f0aeb',
|
||||
'timestamp': 1449446454,
|
||||
'upload_date': '20151207',
|
||||
'title': 'Kyrie spins around defender for two',
|
||||
'description': 'md5:2b0f5bae9616d26fba8808350f0d2b9b',
|
||||
'timestamp': 1489539155,
|
||||
'upload_date': '20170315',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'expected_warnings': ['Unable to download f4m manifest'],
|
||||
}, {
|
||||
'url': 'http://nonredline.sports.espn.go.com/video/clip?id=19744672',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://cdn.espn.go.com/video/clip/_/id/19771774',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.espn.com/watch/player?id=19141491',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.espn.com/watch/player?bucketId=257&id=19505875',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.espn.com/watch/player/_/id/19141491',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.espn.com/video/clip?id=10365079',
|
||||
'only_matching': True,
|
||||
|
@ -185,7 +185,7 @@ from .chirbit import (
|
||||
ChirbitProfileIE,
|
||||
)
|
||||
from .cinchcast import CinchcastIE
|
||||
from .clipfish import ClipfishIE
|
||||
from .cjsw import CJSWIE
|
||||
from .cliphunter import CliphunterIE
|
||||
from .cliprs import ClipRsIE
|
||||
from .clipsyndicate import ClipsyndicateIE
|
||||
@ -297,7 +297,10 @@ from .dw import (
|
||||
from .eagleplatform import EaglePlatformIE
|
||||
from .ebaumsworld import EbaumsWorldIE
|
||||
from .echomsk import EchoMskIE
|
||||
from .egghead import EggheadCourseIE
|
||||
from .egghead import (
|
||||
EggheadCourseIE,
|
||||
EggheadLessonIE,
|
||||
)
|
||||
from .ehow import EHowIE
|
||||
from .eighttracks import EightTracksIE
|
||||
from .einthusan import EinthusanIE
|
||||
@ -347,7 +350,12 @@ from .flipagram import FlipagramIE
|
||||
from .folketinget import FolketingetIE
|
||||
from .footyroom import FootyRoomIE
|
||||
from .formula1 import Formula1IE
|
||||
from .fourtube import FourTubeIE
|
||||
from .fourtube import (
|
||||
FourTubeIE,
|
||||
PornTubeIE,
|
||||
PornerBrosIE,
|
||||
FuxIE,
|
||||
)
|
||||
from .fox import FOXIE
|
||||
from .fox9 import FOX9IE
|
||||
from .foxgay import FoxgayIE
|
||||
@ -469,6 +477,7 @@ from .jamendo import (
|
||||
)
|
||||
from .jeuxvideo import JeuxVideoIE
|
||||
from .jove import JoveIE
|
||||
from .joj import JojIE
|
||||
from .jwplatform import JWPlatformIE
|
||||
from .jpopsukitv import JpopsukiIE
|
||||
from .kaltura import KalturaIE
|
||||
@ -553,6 +562,7 @@ from .matchtv import MatchTVIE
|
||||
from .mdr import MDRIE
|
||||
from .mediaset import MediasetIE
|
||||
from .medici import MediciIE
|
||||
from .megaphone import MegaphoneIE
|
||||
from .meipai import MeipaiIE
|
||||
from .melonvod import MelonVODIE
|
||||
from .meta import METAIE
|
||||
@ -579,7 +589,6 @@ from .mixcloud import (
|
||||
)
|
||||
from .mlb import MLBIE
|
||||
from .mnet import MnetIE
|
||||
from .mpora import MporaIE
|
||||
from .moevideo import MoeVideoIE
|
||||
from .mofosex import MofosexIE
|
||||
from .mojvideo import MojvideoIE
|
||||
@ -651,6 +660,10 @@ from .nextmedia import (
|
||||
AppleDailyIE,
|
||||
NextTVIE,
|
||||
)
|
||||
from .nexx import (
|
||||
NexxIE,
|
||||
NexxEmbedIE,
|
||||
)
|
||||
from .nfb import NFBIE
|
||||
from .nfl import NFLIE
|
||||
from .nhk import NhkVodIE
|
||||
@ -664,6 +677,7 @@ from .nick import (
|
||||
NickIE,
|
||||
NickDeIE,
|
||||
NickNightIE,
|
||||
NickRuIE,
|
||||
)
|
||||
from .niconico import NiconicoIE, NiconicoPlaylistIE
|
||||
from .ninecninemedia import (
|
||||
@ -759,6 +773,7 @@ from .pandoratv import PandoraTVIE
|
||||
from .parliamentliveuk import ParliamentLiveUKIE
|
||||
from .patreon import PatreonIE
|
||||
from .pbs import PBSIE
|
||||
from .pearvideo import PearVideoIE
|
||||
from .people import PeopleIE
|
||||
from .periscope import (
|
||||
PeriscopeIE,
|
||||
@ -824,11 +839,16 @@ from .radiobremen import RadioBremenIE
|
||||
from .radiofrance import RadioFranceIE
|
||||
from .rai import (
|
||||
RaiPlayIE,
|
||||
RaiPlayLiveIE,
|
||||
RaiIE,
|
||||
)
|
||||
from .rbmaradio import RBMARadioIE
|
||||
from .rds import RDSIE
|
||||
from .redbulltv import RedBullTVIE
|
||||
from .reddit import (
|
||||
RedditIE,
|
||||
RedditRIE,
|
||||
)
|
||||
from .redtube import RedTubeIE
|
||||
from .regiotv import RegioTVIE
|
||||
from .rentv import (
|
||||
@ -922,8 +942,9 @@ from .soundcloud import (
|
||||
SoundcloudIE,
|
||||
SoundcloudSetIE,
|
||||
SoundcloudUserIE,
|
||||
SoundcloudTrackStationIE,
|
||||
SoundcloudPlaylistIE,
|
||||
SoundcloudSearchIE
|
||||
SoundcloudSearchIE,
|
||||
)
|
||||
from .soundgasm import (
|
||||
SoundgasmIE,
|
||||
@ -972,6 +993,7 @@ from .tagesschau import (
|
||||
TagesschauIE,
|
||||
)
|
||||
from .tass import TassIE
|
||||
from .tastytrade import TastyTradeIE
|
||||
from .tbs import TBSIE
|
||||
from .tdslifeway import TDSLifewayIE
|
||||
from .teachertube import (
|
||||
@ -980,7 +1002,6 @@ from .teachertube import (
|
||||
)
|
||||
from .teachingchannel import TeachingChannelIE
|
||||
from .teamcoco import TeamcocoIE
|
||||
from .teamfourstar import TeamFourStarIE
|
||||
from .techtalks import TechTalksIE
|
||||
from .ted import TEDIE
|
||||
from .tele13 import Tele13IE
|
||||
@ -1202,12 +1223,14 @@ from .vk import (
|
||||
)
|
||||
from .vlive import (
|
||||
VLiveIE,
|
||||
VLiveChannelIE
|
||||
VLiveChannelIE,
|
||||
VLivePlaylistIE
|
||||
)
|
||||
from .vodlocker import VodlockerIE
|
||||
from .vodpl import VODPlIE
|
||||
from .vodplatform import VODPlatformIE
|
||||
from .voicerepublic import VoiceRepublicIE
|
||||
from .voot import VootIE
|
||||
from .voxmedia import VoxMediaIE
|
||||
from .vporn import VpornIE
|
||||
from .vrt import VRTIE
|
||||
@ -1229,6 +1252,7 @@ from .washingtonpost import (
|
||||
WashingtonPostArticleIE,
|
||||
)
|
||||
from .wat import WatIE
|
||||
from .watchbox import WatchBoxIE
|
||||
from .watchindianporn import WatchIndianPornIE
|
||||
from .wdr import (
|
||||
WDRIE,
|
||||
@ -1278,12 +1302,12 @@ from .yahoo import (
|
||||
YahooIE,
|
||||
YahooSearchIE,
|
||||
)
|
||||
from .yam import YamIE
|
||||
from .yandexmusic import (
|
||||
YandexMusicTrackIE,
|
||||
YandexMusicAlbumIE,
|
||||
YandexMusicPlaylistIE,
|
||||
)
|
||||
from .yandexdisk import YandexDiskIE
|
||||
from .yesjapan import YesJapanIE
|
||||
from .yinyuetai import YinYueTaiIE
|
||||
from .ynet import YnetIE
|
||||
|
@ -203,19 +203,19 @@ class FacebookIE(InfoExtractor):
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def _extract_url(webpage):
|
||||
mobj = re.search(
|
||||
r'<iframe[^>]+?src=(["\'])(?P<url>https://www\.facebook\.com/video/embed.+?)\1', webpage)
|
||||
if mobj is not None:
|
||||
return mobj.group('url')
|
||||
|
||||
def _extract_urls(webpage):
|
||||
urls = []
|
||||
for mobj in re.finditer(
|
||||
r'<iframe[^>]+?src=(["\'])(?P<url>https?://www\.facebook\.com/(?:video/embed|plugins/video\.php).+?)\1',
|
||||
webpage):
|
||||
urls.append(mobj.group('url'))
|
||||
# Facebook API embed
|
||||
# see https://developers.facebook.com/docs/plugins/embedded-video-player
|
||||
mobj = re.search(r'''(?x)<div[^>]+
|
||||
for mobj in re.finditer(r'''(?x)<div[^>]+
|
||||
class=(?P<q1>[\'"])[^\'"]*\bfb-(?:video|post)\b[^\'"]*(?P=q1)[^>]+
|
||||
data-href=(?P<q2>[\'"])(?P<url>(?:https?:)?//(?:www\.)?facebook.com/.+?)(?P=q2)''', webpage)
|
||||
if mobj is not None:
|
||||
return mobj.group('url')
|
||||
data-href=(?P<q2>[\'"])(?P<url>(?:https?:)?//(?:www\.)?facebook.com/.+?)(?P=q2)''', webpage):
|
||||
urls.append(mobj.group('url'))
|
||||
return urls
|
||||
|
||||
def _login(self):
|
||||
(useremail, password) = self._get_login_info()
|
||||
|
@ -43,7 +43,7 @@ class FiveTVIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'id': 'glavnoe',
|
||||
'ext': 'mp4',
|
||||
'title': 'Итоги недели с 8 по 14 июня 2015 года',
|
||||
'title': r're:^Итоги недели с \d+ по \d+ \w+ \d{4} года$',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
},
|
||||
}, {
|
||||
@ -70,7 +70,8 @@ class FiveTVIE(InfoExtractor):
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
video_url = self._search_regex(
|
||||
r'<a[^>]+?href="([^"]+)"[^>]+?class="videoplayer"',
|
||||
[r'<div[^>]+?class="flowplayer[^>]+?data-href="([^"]+)"',
|
||||
r'<a[^>]+?href="([^"]+)"[^>]+?class="videoplayer"'],
|
||||
webpage, 'video url')
|
||||
|
||||
title = self._og_search_title(webpage, default=None) or self._search_regex(
|
||||
|
@ -3,39 +3,22 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
parse_duration,
|
||||
parse_iso8601,
|
||||
sanitized_Request,
|
||||
str_to_int,
|
||||
)
|
||||
|
||||
|
||||
class FourTubeIE(InfoExtractor):
|
||||
IE_NAME = '4tube'
|
||||
_VALID_URL = r'https?://(?:www\.)?4tube\.com/videos/(?P<id>\d+)'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://www.4tube.com/videos/209733/hot-babe-holly-michaels-gets-her-ass-stuffed-by-black',
|
||||
'md5': '6516c8ac63b03de06bc8eac14362db4f',
|
||||
'info_dict': {
|
||||
'id': '209733',
|
||||
'ext': 'mp4',
|
||||
'title': 'Hot Babe Holly Michaels gets her ass stuffed by black',
|
||||
'uploader': 'WCP Club',
|
||||
'uploader_id': 'wcp-club',
|
||||
'upload_date': '20131031',
|
||||
'timestamp': 1383263892,
|
||||
'duration': 583,
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
'categories': list,
|
||||
'age_limit': 18,
|
||||
}
|
||||
}
|
||||
|
||||
class FourTubeBaseIE(InfoExtractor):
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
kind, video_id, display_id = mobj.group('kind', 'id', 'display_id')
|
||||
|
||||
if kind == 'm' or not display_id:
|
||||
url = self._URL_TEMPLATE % video_id
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
title = self._html_search_meta('name', webpage)
|
||||
@ -43,10 +26,10 @@ class FourTubeIE(InfoExtractor):
|
||||
'uploadDate', webpage))
|
||||
thumbnail = self._html_search_meta('thumbnailUrl', webpage)
|
||||
uploader_id = self._html_search_regex(
|
||||
r'<a class="item-to-subscribe" href="[^"]+/channels/([^/"]+)" title="Go to [^"]+ page">',
|
||||
r'<a class="item-to-subscribe" href="[^"]+/(?:channel|user)s?/([^/"]+)" title="Go to [^"]+ page">',
|
||||
webpage, 'uploader id', fatal=False)
|
||||
uploader = self._html_search_regex(
|
||||
r'<a class="item-to-subscribe" href="[^"]+/channels/[^/"]+" title="Go to ([^"]+) page">',
|
||||
r'<a class="item-to-subscribe" href="[^"]+/(?:channel|user)s?/[^/"]+" title="Go to ([^"]+) page">',
|
||||
webpage, 'uploader', fatal=False)
|
||||
|
||||
categories_html = self._search_regex(
|
||||
@ -60,10 +43,10 @@ class FourTubeIE(InfoExtractor):
|
||||
|
||||
view_count = str_to_int(self._search_regex(
|
||||
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([0-9,]+)">',
|
||||
webpage, 'view count', fatal=False))
|
||||
webpage, 'view count', default=None))
|
||||
like_count = str_to_int(self._search_regex(
|
||||
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserLikes:([0-9,]+)">',
|
||||
webpage, 'like count', fatal=False))
|
||||
webpage, 'like count', default=None))
|
||||
duration = parse_duration(self._html_search_meta('duration', webpage))
|
||||
|
||||
media_id = self._search_regex(
|
||||
@ -87,12 +70,12 @@ class FourTubeIE(InfoExtractor):
|
||||
|
||||
token_url = 'https://tkn.kodicdn.com/{0}/desktop/{1}'.format(
|
||||
media_id, '+'.join(sources))
|
||||
headers = {
|
||||
b'Content-Type': b'application/x-www-form-urlencoded',
|
||||
b'Origin': b'https://www.4tube.com',
|
||||
}
|
||||
token_req = sanitized_Request(token_url, b'{}', headers)
|
||||
tokens = self._download_json(token_req, video_id)
|
||||
|
||||
parsed_url = compat_urlparse.urlparse(url)
|
||||
tokens = self._download_json(token_url, video_id, data=b'', headers={
|
||||
'Origin': '%s://%s' % (parsed_url.scheme, parsed_url.hostname),
|
||||
'Referer': url,
|
||||
})
|
||||
formats = [{
|
||||
'url': tokens[format]['token'],
|
||||
'format_id': format + 'p',
|
||||
@ -115,3 +98,126 @@ class FourTubeIE(InfoExtractor):
|
||||
'duration': duration,
|
||||
'age_limit': 18,
|
||||
}
|
||||
|
||||
|
||||
class FourTubeIE(FourTubeBaseIE):
|
||||
IE_NAME = '4tube'
|
||||
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?4tube\.com/(?:videos|embed)/(?P<id>\d+)(?:/(?P<display_id>[^/?#&]+))?'
|
||||
_URL_TEMPLATE = 'https://www.4tube.com/videos/%s/video'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.4tube.com/videos/209733/hot-babe-holly-michaels-gets-her-ass-stuffed-by-black',
|
||||
'md5': '6516c8ac63b03de06bc8eac14362db4f',
|
||||
'info_dict': {
|
||||
'id': '209733',
|
||||
'ext': 'mp4',
|
||||
'title': 'Hot Babe Holly Michaels gets her ass stuffed by black',
|
||||
'uploader': 'WCP Club',
|
||||
'uploader_id': 'wcp-club',
|
||||
'upload_date': '20131031',
|
||||
'timestamp': 1383263892,
|
||||
'duration': 583,
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
'categories': list,
|
||||
'age_limit': 18,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.4tube.com/embed/209733',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://m.4tube.com/videos/209733/hot-babe-holly-michaels-gets-her-ass-stuffed-by-black',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
|
||||
class FuxIE(FourTubeBaseIE):
|
||||
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?fux\.com/(?:video|embed)/(?P<id>\d+)(?:/(?P<display_id>[^/?#&]+))?'
|
||||
_URL_TEMPLATE = 'https://www.fux.com/video/%s/video'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.fux.com/video/195359/awesome-fucking-kitchen-ends-cum-swallow',
|
||||
'info_dict': {
|
||||
'id': '195359',
|
||||
'ext': 'mp4',
|
||||
'title': 'Awesome fucking in the kitchen ends with cum swallow',
|
||||
'uploader': 'alenci2342',
|
||||
'uploader_id': 'alenci2342',
|
||||
'upload_date': '20131230',
|
||||
'timestamp': 1388361660,
|
||||
'duration': 289,
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
'categories': list,
|
||||
'age_limit': 18,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.fux.com/embed/195359',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.fux.com/video/195359/awesome-fucking-kitchen-ends-cum-swallow',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
|
||||
class PornTubeIE(FourTubeBaseIE):
|
||||
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?porntube\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'
|
||||
_URL_TEMPLATE = 'https://www.porntube.com/videos/video_%s'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.porntube.com/videos/teen-couple-doing-anal_7089759',
|
||||
'info_dict': {
|
||||
'id': '7089759',
|
||||
'ext': 'mp4',
|
||||
'title': 'Teen couple doing anal',
|
||||
'uploader': 'Alexy',
|
||||
'uploader_id': 'Alexy',
|
||||
'upload_date': '20150606',
|
||||
'timestamp': 1433595647,
|
||||
'duration': 5052,
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
'categories': list,
|
||||
'age_limit': 18,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.porntube.com/embed/7089759',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://m.porntube.com/videos/teen-couple-doing-anal_7089759',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
|
||||
class PornerBrosIE(FourTubeBaseIE):
|
||||
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?pornerbros\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'
|
||||
_URL_TEMPLATE = 'https://www.pornerbros.com/videos/video_%s'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.pornerbros.com/videos/skinny-brunette-takes-big-cock-down-her-anal-hole_181369',
|
||||
'md5': '6516c8ac63b03de06bc8eac14362db4f',
|
||||
'info_dict': {
|
||||
'id': '181369',
|
||||
'ext': 'mp4',
|
||||
'title': 'Skinny brunette takes big cock down her anal hole',
|
||||
'uploader': 'PornerBros HD',
|
||||
'uploader_id': 'pornerbros-hd',
|
||||
'upload_date': '20130130',
|
||||
'timestamp': 1359527401,
|
||||
'duration': 1224,
|
||||
'view_count': int,
|
||||
'categories': list,
|
||||
'age_limit': 18,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.pornerbros.com/embed/181369',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://m.pornerbros.com/videos/skinny-brunette-takes-big-cock-down-her-anal-hole_181369',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
@ -1,10 +1,14 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import json
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import ExtractorError
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
unified_timestamp,
|
||||
)
|
||||
|
||||
|
||||
class FunnyOrDieIE(InfoExtractor):
|
||||
@ -18,6 +22,10 @@ class FunnyOrDieIE(InfoExtractor):
|
||||
'title': 'Heart-Shaped Box: Literal Video Version',
|
||||
'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338',
|
||||
'thumbnail': r're:^http:.*\.jpg$',
|
||||
'uploader': 'DASjr',
|
||||
'timestamp': 1317904928,
|
||||
'upload_date': '20111006',
|
||||
'duration': 318.3,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.funnyordie.com/embed/e402820827',
|
||||
@ -27,6 +35,8 @@ class FunnyOrDieIE(InfoExtractor):
|
||||
'title': 'Please Use This Song (Jon Lajoie)',
|
||||
'description': 'Please use this to sell something. www.jonlajoie.com',
|
||||
'thumbnail': r're:^http:.*\.jpg$',
|
||||
'timestamp': 1398988800,
|
||||
'upload_date': '20140502',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
@ -100,15 +110,53 @@ class FunnyOrDieIE(InfoExtractor):
|
||||
'url': 'http://www.funnyordie.com%s' % src,
|
||||
}]
|
||||
|
||||
post_json = self._search_regex(
|
||||
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details')
|
||||
post = json.loads(post_json)
|
||||
timestamp = unified_timestamp(self._html_search_meta(
|
||||
'uploadDate', webpage, 'timestamp', default=None))
|
||||
|
||||
uploader = self._html_search_regex(
|
||||
r'<h\d[^>]+\bclass=["\']channel-preview-name[^>]+>(.+?)</h',
|
||||
webpage, 'uploader', default=None)
|
||||
|
||||
title, description, thumbnail, duration = [None] * 4
|
||||
|
||||
medium = self._parse_json(
|
||||
self._search_regex(
|
||||
r'jsonMedium\s*=\s*({.+?});', webpage, 'JSON medium',
|
||||
default='{}'),
|
||||
video_id, fatal=False)
|
||||
if medium:
|
||||
title = medium.get('title')
|
||||
duration = float_or_none(medium.get('duration'))
|
||||
if not timestamp:
|
||||
timestamp = unified_timestamp(medium.get('publishDate'))
|
||||
|
||||
post = self._parse_json(
|
||||
self._search_regex(
|
||||
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details',
|
||||
default='{}'),
|
||||
video_id, fatal=False)
|
||||
if post:
|
||||
if not title:
|
||||
title = post.get('name')
|
||||
description = post.get('description')
|
||||
thumbnail = post.get('picture')
|
||||
|
||||
if not title:
|
||||
title = self._og_search_title(webpage)
|
||||
if not description:
|
||||
description = self._og_search_description(webpage)
|
||||
if not duration:
|
||||
duration = int_or_none(self._html_search_meta(
|
||||
('video:duration', 'duration'), webpage, 'duration', default=False))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': post['name'],
|
||||
'description': post.get('description'),
|
||||
'thumbnail': post.get('picture'),
|
||||
'title': title,
|
||||
'description': description,
|
||||
'thumbnail': thumbnail,
|
||||
'uploader': uploader,
|
||||
'timestamp': timestamp,
|
||||
'duration': duration,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
|
@ -36,6 +36,10 @@ from .brightcove import (
|
||||
BrightcoveLegacyIE,
|
||||
BrightcoveNewIE,
|
||||
)
|
||||
from .nexx import (
|
||||
NexxIE,
|
||||
NexxEmbedIE,
|
||||
)
|
||||
from .nbc import NBCSportsVPlayerIE
|
||||
from .ooyala import OoyalaIE
|
||||
from .rutv import RUTVIE
|
||||
@ -57,6 +61,7 @@ from .dailymotion import (
|
||||
DailymotionIE,
|
||||
DailymotionCloudIE,
|
||||
)
|
||||
from .dailymail import DailyMailIE
|
||||
from .onionstudios import OnionStudiosIE
|
||||
from .viewlift import ViewLiftEmbedIE
|
||||
from .mtv import MTVServicesEmbeddedIE
|
||||
@ -91,6 +96,9 @@ from .anvato import AnvatoIE
|
||||
from .washingtonpost import WashingtonPostIE
|
||||
from .wistia import WistiaIE
|
||||
from .mediaset import MediasetIE
|
||||
from .joj import JojIE
|
||||
from .megaphone import MegaphoneIE
|
||||
from .vzaar import VzaarIE
|
||||
|
||||
|
||||
class GenericIE(InfoExtractor):
|
||||
@ -568,6 +576,19 @@ class GenericIE(InfoExtractor):
|
||||
},
|
||||
'skip': 'movie expired',
|
||||
},
|
||||
# ooyala video embedded with http://player.ooyala.com/static/v4/production/latest/core.min.js
|
||||
{
|
||||
'url': 'http://wnep.com/2017/07/22/steampunk-fest-comes-to-honesdale/',
|
||||
'info_dict': {
|
||||
'id': 'lwYWYxYzE6V5uJMjNGyKtwwiw9ZJD7t2',
|
||||
'ext': 'mp4',
|
||||
'title': 'Steampunk Fest Comes to Honesdale',
|
||||
'duration': 43.276,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
}
|
||||
},
|
||||
# embed.ly video
|
||||
{
|
||||
'url': 'http://www.tested.com/science/weird/460206-tested-grinding-coffee-2000-frames-second/',
|
||||
@ -759,6 +780,20 @@ class GenericIE(InfoExtractor):
|
||||
},
|
||||
'add_ie': ['Dailymotion'],
|
||||
},
|
||||
# DailyMail embed
|
||||
{
|
||||
'url': 'http://www.bumm.sk/krimi/2017/07/05/biztonsagi-kamera-buktatta-le-az-agg-ferfit-utlegelo-apolot',
|
||||
'info_dict': {
|
||||
'id': '1495629',
|
||||
'ext': 'mp4',
|
||||
'title': 'Care worker punches elderly dementia patient in head 11 times',
|
||||
'description': 'md5:3a743dee84e57e48ec68bf67113199a5',
|
||||
},
|
||||
'add_ie': ['DailyMail'],
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
},
|
||||
# YouTube embed
|
||||
{
|
||||
'url': 'http://www.badzine.de/ansicht/datum/2014/06/09/so-funktioniert-die-neue-englische-badminton-liga.html',
|
||||
@ -1185,7 +1220,7 @@ class GenericIE(InfoExtractor):
|
||||
},
|
||||
'add_ie': ['Kaltura'],
|
||||
},
|
||||
# Eagle.Platform embed (generic URL)
|
||||
# EaglePlatform embed (generic URL)
|
||||
{
|
||||
'url': 'http://lenta.ru/news/2015/03/06/navalny/',
|
||||
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
|
||||
@ -1199,8 +1234,26 @@ class GenericIE(InfoExtractor):
|
||||
'view_count': int,
|
||||
'age_limit': 0,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
},
|
||||
# ClipYou (Eagle.Platform) embed (custom URL)
|
||||
# referrer protected EaglePlatform embed
|
||||
{
|
||||
'url': 'https://tvrain.ru/lite/teleshow/kak_vse_nachinalos/namin-418921/',
|
||||
'info_dict': {
|
||||
'id': '582306',
|
||||
'ext': 'mp4',
|
||||
'title': 'Стас Намин: «Мы нарушили девственность Кремля»',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'duration': 3382,
|
||||
'view_count': int,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
},
|
||||
# ClipYou (EaglePlatform) embed (custom URL)
|
||||
{
|
||||
'url': 'http://muz-tv.ru/play/7129/',
|
||||
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
|
||||
@ -1212,6 +1265,9 @@ class GenericIE(InfoExtractor):
|
||||
'duration': 216,
|
||||
'view_count': int,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
},
|
||||
# Pladform embed
|
||||
{
|
||||
@ -1512,6 +1568,22 @@ class GenericIE(InfoExtractor):
|
||||
},
|
||||
'add_ie': ['BrightcoveLegacy'],
|
||||
},
|
||||
# Nexx embed
|
||||
{
|
||||
'url': 'https://www.funk.net/serien/5940e15073f6120001657956/items/593efbb173f6120001657503',
|
||||
'info_dict': {
|
||||
'id': '247746',
|
||||
'ext': 'mp4',
|
||||
'title': "Yesterday's Jam (OV)",
|
||||
'description': 'md5:09bc0984723fed34e2581624a84e05f0',
|
||||
'timestamp': 1492594816,
|
||||
'upload_date': '20170419',
|
||||
},
|
||||
'params': {
|
||||
'format': 'bestvideo',
|
||||
'skip_download': True,
|
||||
},
|
||||
},
|
||||
# Facebook <iframe> embed
|
||||
{
|
||||
'url': 'https://www.hostblogger.de/blog/archives/6181-Auto-jagt-Betonmischer.html',
|
||||
@ -1522,6 +1594,21 @@ class GenericIE(InfoExtractor):
|
||||
'title': 'Facebook video #599637780109885',
|
||||
},
|
||||
},
|
||||
# Facebook <iframe> embed, plugin video
|
||||
{
|
||||
'url': 'http://5pillarsuk.com/2017/06/07/tariq-ramadan-disagrees-with-pr-exercise-by-imams-refusing-funeral-prayers-for-london-attackers/',
|
||||
'info_dict': {
|
||||
'id': '1754168231264132',
|
||||
'ext': 'mp4',
|
||||
'title': 'About the Imams and Religious leaders refusing to perform funeral prayers for...',
|
||||
'uploader': 'Tariq Ramadan (official)',
|
||||
'timestamp': 1496758379,
|
||||
'upload_date': '20170606',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
},
|
||||
# Facebook API embed
|
||||
{
|
||||
'url': 'http://www.lothype.com/blue-stars-2016-preview-standstill-full-show/',
|
||||
@ -1698,6 +1785,21 @@ class GenericIE(InfoExtractor):
|
||||
},
|
||||
'playlist_mincount': 5,
|
||||
},
|
||||
{
|
||||
# Limelight embed (LimelightPlayerUtil.embed)
|
||||
'url': 'https://tv5.ca/videos?v=xuu8qowr291ri',
|
||||
'info_dict': {
|
||||
'id': '95d035dc5c8a401588e9c0e6bd1e9c92',
|
||||
'ext': 'mp4',
|
||||
'title': '07448641',
|
||||
'timestamp': 1499890639,
|
||||
'upload_date': '20170712',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'add_ie': ['LimelightMedia'],
|
||||
},
|
||||
{
|
||||
'url': 'http://kron4.com/2017/04/28/standoff-with-walnut-creek-murder-suspect-ends-with-arrest/',
|
||||
'info_dict': {
|
||||
@ -1734,6 +1836,36 @@ class GenericIE(InfoExtractor):
|
||||
},
|
||||
'add_ie': [MediasetIE.ie_key()],
|
||||
},
|
||||
{
|
||||
# JOJ.sk embeds
|
||||
'url': 'https://www.noviny.sk/slovensko/238543-slovenskom-sa-prehnala-vlna-silnych-burok',
|
||||
'info_dict': {
|
||||
'id': '238543-slovenskom-sa-prehnala-vlna-silnych-burok',
|
||||
'title': 'Slovenskom sa prehnala vlna silných búrok',
|
||||
},
|
||||
'playlist_mincount': 5,
|
||||
'add_ie': [JojIE.ie_key()],
|
||||
},
|
||||
{
|
||||
# AMP embed (see https://www.ampproject.org/docs/reference/components/amp-video)
|
||||
'url': 'https://tvrain.ru/amp/418921/',
|
||||
'md5': 'cc00413936695987e8de148b67d14f1d',
|
||||
'info_dict': {
|
||||
'id': '418921',
|
||||
'ext': 'mp4',
|
||||
'title': 'Стас Намин: «Мы нарушили девственность Кремля»',
|
||||
},
|
||||
},
|
||||
{
|
||||
# vzaar embed
|
||||
'url': 'http://help.vzaar.com/article/165-embedding-video',
|
||||
'md5': '7e3919d9d2620b89e3e00bec7fe8c9d4',
|
||||
'info_dict': {
|
||||
'id': '8707641',
|
||||
'ext': 'mp4',
|
||||
'title': 'Building A Business Online: Principal Chairs Q & A',
|
||||
},
|
||||
},
|
||||
# {
|
||||
# # TODO: find another test
|
||||
# # http://schema.org/VideoObject
|
||||
@ -2033,6 +2165,13 @@ class GenericIE(InfoExtractor):
|
||||
video_description = self._og_search_description(webpage, default=None)
|
||||
video_thumbnail = self._og_search_thumbnail(webpage, default=None)
|
||||
|
||||
info_dict.update({
|
||||
'title': video_title,
|
||||
'description': video_description,
|
||||
'thumbnail': video_thumbnail,
|
||||
'age_limit': age_limit,
|
||||
})
|
||||
|
||||
# Look for Brightcove Legacy Studio embeds
|
||||
bc_urls = BrightcoveLegacyIE._extract_brightcove_urls(webpage)
|
||||
if bc_urls:
|
||||
@ -2054,6 +2193,16 @@ class GenericIE(InfoExtractor):
|
||||
if bc_urls:
|
||||
return self.playlist_from_matches(bc_urls, video_id, video_title, ie='BrightcoveNew')
|
||||
|
||||
# Look for Nexx embeds
|
||||
nexx_urls = NexxIE._extract_urls(webpage)
|
||||
if nexx_urls:
|
||||
return self.playlist_from_matches(nexx_urls, video_id, video_title, ie=NexxIE.ie_key())
|
||||
|
||||
# Look for Nexx iFrame embeds
|
||||
nexx_embed_urls = NexxEmbedIE._extract_urls(webpage)
|
||||
if nexx_embed_urls:
|
||||
return self.playlist_from_matches(nexx_embed_urls, video_id, video_title, ie=NexxEmbedIE.ie_key())
|
||||
|
||||
# Look for ThePlatform embeds
|
||||
tp_urls = ThePlatformIE._extract_urls(webpage)
|
||||
if tp_urls:
|
||||
@ -2126,6 +2275,12 @@ class GenericIE(InfoExtractor):
|
||||
return self.playlist_from_matches(
|
||||
playlists, video_id, video_title, lambda p: '//dailymotion.com/playlist/%s' % p)
|
||||
|
||||
# Look for DailyMail embeds
|
||||
dailymail_urls = DailyMailIE._extract_urls(webpage)
|
||||
if dailymail_urls:
|
||||
return self.playlist_from_matches(
|
||||
dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
|
||||
|
||||
# Look for embedded Wistia player
|
||||
wistia_url = WistiaIE._extract_url(webpage)
|
||||
if wistia_url:
|
||||
@ -2177,6 +2332,7 @@ class GenericIE(InfoExtractor):
|
||||
# Look for Ooyala videos
|
||||
mobj = (re.search(r'player\.ooyala\.com/[^"?]+[?#][^"]*?(?:embedCode|ec)=(?P<ec>[^"&]+)', webpage) or
|
||||
re.search(r'OO\.Player\.create\([\'"].*?[\'"],\s*[\'"](?P<ec>.{32})[\'"]', webpage) or
|
||||
re.search(r'OO\.Player\.create\.apply\(\s*OO\.Player\s*,\s*op\(\s*\[\s*[\'"][^\'"]*[\'"]\s*,\s*[\'"](?P<ec>.{32})[\'"]', webpage) or
|
||||
re.search(r'SBN\.VideoLinkset\.ooyala\([\'"](?P<ec>.{32})[\'"]\)', webpage) or
|
||||
re.search(r'data-ooyala-video-id\s*=\s*[\'"](?P<ec>.{32})[\'"]', webpage))
|
||||
if mobj is not None:
|
||||
@ -2222,9 +2378,9 @@ class GenericIE(InfoExtractor):
|
||||
return self.url_result(mobj.group('url'))
|
||||
|
||||
# Look for embedded Facebook player
|
||||
facebook_url = FacebookIE._extract_url(webpage)
|
||||
if facebook_url is not None:
|
||||
return self.url_result(facebook_url, 'Facebook')
|
||||
facebook_urls = FacebookIE._extract_urls(webpage)
|
||||
if facebook_urls:
|
||||
return self.playlist_from_matches(facebook_urls, video_id, video_title)
|
||||
|
||||
# Look for embedded VK player
|
||||
mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk\.com/video_ext\.php.+?)\1', webpage)
|
||||
@ -2421,12 +2577,12 @@ class GenericIE(InfoExtractor):
|
||||
if kaltura_url:
|
||||
return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())
|
||||
|
||||
# Look for Eagle.Platform embeds
|
||||
# Look for EaglePlatform embeds
|
||||
eagleplatform_url = EaglePlatformIE._extract_url(webpage)
|
||||
if eagleplatform_url:
|
||||
return self.url_result(eagleplatform_url, EaglePlatformIE.ie_key())
|
||||
return self.url_result(smuggle_url(eagleplatform_url, {'referrer': url}), EaglePlatformIE.ie_key())
|
||||
|
||||
# Look for ClipYou (uses Eagle.Platform) embeds
|
||||
# Look for ClipYou (uses EaglePlatform) embeds
|
||||
mobj = re.search(
|
||||
r'<iframe[^>]+src="https?://(?P<host>media\.clipyou\.ru)/index/player\?.*\brecord_id=(?P<id>\d+).*"', webpage)
|
||||
if mobj is not None:
|
||||
@ -2655,7 +2811,7 @@ class GenericIE(InfoExtractor):
|
||||
rutube_urls = RutubeIE._extract_urls(webpage)
|
||||
if rutube_urls:
|
||||
return self.playlist_from_matches(
|
||||
rutube_urls, ie=RutubeIE.ie_key())
|
||||
rutube_urls, video_id, video_title, ie=RutubeIE.ie_key())
|
||||
|
||||
# Look for WashingtonPost embeds
|
||||
wapo_urls = WashingtonPostIE._extract_urls(webpage)
|
||||
@ -2669,18 +2825,44 @@ class GenericIE(InfoExtractor):
|
||||
return self.playlist_from_matches(
|
||||
mediaset_urls, video_id, video_title, ie=MediasetIE.ie_key())
|
||||
|
||||
# Look for JOJ.sk embeds
|
||||
joj_urls = JojIE._extract_urls(webpage)
|
||||
if joj_urls:
|
||||
return self.playlist_from_matches(
|
||||
joj_urls, video_id, video_title, ie=JojIE.ie_key())
|
||||
|
||||
# Look for megaphone.fm embeds
|
||||
mpfn_urls = MegaphoneIE._extract_urls(webpage)
|
||||
if mpfn_urls:
|
||||
return self.playlist_from_matches(
|
||||
mpfn_urls, video_id, video_title, ie=MegaphoneIE.ie_key())
|
||||
|
||||
# Look for vzaar embeds
|
||||
vzaar_urls = VzaarIE._extract_urls(webpage)
|
||||
if vzaar_urls:
|
||||
return self.playlist_from_matches(
|
||||
vzaar_urls, video_id, video_title, ie=VzaarIE.ie_key())
|
||||
|
||||
def merge_dicts(dict1, dict2):
|
||||
merged = {}
|
||||
for k, v in dict1.items():
|
||||
if v is not None:
|
||||
merged[k] = v
|
||||
for k, v in dict2.items():
|
||||
if v is None:
|
||||
continue
|
||||
if (k not in merged or
|
||||
(isinstance(v, compat_str) and v and
|
||||
isinstance(merged[k], compat_str) and
|
||||
not merged[k])):
|
||||
merged[k] = v
|
||||
return merged
|
||||
|
||||
# Looking for http://schema.org/VideoObject
|
||||
json_ld = self._search_json_ld(
|
||||
webpage, video_id, default={}, expected_type='VideoObject')
|
||||
if json_ld.get('url'):
|
||||
info_dict.update({
|
||||
'title': video_title or info_dict['title'],
|
||||
'description': video_description,
|
||||
'thumbnail': video_thumbnail,
|
||||
'age_limit': age_limit
|
||||
})
|
||||
info_dict.update(json_ld)
|
||||
return info_dict
|
||||
return merge_dicts(json_ld, info_dict)
|
||||
|
||||
# Look for HTML5 media
|
||||
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
|
||||
@ -2698,9 +2880,7 @@ class GenericIE(InfoExtractor):
|
||||
if jwplayer_data:
|
||||
info = self._parse_jwplayer_data(
|
||||
jwplayer_data, video_id, require_title=False, base_url=url)
|
||||
if not info.get('title'):
|
||||
info['title'] = video_title
|
||||
return info
|
||||
return merge_dicts(info, info_dict)
|
||||
|
||||
def check_video(vurl):
|
||||
if YoutubeIE.suitable(vurl):
|
||||
|
@ -5,9 +5,10 @@ import json
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
unescapeHTML,
|
||||
qualities,
|
||||
determine_ext,
|
||||
int_or_none,
|
||||
qualities,
|
||||
unescapeHTML,
|
||||
)
|
||||
|
||||
|
||||
@ -15,7 +16,7 @@ class GiantBombIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?giantbomb\.com/videos/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
|
||||
_TEST = {
|
||||
'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/',
|
||||
'md5': '57badeface303ecf6b98b812de1b9018',
|
||||
'md5': 'c8ea694254a59246a42831155dec57ac',
|
||||
'info_dict': {
|
||||
'id': '2300-9782',
|
||||
'display_id': 'quick-look-destiny-the-dark-below',
|
||||
@ -51,11 +52,16 @@ class GiantBombIE(InfoExtractor):
|
||||
for format_id, video_url in video['videoStreams'].items():
|
||||
if format_id == 'f4m_stream':
|
||||
continue
|
||||
if video_url.endswith('.f4m'):
|
||||
ext = determine_ext(video_url)
|
||||
if ext == 'f4m':
|
||||
f4m_formats = self._extract_f4m_formats(video_url + '?hdcore=3.3.1', display_id)
|
||||
if f4m_formats:
|
||||
f4m_formats[0]['quality'] = quality(format_id)
|
||||
formats.extend(f4m_formats)
|
||||
elif ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
video_url, display_id, ext='mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
else:
|
||||
formats.append({
|
||||
'url': video_url,
|
||||
|
@ -92,7 +92,7 @@ class GoogleDriveIE(InfoExtractor):
|
||||
if resolution:
|
||||
f.update({
|
||||
'width': resolution[0],
|
||||
'height': resolution[0],
|
||||
'height': resolution[1],
|
||||
})
|
||||
formats.append(f)
|
||||
self._sort_formats(formats)
|
||||
|
@ -28,7 +28,7 @@ class HGTVComShowIE(InfoExtractor):
|
||||
|
||||
config = self._parse_json(
|
||||
self._search_regex(
|
||||
r'(?s)data-(?:deferred)?-module=["\']video["\'][^>]*>.*?<script[^>]+type=["\']text/x-config["\'][^>]*>(.+?)</script',
|
||||
r'(?s)data-(?:deferred-)?module=["\']video["\'][^>]*>.*?<script[^>]+type=["\']text/x-config["\'][^>]*>(.+?)</script',
|
||||
webpage, 'video config'),
|
||||
display_id)['channels'][0]
|
||||
|
||||
|
@ -89,6 +89,11 @@ class IGNIE(InfoExtractor):
|
||||
'url': 'http://me.ign.com/ar/angry-birds-2/106533/video/lrd-ldyy-lwl-lfylm-angry-birds',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
# videoId pattern
|
||||
'url': 'http://www.ign.com/articles/2017/06/08/new-ducktales-short-donalds-birthday-doesnt-go-as-planned',
|
||||
'only_matching': True,
|
||||
},
|
||||
]
|
||||
|
||||
def _find_video_id(self, webpage):
|
||||
@ -98,6 +103,8 @@ class IGNIE(InfoExtractor):
|
||||
r'data-video-id="(.+?)"',
|
||||
r'<object id="vid_(.+?)"',
|
||||
r'<meta name="og:image" content=".*/(.+?)-(.+?)/.+.jpg"',
|
||||
r'videoId"\s*:\s*"(.+?)"',
|
||||
r'videoId["\']\s*:\s*["\']([^"\']+?)["\']',
|
||||
]
|
||||
return self._search_regex(res_id, webpage, 'video id', default=None)
|
||||
|
||||
|
@ -59,12 +59,18 @@ class ITVIE(InfoExtractor):
|
||||
def _add_sub_element(element, name):
|
||||
return etree.SubElement(element, _add_ns(name))
|
||||
|
||||
production_id = (
|
||||
params.get('data-video-autoplay-id') or
|
||||
'%s#001' % (
|
||||
params.get('data-video-episode-id') or
|
||||
video_id.replace('a', '/')))
|
||||
|
||||
req_env = etree.Element(_add_ns('soapenv:Envelope'))
|
||||
_add_sub_element(req_env, 'soapenv:Header')
|
||||
body = _add_sub_element(req_env, 'soapenv:Body')
|
||||
get_playlist = _add_sub_element(body, ('tem:GetPlaylist'))
|
||||
request = _add_sub_element(get_playlist, 'tem:request')
|
||||
_add_sub_element(request, 'itv:ProductionId').text = params['data-video-id']
|
||||
_add_sub_element(request, 'itv:ProductionId').text = production_id
|
||||
_add_sub_element(request, 'itv:RequestGuid').text = compat_str(uuid.uuid4()).upper()
|
||||
vodcrid = _add_sub_element(request, 'itv:Vodcrid')
|
||||
_add_sub_element(vodcrid, 'com:Id')
|
||||
|
100
youtube_dl/extractor/joj.py
Executable file
100
youtube_dl/extractor/joj.py
Executable file
@ -0,0 +1,100 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
try_get,
|
||||
)
|
||||
|
||||
|
||||
class JojIE(InfoExtractor):
|
||||
_VALID_URL = r'''(?x)
|
||||
(?:
|
||||
joj:|
|
||||
https?://media\.joj\.sk/embed/
|
||||
)
|
||||
(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})
|
||||
'''
|
||||
_TESTS = [{
|
||||
'url': 'https://media.joj.sk/embed/a388ec4c-6019-4a4a-9312-b1bee194e932',
|
||||
'info_dict': {
|
||||
'id': 'a388ec4c-6019-4a4a-9312-b1bee194e932',
|
||||
'ext': 'mp4',
|
||||
'title': 'NOVÉ BÝVANIE',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'duration': 3118,
|
||||
}
|
||||
}, {
|
||||
'url': 'joj:a388ec4c-6019-4a4a-9312-b1bee194e932',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def _extract_urls(webpage):
|
||||
return re.findall(
|
||||
r'<iframe\b[^>]+\bsrc=["\'](?P<url>(?:https?:)?//media\.joj\.sk/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
|
||||
webpage)
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(
|
||||
'https://media.joj.sk/embed/%s' % video_id, video_id)
|
||||
|
||||
title = self._search_regex(
|
||||
(r'videoTitle\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
|
||||
r'<title>(?P<title>[^<]+)'), webpage, 'title',
|
||||
default=None, group='title') or self._og_search_title(webpage)
|
||||
|
||||
bitrates = self._parse_json(
|
||||
self._search_regex(
|
||||
r'(?s)bitrates\s*=\s*({.+?});', webpage, 'bitrates',
|
||||
default='{}'),
|
||||
video_id, transform_source=js_to_json, fatal=False)
|
||||
|
||||
formats = []
|
||||
for format_url in try_get(bitrates, lambda x: x['mp4'], list) or []:
|
||||
if isinstance(format_url, compat_str):
|
||||
height = self._search_regex(
|
||||
r'(\d+)[pP]\.', format_url, 'height', default=None)
|
||||
formats.append({
|
||||
'url': format_url,
|
||||
'format_id': '%sp' % height if height else None,
|
||||
'height': int(height),
|
||||
})
|
||||
if not formats:
|
||||
playlist = self._download_xml(
|
||||
'https://media.joj.sk/services/Video.php?clip=%s' % video_id,
|
||||
video_id)
|
||||
for file_el in playlist.findall('./files/file'):
|
||||
path = file_el.get('path')
|
||||
if not path:
|
||||
continue
|
||||
format_id = file_el.get('id') or file_el.get('label')
|
||||
formats.append({
|
||||
'url': 'http://n16.joj.sk/storage/%s' % path.replace(
|
||||
'dat/', '', 1),
|
||||
'format_id': format_id,
|
||||
'height': int_or_none(self._search_regex(
|
||||
r'(\d+)[pP]', format_id or path, 'height',
|
||||
default=None)),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
|
||||
duration = int_or_none(self._search_regex(
|
||||
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'thumbnail': thumbnail,
|
||||
'duration': duration,
|
||||
'formats': formats,
|
||||
}
|
@ -324,7 +324,7 @@ class KalturaIE(InfoExtractor):
|
||||
if captions:
|
||||
for caption in captions.get('objects', []):
|
||||
# Continue if caption is not ready
|
||||
if f.get('status') != 2:
|
||||
if caption.get('status') != 2:
|
||||
continue
|
||||
if not caption.get('id'):
|
||||
continue
|
||||
|
@ -48,7 +48,7 @@ class KarriereVideosIE(InfoExtractor):
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
title = (self._html_search_meta('title', webpage, default=None) or
|
||||
self._search_regex(r'<h1 class="title">([^<]+)</h1>'))
|
||||
self._search_regex(r'<h1 class="title">([^<]+)</h1>', webpage, 'video title'))
|
||||
|
||||
video_id = self._search_regex(
|
||||
r'/config/video/(.+?)\.xml', webpage, 'video id')
|
||||
|
@ -26,14 +26,16 @@ class LimelightBaseIE(InfoExtractor):
|
||||
'Channel': 'channel',
|
||||
'ChannelList': 'channel_list',
|
||||
}
|
||||
|
||||
def smuggle(url):
|
||||
return smuggle_url(url, {'source_url': source_url})
|
||||
|
||||
entries = []
|
||||
for kind, video_id in re.findall(
|
||||
r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})',
|
||||
webpage):
|
||||
entries.append(cls.url_result(
|
||||
smuggle_url(
|
||||
'limelight:%s:%s' % (lm[kind], video_id),
|
||||
{'source_url': source_url}),
|
||||
smuggle('limelight:%s:%s' % (lm[kind], video_id)),
|
||||
'Limelight%s' % kind, video_id))
|
||||
for mobj in re.finditer(
|
||||
# As per [1] class attribute should be exactly equal to
|
||||
@ -49,10 +51,15 @@ class LimelightBaseIE(InfoExtractor):
|
||||
''', webpage):
|
||||
kind, video_id = mobj.group('kind'), mobj.group('id')
|
||||
entries.append(cls.url_result(
|
||||
smuggle_url(
|
||||
'limelight:%s:%s' % (kind, video_id),
|
||||
{'source_url': source_url}),
|
||||
smuggle('limelight:%s:%s' % (kind, video_id)),
|
||||
'Limelight%s' % kind.capitalize(), video_id))
|
||||
# http://support.3playmedia.com/hc/en-us/articles/115009517327-Limelight-Embedding-the-Audio-Description-Plugin-with-the-Limelight-Player-on-Your-Web-Page)
|
||||
for video_id in re.findall(
|
||||
r'(?s)LimelightPlayerUtil\.embed\s*\(\s*{.*?\bmediaId["\']\s*:\s*["\'](?P<id>[a-z0-9]{32})',
|
||||
webpage):
|
||||
entries.append(cls.url_result(
|
||||
smuggle('limelight:media:%s' % video_id),
|
||||
LimelightMediaIE.ie_key(), video_id))
|
||||
return entries
|
||||
|
||||
def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
|
||||
|
55
youtube_dl/extractor/megaphone.py
Normal file
55
youtube_dl/extractor/megaphone.py
Normal file
@ -0,0 +1,55 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import js_to_json
|
||||
|
||||
|
||||
class MegaphoneIE(InfoExtractor):
|
||||
IE_NAME = 'megaphone.fm'
|
||||
IE_DESC = 'megaphone.fm embedded players'
|
||||
_VALID_URL = r'https://player\.megaphone\.fm/(?P<id>[A-Z0-9]+)'
|
||||
_TEST = {
|
||||
'url': 'https://player.megaphone.fm/GLT9749789991?"',
|
||||
'md5': '4816a0de523eb3e972dc0dda2c191f96',
|
||||
'info_dict': {
|
||||
'id': 'GLT9749789991',
|
||||
'ext': 'mp3',
|
||||
'title': '#97 What Kind Of Idiot Gets Phished?',
|
||||
'thumbnail': 're:^https://.*\.png.*$',
|
||||
'duration': 1776.26375,
|
||||
'author': 'Reply All',
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
title = self._og_search_property('audio:title', webpage)
|
||||
author = self._og_search_property('audio:artist', webpage)
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
|
||||
episode_json = self._search_regex(r'(?s)var\s+episode\s*=\s*(\{.+?\});', webpage, 'episode JSON')
|
||||
episode_data = self._parse_json(episode_json, video_id, js_to_json)
|
||||
video_url = self._proto_relative_url(episode_data['mediaUrl'], 'https:')
|
||||
|
||||
formats = [{
|
||||
'url': video_url,
|
||||
}]
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'thumbnail': thumbnail,
|
||||
'title': title,
|
||||
'author': author,
|
||||
'duration': episode_data['duration'],
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def _extract_urls(cls, webpage):
|
||||
return [m[0] for m in re.findall(
|
||||
r'<iframe[^>]*?\ssrc=["\'](%s)' % cls._VALID_URL, webpage)]
|
@ -54,15 +54,23 @@ class MixcloudIE(InfoExtractor):
|
||||
}]
|
||||
|
||||
# See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js
|
||||
@staticmethod
|
||||
def _decrypt_play_info(play_info):
|
||||
KEY = 'pleasedontdownloadourmusictheartistswontgetpaid'
|
||||
|
||||
def _decrypt_play_info(self, play_info, video_id):
|
||||
KEYS = (
|
||||
'pleasedontdownloadourmusictheartistswontgetpaid',
|
||||
'window.addEventListener = window.addEventListener || function() {};',
|
||||
'(function() { return new Date().toLocaleDateString(); })()',
|
||||
)
|
||||
play_info = base64.b64decode(play_info.encode('ascii'))
|
||||
|
||||
return ''.join([
|
||||
compat_chr(compat_ord(ch) ^ compat_ord(KEY[idx % len(KEY)]))
|
||||
for idx, ch in enumerate(play_info)])
|
||||
for num, key in enumerate(KEYS, start=1):
|
||||
try:
|
||||
return self._parse_json(
|
||||
''.join([
|
||||
compat_chr(compat_ord(ch) ^ compat_ord(key[idx % len(key)]))
|
||||
for idx, ch in enumerate(play_info)]),
|
||||
video_id)
|
||||
except ExtractorError:
|
||||
if num == len(KEYS):
|
||||
raise
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
@ -78,8 +86,8 @@ class MixcloudIE(InfoExtractor):
|
||||
|
||||
encrypted_play_info = self._search_regex(
|
||||
r'm-play-info="([^"]+)"', webpage, 'play info')
|
||||
play_info = self._parse_json(
|
||||
self._decrypt_play_info(encrypted_play_info), track_id)
|
||||
|
||||
play_info = self._decrypt_play_info(encrypted_play_info, track_id)
|
||||
|
||||
if message and 'stream_url' not in play_info:
|
||||
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
|
||||
|
@ -15,7 +15,7 @@ class MLBIE(InfoExtractor):
|
||||
(?:[\da-z_-]+\.)*mlb\.com/
|
||||
(?:
|
||||
(?:
|
||||
(?:.*?/)?video/(?:topic/[\da-z_-]+/)?v|
|
||||
(?:.*?/)?video/(?:topic/[\da-z_-]+/)?(?:v|.*?/c-)|
|
||||
(?:
|
||||
shared/video/embed/(?:embed|m-internal-embed)\.html|
|
||||
(?:[^/]+/)+(?:play|index)\.jsp|
|
||||
@ -84,7 +84,7 @@ class MLBIE(InfoExtractor):
|
||||
},
|
||||
{
|
||||
'url': 'http://m.mlb.com/news/article/118550098/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer',
|
||||
'md5': 'b190e70141fb9a1552a85426b4da1b5d',
|
||||
'md5': 'aafaf5b0186fee8f32f20508092f8111',
|
||||
'info_dict': {
|
||||
'id': '75609783',
|
||||
'ext': 'mp4',
|
||||
@ -94,6 +94,10 @@ class MLBIE(InfoExtractor):
|
||||
'upload_date': '20150415',
|
||||
}
|
||||
},
|
||||
{
|
||||
'url': 'https://www.mlb.com/video/hargrove-homers-off-caldwell/c-1352023483?tid=67793694',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
'url': 'http://m.mlb.com/shared/video/embed/embed.html?content_id=35692085&topic_id=6479266&width=400&height=224&property=mlb',
|
||||
'only_matching': True,
|
||||
|
@ -1,62 +0,0 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import int_or_none
|
||||
|
||||
|
||||
class MporaIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?mpora\.(?:com|de)/videos/(?P<id>[^?#/]+)'
|
||||
IE_NAME = 'MPORA'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://mpora.de/videos/AAdo8okx4wiz/embed?locale=de',
|
||||
'md5': 'a7a228473eedd3be741397cf452932eb',
|
||||
'info_dict': {
|
||||
'id': 'AAdo8okx4wiz',
|
||||
'ext': 'mp4',
|
||||
'title': 'Katy Curd - Winter in the Forest',
|
||||
'duration': 416,
|
||||
'uploader': 'Peter Newman Media',
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
data_json = self._search_regex(
|
||||
[r"new FM\.Player\('[^']+',\s*(\{.*?)\).player;",
|
||||
r"new\s+FM\.Kaltura\.Player\('[^']+'\s*,\s*({.+?})\);"],
|
||||
webpage, 'json')
|
||||
data = self._parse_json(data_json, video_id)
|
||||
|
||||
uploader = data['info_overlay'].get('username')
|
||||
duration = data['video']['duration'] // 1000
|
||||
thumbnail = data['video']['encodings']['sd']['poster']
|
||||
title = data['info_overlay']['title']
|
||||
|
||||
formats = []
|
||||
for encoding_id, edata in data['video']['encodings'].items():
|
||||
for src in edata['sources']:
|
||||
width_str = self._search_regex(
|
||||
r'_([0-9]+)\.[a-zA-Z0-9]+$', src['src'],
|
||||
False, default=None)
|
||||
vcodec = src['type'].partition('/')[2]
|
||||
|
||||
formats.append({
|
||||
'format_id': encoding_id + '-' + vcodec,
|
||||
'url': src['src'],
|
||||
'vcodec': vcodec,
|
||||
'width': int_or_none(width_str),
|
||||
})
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
'uploader': uploader,
|
||||
'duration': duration,
|
||||
'thumbnail': thumbnail,
|
||||
}
|
@ -50,8 +50,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
|
||||
thumb_node = itemdoc.find(search_path)
|
||||
if thumb_node is None:
|
||||
return None
|
||||
else:
|
||||
return thumb_node.attrib['url']
|
||||
return thumb_node.get('url') or thumb_node.text or None
|
||||
|
||||
def _extract_mobile_video_formats(self, mtvn_id):
|
||||
webpage_url = self._MOBILE_TEMPLATE % mtvn_id
|
||||
@ -83,7 +82,7 @@ class MTVServicesInfoExtractor(InfoExtractor):
|
||||
hls_url = rendition.find('./src').text
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
hls_url, video_id, ext='mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls'))
|
||||
m3u8_id='hls', fatal=False))
|
||||
else:
|
||||
# fms
|
||||
try:
|
||||
@ -106,7 +105,8 @@ class MTVServicesInfoExtractor(InfoExtractor):
|
||||
}])
|
||||
except (KeyError, TypeError):
|
||||
raise ExtractorError('Invalid rendition field.')
|
||||
self._sort_formats(formats)
|
||||
if formats:
|
||||
self._sort_formats(formats)
|
||||
return formats
|
||||
|
||||
def _extract_subtitles(self, mdoc, mtvn_id):
|
||||
@ -133,8 +133,11 @@ class MTVServicesInfoExtractor(InfoExtractor):
|
||||
mediagen_url += 'acceptMethods='
|
||||
mediagen_url += 'hls' if use_hls else 'fms'
|
||||
|
||||
mediagen_doc = self._download_xml(mediagen_url, video_id,
|
||||
'Downloading video urls')
|
||||
mediagen_doc = self._download_xml(
|
||||
mediagen_url, video_id, 'Downloading video urls', fatal=False)
|
||||
|
||||
if mediagen_doc is False:
|
||||
return None
|
||||
|
||||
item = mediagen_doc.find('./video/item')
|
||||
if item is not None and item.get('type') == 'text':
|
||||
@ -174,6 +177,13 @@ class MTVServicesInfoExtractor(InfoExtractor):
|
||||
|
||||
formats = self._extract_video_formats(mediagen_doc, mtvn_id, video_id)
|
||||
|
||||
# Some parts of complete video may be missing (e.g. missing Act 3 in
|
||||
# http://www.southpark.de/alle-episoden/s14e01-sexual-healing)
|
||||
if not formats:
|
||||
return None
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
@ -205,9 +215,14 @@ class MTVServicesInfoExtractor(InfoExtractor):
|
||||
title = xpath_text(idoc, './channel/title')
|
||||
description = xpath_text(idoc, './channel/description')
|
||||
|
||||
entries = []
|
||||
for item in idoc.findall('.//item'):
|
||||
info = self._get_video_info(item, use_hls)
|
||||
if info:
|
||||
entries.append(info)
|
||||
|
||||
return self.playlist_result(
|
||||
[self._get_video_info(item, use_hls) for item in idoc.findall('.//item')],
|
||||
playlist_title=title, playlist_description=description)
|
||||
entries, playlist_title=title, playlist_description=description)
|
||||
|
||||
def _extract_triforce_mgid(self, webpage, data_zone=None, video_id=None):
|
||||
triforce_feed = self._parse_json(self._search_regex(
|
||||
|
271
youtube_dl/extractor/nexx.py
Normal file
271
youtube_dl/extractor/nexx.py
Normal file
@ -0,0 +1,271 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import hashlib
|
||||
import random
|
||||
import re
|
||||
import time
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
parse_duration,
|
||||
try_get,
|
||||
urlencode_postdata,
|
||||
)
|
||||
|
||||
|
||||
class NexxIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://api\.nexx(?:\.cloud|cdn\.com)/v3/(?P<domain_id>\d+)/videos/byid/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
# movie
|
||||
'url': 'https://api.nexx.cloud/v3/748/videos/byid/128907',
|
||||
'md5': '16746bfc28c42049492385c989b26c4a',
|
||||
'info_dict': {
|
||||
'id': '128907',
|
||||
'ext': 'mp4',
|
||||
'title': 'Stiftung Warentest',
|
||||
'alt_title': 'Wie ein Test abläuft',
|
||||
'description': 'md5:d1ddb1ef63de721132abd38639cc2fd2',
|
||||
'release_year': 2013,
|
||||
'creator': 'SPIEGEL TV',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'duration': 2509,
|
||||
'timestamp': 1384264416,
|
||||
'upload_date': '20131112',
|
||||
},
|
||||
'params': {
|
||||
'format': 'bestvideo',
|
||||
},
|
||||
}, {
|
||||
# episode
|
||||
'url': 'https://api.nexx.cloud/v3/741/videos/byid/247858',
|
||||
'info_dict': {
|
||||
'id': '247858',
|
||||
'ext': 'mp4',
|
||||
'title': 'Return of the Golden Child (OV)',
|
||||
'description': 'md5:5d969537509a92b733de21bae249dc63',
|
||||
'release_year': 2017,
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'duration': 1397,
|
||||
'timestamp': 1495033267,
|
||||
'upload_date': '20170517',
|
||||
'episode_number': 2,
|
||||
'season_number': 2,
|
||||
},
|
||||
'params': {
|
||||
'format': 'bestvideo',
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'https://api.nexxcdn.com/v3/748/videos/byid/128907',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def _extract_urls(webpage):
|
||||
# Reference:
|
||||
# 1. https://nx-s.akamaized.net/files/201510/44.pdf
|
||||
|
||||
entries = []
|
||||
|
||||
# JavaScript Integration
|
||||
mobj = re.search(
|
||||
r'<script\b[^>]+\bsrc=["\']https?://require\.nexx(?:\.cloud|cdn\.com)/(?P<id>\d+)',
|
||||
webpage)
|
||||
if mobj:
|
||||
domain_id = mobj.group('id')
|
||||
for video_id in re.findall(
|
||||
r'(?is)onPLAYReady.+?_play\.init\s*\(.+?\s*,\s*["\']?(\d+)',
|
||||
webpage):
|
||||
entries.append(
|
||||
'https://api.nexx.cloud/v3/%s/videos/byid/%s'
|
||||
% (domain_id, video_id))
|
||||
|
||||
# TODO: support more embed formats
|
||||
|
||||
return entries
|
||||
|
||||
@staticmethod
|
||||
def _extract_url(webpage):
|
||||
return NexxIE._extract_urls(webpage)[0]
|
||||
|
||||
def _handle_error(self, response):
|
||||
status = int_or_none(try_get(
|
||||
response, lambda x: x['metadata']['status']) or 200)
|
||||
if 200 <= status < 300:
|
||||
return
|
||||
raise ExtractorError(
|
||||
'%s said: %s' % (self.IE_NAME, response['metadata']['errorhint']),
|
||||
expected=True)
|
||||
|
||||
def _call_api(self, domain_id, path, video_id, data=None, headers={}):
|
||||
headers['Content-Type'] = 'application/x-www-form-urlencoded; charset=UTF-8'
|
||||
result = self._download_json(
|
||||
'https://api.nexx.cloud/v3/%s/%s' % (domain_id, path), video_id,
|
||||
'Downloading %s JSON' % path, data=urlencode_postdata(data),
|
||||
headers=headers)
|
||||
self._handle_error(result)
|
||||
return result['result']
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
domain_id, video_id = mobj.group('domain_id', 'id')
|
||||
|
||||
# Reverse engineered from JS code (see getDeviceID function)
|
||||
device_id = '%d:%d:%d%d' % (
|
||||
random.randint(1, 4), int(time.time()),
|
||||
random.randint(1e4, 99999), random.randint(1, 9))
|
||||
|
||||
result = self._call_api(domain_id, 'session/init', video_id, data={
|
||||
'nxp_devh': device_id,
|
||||
'nxp_userh': '',
|
||||
'precid': '0',
|
||||
'playlicense': '0',
|
||||
'screenx': '1920',
|
||||
'screeny': '1080',
|
||||
'playerversion': '6.0.00',
|
||||
'gateway': 'html5',
|
||||
'adGateway': '',
|
||||
'explicitlanguage': 'en-US',
|
||||
'addTextTemplates': '1',
|
||||
'addDomainData': '1',
|
||||
'addAdModel': '1',
|
||||
}, headers={
|
||||
'X-Request-Enable-Auth-Fallback': '1',
|
||||
})
|
||||
|
||||
cid = result['general']['cid']
|
||||
|
||||
# As described in [1] X-Request-Token generation algorithm is
|
||||
# as follows:
|
||||
# md5( operation + domain_id + domain_secret )
|
||||
# where domain_secret is a static value that will be given by nexx.tv
|
||||
# as per [1]. Here is how this "secret" is generated (reversed
|
||||
# from _play.api.init function, search for clienttoken). So it's
|
||||
# actually not static and not that much of a secret.
|
||||
# 1. https://nexxtvstorage.blob.core.windows.net/files/201610/27.pdf
|
||||
secret = result['device']['clienttoken'][int(device_id[0]):]
|
||||
secret = secret[0:len(secret) - int(device_id[-1])]
|
||||
|
||||
op = 'byid'
|
||||
|
||||
# Reversed from JS code for _play.api.call function (search for
|
||||
# X-Request-Token)
|
||||
request_token = hashlib.md5(
|
||||
''.join((op, domain_id, secret)).encode('utf-8')).hexdigest()
|
||||
|
||||
video = self._call_api(
|
||||
domain_id, 'videos/%s/%s' % (op, video_id), video_id, data={
|
||||
'additionalfields': 'language,channel,actors,studio,licenseby,slug,subtitle,teaser,description',
|
||||
'addInteractionOptions': '1',
|
||||
'addStatusDetails': '1',
|
||||
'addStreamDetails': '1',
|
||||
'addCaptions': '1',
|
||||
'addScenes': '1',
|
||||
'addHotSpots': '1',
|
||||
'addBumpers': '1',
|
||||
'captionFormat': 'data',
|
||||
}, headers={
|
||||
'X-Request-CID': cid,
|
||||
'X-Request-Token': request_token,
|
||||
})
|
||||
|
||||
general = video['general']
|
||||
title = general['title']
|
||||
|
||||
stream_data = video['streamdata']
|
||||
language = general.get('language_raw') or ''
|
||||
|
||||
# TODO: reverse more cdns and formats
|
||||
|
||||
cdn = stream_data['cdnType']
|
||||
assert cdn == 'azure'
|
||||
|
||||
azure_locator = stream_data['azureLocator']
|
||||
|
||||
AZURE_URL = 'http://nx-p%02d.akamaized.net/'
|
||||
|
||||
for secure in ('s', ''):
|
||||
cdn_shield = stream_data.get('cdnShieldHTTP%s' % secure.upper())
|
||||
if cdn_shield:
|
||||
azure_base = 'http%s://%s' % (secure, cdn_shield)
|
||||
break
|
||||
else:
|
||||
azure_base = AZURE_URL % int(stream_data['azureAccount'].replace('nexxplayplus', ''))
|
||||
|
||||
is_ml = ',' in language
|
||||
azure_m3u8_url = '%s%s/%s_src%s.ism/Manifest(format=m3u8-aapl)' % (
|
||||
azure_base, azure_locator, video_id, ('_manifest' if is_ml else ''))
|
||||
|
||||
protection_token = try_get(
|
||||
video, lambda x: x['protectiondata']['token'], compat_str)
|
||||
if protection_token:
|
||||
azure_m3u8_url += '?hdnts=%s' % protection_token
|
||||
|
||||
formats = self._extract_m3u8_formats(
|
||||
azure_m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='%s-hls' % cdn)
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'alt_title': general.get('subtitle'),
|
||||
'description': general.get('description'),
|
||||
'release_year': int_or_none(general.get('year')),
|
||||
'creator': general.get('studio') or general.get('studio_adref'),
|
||||
'thumbnail': try_get(
|
||||
video, lambda x: x['imagedata']['thumb'], compat_str),
|
||||
'duration': parse_duration(general.get('runtime')),
|
||||
'timestamp': int_or_none(general.get('uploaded')),
|
||||
'episode_number': int_or_none(try_get(
|
||||
video, lambda x: x['episodedata']['episode'])),
|
||||
'season_number': int_or_none(try_get(
|
||||
video, lambda x: x['episodedata']['season'])),
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
|
||||
class NexxEmbedIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://embed\.nexx(?:\.cloud|cdn\.com)/\d+/(?P<id>[^/?#&]+)'
|
||||
_TEST = {
|
||||
'url': 'http://embed.nexx.cloud/748/KC1614647Z27Y7T?autoplay=1',
|
||||
'md5': '16746bfc28c42049492385c989b26c4a',
|
||||
'info_dict': {
|
||||
'id': '161464',
|
||||
'ext': 'mp4',
|
||||
'title': 'Nervenkitzel Achterbahn',
|
||||
'alt_title': 'Karussellbauer in Deutschland',
|
||||
'description': 'md5:ffe7b1cc59a01f585e0569949aef73cc',
|
||||
'release_year': 2005,
|
||||
'creator': 'SPIEGEL TV',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'duration': 2761,
|
||||
'timestamp': 1394021479,
|
||||
'upload_date': '20140305',
|
||||
},
|
||||
'params': {
|
||||
'format': 'bestvideo',
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def _extract_urls(webpage):
|
||||
# Reference:
|
||||
# 1. https://nx-s.akamaized.net/files/201510/44.pdf
|
||||
|
||||
# iFrame Embed Integration
|
||||
return [mobj.group('url') for mobj in re.finditer(
|
||||
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.nexx(?:\.cloud|cdn\.com)/\d+/(?:(?!\1).)+)\1',
|
||||
webpage)]
|
||||
|
||||
def _real_extract(self, url):
|
||||
embed_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, embed_id)
|
||||
|
||||
return self.url_result(NexxIE._extract_url(webpage), ie=NexxIE.ie_key())
|
@ -12,6 +12,7 @@ class NickIE(MTVServicesInfoExtractor):
|
||||
IE_NAME = 'nick.com'
|
||||
_VALID_URL = r'https?://(?:(?:www|beta)\.)?nick(?:jr)?\.com/(?:[^/]+/)?(?:videos/clip|[^/]+/videos)/(?P<id>[^/?#.]+)'
|
||||
_FEED_URL = 'http://udat.mtvnservices.com/service1/dispatch.htm'
|
||||
_GEO_COUNTRIES = ['US']
|
||||
_TESTS = [{
|
||||
'url': 'http://www.nick.com/videos/clip/alvinnn-and-the-chipmunks-112-full-episode.html',
|
||||
'playlist': [
|
||||
@ -74,7 +75,7 @@ class NickIE(MTVServicesInfoExtractor):
|
||||
|
||||
class NickDeIE(MTVServicesInfoExtractor):
|
||||
IE_NAME = 'nick.de'
|
||||
_VALID_URL = r'https?://(?:www\.)?(?P<host>nick\.de|nickelodeon\.(?:nl|at))/(?:playlist|shows)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
|
||||
_VALID_URL = r'https?://(?:www\.)?(?P<host>nick\.(?:de|com\.pl)|nickelodeon\.(?:nl|at))/[^/]+/(?:[^/]+/)*(?P<id>[^/?#&]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.nick.de/playlist/3773-top-videos/videos/episode/17306-zu-wasser-und-zu-land-rauchende-erdnusse',
|
||||
'only_matching': True,
|
||||
@ -87,6 +88,9 @@ class NickDeIE(MTVServicesInfoExtractor):
|
||||
}, {
|
||||
'url': 'http://www.nickelodeon.at/playlist/3773-top-videos/videos/episode/77993-das-letzte-gefecht',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.nick.com.pl/seriale/474-spongebob-kanciastoporty/wideo/17412-teatr-to-jest-to-rodeo-oszolom',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _extract_mrss_url(self, webpage, host):
|
||||
@ -124,3 +128,21 @@ class NickNightIE(NickDeIE):
|
||||
return self._search_regex(
|
||||
r'mrss\s*:\s*(["\'])(?P<url>http.+?)\1', webpage,
|
||||
'mrss url', group='url')
|
||||
|
||||
|
||||
class NickRuIE(MTVServicesInfoExtractor):
|
||||
IE_NAME = 'nickelodeonru'
|
||||
_VALID_URL = r'https?://(?:www\.)nickelodeon\.ru/(?:playlist|shows|videos)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.nickelodeon.ru/shows/henrydanger/videos/episodes/3-sezon-15-seriya-licenziya-na-polyot/pmomfb#playlist/7airc6',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.nickelodeon.ru/videos/smotri-na-nickelodeon-v-iyule/g9hvh7',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
mgid = self._extract_mgid(webpage)
|
||||
return self.url_result('http://media.mtvnservices.com/embed/%s' % mgid)
|
||||
|
@ -1,23 +1,27 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
import json
|
||||
import datetime
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_parse_qs,
|
||||
compat_urlparse,
|
||||
)
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
dict_get,
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
float_or_none,
|
||||
parse_duration,
|
||||
parse_iso8601,
|
||||
sanitized_Request,
|
||||
xpath_text,
|
||||
determine_ext,
|
||||
remove_start,
|
||||
try_get,
|
||||
unified_timestamp,
|
||||
urlencode_postdata,
|
||||
xpath_text,
|
||||
)
|
||||
|
||||
|
||||
@ -32,12 +36,15 @@ class NiconicoIE(InfoExtractor):
|
||||
'id': 'sm22312215',
|
||||
'ext': 'mp4',
|
||||
'title': 'Big Buck Bunny',
|
||||
'thumbnail': r're:https?://.*',
|
||||
'uploader': 'takuya0301',
|
||||
'uploader_id': '2698420',
|
||||
'upload_date': '20131123',
|
||||
'timestamp': 1385182762,
|
||||
'description': '(c) copyright 2008, Blender Foundation / www.bigbuckbunny.org',
|
||||
'duration': 33,
|
||||
'view_count': int,
|
||||
'comment_count': int,
|
||||
},
|
||||
'skip': 'Requires an account',
|
||||
}, {
|
||||
@ -49,6 +56,7 @@ class NiconicoIE(InfoExtractor):
|
||||
'ext': 'swf',
|
||||
'title': '【鏡音リン】Dance on media【オリジナル】take2!',
|
||||
'description': 'md5:689f066d74610b3b22e0f1739add0f58',
|
||||
'thumbnail': r're:https?://.*',
|
||||
'uploader': 'りょうた',
|
||||
'uploader_id': '18822557',
|
||||
'upload_date': '20110429',
|
||||
@ -65,9 +73,11 @@ class NiconicoIE(InfoExtractor):
|
||||
'ext': 'unknown_video',
|
||||
'description': 'deleted',
|
||||
'title': 'ドラえもんエターナル第3話「決戦第3新東京市」<前編>',
|
||||
'thumbnail': r're:https?://.*',
|
||||
'upload_date': '20071224',
|
||||
'timestamp': int, # timestamp field has different value if logged in
|
||||
'duration': 304,
|
||||
'view_count': int,
|
||||
},
|
||||
'skip': 'Requires an account',
|
||||
}, {
|
||||
@ -77,15 +87,57 @@ class NiconicoIE(InfoExtractor):
|
||||
'ext': 'mp4',
|
||||
'title': '【第1回】RADIOアニメロミックス ラブライブ!~のぞえりRadio Garden~',
|
||||
'description': 'md5:b27d224bb0ff53d3c8269e9f8b561cf1',
|
||||
'thumbnail': r're:https?://.*',
|
||||
'timestamp': 1388851200,
|
||||
'upload_date': '20140104',
|
||||
'uploader': 'アニメロチャンネル',
|
||||
'uploader_id': '312',
|
||||
},
|
||||
'skip': 'The viewing period of the video you were searching for has expired.',
|
||||
}, {
|
||||
# video not available via `getflv`; "old" HTML5 video
|
||||
'url': 'http://www.nicovideo.jp/watch/sm1151009',
|
||||
'md5': '8fa81c364eb619d4085354eab075598a',
|
||||
'info_dict': {
|
||||
'id': 'sm1151009',
|
||||
'ext': 'mp4',
|
||||
'title': 'マスターシステム本体内蔵のスペハリのメインテーマ(PSG版)',
|
||||
'description': 'md5:6ee077e0581ff5019773e2e714cdd0b7',
|
||||
'thumbnail': r're:https?://.*',
|
||||
'duration': 184,
|
||||
'timestamp': 1190868283,
|
||||
'upload_date': '20070927',
|
||||
'uploader': 'denden2',
|
||||
'uploader_id': '1392194',
|
||||
'view_count': int,
|
||||
'comment_count': int,
|
||||
},
|
||||
'skip': 'Requires an account',
|
||||
}, {
|
||||
# "New" HTML5 video
|
||||
'url': 'http://www.nicovideo.jp/watch/sm31464864',
|
||||
'md5': '351647b4917660986dc0fa8864085135',
|
||||
'info_dict': {
|
||||
'id': 'sm31464864',
|
||||
'ext': 'mp4',
|
||||
'title': '新作TVアニメ「戦姫絶唱シンフォギアAXZ」PV 最高画質',
|
||||
'description': 'md5:e52974af9a96e739196b2c1ca72b5feb',
|
||||
'timestamp': 1498514060,
|
||||
'upload_date': '20170626',
|
||||
'uploader': 'ゲス',
|
||||
'uploader_id': '40826363',
|
||||
'thumbnail': r're:https?://.*',
|
||||
'duration': 198,
|
||||
'view_count': int,
|
||||
'comment_count': int,
|
||||
},
|
||||
'skip': 'Requires an account',
|
||||
}, {
|
||||
'url': 'http://sp.nicovideo.jp/watch/sm28964488?ss_pos=1&cp_in=wt_tg',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
_VALID_URL = r'https?://(?:www\.|secure\.)?nicovideo\.jp/watch/(?P<id>(?:[a-z]{2})?[0-9]+)'
|
||||
_VALID_URL = r'https?://(?:www\.|secure\.|sp\.)?nicovideo\.jp/watch/(?P<id>(?:[a-z]{2})?[0-9]+)'
|
||||
_NETRC_MACHINE = 'niconico'
|
||||
|
||||
def _real_initialize(self):
|
||||
@ -98,19 +150,102 @@ class NiconicoIE(InfoExtractor):
|
||||
return True
|
||||
|
||||
# Log in
|
||||
login_ok = True
|
||||
login_form_strs = {
|
||||
'mail': username,
|
||||
'mail_tel': username,
|
||||
'password': password,
|
||||
}
|
||||
login_data = urlencode_postdata(login_form_strs)
|
||||
request = sanitized_Request(
|
||||
'https://secure.nicovideo.jp/secure/login', login_data)
|
||||
login_results = self._download_webpage(
|
||||
request, None, note='Logging in', errnote='Unable to log in')
|
||||
if re.search(r'(?i)<h1 class="mb8p4">Log in error</h1>', login_results) is not None:
|
||||
urlh = self._request_webpage(
|
||||
'https://account.nicovideo.jp/api/v1/login', None,
|
||||
note='Logging in', errnote='Unable to log in',
|
||||
data=urlencode_postdata(login_form_strs))
|
||||
if urlh is False:
|
||||
login_ok = False
|
||||
else:
|
||||
parts = compat_urlparse.urlparse(urlh.geturl())
|
||||
if compat_parse_qs(parts.query).get('message', [None])[0] == 'cant_login':
|
||||
login_ok = False
|
||||
if not login_ok:
|
||||
self._downloader.report_warning('unable to log in: bad username or password')
|
||||
return False
|
||||
return True
|
||||
return login_ok
|
||||
|
||||
def _extract_format_for_quality(self, api_data, video_id, audio_quality, video_quality):
|
||||
def yesno(boolean):
|
||||
return 'yes' if boolean else 'no'
|
||||
|
||||
session_api_data = api_data['video']['dmcInfo']['session_api']
|
||||
session_api_endpoint = session_api_data['urls'][0]
|
||||
|
||||
format_id = '-'.join(map(lambda s: remove_start(s['id'], 'archive_'), [video_quality, audio_quality]))
|
||||
|
||||
session_response = self._download_json(
|
||||
session_api_endpoint['url'], video_id,
|
||||
query={'_format': 'json'},
|
||||
headers={'Content-Type': 'application/json'},
|
||||
note='Downloading JSON metadata for %s' % format_id,
|
||||
data=json.dumps({
|
||||
'session': {
|
||||
'client_info': {
|
||||
'player_id': session_api_data['player_id'],
|
||||
},
|
||||
'content_auth': {
|
||||
'auth_type': session_api_data['auth_types'][session_api_data['protocols'][0]],
|
||||
'content_key_timeout': session_api_data['content_key_timeout'],
|
||||
'service_id': 'nicovideo',
|
||||
'service_user_id': session_api_data['service_user_id']
|
||||
},
|
||||
'content_id': session_api_data['content_id'],
|
||||
'content_src_id_sets': [{
|
||||
'content_src_ids': [{
|
||||
'src_id_to_mux': {
|
||||
'audio_src_ids': [audio_quality['id']],
|
||||
'video_src_ids': [video_quality['id']],
|
||||
}
|
||||
}]
|
||||
}],
|
||||
'content_type': 'movie',
|
||||
'content_uri': '',
|
||||
'keep_method': {
|
||||
'heartbeat': {
|
||||
'lifetime': session_api_data['heartbeat_lifetime']
|
||||
}
|
||||
},
|
||||
'priority': session_api_data['priority'],
|
||||
'protocol': {
|
||||
'name': 'http',
|
||||
'parameters': {
|
||||
'http_parameters': {
|
||||
'parameters': {
|
||||
'http_output_download_parameters': {
|
||||
'use_ssl': yesno(session_api_endpoint['is_ssl']),
|
||||
'use_well_known_port': yesno(session_api_endpoint['is_well_known_port']),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
'recipe_id': session_api_data['recipe_id'],
|
||||
'session_operation_auth': {
|
||||
'session_operation_auth_by_signature': {
|
||||
'signature': session_api_data['signature'],
|
||||
'token': session_api_data['token'],
|
||||
}
|
||||
},
|
||||
'timing_constraint': 'unlimited'
|
||||
}
|
||||
}))
|
||||
|
||||
resolution = video_quality.get('resolution', {})
|
||||
|
||||
return {
|
||||
'url': session_response['data']['session']['content_uri'],
|
||||
'format_id': format_id,
|
||||
'ext': 'mp4', # Session API are used in HTML5, which always serves mp4
|
||||
'abr': float_or_none(audio_quality.get('bitrate'), 1000),
|
||||
'vbr': float_or_none(video_quality.get('bitrate'), 1000),
|
||||
'height': resolution.get('height'),
|
||||
'width': resolution.get('width'),
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
@ -123,30 +258,84 @@ class NiconicoIE(InfoExtractor):
|
||||
if video_id.startswith('so'):
|
||||
video_id = self._match_id(handle.geturl())
|
||||
|
||||
video_info = self._download_xml(
|
||||
'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id, video_id,
|
||||
note='Downloading video info page')
|
||||
api_data = self._parse_json(self._html_search_regex(
|
||||
'data-api-data="([^"]+)"', webpage,
|
||||
'API data', default='{}'), video_id)
|
||||
|
||||
# Get flv info
|
||||
flv_info_webpage = self._download_webpage(
|
||||
'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1',
|
||||
video_id, 'Downloading flv info')
|
||||
def _format_id_from_url(video_url):
|
||||
return 'economy' if video_real_url.endswith('low') else 'normal'
|
||||
|
||||
flv_info = compat_urlparse.parse_qs(flv_info_webpage)
|
||||
if 'url' not in flv_info:
|
||||
if 'deleted' in flv_info:
|
||||
raise ExtractorError('The video has been deleted.',
|
||||
expected=True)
|
||||
elif 'closed' in flv_info:
|
||||
raise ExtractorError('Niconico videos now require logging in',
|
||||
expected=True)
|
||||
else:
|
||||
raise ExtractorError('Unable to find video URL')
|
||||
try:
|
||||
video_real_url = api_data['video']['smileInfo']['url']
|
||||
except KeyError: # Flash videos
|
||||
# Get flv info
|
||||
flv_info_webpage = self._download_webpage(
|
||||
'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1',
|
||||
video_id, 'Downloading flv info')
|
||||
|
||||
video_real_url = flv_info['url'][0]
|
||||
flv_info = compat_urlparse.parse_qs(flv_info_webpage)
|
||||
if 'url' not in flv_info:
|
||||
if 'deleted' in flv_info:
|
||||
raise ExtractorError('The video has been deleted.',
|
||||
expected=True)
|
||||
elif 'closed' in flv_info:
|
||||
raise ExtractorError('Niconico videos now require logging in',
|
||||
expected=True)
|
||||
elif 'error' in flv_info:
|
||||
raise ExtractorError('%s reports error: %s' % (
|
||||
self.IE_NAME, flv_info['error'][0]), expected=True)
|
||||
else:
|
||||
raise ExtractorError('Unable to find video URL')
|
||||
|
||||
video_info_xml = self._download_xml(
|
||||
'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id,
|
||||
video_id, note='Downloading video info page')
|
||||
|
||||
def get_video_info(items):
|
||||
if not isinstance(items, list):
|
||||
items = [items]
|
||||
for item in items:
|
||||
ret = xpath_text(video_info_xml, './/' + item)
|
||||
if ret:
|
||||
return ret
|
||||
|
||||
video_real_url = flv_info['url'][0]
|
||||
|
||||
extension = get_video_info('movie_type')
|
||||
if not extension:
|
||||
extension = determine_ext(video_real_url)
|
||||
|
||||
formats = [{
|
||||
'url': video_real_url,
|
||||
'ext': extension,
|
||||
'format_id': _format_id_from_url(video_real_url),
|
||||
}]
|
||||
else:
|
||||
formats = []
|
||||
|
||||
dmc_info = api_data['video'].get('dmcInfo')
|
||||
if dmc_info: # "New" HTML5 videos
|
||||
quality_info = dmc_info['quality']
|
||||
for audio_quality in quality_info['audios']:
|
||||
for video_quality in quality_info['videos']:
|
||||
if not audio_quality['available'] or not video_quality['available']:
|
||||
continue
|
||||
formats.append(self._extract_format_for_quality(
|
||||
api_data, video_id, audio_quality, video_quality))
|
||||
|
||||
self._sort_formats(formats)
|
||||
else: # "Old" HTML5 videos
|
||||
formats = [{
|
||||
'url': video_real_url,
|
||||
'ext': 'mp4',
|
||||
'format_id': _format_id_from_url(video_real_url),
|
||||
}]
|
||||
|
||||
def get_video_info(items):
|
||||
return dict_get(api_data['video'], items)
|
||||
|
||||
# Start extracting information
|
||||
title = xpath_text(video_info, './/title')
|
||||
title = get_video_info('title')
|
||||
if not title:
|
||||
title = self._og_search_title(webpage, default=None)
|
||||
if not title:
|
||||
@ -160,18 +349,15 @@ class NiconicoIE(InfoExtractor):
|
||||
watch_api_data = self._parse_json(watch_api_data_string, video_id) if watch_api_data_string else {}
|
||||
video_detail = watch_api_data.get('videoDetail', {})
|
||||
|
||||
extension = xpath_text(video_info, './/movie_type')
|
||||
if not extension:
|
||||
extension = determine_ext(video_real_url)
|
||||
|
||||
thumbnail = (
|
||||
xpath_text(video_info, './/thumbnail_url') or
|
||||
get_video_info(['thumbnail_url', 'thumbnailURL']) or
|
||||
self._html_search_meta('image', webpage, 'thumbnail', default=None) or
|
||||
video_detail.get('thumbnail'))
|
||||
|
||||
description = xpath_text(video_info, './/description')
|
||||
description = get_video_info('description')
|
||||
|
||||
timestamp = parse_iso8601(xpath_text(video_info, './/first_retrieve'))
|
||||
timestamp = (parse_iso8601(get_video_info('first_retrieve')) or
|
||||
unified_timestamp(get_video_info('postedDateTime')))
|
||||
if not timestamp:
|
||||
match = self._html_search_meta('datePublished', webpage, 'date published', default=None)
|
||||
if match:
|
||||
@ -181,7 +367,7 @@ class NiconicoIE(InfoExtractor):
|
||||
video_detail['postedAt'].replace('/', '-'),
|
||||
delimiter=' ', timezone=datetime.timedelta(hours=9))
|
||||
|
||||
view_count = int_or_none(xpath_text(video_info, './/view_counter'))
|
||||
view_count = int_or_none(get_video_info(['view_counter', 'viewCount']))
|
||||
if not view_count:
|
||||
match = self._html_search_regex(
|
||||
r'>Views: <strong[^>]*>([^<]+)</strong>',
|
||||
@ -190,38 +376,33 @@ class NiconicoIE(InfoExtractor):
|
||||
view_count = int_or_none(match.replace(',', ''))
|
||||
view_count = view_count or video_detail.get('viewCount')
|
||||
|
||||
comment_count = int_or_none(xpath_text(video_info, './/comment_num'))
|
||||
comment_count = (int_or_none(get_video_info('comment_num')) or
|
||||
video_detail.get('commentCount') or
|
||||
try_get(api_data, lambda x: x['thread']['commentCount']))
|
||||
if not comment_count:
|
||||
match = self._html_search_regex(
|
||||
r'>Comments: <strong[^>]*>([^<]+)</strong>',
|
||||
webpage, 'comment count', default=None)
|
||||
if match:
|
||||
comment_count = int_or_none(match.replace(',', ''))
|
||||
comment_count = comment_count or video_detail.get('commentCount')
|
||||
|
||||
duration = (parse_duration(
|
||||
xpath_text(video_info, './/length') or
|
||||
get_video_info('length') or
|
||||
self._html_search_meta(
|
||||
'video:duration', webpage, 'video duration', default=None)) or
|
||||
video_detail.get('length'))
|
||||
video_detail.get('length') or
|
||||
get_video_info('duration'))
|
||||
|
||||
webpage_url = xpath_text(video_info, './/watch_url') or url
|
||||
webpage_url = get_video_info('watch_url') or url
|
||||
|
||||
if video_info.find('.//ch_id') is not None:
|
||||
uploader_id = video_info.find('.//ch_id').text
|
||||
uploader = video_info.find('.//ch_name').text
|
||||
elif video_info.find('.//user_id') is not None:
|
||||
uploader_id = video_info.find('.//user_id').text
|
||||
uploader = video_info.find('.//user_nickname').text
|
||||
else:
|
||||
uploader_id = uploader = None
|
||||
owner = api_data.get('owner', {})
|
||||
uploader_id = get_video_info(['ch_id', 'user_id']) or owner.get('id')
|
||||
uploader = get_video_info(['ch_name', 'user_nickname']) or owner.get('nickname')
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'url': video_real_url,
|
||||
'title': title,
|
||||
'ext': extension,
|
||||
'format_id': 'economy' if video_real_url.endswith('low') else 'normal',
|
||||
'formats': formats,
|
||||
'thumbnail': thumbnail,
|
||||
'description': description,
|
||||
'uploader': uploader,
|
||||
|
@ -28,7 +28,7 @@ class NPOBaseIE(InfoExtractor):
|
||||
|
||||
class NPOIE(NPOBaseIE):
|
||||
IE_NAME = 'npo'
|
||||
IE_DESC = 'npo.nl and ntr.nl'
|
||||
IE_DESC = 'npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl'
|
||||
_VALID_URL = r'''(?x)
|
||||
(?:
|
||||
npo:|
|
||||
@ -38,7 +38,7 @@ class NPOIE(NPOBaseIE):
|
||||
npo\.nl/(?!(?:live|radio)/)(?:[^/]+/){2}|
|
||||
ntr\.nl/(?:[^/]+/){2,}|
|
||||
omroepwnl\.nl/video/fragment/[^/]+__|
|
||||
zapp\.nl/[^/]+/[^/]+/
|
||||
(?:zapp|npo3)\.nl/(?:[^/]+/){2}
|
||||
)
|
||||
)
|
||||
(?P<id>[^/?#]+)
|
||||
@ -146,6 +146,9 @@ class NPOIE(NPOBaseIE):
|
||||
}, {
|
||||
'url': 'http://www.zapp.nl/beste-vrienden-quiz/extra-video-s/WO_NTR_1067990',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.npo3.nl/3onderzoekt/16-09-2015/VPWON_1239870',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# live stream
|
||||
'url': 'npo:LI_NL1_4188102',
|
||||
@ -341,7 +344,7 @@ class NPOLiveIE(NPOBaseIE):
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
live_id = self._search_regex(
|
||||
r'data-prid="([^"]+)"', webpage, 'live id')
|
||||
[r'media-id="([^"]+)"', r'data-prid="([^"]+)"'], webpage, 'live id')
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
|
@ -237,7 +237,7 @@ class NRKTVIE(NRKBaseIE):
|
||||
(?:/\d{2}-\d{2}-\d{4})?
|
||||
(?:\#del=(?P<part_id>\d+))?
|
||||
''' % _EPISODE_RE
|
||||
_API_HOST = 'psapi-we.nrk.no'
|
||||
_API_HOST = 'psapi-ne.nrk.no'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
|
||||
|
@ -11,6 +11,7 @@ from ..utils import (
|
||||
get_element_by_class,
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
NO_DEFAULT,
|
||||
parse_iso8601,
|
||||
remove_start,
|
||||
strip_or_none,
|
||||
@ -198,6 +199,19 @@ class OnetPlIE(InfoExtractor):
|
||||
'upload_date': '20170214',
|
||||
'timestamp': 1487078046,
|
||||
},
|
||||
}, {
|
||||
# embedded via pulsembed
|
||||
'url': 'http://film.onet.pl/pensjonat-nad-rozlewiskiem-relacja-z-planu-serialu/y428n0',
|
||||
'info_dict': {
|
||||
'id': '501235.965429946',
|
||||
'ext': 'mp4',
|
||||
'title': '"Pensjonat nad rozlewiskiem": relacja z planu serialu',
|
||||
'upload_date': '20170622',
|
||||
'timestamp': 1498159955,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'http://film.onet.pl/zwiastuny/ghost-in-the-shell-drugi-zwiastun-pl/5q6yl3',
|
||||
'only_matching': True,
|
||||
@ -212,13 +226,25 @@ class OnetPlIE(InfoExtractor):
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _search_mvp_id(self, webpage, default=NO_DEFAULT):
|
||||
return self._search_regex(
|
||||
r'data-(?:params-)?mvp=["\'](\d+\.\d+)', webpage, 'mvp id',
|
||||
default=default)
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
mvp_id = self._search_regex(
|
||||
r'data-params-mvp=["\'](\d+\.\d+)', webpage, 'mvp id')
|
||||
mvp_id = self._search_mvp_id(webpage, default=None)
|
||||
|
||||
if not mvp_id:
|
||||
pulsembed_url = self._search_regex(
|
||||
r'data-src=(["\'])(?P<url>(?:https?:)?//pulsembed\.eu/.+?)\1',
|
||||
webpage, 'pulsembed url', group='url')
|
||||
webpage = self._download_webpage(
|
||||
pulsembed_url, video_id, 'Downloading pulsembed webpage')
|
||||
mvp_id = self._search_mvp_id(webpage)
|
||||
|
||||
return self.url_result(
|
||||
'onetmvp:%s' % mvp_id, OnetMVPIE.ie_key(), video_id=mvp_id)
|
||||
|
@ -3,12 +3,14 @@ import re
|
||||
import base64
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
float_or_none,
|
||||
ExtractorError,
|
||||
unsmuggle_url,
|
||||
determine_ext,
|
||||
ExtractorError,
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
try_get,
|
||||
unsmuggle_url,
|
||||
)
|
||||
from ..compat import compat_urllib_parse_urlencode
|
||||
|
||||
@ -39,13 +41,15 @@ class OoyalaBaseIE(InfoExtractor):
|
||||
formats = []
|
||||
if cur_auth_data['authorized']:
|
||||
for stream in cur_auth_data['streams']:
|
||||
s_url = base64.b64decode(
|
||||
stream['url']['data'].encode('ascii')).decode('utf-8')
|
||||
if s_url in urls:
|
||||
url_data = try_get(stream, lambda x: x['url']['data'], compat_str)
|
||||
if not url_data:
|
||||
continue
|
||||
s_url = base64.b64decode(url_data.encode('ascii')).decode('utf-8')
|
||||
if not s_url or s_url in urls:
|
||||
continue
|
||||
urls.append(s_url)
|
||||
ext = determine_ext(s_url, None)
|
||||
delivery_type = stream['delivery_type']
|
||||
delivery_type = stream.get('delivery_type')
|
||||
if delivery_type == 'hls' or ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
re.sub(r'/ip(?:ad|hone)/', '/all/', s_url), embed_code, 'mp4', 'm3u8_native',
|
||||
@ -65,7 +69,7 @@ class OoyalaBaseIE(InfoExtractor):
|
||||
else:
|
||||
formats.append({
|
||||
'url': s_url,
|
||||
'ext': ext or stream.get('delivery_type'),
|
||||
'ext': ext or delivery_type,
|
||||
'vcodec': stream.get('video_codec'),
|
||||
'format_id': delivery_type,
|
||||
'width': int_or_none(stream.get('width')),
|
||||
@ -136,6 +140,11 @@ class OoyalaIE(OoyalaBaseIE):
|
||||
'title': 'Divide Tool Path.mp4',
|
||||
'duration': 204.405,
|
||||
}
|
||||
},
|
||||
{
|
||||
# empty stream['url']['data']
|
||||
'url': 'http://player.ooyala.com/player.js?embedCode=w2bnZtYjE6axZ_dw1Cd0hQtXd_ige2Is',
|
||||
'only_matching': True,
|
||||
}
|
||||
]
|
||||
|
||||
|
@ -10,13 +10,13 @@ from ..utils import (
|
||||
|
||||
class PandaTVIE(InfoExtractor):
|
||||
IE_DESC = '熊猫TV'
|
||||
_VALID_URL = r'http://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
|
||||
_TEST = {
|
||||
'url': 'http://www.panda.tv/10091',
|
||||
_VALID_URL = r'https?://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.panda.tv/66666',
|
||||
'info_dict': {
|
||||
'id': '10091',
|
||||
'id': '66666',
|
||||
'title': 're:.+',
|
||||
'uploader': '囚徒',
|
||||
'uploader': '刘杀鸡',
|
||||
'ext': 'flv',
|
||||
'is_live': True,
|
||||
},
|
||||
@ -24,13 +24,16 @@ class PandaTVIE(InfoExtractor):
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'Live stream is offline',
|
||||
}
|
||||
}, {
|
||||
'url': 'https://www.panda.tv/66666',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
config = self._download_json(
|
||||
'http://www.panda.tv/api_room?roomid=%s' % video_id, video_id)
|
||||
'https://www.panda.tv/api_room?roomid=%s' % video_id, video_id)
|
||||
|
||||
error_code = config.get('errno', 0)
|
||||
if error_code is not 0:
|
||||
@ -74,7 +77,7 @@ class PandaTVIE(InfoExtractor):
|
||||
continue
|
||||
for pref, (ext, pl) in enumerate((('m3u8', '-hls'), ('flv', ''))):
|
||||
formats.append({
|
||||
'url': 'http://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s'
|
||||
'url': 'https://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s'
|
||||
% (pl, plflag1, room_key, live_panda, suffix[quality], ext),
|
||||
'format_id': '%s-%s' % (k, ext),
|
||||
'quality': quality,
|
||||
|
@ -189,7 +189,7 @@ class PBSIE(InfoExtractor):
|
||||
# Direct video URL
|
||||
(?:%s)/(?:viralplayer|video)/(?P<id>[0-9]+)/? |
|
||||
# Article with embedded player (or direct video)
|
||||
(?:www\.)?pbs\.org/(?:[^/]+/){2,5}(?P<presumptive_id>[^/]+?)(?:\.html)?/?(?:$|[?\#]) |
|
||||
(?:www\.)?pbs\.org/(?:[^/]+/){1,5}(?P<presumptive_id>[^/]+?)(?:\.html)?/?(?:$|[?\#]) |
|
||||
# Player
|
||||
(?:video|player)\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/]+)/
|
||||
)
|
||||
@ -345,6 +345,21 @@ class PBSIE(InfoExtractor):
|
||||
'formats': 'mincount:8',
|
||||
},
|
||||
},
|
||||
{
|
||||
# https://github.com/rg3/youtube-dl/issues/13801
|
||||
'url': 'https://www.pbs.org/video/pbs-newshour-full-episode-july-31-2017-1501539057/',
|
||||
'info_dict': {
|
||||
'id': '3003333873',
|
||||
'ext': 'mp4',
|
||||
'title': 'PBS NewsHour - full episode July 31, 2017',
|
||||
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
|
||||
'duration': 3265,
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
},
|
||||
{
|
||||
'url': 'http://player.pbs.org/widget/partnerplayer/2365297708/?start=0&end=0&chapterbar=false&endscreen=false&topbar=true',
|
||||
'only_matching': True,
|
||||
@ -433,6 +448,9 @@ class PBSIE(InfoExtractor):
|
||||
if url:
|
||||
break
|
||||
|
||||
if not url:
|
||||
url = self._og_search_url(webpage)
|
||||
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
|
||||
player_id = mobj.group('player_id')
|
||||
|
63
youtube_dl/extractor/pearvideo.py
Normal file
63
youtube_dl/extractor/pearvideo.py
Normal file
@ -0,0 +1,63 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
qualities,
|
||||
unified_timestamp,
|
||||
)
|
||||
|
||||
|
||||
class PearVideoIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?pearvideo\.com/video_(?P<id>\d+)'
|
||||
_TEST = {
|
||||
'url': 'http://www.pearvideo.com/video_1076290',
|
||||
'info_dict': {
|
||||
'id': '1076290',
|
||||
'ext': 'mp4',
|
||||
'title': '小浣熊在主人家玻璃上滚石头:没砸',
|
||||
'description': 'md5:01d576b747de71be0ee85eb7cac25f9d',
|
||||
'timestamp': 1494275280,
|
||||
'upload_date': '20170508',
|
||||
}
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
quality = qualities(
|
||||
('ldflv', 'ld', 'sdflv', 'sd', 'hdflv', 'hd', 'src'))
|
||||
|
||||
formats = [{
|
||||
'url': mobj.group('url'),
|
||||
'format_id': mobj.group('id'),
|
||||
'quality': quality(mobj.group('id')),
|
||||
} for mobj in re.finditer(
|
||||
r'(?P<id>[a-zA-Z]+)Url\s*=\s*(["\'])(?P<url>(?:https?:)?//.+?)\2',
|
||||
webpage)]
|
||||
self._sort_formats(formats)
|
||||
|
||||
title = self._search_regex(
|
||||
(r'<h1[^>]+\bclass=(["\'])video-tt\1[^>]*>(?P<value>[^<]+)',
|
||||
r'<[^>]+\bdata-title=(["\'])(?P<value>(?:(?!\1).)+)\1'),
|
||||
webpage, 'title', group='value')
|
||||
description = self._search_regex(
|
||||
(r'<div[^>]+\bclass=(["\'])summary\1[^>]*>(?P<value>[^<]+)',
|
||||
r'<[^>]+\bdata-summary=(["\'])(?P<value>(?:(?!\1).)+)\1'),
|
||||
webpage, 'description', default=None,
|
||||
group='value') or self._html_search_meta('Description', webpage)
|
||||
timestamp = unified_timestamp(self._search_regex(
|
||||
r'<div[^>]+\bclass=["\']date["\'][^>]*>([^<]+)',
|
||||
webpage, 'timestamp', fatal=False))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'timestamp': timestamp,
|
||||
'formats': formats,
|
||||
}
|
@ -49,7 +49,7 @@ class PeriscopeIE(PeriscopeBaseIE):
|
||||
@staticmethod
|
||||
def _extract_url(webpage):
|
||||
mobj = re.search(
|
||||
r'<iframe[^>]+src=([\'"])(?P<url>(?:https?:)?//(?:www\.)?periscope\.tv/(?:(?!\1).)+)\1', webpage)
|
||||
r'<iframe[^>]+src=([\'"])(?P<url>(?:https?:)?//(?:www\.)?(?:periscope|pscp)\.tv/(?:(?!\1).)+)\1', webpage)
|
||||
if mobj:
|
||||
return mobj.group('url')
|
||||
|
||||
|
@ -224,6 +224,7 @@ class PluralsightIE(PluralsightBaseIE):
|
||||
req_format_split = req_format.split('-', 1)
|
||||
if len(req_format_split) > 1:
|
||||
req_ext, req_quality = req_format_split
|
||||
req_quality = '-'.join(req_quality.split('-')[:2])
|
||||
for allowed_quality in ALLOWED_QUALITIES:
|
||||
if req_ext == allowed_quality.ext and req_quality in allowed_quality.qualities:
|
||||
return (AllowedQuality(req_ext, (req_quality, )), )
|
||||
|
@ -9,39 +9,46 @@ from ..utils import int_or_none
|
||||
|
||||
class PodomaticIE(InfoExtractor):
|
||||
IE_NAME = 'podomatic'
|
||||
_VALID_URL = r'^(?P<proto>https?)://(?P<channel>[^.]+)\.podomatic\.com/entry/(?P<id>[^?]+)'
|
||||
_VALID_URL = r'''(?x)
|
||||
(?P<proto>https?)://
|
||||
(?:
|
||||
(?P<channel>[^.]+)\.podomatic\.com/entry|
|
||||
(?:www\.)?podomatic\.com/podcasts/(?P<channel_2>[^/]+)/episodes
|
||||
)/
|
||||
(?P<id>[^/?#&]+)
|
||||
'''
|
||||
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'http://scienceteachingtips.podomatic.com/entry/2009-01-02T16_03_35-08_00',
|
||||
'md5': '84bb855fcf3429e6bf72460e1eed782d',
|
||||
'info_dict': {
|
||||
'id': '2009-01-02T16_03_35-08_00',
|
||||
'ext': 'mp3',
|
||||
'uploader': 'Science Teaching Tips',
|
||||
'uploader_id': 'scienceteachingtips',
|
||||
'title': '64. When the Moon Hits Your Eye',
|
||||
'duration': 446,
|
||||
}
|
||||
},
|
||||
{
|
||||
'url': 'http://ostbahnhof.podomatic.com/entry/2013-11-15T16_31_21-08_00',
|
||||
'md5': 'd2cf443931b6148e27638650e2638297',
|
||||
'info_dict': {
|
||||
'id': '2013-11-15T16_31_21-08_00',
|
||||
'ext': 'mp3',
|
||||
'uploader': 'Ostbahnhof / Techno Mix',
|
||||
'uploader_id': 'ostbahnhof',
|
||||
'title': 'Einunddreizig',
|
||||
'duration': 3799,
|
||||
}
|
||||
},
|
||||
]
|
||||
_TESTS = [{
|
||||
'url': 'http://scienceteachingtips.podomatic.com/entry/2009-01-02T16_03_35-08_00',
|
||||
'md5': '84bb855fcf3429e6bf72460e1eed782d',
|
||||
'info_dict': {
|
||||
'id': '2009-01-02T16_03_35-08_00',
|
||||
'ext': 'mp3',
|
||||
'uploader': 'Science Teaching Tips',
|
||||
'uploader_id': 'scienceteachingtips',
|
||||
'title': '64. When the Moon Hits Your Eye',
|
||||
'duration': 446,
|
||||
}
|
||||
}, {
|
||||
'url': 'http://ostbahnhof.podomatic.com/entry/2013-11-15T16_31_21-08_00',
|
||||
'md5': 'd2cf443931b6148e27638650e2638297',
|
||||
'info_dict': {
|
||||
'id': '2013-11-15T16_31_21-08_00',
|
||||
'ext': 'mp3',
|
||||
'uploader': 'Ostbahnhof / Techno Mix',
|
||||
'uploader_id': 'ostbahnhof',
|
||||
'title': 'Einunddreizig',
|
||||
'duration': 3799,
|
||||
}
|
||||
}, {
|
||||
'url': 'https://www.podomatic.com/podcasts/scienceteachingtips/episodes/2009-01-02T16_03_35-08_00',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
channel = mobj.group('channel')
|
||||
channel = mobj.group('channel') or mobj.group('channel_2')
|
||||
|
||||
json_url = (('%s://%s.podomatic.com/entry/embed_params/%s' +
|
||||
'?permalink=true&rtmp=0') %
|
||||
|
@ -54,7 +54,7 @@ class PornHdIE(InfoExtractor):
|
||||
r'<title>(.+?) - .*?[Pp]ornHD.*?</title>'], webpage, 'title')
|
||||
|
||||
sources = self._parse_json(js_to_json(self._search_regex(
|
||||
r"(?s)'sources'\s*:\s*(\{.+?\})\s*\}[;,)]",
|
||||
r"(?s)sources'?\s*:\s*(\{.+?\})\s*\}[;,)]",
|
||||
webpage, 'sources', default='{}')), video_id)
|
||||
|
||||
if not sources:
|
||||
|
@ -227,20 +227,6 @@ class PornHubIE(InfoExtractor):
|
||||
|
||||
class PornHubPlaylistBaseIE(InfoExtractor):
|
||||
def _extract_entries(self, webpage):
|
||||
return [
|
||||
self.url_result(
|
||||
'http://www.pornhub.com/%s' % video_url,
|
||||
PornHubIE.ie_key(), video_title=title)
|
||||
for video_url, title in orderedSet(re.findall(
|
||||
r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"[^>]*\s+title="([^"]+)"',
|
||||
webpage))
|
||||
]
|
||||
|
||||
def _real_extract(self, url):
|
||||
playlist_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, playlist_id)
|
||||
|
||||
# Only process container div with main playlist content skipping
|
||||
# drop-down menu that uses similar pattern for videos (see
|
||||
# https://github.com/rg3/youtube-dl/issues/11594).
|
||||
@ -248,7 +234,21 @@ class PornHubPlaylistBaseIE(InfoExtractor):
|
||||
r'(?s)(<div[^>]+class=["\']container.+)', webpage,
|
||||
'container', default=webpage)
|
||||
|
||||
entries = self._extract_entries(container)
|
||||
return [
|
||||
self.url_result(
|
||||
'http://www.pornhub.com/%s' % video_url,
|
||||
PornHubIE.ie_key(), video_title=title)
|
||||
for video_url, title in orderedSet(re.findall(
|
||||
r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"[^>]*\s+title="([^"]+)"',
|
||||
container))
|
||||
]
|
||||
|
||||
def _real_extract(self, url):
|
||||
playlist_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, playlist_id)
|
||||
|
||||
entries = self._extract_entries(webpage)
|
||||
|
||||
playlist = self._parse_json(
|
||||
self._search_regex(
|
||||
|
@ -191,11 +191,12 @@ class RaiPlayIE(RaiBaseIE):
|
||||
|
||||
info = {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'title': self._live_title(title) if relinker_info.get(
|
||||
'is_live') else title,
|
||||
'alt_title': media.get('subtitle'),
|
||||
'description': media.get('description'),
|
||||
'uploader': media.get('channel'),
|
||||
'creator': media.get('editor'),
|
||||
'uploader': strip_or_none(media.get('channel')),
|
||||
'creator': strip_or_none(media.get('editor')),
|
||||
'duration': parse_duration(video.get('duration')),
|
||||
'timestamp': timestamp,
|
||||
'thumbnails': thumbnails,
|
||||
@ -208,10 +209,46 @@ class RaiPlayIE(RaiBaseIE):
|
||||
}
|
||||
|
||||
info.update(relinker_info)
|
||||
|
||||
return info
|
||||
|
||||
|
||||
class RaiPlayLiveIE(RaiBaseIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?raiplay\.it/dirette/(?P<id>[^/?#&]+)'
|
||||
_TEST = {
|
||||
'url': 'http://www.raiplay.it/dirette/rainews24',
|
||||
'info_dict': {
|
||||
'id': 'd784ad40-e0ae-4a69-aa76-37519d238a9c',
|
||||
'display_id': 'rainews24',
|
||||
'ext': 'mp4',
|
||||
'title': 're:^Diretta di Rai News 24 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
|
||||
'description': 'md5:6eca31500550f9376819f174e5644754',
|
||||
'uploader': 'Rai News 24',
|
||||
'creator': 'Rai News 24',
|
||||
'is_live': True,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
video_id = self._search_regex(
|
||||
r'data-uniquename=["\']ContentItem-(%s)' % RaiBaseIE._UUID_RE,
|
||||
webpage, 'content id')
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'ie_key': RaiPlayIE.ie_key(),
|
||||
'url': 'http://www.raiplay.it/dirette/ContentItem-%s.html' % video_id,
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
}
|
||||
|
||||
|
||||
class RaiIE(RaiBaseIE):
|
||||
_VALID_URL = r'https?://[^/]+\.(?:rai\.(?:it|tv)|rainews\.it)/dl/.+?-(?P<id>%s)(?:-.+?)?\.html' % RaiBaseIE._UUID_RE
|
||||
_TESTS = [{
|
||||
|
@ -13,7 +13,7 @@ from ..utils import (
|
||||
|
||||
|
||||
class RedBullTVIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?redbull\.tv/(?:video|film)/(?P<id>AP-\w+)'
|
||||
_VALID_URL = r'https?://(?:www\.)?redbull\.tv/(?:video|film|live)/(?:AP-\w+/segment/)?(?P<id>AP-\w+)'
|
||||
_TESTS = [{
|
||||
# film
|
||||
'url': 'https://www.redbull.tv/video/AP-1Q756YYX51W11/abc-of-wrc',
|
||||
@ -42,6 +42,22 @@ class RedBullTVIE(InfoExtractor):
|
||||
'season_number': 2,
|
||||
'episode_number': 4,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
# segment
|
||||
'url': 'https://www.redbull.tv/live/AP-1R5DX49XS1W11/segment/AP-1QSAQJ6V52111/semi-finals',
|
||||
'info_dict': {
|
||||
'id': 'AP-1QSAQJ6V52111',
|
||||
'ext': 'mp4',
|
||||
'title': 'Semi Finals - Vans Park Series Pro Tour',
|
||||
'description': 'md5:306a2783cdafa9e65e39aa62f514fd97',
|
||||
'duration': 11791.991,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.redbull.tv/film/AP-1MSKKF5T92111/in-motion',
|
||||
'only_matching': True,
|
||||
@ -82,7 +98,8 @@ class RedBullTVIE(InfoExtractor):
|
||||
title = info['title'].strip()
|
||||
|
||||
formats = self._extract_m3u8_formats(
|
||||
video['url'], video_id, 'mp4', 'm3u8_native')
|
||||
video['url'], video_id, 'mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls')
|
||||
self._sort_formats(formats)
|
||||
|
||||
subtitles = {}
|
||||
|
114
youtube_dl/extractor/reddit.py
Normal file
114
youtube_dl/extractor/reddit.py
Normal file
@ -0,0 +1,114 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
float_or_none,
|
||||
)
|
||||
|
||||
|
||||
class RedditIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://v\.redd\.it/(?P<id>[^/?#&]+)'
|
||||
_TEST = {
|
||||
# from https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/
|
||||
'url': 'https://v.redd.it/zv89llsvexdz',
|
||||
'md5': '655d06ace653ea3b87bccfb1b27ec99d',
|
||||
'info_dict': {
|
||||
'id': 'zv89llsvexdz',
|
||||
'ext': 'mp4',
|
||||
'title': 'zv89llsvexdz',
|
||||
},
|
||||
'params': {
|
||||
'format': 'bestvideo',
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
formats = self._extract_m3u8_formats(
|
||||
'https://v.redd.it/%s/HLSPlaylist.m3u8' % video_id, video_id,
|
||||
'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
|
||||
|
||||
formats.extend(self._extract_mpd_formats(
|
||||
'https://v.redd.it/%s/DASHPlaylist.mpd' % video_id, video_id,
|
||||
mpd_id='dash', fatal=False))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': video_id,
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
|
||||
class RedditRIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?reddit\.com/r/[^/]+/comments/(?P<id>[^/]+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/',
|
||||
'info_dict': {
|
||||
'id': 'zv89llsvexdz',
|
||||
'ext': 'mp4',
|
||||
'title': 'That small heart attack.',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'timestamp': 1501941939,
|
||||
'upload_date': '20170805',
|
||||
'uploader': 'Antw87',
|
||||
'like_count': int,
|
||||
'dislike_count': int,
|
||||
'comment_count': int,
|
||||
'age_limit': 0,
|
||||
},
|
||||
'params': {
|
||||
'format': 'bestvideo',
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
'url': 'https://www.reddit.com/r/videos/comments/6rrwyj',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# imgur
|
||||
'url': 'https://www.reddit.com/r/MadeMeSmile/comments/6t7wi5/wait_for_it/',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# streamable
|
||||
'url': 'https://www.reddit.com/r/videos/comments/6t7sg9/comedians_hilarious_joke_about_the_guam_flag/',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# youtube
|
||||
'url': 'https://www.reddit.com/r/videos/comments/6t75wq/southern_man_tries_to_speak_without_an_accent/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
data = self._download_json(
|
||||
url + '.json', video_id)[0]['data']['children'][0]['data']
|
||||
|
||||
video_url = data['url']
|
||||
|
||||
# Avoid recursing into the same reddit URL
|
||||
if 'reddit.com/' in video_url and '/%s/' % video_id in video_url:
|
||||
raise ExtractorError('No media found', expected=True)
|
||||
|
||||
over_18 = data.get('over_18')
|
||||
if over_18 is True:
|
||||
age_limit = 18
|
||||
elif over_18 is False:
|
||||
age_limit = 0
|
||||
else:
|
||||
age_limit = None
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'url': video_url,
|
||||
'title': data.get('title'),
|
||||
'thumbnail': data.get('thumbnail'),
|
||||
'timestamp': float_or_none(data.get('created_utc')),
|
||||
'uploader': data.get('author'),
|
||||
'like_count': int_or_none(data.get('ups')),
|
||||
'dislike_count': int_or_none(data.get('downs')),
|
||||
'comment_count': int_or_none(data.get('num_comments')),
|
||||
'age_limit': age_limit,
|
||||
}
|
@ -31,7 +31,7 @@ class SlideshareIE(InfoExtractor):
|
||||
page_title = mobj.group('title')
|
||||
webpage = self._download_webpage(url, page_title)
|
||||
slideshare_obj = self._search_regex(
|
||||
r'\$\.extend\(slideshare_object,\s*(\{.*?\})\);',
|
||||
r'\$\.extend\(.*?slideshare_object,\s*(\{.*?\})\);',
|
||||
webpage, 'slideshare object')
|
||||
info = json.loads(slideshare_obj)
|
||||
if info['slideshow']['type'] != 'video':
|
||||
|
@ -31,6 +31,7 @@ class SoundcloudIE(InfoExtractor):
|
||||
|
||||
_VALID_URL = r'''(?x)^(?:https?://)?
|
||||
(?:(?:(?:www\.|m\.)?soundcloud\.com/
|
||||
(?!stations/track)
|
||||
(?P<uploader>[\w\d-]+)/
|
||||
(?!(?:tracks|sets(?:/.+?)?|reposts|likes|spotlight)/?(?:$|[?#]))
|
||||
(?P<title>[\w\d-]+)/?
|
||||
@ -121,7 +122,7 @@ class SoundcloudIE(InfoExtractor):
|
||||
},
|
||||
]
|
||||
|
||||
_CLIENT_ID = '2t9loNQH90kzJcsFCODdigxfp325aq4z'
|
||||
_CLIENT_ID = 'JlZIsxg2hY5WnBgtn3jfS0UYCl0K8DOg'
|
||||
_IPHONE_CLIENT_ID = '376f225bf427445fc4bfb6b99b72e0bf'
|
||||
|
||||
@staticmethod
|
||||
@ -136,7 +137,7 @@ class SoundcloudIE(InfoExtractor):
|
||||
|
||||
@classmethod
|
||||
def _resolv_url(cls, url):
|
||||
return 'http://api.soundcloud.com/resolve.json?url=' + url + '&client_id=' + cls._CLIENT_ID
|
||||
return 'https://api.soundcloud.com/resolve.json?url=' + url + '&client_id=' + cls._CLIENT_ID
|
||||
|
||||
def _extract_info_dict(self, info, full_title=None, quiet=False, secret_token=None):
|
||||
track_id = compat_str(info['id'])
|
||||
@ -174,7 +175,7 @@ class SoundcloudIE(InfoExtractor):
|
||||
|
||||
# We have to retrieve the url
|
||||
format_dict = self._download_json(
|
||||
'http://api.soundcloud.com/i1/tracks/%s/streams' % track_id,
|
||||
'https://api.soundcloud.com/i1/tracks/%s/streams' % track_id,
|
||||
track_id, 'Downloading track url', query={
|
||||
'client_id': self._CLIENT_ID,
|
||||
'secret_token': secret_token,
|
||||
@ -236,7 +237,7 @@ class SoundcloudIE(InfoExtractor):
|
||||
track_id = mobj.group('track_id')
|
||||
|
||||
if track_id is not None:
|
||||
info_json_url = 'http://api.soundcloud.com/tracks/' + track_id + '.json?client_id=' + self._CLIENT_ID
|
||||
info_json_url = 'https://api.soundcloud.com/tracks/' + track_id + '.json?client_id=' + self._CLIENT_ID
|
||||
full_title = track_id
|
||||
token = mobj.group('secret_token')
|
||||
if token:
|
||||
@ -261,7 +262,7 @@ class SoundcloudIE(InfoExtractor):
|
||||
|
||||
self.report_resolve(full_title)
|
||||
|
||||
url = 'http://soundcloud.com/%s' % resolve_title
|
||||
url = 'https://soundcloud.com/%s' % resolve_title
|
||||
info_json_url = self._resolv_url(url)
|
||||
info = self._download_json(info_json_url, full_title, 'Downloading info JSON')
|
||||
|
||||
@ -290,7 +291,7 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
|
||||
'id': '2284613',
|
||||
'title': 'The Royal Concept EP',
|
||||
},
|
||||
'playlist_mincount': 6,
|
||||
'playlist_mincount': 5,
|
||||
}, {
|
||||
'url': 'https://soundcloud.com/the-concept-band/sets/the-royal-concept-ep/token',
|
||||
'only_matching': True,
|
||||
@ -304,7 +305,7 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
|
||||
# extract simple title (uploader + slug of song title)
|
||||
slug_title = mobj.group('slug_title')
|
||||
full_title = '%s/sets/%s' % (uploader, slug_title)
|
||||
url = 'http://soundcloud.com/%s/sets/%s' % (uploader, slug_title)
|
||||
url = 'https://soundcloud.com/%s/sets/%s' % (uploader, slug_title)
|
||||
|
||||
token = mobj.group('token')
|
||||
if token:
|
||||
@ -330,7 +331,63 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
|
||||
}
|
||||
|
||||
|
||||
class SoundcloudUserIE(SoundcloudPlaylistBaseIE):
|
||||
class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE):
|
||||
_API_BASE = 'https://api.soundcloud.com'
|
||||
_API_V2_BASE = 'https://api-v2.soundcloud.com'
|
||||
|
||||
def _extract_playlist(self, base_url, playlist_id, playlist_title):
|
||||
COMMON_QUERY = {
|
||||
'limit': 50,
|
||||
'client_id': self._CLIENT_ID,
|
||||
'linked_partitioning': '1',
|
||||
}
|
||||
|
||||
query = COMMON_QUERY.copy()
|
||||
query['offset'] = 0
|
||||
|
||||
next_href = base_url + '?' + compat_urllib_parse_urlencode(query)
|
||||
|
||||
entries = []
|
||||
for i in itertools.count():
|
||||
response = self._download_json(
|
||||
next_href, playlist_id, 'Downloading track page %s' % (i + 1))
|
||||
|
||||
collection = response['collection']
|
||||
if not collection:
|
||||
break
|
||||
|
||||
def resolve_permalink_url(candidates):
|
||||
for cand in candidates:
|
||||
if isinstance(cand, dict):
|
||||
permalink_url = cand.get('permalink_url')
|
||||
entry_id = self._extract_id(cand)
|
||||
if permalink_url and permalink_url.startswith('http'):
|
||||
return permalink_url, entry_id
|
||||
|
||||
for e in collection:
|
||||
permalink_url, entry_id = resolve_permalink_url((e, e.get('track'), e.get('playlist')))
|
||||
if permalink_url:
|
||||
entries.append(self.url_result(permalink_url, video_id=entry_id))
|
||||
|
||||
next_href = response.get('next_href')
|
||||
if not next_href:
|
||||
break
|
||||
|
||||
parsed_next_href = compat_urlparse.urlparse(response['next_href'])
|
||||
qs = compat_urlparse.parse_qs(parsed_next_href.query)
|
||||
qs.update(COMMON_QUERY)
|
||||
next_href = compat_urlparse.urlunparse(
|
||||
parsed_next_href._replace(query=compat_urllib_parse_urlencode(qs, True)))
|
||||
|
||||
return {
|
||||
'_type': 'playlist',
|
||||
'id': playlist_id,
|
||||
'title': playlist_title,
|
||||
'entries': entries,
|
||||
}
|
||||
|
||||
|
||||
class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://
|
||||
(?:(?:www|m)\.)?soundcloud\.com/
|
||||
@ -380,21 +437,18 @@ class SoundcloudUserIE(SoundcloudPlaylistBaseIE):
|
||||
'url': 'https://soundcloud.com/grynpyret/spotlight',
|
||||
'info_dict': {
|
||||
'id': '7098329',
|
||||
'title': 'GRYNPYRET (Spotlight)',
|
||||
'title': 'Grynpyret (Spotlight)',
|
||||
},
|
||||
'playlist_mincount': 1,
|
||||
}]
|
||||
|
||||
_API_BASE = 'https://api.soundcloud.com'
|
||||
_API_V2_BASE = 'https://api-v2.soundcloud.com'
|
||||
|
||||
_BASE_URL_MAP = {
|
||||
'all': '%s/profile/soundcloud:users:%%s' % _API_V2_BASE,
|
||||
'tracks': '%s/users/%%s/tracks' % _API_BASE,
|
||||
'sets': '%s/users/%%s/playlists' % _API_V2_BASE,
|
||||
'reposts': '%s/profile/soundcloud:users:%%s/reposts' % _API_V2_BASE,
|
||||
'likes': '%s/users/%%s/likes' % _API_V2_BASE,
|
||||
'spotlight': '%s/users/%%s/spotlight' % _API_V2_BASE,
|
||||
'all': '%s/profile/soundcloud:users:%%s' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
|
||||
'tracks': '%s/users/%%s/tracks' % SoundcloudPagedPlaylistBaseIE._API_BASE,
|
||||
'sets': '%s/users/%%s/playlists' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
|
||||
'reposts': '%s/profile/soundcloud:users:%%s/reposts' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
|
||||
'likes': '%s/users/%%s/likes' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
|
||||
'spotlight': '%s/users/%%s/spotlight' % SoundcloudPagedPlaylistBaseIE._API_V2_BASE,
|
||||
}
|
||||
|
||||
_TITLE_MAP = {
|
||||
@ -410,70 +464,49 @@ class SoundcloudUserIE(SoundcloudPlaylistBaseIE):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
uploader = mobj.group('user')
|
||||
|
||||
url = 'http://soundcloud.com/%s/' % uploader
|
||||
url = 'https://soundcloud.com/%s/' % uploader
|
||||
resolv_url = self._resolv_url(url)
|
||||
user = self._download_json(
|
||||
resolv_url, uploader, 'Downloading user info')
|
||||
|
||||
resource = mobj.group('rsrc') or 'all'
|
||||
base_url = self._BASE_URL_MAP[resource] % user['id']
|
||||
|
||||
COMMON_QUERY = {
|
||||
'limit': 50,
|
||||
'client_id': self._CLIENT_ID,
|
||||
'linked_partitioning': '1',
|
||||
}
|
||||
return self._extract_playlist(
|
||||
self._BASE_URL_MAP[resource] % user['id'], compat_str(user['id']),
|
||||
'%s (%s)' % (user['username'], self._TITLE_MAP[resource]))
|
||||
|
||||
query = COMMON_QUERY.copy()
|
||||
query['offset'] = 0
|
||||
|
||||
next_href = base_url + '?' + compat_urllib_parse_urlencode(query)
|
||||
class SoundcloudTrackStationIE(SoundcloudPagedPlaylistBaseIE):
|
||||
_VALID_URL = r'https?://(?:(?:www|m)\.)?soundcloud\.com/stations/track/[^/]+/(?P<id>[^/?#&]+)'
|
||||
IE_NAME = 'soundcloud:trackstation'
|
||||
_TESTS = [{
|
||||
'url': 'https://soundcloud.com/stations/track/officialsundial/your-text',
|
||||
'info_dict': {
|
||||
'id': '286017854',
|
||||
'title': 'Track station: your-text',
|
||||
},
|
||||
'playlist_mincount': 47,
|
||||
}]
|
||||
|
||||
entries = []
|
||||
for i in itertools.count():
|
||||
response = self._download_json(
|
||||
next_href, uploader, 'Downloading track page %s' % (i + 1))
|
||||
def _real_extract(self, url):
|
||||
track_name = self._match_id(url)
|
||||
|
||||
collection = response['collection']
|
||||
if not collection:
|
||||
break
|
||||
webpage = self._download_webpage(url, track_name)
|
||||
|
||||
def resolve_permalink_url(candidates):
|
||||
for cand in candidates:
|
||||
if isinstance(cand, dict):
|
||||
permalink_url = cand.get('permalink_url')
|
||||
entry_id = self._extract_id(cand)
|
||||
if permalink_url and permalink_url.startswith('http'):
|
||||
return permalink_url, entry_id
|
||||
track_id = self._search_regex(
|
||||
r'soundcloud:track-stations:(\d+)', webpage, 'track id')
|
||||
|
||||
for e in collection:
|
||||
permalink_url, entry_id = resolve_permalink_url((e, e.get('track'), e.get('playlist')))
|
||||
if permalink_url:
|
||||
entries.append(self.url_result(permalink_url, video_id=entry_id))
|
||||
|
||||
next_href = response.get('next_href')
|
||||
if not next_href:
|
||||
break
|
||||
|
||||
parsed_next_href = compat_urlparse.urlparse(response['next_href'])
|
||||
qs = compat_urlparse.parse_qs(parsed_next_href.query)
|
||||
qs.update(COMMON_QUERY)
|
||||
next_href = compat_urlparse.urlunparse(
|
||||
parsed_next_href._replace(query=compat_urllib_parse_urlencode(qs, True)))
|
||||
|
||||
return {
|
||||
'_type': 'playlist',
|
||||
'id': compat_str(user['id']),
|
||||
'title': '%s (%s)' % (user['username'], self._TITLE_MAP[resource]),
|
||||
'entries': entries,
|
||||
}
|
||||
return self._extract_playlist(
|
||||
'%s/stations/soundcloud:track-stations:%s/tracks'
|
||||
% (self._API_V2_BASE, track_id),
|
||||
track_id, 'Track station: %s' % track_name)
|
||||
|
||||
|
||||
class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
|
||||
_VALID_URL = r'https?://api\.soundcloud\.com/playlists/(?P<id>[0-9]+)(?:/?\?secret_token=(?P<token>[^&]+?))?$'
|
||||
IE_NAME = 'soundcloud:playlist'
|
||||
_TESTS = [{
|
||||
'url': 'http://api.soundcloud.com/playlists/4110309',
|
||||
'url': 'https://api.soundcloud.com/playlists/4110309',
|
||||
'info_dict': {
|
||||
'id': '4110309',
|
||||
'title': 'TILT Brass - Bowery Poetry Club, August \'03 [Non-Site SCR 02]',
|
||||
|
@ -4,6 +4,7 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .nexx import NexxEmbedIE
|
||||
from .spiegeltv import SpiegeltvIE
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
@ -121,6 +122,26 @@ class SpiegelArticleIE(InfoExtractor):
|
||||
|
||||
},
|
||||
'playlist_count': 6,
|
||||
}, {
|
||||
# Nexx iFrame embed
|
||||
'url': 'http://www.spiegel.de/sptv/spiegeltv/spiegel-tv-ueber-schnellste-katapult-achterbahn-der-welt-taron-a-1137884.html',
|
||||
'info_dict': {
|
||||
'id': '161464',
|
||||
'ext': 'mp4',
|
||||
'title': 'Nervenkitzel Achterbahn',
|
||||
'alt_title': 'Karussellbauer in Deutschland',
|
||||
'description': 'md5:ffe7b1cc59a01f585e0569949aef73cc',
|
||||
'release_year': 2005,
|
||||
'creator': 'SPIEGEL TV',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'duration': 2761,
|
||||
'timestamp': 1394021479,
|
||||
'upload_date': '20140305',
|
||||
},
|
||||
'params': {
|
||||
'format': 'bestvideo',
|
||||
'skip_download': True,
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -143,6 +164,9 @@ class SpiegelArticleIE(InfoExtractor):
|
||||
entries = [
|
||||
self.url_result(compat_urlparse.urljoin(
|
||||
self.http_scheme() + '//spiegel.de/', embed_path))
|
||||
for embed_path in embeds
|
||||
]
|
||||
return self.playlist_result(entries)
|
||||
for embed_path in embeds]
|
||||
if embeds:
|
||||
return self.playlist_result(entries)
|
||||
|
||||
return self.playlist_from_matches(
|
||||
NexxEmbedIE._extract_urls(webpage), ie=NexxEmbedIE.ie_key())
|
||||
|
@ -1,114 +1,17 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urllib_parse_urlparse
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
float_or_none,
|
||||
)
|
||||
from .nexx import NexxIE
|
||||
|
||||
|
||||
class SpiegeltvIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?spiegel\.tv/(?:#/)?filme/(?P<id>[\-a-z0-9]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.spiegel.tv/filme/flug-mh370/',
|
||||
'info_dict': {
|
||||
'id': 'flug-mh370',
|
||||
'ext': 'm4v',
|
||||
'title': 'Flug MH370',
|
||||
'description': 'Das Rätsel um die Boeing 777 der Malaysia-Airlines',
|
||||
'thumbnail': r're:http://.*\.jpg$',
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
}
|
||||
}, {
|
||||
'url': 'http://www.spiegel.tv/#/filme/alleskino-die-wahrheit-ueber-maenner/',
|
||||
_VALID_URL = r'https?://(?:www\.)?spiegel\.tv/videos/(?P<id>\d+)'
|
||||
_TEST = {
|
||||
'url': 'http://www.spiegel.tv/videos/161681-flug-mh370/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
if '/#/' in url:
|
||||
url = url.replace('/#/', '/')
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
title = self._html_search_regex(r'<h1.*?>(.*?)</h1>', webpage, 'title')
|
||||
|
||||
apihost = 'http://spiegeltv-ivms2-restapi.s3.amazonaws.com'
|
||||
version_json = self._download_json(
|
||||
'%s/version.json' % apihost, video_id,
|
||||
note='Downloading version information')
|
||||
version_name = version_json['version_name']
|
||||
|
||||
slug_json = self._download_json(
|
||||
'%s/%s/restapi/slugs/%s.json' % (apihost, version_name, video_id),
|
||||
video_id,
|
||||
note='Downloading object information')
|
||||
oid = slug_json['object_id']
|
||||
|
||||
media_json = self._download_json(
|
||||
'%s/%s/restapi/media/%s.json' % (apihost, version_name, oid),
|
||||
video_id, note='Downloading media information')
|
||||
uuid = media_json['uuid']
|
||||
is_wide = media_json['is_wide']
|
||||
|
||||
server_json = self._download_json(
|
||||
'http://spiegeltv-prod-static.s3.amazonaws.com/projectConfigs/projectConfig.json',
|
||||
video_id, note='Downloading server information')
|
||||
|
||||
format = '16x9' if is_wide else '4x3'
|
||||
|
||||
formats = []
|
||||
for streamingserver in server_json['streamingserver']:
|
||||
endpoint = streamingserver.get('endpoint')
|
||||
if not endpoint:
|
||||
continue
|
||||
play_path = 'mp4:%s_spiegeltv_0500_%s.m4v' % (uuid, format)
|
||||
if endpoint.startswith('rtmp'):
|
||||
formats.append({
|
||||
'url': endpoint,
|
||||
'format_id': 'rtmp',
|
||||
'app': compat_urllib_parse_urlparse(endpoint).path[1:],
|
||||
'play_path': play_path,
|
||||
'player_path': 'http://prod-static.spiegel.tv/frontend-076.swf',
|
||||
'ext': 'flv',
|
||||
'rtmp_live': True,
|
||||
})
|
||||
elif determine_ext(endpoint) == 'm3u8':
|
||||
formats.append({
|
||||
'url': endpoint.replace('[video]', play_path),
|
||||
'ext': 'm4v',
|
||||
'format_id': 'hls', # Prefer hls since it allows to workaround georestriction
|
||||
'protocol': 'm3u8',
|
||||
'preference': 1,
|
||||
'http_headers': {
|
||||
'Accept-Encoding': 'deflate', # gzip causes trouble on the server side
|
||||
},
|
||||
})
|
||||
else:
|
||||
formats.append({
|
||||
'url': endpoint,
|
||||
})
|
||||
self._check_formats(formats, video_id)
|
||||
|
||||
thumbnails = []
|
||||
for image in media_json['images']:
|
||||
thumbnails.append({
|
||||
'url': image['url'],
|
||||
'width': image['width'],
|
||||
'height': image['height'],
|
||||
})
|
||||
|
||||
description = media_json['subtitle']
|
||||
duration = float_or_none(media_json.get('duration_in_ms'), scale=1000)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'duration': duration,
|
||||
'thumbnails': thumbnails,
|
||||
'formats': formats,
|
||||
}
|
||||
return self.url_result(
|
||||
'https://api.nexx.cloud/v3/748/videos/byid/%s'
|
||||
% self._match_id(url), ie=NexxIE.ie_key())
|
||||
|
@ -4,7 +4,11 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import js_to_json
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
)
|
||||
|
||||
|
||||
class SportBoxEmbedIE(InfoExtractor):
|
||||
@ -14,8 +18,10 @@ class SportBoxEmbedIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'id': '211355',
|
||||
'ext': 'mp4',
|
||||
'title': 'В Новороссийске прошел детский турнир «Поле славы боевой»',
|
||||
'title': '211355',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'duration': 292,
|
||||
'view_count': int,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
@ -24,6 +30,9 @@ class SportBoxEmbedIE(InfoExtractor):
|
||||
}, {
|
||||
'url': 'http://news.sportbox.ru/vdl/player?nid=370908&only_player=1&autostart=false&playeri=2&height=340&width=580',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://news.sportbox.ru/vdl/player/media/193095',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
@ -37,36 +46,34 @@ class SportBoxEmbedIE(InfoExtractor):
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
wjplayer_data = self._parse_json(
|
||||
self._search_regex(
|
||||
r'(?s)wjplayer\(({.+?})\);', webpage, 'wjplayer settings'),
|
||||
video_id, transform_source=js_to_json)
|
||||
|
||||
formats = []
|
||||
|
||||
def cleanup_js(code):
|
||||
# desktop_advert_config contains complex Javascripts and we don't need it
|
||||
return js_to_json(re.sub(r'desktop_advert_config.*', '', code))
|
||||
|
||||
jwplayer_data = self._parse_json(self._search_regex(
|
||||
r'(?s)player\.setup\(({.+?})\);', webpage, 'jwplayer settings'), video_id,
|
||||
transform_source=cleanup_js)
|
||||
|
||||
hls_url = jwplayer_data.get('hls_url')
|
||||
if hls_url:
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
hls_url, video_id, ext='mp4', m3u8_id='hls'))
|
||||
|
||||
rtsp_url = jwplayer_data.get('rtsp_url')
|
||||
if rtsp_url:
|
||||
formats.append({
|
||||
'url': rtsp_url,
|
||||
'format_id': 'rtsp',
|
||||
})
|
||||
|
||||
for source in wjplayer_data['sources']:
|
||||
src = source.get('src')
|
||||
if not src:
|
||||
continue
|
||||
if determine_ext(src) == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
src, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
else:
|
||||
formats.append({
|
||||
'url': src,
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
title = jwplayer_data['node_title']
|
||||
thumbnail = jwplayer_data.get('image_url')
|
||||
view_count = int_or_none(self._search_regex(
|
||||
r'Просмотров\s*:\s*(\d+)', webpage, 'view count', default=None))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'thumbnail': thumbnail,
|
||||
'title': video_id,
|
||||
'thumbnail': wjplayer_data.get('poster'),
|
||||
'duration': int_or_none(wjplayer_data.get('duration')),
|
||||
'view_count': view_count,
|
||||
'formats': formats,
|
||||
}
|
||||
|
@ -181,7 +181,8 @@ class SVTPlayIE(SVTBaseIE):
|
||||
|
||||
if video_id:
|
||||
data = self._download_json(
|
||||
'http://www.svt.se/videoplayer-api/video/%s' % video_id, video_id)
|
||||
'https://api.svt.se/videoplayer-api/video/%s' % video_id,
|
||||
video_id, headers=self.geo_verification_headers())
|
||||
info_dict = self._extract_video(data, video_id)
|
||||
if not info_dict.get('title'):
|
||||
info_dict['title'] = re.sub(
|
||||
|
43
youtube_dl/extractor/tastytrade.py
Normal file
43
youtube_dl/extractor/tastytrade.py
Normal file
@ -0,0 +1,43 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .ooyala import OoyalaIE
|
||||
|
||||
|
||||
class TastyTradeIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?tastytrade\.com/tt/shows/[^/]+/episodes/(?P<id>[^/?#&]+)'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'https://www.tastytrade.com/tt/shows/market-measures/episodes/correlation-in-short-volatility-06-28-2017',
|
||||
'info_dict': {
|
||||
'id': 'F3bnlzbToeI6pLEfRyrlfooIILUjz4nM',
|
||||
'ext': 'mp4',
|
||||
'title': 'A History of Teaming',
|
||||
'description': 'md5:2a9033db8da81f2edffa4c99888140b3',
|
||||
'duration': 422.255,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'add_ie': ['Ooyala'],
|
||||
}, {
|
||||
'url': 'https://www.tastytrade.com/tt/shows/daily-dose/episodes/daily-dose-06-30-2017',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
ooyala_code = self._search_regex(
|
||||
r'data-media-id=(["\'])(?P<code>(?:(?!\1).)+)\1',
|
||||
webpage, 'ooyala code', group='code')
|
||||
|
||||
info = self._search_json_ld(webpage, display_id, fatal=False)
|
||||
info.update({
|
||||
'_type': 'url_transparent',
|
||||
'ie_key': OoyalaIE.ie_key(),
|
||||
'url': 'ooyala:%s' % ooyala_code,
|
||||
'display_id': display_id,
|
||||
})
|
||||
return info
|
@ -8,6 +8,9 @@ from ..utils import extract_attributes
|
||||
|
||||
|
||||
class TBSIE(TurnerBaseIE):
|
||||
# https://github.com/rg3/youtube-dl/issues/13658
|
||||
_WORKING = False
|
||||
|
||||
_VALID_URL = r'https?://(?:www\.)?(?P<site>tbs|tntdrama)\.com/videos/(?:[^/]+/)+(?P<id>[^/?#]+)\.html'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.tbs.com/videos/people-of-earth/season-1/extras/2007318/theatrical-trailer.html',
|
||||
@ -17,7 +20,8 @@ class TBSIE(TurnerBaseIE):
|
||||
'ext': 'mp4',
|
||||
'title': 'Theatrical Trailer',
|
||||
'description': 'Catch the latest comedy from TBS, People of Earth, premiering Halloween night--Monday, October 31, at 9/8c.',
|
||||
}
|
||||
},
|
||||
'skip': 'TBS videos are deleted after a while',
|
||||
}, {
|
||||
'url': 'http://www.tntdrama.com/videos/good-behavior/season-1/extras/1538823/you-better-run.html',
|
||||
'md5': 'ce53c6ead5e9f3280b4ad2031a6fab56',
|
||||
@ -26,7 +30,8 @@ class TBSIE(TurnerBaseIE):
|
||||
'ext': 'mp4',
|
||||
'title': 'You Better Run',
|
||||
'description': 'Letty Raines must figure out what she\'s running toward while running away from her past. Good Behavior premieres November 15 at 9/8c.',
|
||||
}
|
||||
},
|
||||
'skip': 'TBS videos are deleted after a while',
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
@ -1,48 +0,0 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .jwplatform import JWPlatformIE
|
||||
from ..utils import unified_strdate
|
||||
|
||||
|
||||
class TeamFourStarIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?teamfourstar\.com/(?P<id>[a-z0-9\-]+)'
|
||||
_TEST = {
|
||||
'url': 'http://teamfourstar.com/tfs-abridged-parody-episode-1-2/',
|
||||
'info_dict': {
|
||||
'id': '0WdZO31W',
|
||||
'title': 'TFS Abridged Parody Episode 1',
|
||||
'description': 'md5:d60bc389588ebab2ee7ad432bda953ae',
|
||||
'ext': 'mp4',
|
||||
'timestamp': 1394168400,
|
||||
'upload_date': '20080508',
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
jwplatform_url = JWPlatformIE._extract_url(webpage)
|
||||
|
||||
video_title = self._html_search_regex(
|
||||
r'<h1[^>]+class="entry-title"[^>]*>(?P<title>.+?)</h1>',
|
||||
webpage, 'title')
|
||||
video_date = unified_strdate(self._html_search_regex(
|
||||
r'<span[^>]+class="meta-date date updated"[^>]*>(?P<date>.+?)</span>',
|
||||
webpage, 'date', fatal=False))
|
||||
video_description = self._html_search_regex(
|
||||
r'(?s)<div[^>]+class="content-inner"[^>]*>.*?(?P<description><p>.+?)</div>',
|
||||
webpage, 'description', fatal=False)
|
||||
video_thumbnail = self._og_search_thumbnail(webpage)
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'display_id': display_id,
|
||||
'title': video_title,
|
||||
'description': video_description,
|
||||
'upload_date': video_date,
|
||||
'thumbnail': video_thumbnail,
|
||||
'url': jwplatform_url,
|
||||
}
|
@ -6,7 +6,10 @@ import re
|
||||
from .common import InfoExtractor
|
||||
|
||||
from ..compat import compat_str
|
||||
from ..utils import int_or_none
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
try_get,
|
||||
)
|
||||
|
||||
|
||||
class TEDIE(InfoExtractor):
|
||||
@ -113,8 +116,9 @@ class TEDIE(InfoExtractor):
|
||||
}
|
||||
|
||||
def _extract_info(self, webpage):
|
||||
info_json = self._search_regex(r'q\("\w+.init",({.+})\)</script>',
|
||||
webpage, 'info json')
|
||||
info_json = self._search_regex(
|
||||
r'(?s)q\(\s*"\w+.init"\s*,\s*({.+})\)\s*</script>',
|
||||
webpage, 'info json')
|
||||
return json.loads(info_json)
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -136,11 +140,16 @@ class TEDIE(InfoExtractor):
|
||||
webpage = self._download_webpage(url, name,
|
||||
'Downloading playlist webpage')
|
||||
info = self._extract_info(webpage)
|
||||
playlist_info = info['playlist']
|
||||
|
||||
playlist_info = try_get(
|
||||
info, lambda x: x['__INITIAL_DATA__']['playlist'],
|
||||
dict) or info['playlist']
|
||||
|
||||
playlist_entries = [
|
||||
self.url_result('http://www.ted.com/talks/' + talk['slug'], self.ie_key())
|
||||
for talk in info['talks']
|
||||
for talk in try_get(
|
||||
info, lambda x: x['__INITIAL_DATA__']['talks'],
|
||||
dict) or info['talks']
|
||||
]
|
||||
return self.playlist_result(
|
||||
playlist_entries,
|
||||
@ -149,9 +158,14 @@ class TEDIE(InfoExtractor):
|
||||
|
||||
def _talk_info(self, url, video_name):
|
||||
webpage = self._download_webpage(url, video_name)
|
||||
self.report_extraction(video_name)
|
||||
|
||||
talk_info = self._extract_info(webpage)['talks'][0]
|
||||
info = self._extract_info(webpage)
|
||||
|
||||
talk_info = try_get(
|
||||
info, lambda x: x['__INITIAL_DATA__']['talks'][0],
|
||||
dict) or info['talks'][0]
|
||||
|
||||
title = talk_info['title'].strip()
|
||||
|
||||
external = talk_info.get('external')
|
||||
if external:
|
||||
@ -165,19 +179,27 @@ class TEDIE(InfoExtractor):
|
||||
'url': ext_url or external['uri'],
|
||||
}
|
||||
|
||||
native_downloads = try_get(
|
||||
talk_info, lambda x: x['downloads']['nativeDownloads'],
|
||||
dict) or talk_info['nativeDownloads']
|
||||
|
||||
formats = [{
|
||||
'url': format_url,
|
||||
'format_id': format_id,
|
||||
'format': format_id,
|
||||
} for (format_id, format_url) in talk_info['nativeDownloads'].items() if format_url is not None]
|
||||
} for (format_id, format_url) in native_downloads.items() if format_url is not None]
|
||||
if formats:
|
||||
for f in formats:
|
||||
finfo = self._NATIVE_FORMATS.get(f['format_id'])
|
||||
if finfo:
|
||||
f.update(finfo)
|
||||
|
||||
player_talk = talk_info['player_talks'][0]
|
||||
|
||||
resources_ = player_talk.get('resources') or talk_info.get('resources')
|
||||
|
||||
http_url = None
|
||||
for format_id, resources in talk_info['resources'].items():
|
||||
for format_id, resources in resources_.items():
|
||||
if format_id == 'h264':
|
||||
for resource in resources:
|
||||
h264_url = resource.get('file')
|
||||
@ -237,14 +259,11 @@ class TEDIE(InfoExtractor):
|
||||
|
||||
video_id = compat_str(talk_info['id'])
|
||||
|
||||
thumbnail = talk_info['thumb']
|
||||
if not thumbnail.startswith('http'):
|
||||
thumbnail = 'http://' + thumbnail
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': talk_info['title'].strip(),
|
||||
'uploader': talk_info['speaker'],
|
||||
'thumbnail': thumbnail,
|
||||
'title': title,
|
||||
'uploader': player_talk.get('speaker') or talk_info.get('speaker'),
|
||||
'thumbnail': player_talk.get('thumb') or talk_info.get('thumb'),
|
||||
'description': self._og_search_description(webpage),
|
||||
'subtitles': self._get_subtitles(video_id, talk_info),
|
||||
'formats': formats,
|
||||
@ -252,20 +271,22 @@ class TEDIE(InfoExtractor):
|
||||
}
|
||||
|
||||
def _get_subtitles(self, video_id, talk_info):
|
||||
languages = [lang['languageCode'] for lang in talk_info.get('languages', [])]
|
||||
if languages:
|
||||
sub_lang_list = {}
|
||||
for l in languages:
|
||||
sub_lang_list[l] = [
|
||||
{
|
||||
'url': 'http://www.ted.com/talks/subtitles/id/%s/lang/%s/format/%s' % (video_id, l, ext),
|
||||
'ext': ext,
|
||||
}
|
||||
for ext in ['ted', 'srt']
|
||||
]
|
||||
return sub_lang_list
|
||||
else:
|
||||
return {}
|
||||
sub_lang_list = {}
|
||||
for language in try_get(
|
||||
talk_info,
|
||||
(lambda x: x['downloads']['languages'],
|
||||
lambda x: x['languages']), list):
|
||||
lang_code = language.get('languageCode') or language.get('ianaCode')
|
||||
if not lang_code:
|
||||
continue
|
||||
sub_lang_list[lang_code] = [
|
||||
{
|
||||
'url': 'http://www.ted.com/talks/subtitles/id/%s/lang/%s/format/%s' % (video_id, lang_code, ext),
|
||||
'ext': ext,
|
||||
}
|
||||
for ext in ['ted', 'srt']
|
||||
]
|
||||
return sub_lang_list
|
||||
|
||||
def _watch_info(self, url, name):
|
||||
webpage = self._download_webpage(url, name)
|
||||
|
@ -2,13 +2,15 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import try_get
|
||||
|
||||
|
||||
class ThisOldHouseIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode)/(?P<id>[^/?#]+)'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench',
|
||||
'md5': '946f05bbaa12a33f9ae35580d2dfcfe3',
|
||||
'md5': '568acf9ca25a639f0c4ff905826b662f',
|
||||
'info_dict': {
|
||||
'id': '2REGtUDQ',
|
||||
'ext': 'mp4',
|
||||
@ -28,8 +30,15 @@ class ThisOldHouseIE(InfoExtractor):
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
drupal_settings = self._parse_json(self._search_regex(
|
||||
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
|
||||
webpage, 'drupal settings'), display_id)
|
||||
video_id = drupal_settings['jwplatform']['video_id']
|
||||
video_id = self._search_regex(
|
||||
(r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1',
|
||||
r'id=(["\'])inline-video-player-(?P<id>(?:(?!\1).)+)\1'),
|
||||
webpage, 'video id', default=None, group='id')
|
||||
if not video_id:
|
||||
drupal_settings = self._parse_json(self._search_regex(
|
||||
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
|
||||
webpage, 'drupal settings'), display_id)
|
||||
video_id = try_get(
|
||||
drupal_settings, lambda x: x['jwplatform']['video_id'],
|
||||
compat_str) or list(drupal_settings['comScore'])[0]
|
||||
return self.url_result('jwplatform:' + video_id, 'JWPlatform', video_id)
|
||||
|
@ -50,7 +50,7 @@ class TwentyMinutenIE(InfoExtractor):
|
||||
@staticmethod
|
||||
def _extract_urls(webpage):
|
||||
return [m.group('url') for m in re.finditer(
|
||||
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?(?:www\.)?20min\.ch/videoplayer/videoplayer.html\?.*?\bvideoId@\d+.*?)\1',
|
||||
r'<iframe[^>]+src=(["\'])(?P<url>(?:(?:https?:)?//)?(?:www\.)?20min\.ch/videoplayer/videoplayer.html\?.*?\bvideoId@\d+.*?)\1',
|
||||
webpage)]
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
@ -7,20 +7,38 @@ from .common import InfoExtractor
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
float_or_none,
|
||||
xpath_text,
|
||||
remove_end,
|
||||
int_or_none,
|
||||
dict_get,
|
||||
ExtractorError,
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
remove_end,
|
||||
try_get,
|
||||
xpath_text,
|
||||
)
|
||||
|
||||
from .periscope import PeriscopeIE
|
||||
|
||||
|
||||
class TwitterBaseIE(InfoExtractor):
|
||||
def _get_vmap_video_url(self, vmap_url, video_id):
|
||||
def _extract_formats_from_vmap_url(self, vmap_url, video_id):
|
||||
vmap_data = self._download_xml(vmap_url, video_id)
|
||||
return xpath_text(vmap_data, './/MediaFile').strip()
|
||||
video_url = xpath_text(vmap_data, './/MediaFile').strip()
|
||||
if determine_ext(video_url) == 'm3u8':
|
||||
return self._extract_m3u8_formats(
|
||||
video_url, video_id, ext='mp4', m3u8_id='hls',
|
||||
entry_protocol='m3u8_native')
|
||||
return [{
|
||||
'url': video_url,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def _search_dimensions_in_video_url(a_format, video_url):
|
||||
m = re.search(r'/(?P<width>\d+)x(?P<height>\d+)/', video_url)
|
||||
if m:
|
||||
a_format.update({
|
||||
'width': int(m.group('width')),
|
||||
'height': int(m.group('height')),
|
||||
})
|
||||
|
||||
|
||||
class TwitterCardIE(TwitterBaseIE):
|
||||
@ -36,7 +54,8 @@ class TwitterCardIE(TwitterBaseIE):
|
||||
'title': 'Twitter Card',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'duration': 30.033,
|
||||
}
|
||||
},
|
||||
'skip': 'Video gone',
|
||||
},
|
||||
{
|
||||
'url': 'https://twitter.com/i/cards/tfw/v1/623160978427936768',
|
||||
@ -48,6 +67,7 @@ class TwitterCardIE(TwitterBaseIE):
|
||||
'thumbnail': r're:^https?://.*\.jpg',
|
||||
'duration': 80.155,
|
||||
},
|
||||
'skip': 'Video gone',
|
||||
},
|
||||
{
|
||||
'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977',
|
||||
@ -65,7 +85,7 @@ class TwitterCardIE(TwitterBaseIE):
|
||||
},
|
||||
{
|
||||
'url': 'https://twitter.com/i/cards/tfw/v1/665289828897005568',
|
||||
'md5': 'ab2745d0b0ce53319a534fccaa986439',
|
||||
'md5': '6dabeaca9e68cbb71c99c322a4b42a11',
|
||||
'info_dict': {
|
||||
'id': 'iBb2x00UVlv',
|
||||
'ext': 'mp4',
|
||||
@ -73,16 +93,17 @@ class TwitterCardIE(TwitterBaseIE):
|
||||
'uploader_id': '1189339351084113920',
|
||||
'uploader': 'ArsenalTerje',
|
||||
'title': 'Vine by ArsenalTerje',
|
||||
'timestamp': 1447451307,
|
||||
},
|
||||
'add_ie': ['Vine'],
|
||||
}, {
|
||||
'url': 'https://twitter.com/i/videos/tweet/705235433198714880',
|
||||
'md5': '3846d0a07109b5ab622425449b59049d',
|
||||
'md5': '884812a2adc8aaf6fe52b15ccbfa3b88',
|
||||
'info_dict': {
|
||||
'id': '705235433198714880',
|
||||
'ext': 'mp4',
|
||||
'title': 'Twitter web player',
|
||||
'thumbnail': r're:^https?://.*\.jpg',
|
||||
'thumbnail': r're:^https?://.*',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://twitter.com/i/videos/752274308186120192',
|
||||
@ -90,6 +111,59 @@ class TwitterCardIE(TwitterBaseIE):
|
||||
},
|
||||
]
|
||||
|
||||
def _parse_media_info(self, media_info, video_id):
|
||||
formats = []
|
||||
for media_variant in media_info.get('variants', []):
|
||||
media_url = media_variant['url']
|
||||
if media_url.endswith('.m3u8'):
|
||||
formats.extend(self._extract_m3u8_formats(media_url, video_id, ext='mp4', m3u8_id='hls'))
|
||||
elif media_url.endswith('.mpd'):
|
||||
formats.extend(self._extract_mpd_formats(media_url, video_id, mpd_id='dash'))
|
||||
else:
|
||||
vbr = int_or_none(dict_get(media_variant, ('bitRate', 'bitrate')), scale=1000)
|
||||
a_format = {
|
||||
'url': media_url,
|
||||
'format_id': 'http-%d' % vbr if vbr else 'http',
|
||||
'vbr': vbr,
|
||||
}
|
||||
# Reported bitRate may be zero
|
||||
if not a_format['vbr']:
|
||||
del a_format['vbr']
|
||||
|
||||
self._search_dimensions_in_video_url(a_format, media_url)
|
||||
|
||||
formats.append(a_format)
|
||||
return formats
|
||||
|
||||
def _extract_mobile_formats(self, username, video_id):
|
||||
webpage = self._download_webpage(
|
||||
'https://mobile.twitter.com/%s/status/%s' % (username, video_id),
|
||||
video_id, 'Downloading mobile webpage',
|
||||
headers={
|
||||
# A recent mobile UA is necessary for `gt` cookie
|
||||
'User-Agent': 'Mozilla/5.0 (Android 6.0.1; Mobile; rv:54.0) Gecko/54.0 Firefox/54.0',
|
||||
})
|
||||
main_script_url = self._html_search_regex(
|
||||
r'<script[^>]+src="([^"]+main\.[^"]+)"', webpage, 'main script URL')
|
||||
main_script = self._download_webpage(
|
||||
main_script_url, video_id, 'Downloading main script')
|
||||
bearer_token = self._search_regex(
|
||||
r'BEARER_TOKEN\s*:\s*"([^"]+)"',
|
||||
main_script, 'bearer token')
|
||||
guest_token = self._search_regex(
|
||||
r'document\.cookie\s*=\s*decodeURIComponent\("gt=(\d+)',
|
||||
webpage, 'guest token')
|
||||
api_data = self._download_json(
|
||||
'https://api.twitter.com/2/timeline/conversation/%s.json' % video_id,
|
||||
video_id, 'Downloading mobile API data',
|
||||
headers={
|
||||
'Authorization': 'Bearer ' + bearer_token,
|
||||
'x-guest-token': guest_token,
|
||||
})
|
||||
media_info = try_get(api_data, lambda o: o['globalObjects']['tweets'][video_id]
|
||||
['extended_entities']['media'][0]['video_info']) or {}
|
||||
return self._parse_media_info(media_info, video_id)
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
@ -117,14 +191,6 @@ class TwitterCardIE(TwitterBaseIE):
|
||||
if periscope_url:
|
||||
return self.url_result(periscope_url, PeriscopeIE.ie_key())
|
||||
|
||||
def _search_dimensions_in_video_url(a_format, video_url):
|
||||
m = re.search(r'/(?P<width>\d+)x(?P<height>\d+)/', video_url)
|
||||
if m:
|
||||
a_format.update({
|
||||
'width': int(m.group('width')),
|
||||
'height': int(m.group('height')),
|
||||
})
|
||||
|
||||
video_url = config.get('video_url') or config.get('playlist', [{}])[0].get('source')
|
||||
|
||||
if video_url:
|
||||
@ -135,15 +201,14 @@ class TwitterCardIE(TwitterBaseIE):
|
||||
'url': video_url,
|
||||
}
|
||||
|
||||
_search_dimensions_in_video_url(f, video_url)
|
||||
self._search_dimensions_in_video_url(f, video_url)
|
||||
|
||||
formats.append(f)
|
||||
|
||||
vmap_url = config.get('vmapUrl') or config.get('vmap_url')
|
||||
if vmap_url:
|
||||
formats.append({
|
||||
'url': self._get_vmap_video_url(vmap_url, video_id),
|
||||
})
|
||||
formats.extend(
|
||||
self._extract_formats_from_vmap_url(vmap_url, video_id))
|
||||
|
||||
media_info = None
|
||||
|
||||
@ -152,29 +217,14 @@ class TwitterCardIE(TwitterBaseIE):
|
||||
media_info = entity['mediaInfo']
|
||||
|
||||
if media_info:
|
||||
for media_variant in media_info['variants']:
|
||||
media_url = media_variant['url']
|
||||
if media_url.endswith('.m3u8'):
|
||||
formats.extend(self._extract_m3u8_formats(media_url, video_id, ext='mp4', m3u8_id='hls'))
|
||||
elif media_url.endswith('.mpd'):
|
||||
formats.extend(self._extract_mpd_formats(media_url, video_id, mpd_id='dash'))
|
||||
else:
|
||||
vbr = int_or_none(media_variant.get('bitRate'), scale=1000)
|
||||
a_format = {
|
||||
'url': media_url,
|
||||
'format_id': 'http-%d' % vbr if vbr else 'http',
|
||||
'vbr': vbr,
|
||||
}
|
||||
# Reported bitRate may be zero
|
||||
if not a_format['vbr']:
|
||||
del a_format['vbr']
|
||||
|
||||
_search_dimensions_in_video_url(a_format, media_url)
|
||||
|
||||
formats.append(a_format)
|
||||
|
||||
formats.extend(self._parse_media_info(media_info, video_id))
|
||||
duration = float_or_none(media_info.get('duration', {}).get('nanos'), scale=1e9)
|
||||
|
||||
username = config.get('user', {}).get('screen_name')
|
||||
if username:
|
||||
formats.extend(self._extract_mobile_formats(username, video_id))
|
||||
|
||||
self._remove_duplicate_formats(formats)
|
||||
self._sort_formats(formats)
|
||||
|
||||
title = self._search_regex(r'<title>([^<]+)</title>', webpage, 'title')
|
||||
@ -255,10 +305,10 @@ class TwitterIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'id': '700207533655363584',
|
||||
'ext': 'mp4',
|
||||
'title': 'JG - BEAT PROD: @suhmeduh #Damndaniel',
|
||||
'description': 'JG on Twitter: "BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
|
||||
'title': 'Donte - BEAT PROD: @suhmeduh #Damndaniel',
|
||||
'description': 'Donte on Twitter: "BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
|
||||
'thumbnail': r're:^https?://.*\.jpg',
|
||||
'uploader': 'JG',
|
||||
'uploader': 'Donte',
|
||||
'uploader_id': 'jaydingeer',
|
||||
},
|
||||
'params': {
|
||||
@ -270,9 +320,11 @@ class TwitterIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'id': 'MIOxnrUteUd',
|
||||
'ext': 'mp4',
|
||||
'title': 'Dr.Pepperの飲み方 #japanese #バカ #ドクペ #電動ガン',
|
||||
'uploader': 'TAKUMA',
|
||||
'uploader_id': '1004126642786242560',
|
||||
'title': 'FilmDrunk - Vine of the day',
|
||||
'description': 'FilmDrunk on Twitter: "Vine of the day https://t.co/xmTvRdqxWf"',
|
||||
'uploader': 'FilmDrunk',
|
||||
'uploader_id': 'Filmdrunk',
|
||||
'timestamp': 1402826626,
|
||||
'upload_date': '20140615',
|
||||
},
|
||||
'add_ie': ['Vine'],
|
||||
@ -294,13 +346,28 @@ class TwitterIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'id': '1zqKVVlkqLaKB',
|
||||
'ext': 'mp4',
|
||||
'title': 'Sgt Kerry Schmidt - Ontario Provincial Police - Road rage, mischief, assault, rollover and fire in one occurrence',
|
||||
'title': 'Sgt Kerry Schmidt - LIVE on #Periscope: Road rage, mischief, assault, rollover and fire in one occurrence',
|
||||
'description': 'Sgt Kerry Schmidt on Twitter: "LIVE on #Periscope: Road rage, mischief, assault, rollover and fire in one occurrence https://t.co/EKrVgIXF3s"',
|
||||
'upload_date': '20160923',
|
||||
'uploader_id': 'OPP_HSD',
|
||||
'uploader': 'Sgt Kerry Schmidt - Ontario Provincial Police',
|
||||
'uploader': 'Sgt Kerry Schmidt',
|
||||
'timestamp': 1474613214,
|
||||
},
|
||||
'add_ie': ['Periscope'],
|
||||
}, {
|
||||
# has mp4 formats via mobile API
|
||||
'url': 'https://twitter.com/news_al3alm/status/852138619213144067',
|
||||
'info_dict': {
|
||||
'id': '852138619213144067',
|
||||
'ext': 'mp4',
|
||||
'title': 'عالم الأخبار - كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعارضين : اتقوا الله .. الظلم ظلمات يوم القيامة',
|
||||
'description': 'عالم الأخبار on Twitter: "كلمة تاريخية بجلسة الجناسي التاريخية.. النائب خالد مؤنس العتيبي للمعارضين : اتقوا الله .. الظلم ظلمات يوم القيامة https://t.co/xg6OhpyKfN"',
|
||||
'uploader': 'عالم الأخبار',
|
||||
'uploader_id': 'news_al3alm',
|
||||
},
|
||||
'params': {
|
||||
'format': 'best[format_id^=http-]',
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -393,7 +460,7 @@ class TwitterAmplifyIE(TwitterBaseIE):
|
||||
|
||||
vmap_url = self._html_search_meta(
|
||||
'twitter:amplify:vmap', webpage, 'vmap url')
|
||||
video_url = self._get_vmap_video_url(vmap_url, video_id)
|
||||
formats = self._extract_formats_from_vmap_url(vmap_url, video_id)
|
||||
|
||||
thumbnails = []
|
||||
thumbnail = self._html_search_meta(
|
||||
@ -415,11 +482,10 @@ class TwitterAmplifyIE(TwitterBaseIE):
|
||||
})
|
||||
|
||||
video_w, video_h = _find_dimension('player')
|
||||
formats = [{
|
||||
'url': video_url,
|
||||
formats[0].update({
|
||||
'width': video_w,
|
||||
'height': video_h,
|
||||
}]
|
||||
})
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
|
@ -15,6 +15,7 @@ from ..utils import (
|
||||
ExtractorError,
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
sanitized_Request,
|
||||
unescapeHTML,
|
||||
urlencode_postdata,
|
||||
@ -268,6 +269,25 @@ class UdemyIE(InfoExtractor):
|
||||
f = add_output_format_meta(f, format_id)
|
||||
formats.append(f)
|
||||
|
||||
def extract_subtitles(track_list):
|
||||
if not isinstance(track_list, list):
|
||||
return
|
||||
for track in track_list:
|
||||
if not isinstance(track, dict):
|
||||
continue
|
||||
if track.get('kind') != 'captions':
|
||||
continue
|
||||
src = track.get('src')
|
||||
if not src or not isinstance(src, compat_str):
|
||||
continue
|
||||
lang = track.get('language') or track.get(
|
||||
'srclang') or track.get('label')
|
||||
sub_dict = automatic_captions if track.get(
|
||||
'autogenerated') is True else subtitles
|
||||
sub_dict.setdefault(lang, []).append({
|
||||
'url': src,
|
||||
})
|
||||
|
||||
download_urls = asset.get('download_urls')
|
||||
if isinstance(download_urls, dict):
|
||||
extract_formats(download_urls.get('Video'))
|
||||
@ -315,23 +335,16 @@ class UdemyIE(InfoExtractor):
|
||||
extract_formats(data.get('sources'))
|
||||
if not duration:
|
||||
duration = int_or_none(data.get('duration'))
|
||||
tracks = data.get('tracks')
|
||||
if isinstance(tracks, list):
|
||||
for track in tracks:
|
||||
if not isinstance(track, dict):
|
||||
continue
|
||||
if track.get('kind') != 'captions':
|
||||
continue
|
||||
src = track.get('src')
|
||||
if not src or not isinstance(src, compat_str):
|
||||
continue
|
||||
lang = track.get('language') or track.get(
|
||||
'srclang') or track.get('label')
|
||||
sub_dict = automatic_captions if track.get(
|
||||
'autogenerated') is True else subtitles
|
||||
sub_dict.setdefault(lang, []).append({
|
||||
'url': src,
|
||||
})
|
||||
extract_subtitles(data.get('tracks'))
|
||||
|
||||
if not subtitles and not automatic_captions:
|
||||
text_tracks = self._parse_json(
|
||||
self._search_regex(
|
||||
r'text-tracks=(["\'])(?P<data>\[.+?\])\1', view_html,
|
||||
'text tracks', default='{}', group='data'), video_id,
|
||||
transform_source=lambda s: js_to_json(unescapeHTML(s)),
|
||||
fatal=False)
|
||||
extract_subtitles(text_tracks)
|
||||
|
||||
self._sort_formats(formats, field_preference=('height', 'width', 'tbr', 'format_id'))
|
||||
|
||||
|
@ -12,47 +12,46 @@ from ..utils import (
|
||||
|
||||
|
||||
class VeohIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?veoh\.com/(?:watch|iphone/#_Watch)/(?P<id>(?:v|yapi-)[\da-zA-Z]+)'
|
||||
_VALID_URL = r'https?://(?:www\.)?veoh\.com/(?:watch|iphone/#_Watch)/(?P<id>(?:v|e|yapi-)[\da-zA-Z]+)'
|
||||
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'http://www.veoh.com/watch/v56314296nk7Zdmz3',
|
||||
'md5': '620e68e6a3cff80086df3348426c9ca3',
|
||||
'info_dict': {
|
||||
'id': '56314296',
|
||||
'ext': 'mp4',
|
||||
'title': 'Straight Backs Are Stronger',
|
||||
'uploader': 'LUMOback',
|
||||
'description': 'At LUMOback, we believe straight backs are stronger. The LUMOback Posture & Movement Sensor: It gently vibrates when you slouch, inspiring improved posture and mobility. Use the app to track your data and improve your posture over time. ',
|
||||
},
|
||||
_TESTS = [{
|
||||
'url': 'http://www.veoh.com/watch/v56314296nk7Zdmz3',
|
||||
'md5': '620e68e6a3cff80086df3348426c9ca3',
|
||||
'info_dict': {
|
||||
'id': '56314296',
|
||||
'ext': 'mp4',
|
||||
'title': 'Straight Backs Are Stronger',
|
||||
'uploader': 'LUMOback',
|
||||
'description': 'At LUMOback, we believe straight backs are stronger. The LUMOback Posture & Movement Sensor: It gently vibrates when you slouch, inspiring improved posture and mobility. Use the app to track your data and improve your posture over time. ',
|
||||
},
|
||||
{
|
||||
'url': 'http://www.veoh.com/watch/v27701988pbTc4wzN?h1=Chile+workers+cover+up+to+avoid+skin+damage',
|
||||
'md5': '4a6ff84b87d536a6a71e6aa6c0ad07fa',
|
||||
'info_dict': {
|
||||
'id': '27701988',
|
||||
'ext': 'mp4',
|
||||
'title': 'Chile workers cover up to avoid skin damage',
|
||||
'description': 'md5:2bd151625a60a32822873efc246ba20d',
|
||||
'uploader': 'afp-news',
|
||||
'duration': 123,
|
||||
},
|
||||
'skip': 'This video has been deleted.',
|
||||
}, {
|
||||
'url': 'http://www.veoh.com/watch/v27701988pbTc4wzN?h1=Chile+workers+cover+up+to+avoid+skin+damage',
|
||||
'md5': '4a6ff84b87d536a6a71e6aa6c0ad07fa',
|
||||
'info_dict': {
|
||||
'id': '27701988',
|
||||
'ext': 'mp4',
|
||||
'title': 'Chile workers cover up to avoid skin damage',
|
||||
'description': 'md5:2bd151625a60a32822873efc246ba20d',
|
||||
'uploader': 'afp-news',
|
||||
'duration': 123,
|
||||
},
|
||||
{
|
||||
'url': 'http://www.veoh.com/watch/v69525809F6Nc4frX',
|
||||
'md5': '4fde7b9e33577bab2f2f8f260e30e979',
|
||||
'note': 'Embedded ooyala video',
|
||||
'info_dict': {
|
||||
'id': '69525809',
|
||||
'ext': 'mp4',
|
||||
'title': 'Doctors Alter Plan For Preteen\'s Weight Loss Surgery',
|
||||
'description': 'md5:f5a11c51f8fb51d2315bca0937526891',
|
||||
'uploader': 'newsy-videos',
|
||||
},
|
||||
'skip': 'This video has been deleted.',
|
||||
'skip': 'This video has been deleted.',
|
||||
}, {
|
||||
'url': 'http://www.veoh.com/watch/v69525809F6Nc4frX',
|
||||
'md5': '4fde7b9e33577bab2f2f8f260e30e979',
|
||||
'note': 'Embedded ooyala video',
|
||||
'info_dict': {
|
||||
'id': '69525809',
|
||||
'ext': 'mp4',
|
||||
'title': 'Doctors Alter Plan For Preteen\'s Weight Loss Surgery',
|
||||
'description': 'md5:f5a11c51f8fb51d2315bca0937526891',
|
||||
'uploader': 'newsy-videos',
|
||||
},
|
||||
]
|
||||
'skip': 'This video has been deleted.',
|
||||
}, {
|
||||
'url': 'http://www.veoh.com/watch/e152215AJxZktGS',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _extract_formats(self, source):
|
||||
formats = []
|
||||
|
@ -121,7 +121,11 @@ class VH1IE(MTVIE):
|
||||
idoc = self._download_xml(
|
||||
doc_url, video_id,
|
||||
'Downloading info', transform_source=fix_xml_ampersands)
|
||||
return self.playlist_result(
|
||||
[self._get_video_info(item) for item in idoc.findall('.//item')],
|
||||
playlist_id=video_id,
|
||||
)
|
||||
|
||||
entries = []
|
||||
for item in idoc.findall('.//item'):
|
||||
info = self._get_video_info(item)
|
||||
if info:
|
||||
entries.append(info)
|
||||
|
||||
return self.playlist_result(entries, playlist_id=video_id)
|
||||
|
@ -56,7 +56,8 @@ class VidioIE(InfoExtractor):
|
||||
self._sort_formats(formats)
|
||||
|
||||
duration = int_or_none(duration or self._search_regex(
|
||||
r'data-video-duration=(["\'])(?P<duartion>\d+)\1', webpage, 'duration'))
|
||||
r'data-video-duration=(["\'])(?P<duration>\d+)\1', webpage,
|
||||
'duration', fatal=False, group='duration'))
|
||||
thumbnail = thumbnail or self._og_search_thumbnail(webpage)
|
||||
|
||||
like_count = int_or_none(self._search_regex(
|
||||
|
@ -3,7 +3,10 @@ from __future__ import unicode_literals
|
||||
import itertools
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_HTTPError
|
||||
from ..compat import (
|
||||
compat_HTTPError,
|
||||
compat_str,
|
||||
)
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
@ -161,13 +164,28 @@ class VidmeIE(InfoExtractor):
|
||||
'or for violating the terms of use.',
|
||||
expected=True)
|
||||
|
||||
formats = [{
|
||||
'format_id': f.get('type'),
|
||||
'url': f['uri'],
|
||||
'width': int_or_none(f.get('width')),
|
||||
'height': int_or_none(f.get('height')),
|
||||
'preference': 0 if f.get('type', '').endswith('clip') else 1,
|
||||
} for f in video.get('formats', []) if f.get('uri')]
|
||||
formats = []
|
||||
for f in video.get('formats', []):
|
||||
format_url = f.get('uri')
|
||||
if not format_url or not isinstance(format_url, compat_str):
|
||||
continue
|
||||
format_type = f.get('type')
|
||||
if format_type == 'dash':
|
||||
formats.extend(self._extract_mpd_formats(
|
||||
format_url, video_id, mpd_id='dash', fatal=False))
|
||||
elif format_type == 'hls':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
else:
|
||||
formats.append({
|
||||
'format_id': f.get('type'),
|
||||
'url': format_url,
|
||||
'width': int_or_none(f.get('width')),
|
||||
'height': int_or_none(f.get('height')),
|
||||
'preference': 0 if f.get('type', '').endswith(
|
||||
'clip') else 1,
|
||||
})
|
||||
|
||||
if not formats and video.get('complete_url'):
|
||||
formats.append({
|
||||
|
@ -15,7 +15,21 @@ from ..utils import (
|
||||
class VierIE(InfoExtractor):
|
||||
IE_NAME = 'vier'
|
||||
IE_DESC = 'vier.be and vijf.be'
|
||||
_VALID_URL = r'https?://(?:www\.)?(?P<site>vier|vijf)\.be/(?:[^/]+/videos/(?P<display_id>[^/]+)(?:/(?P<id>\d+))?|video/v3/embed/(?P<embed_id>\d+))'
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://
|
||||
(?:www\.)?(?P<site>vier|vijf)\.be/
|
||||
(?:
|
||||
(?:
|
||||
[^/]+/videos|
|
||||
video(?:/[^/]+)*
|
||||
)/
|
||||
(?P<display_id>[^/]+)(?:/(?P<id>\d+))?|
|
||||
(?:
|
||||
video/v3/embed|
|
||||
embed/video/public
|
||||
)/(?P<embed_id>\d+)
|
||||
)
|
||||
'''
|
||||
_NETRC_MACHINE = 'vier'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.vier.be/planb/videos/het-wordt-warm-de-moestuin/16129',
|
||||
@ -83,6 +97,15 @@ class VierIE(InfoExtractor):
|
||||
}, {
|
||||
'url': 'http://www.vier.be/video/v3/embed/16129',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.vijf.be/embed/video/public/4093',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.vier.be/video/blockbusters/in-juli-en-augustus-summer-classics',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.vier.be/video/achter-de-rug/2017/achter-de-rug-seizoen-1-aflevering-6',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_initialize(self):
|
||||
@ -133,14 +156,20 @@ class VierIE(InfoExtractor):
|
||||
video_id = self._search_regex(
|
||||
[r'data-nid="(\d+)"', r'"nid"\s*:\s*"(\d+)"'],
|
||||
webpage, 'video id', default=video_id or display_id)
|
||||
application = self._search_regex(
|
||||
[r'data-application="([^"]+)"', r'"application"\s*:\s*"([^"]+)"'],
|
||||
webpage, 'application', default=site + '_vod')
|
||||
filename = self._search_regex(
|
||||
[r'data-filename="([^"]+)"', r'"filename"\s*:\s*"([^"]+)"'],
|
||||
webpage, 'filename')
|
||||
|
||||
playlist_url = 'http://vod.streamcloud.be/%s/_definst_/mp4:%s.mp4/playlist.m3u8' % (application, filename)
|
||||
playlist_url = self._search_regex(
|
||||
r'data-file=(["\'])(?P<url>(?:https?:)?//[^/]+/.+?\.m3u8.*?)\1',
|
||||
webpage, 'm3u8 url', default=None, group='url')
|
||||
|
||||
if not playlist_url:
|
||||
application = self._search_regex(
|
||||
[r'data-application="([^"]+)"', r'"application"\s*:\s*"([^"]+)"'],
|
||||
webpage, 'application', default=site + '_vod')
|
||||
filename = self._search_regex(
|
||||
[r'data-filename="([^"]+)"', r'"filename"\s*:\s*"([^"]+)"'],
|
||||
webpage, 'filename')
|
||||
playlist_url = 'http://vod.streamcloud.be/%s/_definst_/mp4:%s.mp4/playlist.m3u8' % (application, filename)
|
||||
|
||||
formats = self._extract_wowza_formats(
|
||||
playlist_url, display_id, skip_protocols=['dash'])
|
||||
self._sort_formats(formats)
|
||||
|
@ -92,10 +92,12 @@ class VineIE(InfoExtractor):
|
||||
|
||||
username = data.get('username')
|
||||
|
||||
alt_title = 'Vine by %s' % username if username else None
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': data.get('description'),
|
||||
'alt_title': 'Vine by %s' % username if username else None,
|
||||
'title': data.get('description') or alt_title or 'Vine video',
|
||||
'alt_title': alt_title,
|
||||
'thumbnail': data.get('thumbnailUrl'),
|
||||
'timestamp': unified_timestamp(data.get('created')),
|
||||
'uploader': username,
|
||||
|
@ -49,6 +49,10 @@ class VLiveIE(InfoExtractor):
|
||||
},
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
def suitable(cls, url):
|
||||
return False if VLivePlaylistIE.suitable(url) else super(VLiveIE, cls).suitable(url)
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
@ -232,7 +236,12 @@ class VLiveChannelIE(InfoExtractor):
|
||||
query={
|
||||
'app_id': app_id,
|
||||
'channelSeq': channel_seq,
|
||||
'maxNumOfRows': 1000,
|
||||
# Large values of maxNumOfRows (~300 or above) may cause
|
||||
# empty responses (see [1]), e.g. this happens for [2] that
|
||||
# has more than 300 videos.
|
||||
# 1. https://github.com/rg3/youtube-dl/issues/13830
|
||||
# 2. http://channels.vlive.tv/EDBF.
|
||||
'maxNumOfRows': 100,
|
||||
'_': int(time.time()),
|
||||
'pageNo': page_num
|
||||
}
|
||||
@ -261,3 +270,54 @@ class VLiveChannelIE(InfoExtractor):
|
||||
|
||||
return self.playlist_result(
|
||||
entries, channel_code, channel_name)
|
||||
|
||||
|
||||
class VLivePlaylistIE(InfoExtractor):
|
||||
IE_NAME = 'vlive:playlist'
|
||||
_VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<video_id>[0-9]+)/playlist/(?P<id>[0-9]+)'
|
||||
_TEST = {
|
||||
'url': 'http://www.vlive.tv/video/22867/playlist/22912',
|
||||
'info_dict': {
|
||||
'id': '22912',
|
||||
'title': 'Valentine Day Message from TWICE'
|
||||
},
|
||||
'playlist_mincount': 9
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id, playlist_id = mobj.group('video_id', 'id')
|
||||
|
||||
VIDEO_URL_TEMPLATE = 'http://www.vlive.tv/video/%s'
|
||||
if self._downloader.params.get('noplaylist'):
|
||||
self.to_screen(
|
||||
'Downloading just video %s because of --no-playlist' % video_id)
|
||||
return self.url_result(
|
||||
VIDEO_URL_TEMPLATE % video_id,
|
||||
ie=VLiveIE.ie_key(), video_id=video_id)
|
||||
|
||||
self.to_screen(
|
||||
'Downloading playlist %s - add --no-playlist to just download video'
|
||||
% playlist_id)
|
||||
|
||||
webpage = self._download_webpage(
|
||||
'http://www.vlive.tv/video/%s/playlist/%s'
|
||||
% (video_id, playlist_id), playlist_id)
|
||||
|
||||
item_ids = self._parse_json(
|
||||
self._search_regex(
|
||||
r'playlistVideoSeqs\s*=\s*(\[[^]]+\])', webpage,
|
||||
'playlist video seqs'),
|
||||
playlist_id)
|
||||
|
||||
entries = [
|
||||
self.url_result(
|
||||
VIDEO_URL_TEMPLATE % item_id, ie=VLiveIE.ie_key(),
|
||||
video_id=compat_str(item_id))
|
||||
for item_id in item_ids]
|
||||
|
||||
playlist_name = self._html_search_regex(
|
||||
r'<div[^>]+class="[^"]*multicam_playlist[^>]*>\s*<h3[^>]+>([^<]+)',
|
||||
webpage, 'playlist title', fatal=False)
|
||||
|
||||
return self.playlist_result(entries, playlist_id, playlist_name)
|
||||
|
98
youtube_dl/extractor/voot.py
Normal file
98
youtube_dl/extractor/voot.py
Normal file
@ -0,0 +1,98 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .kaltura import KalturaIE
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
try_get,
|
||||
unified_timestamp,
|
||||
)
|
||||
|
||||
|
||||
class VootIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?voot\.com/(?:[^/]+/)+(?P<id>\d+)'
|
||||
_GEO_COUNTRIES = ['IN']
|
||||
_TESTS = [{
|
||||
'url': 'https://www.voot.com/shows/ishq-ka-rang-safed/1/360558/is-this-the-end-of-kamini-/441353',
|
||||
'info_dict': {
|
||||
'id': '0_8ledb18o',
|
||||
'ext': 'mp4',
|
||||
'title': 'Ishq Ka Rang Safed - Season 01 - Episode 340',
|
||||
'description': 'md5:06291fbbbc4dcbe21235c40c262507c1',
|
||||
'uploader_id': 'batchUser',
|
||||
'timestamp': 1472162937,
|
||||
'upload_date': '20160825',
|
||||
'duration': 1146,
|
||||
'series': 'Ishq Ka Rang Safed',
|
||||
'season_number': 1,
|
||||
'episode': 'Is this the end of Kamini?',
|
||||
'episode_number': 340,
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'expected_warnings': ['Failed to download m3u8 information'],
|
||||
}, {
|
||||
'url': 'https://www.voot.com/kids/characters/mighty-cat-masked-niyander-e-/400478/school-bag-disappears/440925',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.voot.com/movies/pandavas-5/424627',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
media_info = self._download_json(
|
||||
'https://wapi.voot.com/ws/ott/getMediaInfo.json', video_id,
|
||||
query={
|
||||
'platform': 'Web',
|
||||
'pId': 2,
|
||||
'mediaId': video_id,
|
||||
})
|
||||
|
||||
status_code = try_get(media_info, lambda x: x['status']['code'], int)
|
||||
if status_code != 0:
|
||||
raise ExtractorError(media_info['status']['message'], expected=True)
|
||||
|
||||
media = media_info['assets']
|
||||
|
||||
entry_id = media['EntryId']
|
||||
title = media['MediaName']
|
||||
|
||||
description, series, season_number, episode, episode_number = [None] * 5
|
||||
|
||||
for meta in try_get(media, lambda x: x['Metas'], list) or []:
|
||||
key, value = meta.get('Key'), meta.get('Value')
|
||||
if not key or not value:
|
||||
continue
|
||||
if key == 'ContentSynopsis':
|
||||
description = value
|
||||
elif key == 'RefSeriesTitle':
|
||||
series = value
|
||||
elif key == 'RefSeriesSeason':
|
||||
season_number = int_or_none(value)
|
||||
elif key == 'EpisodeMainTitle':
|
||||
episode = value
|
||||
elif key == 'EpisodeNo':
|
||||
episode_number = int_or_none(value)
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'url': 'kaltura:1982551:%s' % entry_id,
|
||||
'ie_key': KalturaIE.ie_key(),
|
||||
'title': title,
|
||||
'description': description,
|
||||
'series': series,
|
||||
'season_number': season_number,
|
||||
'episode': episode,
|
||||
'episode_number': episode_number,
|
||||
'timestamp': unified_timestamp(media.get('CreationDate')),
|
||||
'duration': int_or_none(media.get('Duration')),
|
||||
'view_count': int_or_none(media.get('ViewCounter')),
|
||||
'like_count': int_or_none(media.get('like_counter')),
|
||||
}
|
@ -1,6 +1,8 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
@ -28,6 +30,12 @@ class VzaarIE(InfoExtractor):
|
||||
},
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def _extract_urls(webpage):
|
||||
return re.findall(
|
||||
r'<iframe[^>]+src=["\']((?:https?:)?//(?:view\.vzaar\.com)/[0-9]+)',
|
||||
webpage)
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
video_data = self._download_json(
|
||||
|
151
youtube_dl/extractor/watchbox.py
Normal file
151
youtube_dl/extractor/watchbox.py
Normal file
@ -0,0 +1,151 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
strip_or_none,
|
||||
try_get,
|
||||
unified_timestamp,
|
||||
)
|
||||
|
||||
|
||||
class WatchBoxIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?watchbox\.de/(?P<kind>serien|filme)/(?:[^/]+/)*[^/]+-(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
# film
|
||||
'url': 'https://www.watchbox.de/filme/free-jimmy-12325.html',
|
||||
'info_dict': {
|
||||
'id': '341368',
|
||||
'ext': 'mp4',
|
||||
'title': 'Free Jimmy',
|
||||
'description': 'md5:bcd8bafbbf9dc0ef98063d344d7cc5f6',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'duration': 4890,
|
||||
'age_limit': 16,
|
||||
'release_year': 2009,
|
||||
},
|
||||
'params': {
|
||||
'format': 'bestvideo',
|
||||
'skip_download': True,
|
||||
},
|
||||
'expected_warnings': ['Failed to download m3u8 information'],
|
||||
}, {
|
||||
# episode
|
||||
'url': 'https://www.watchbox.de/serien/ugly-americans-12231/staffel-1/date-in-der-hoelle-328286.html',
|
||||
'info_dict': {
|
||||
'id': '328286',
|
||||
'ext': 'mp4',
|
||||
'title': 'S01 E01 - Date in der Hölle',
|
||||
'description': 'md5:2f31c74a8186899f33cb5114491dae2b',
|
||||
'thumbnail': r're:^https?://.*\.jpg$',
|
||||
'duration': 1291,
|
||||
'age_limit': 12,
|
||||
'release_year': 2010,
|
||||
'series': 'Ugly Americans',
|
||||
'season_number': 1,
|
||||
'episode': 'Date in der Hölle',
|
||||
'episode_number': 1,
|
||||
},
|
||||
'params': {
|
||||
'format': 'bestvideo',
|
||||
'skip_download': True,
|
||||
},
|
||||
'expected_warnings': ['Failed to download m3u8 information'],
|
||||
}, {
|
||||
'url': 'https://www.watchbox.de/serien/ugly-americans-12231/staffel-2/der-ring-des-powers-328270',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
kind, video_id = mobj.group('kind', 'id')
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
source = self._parse_json(
|
||||
self._search_regex(
|
||||
r'(?s)source\s*:\s*({.+?})\s*,\s*\n', webpage, 'source',
|
||||
default='{}'),
|
||||
video_id, transform_source=js_to_json, fatal=False) or {}
|
||||
|
||||
video_id = compat_str(source.get('videoId') or video_id)
|
||||
|
||||
devapi = self._download_json(
|
||||
'http://api.watchbox.de/devapi/id/%s' % video_id, video_id, query={
|
||||
'format': 'json',
|
||||
'apikey': 'hbbtv',
|
||||
}, fatal=False)
|
||||
|
||||
item = try_get(devapi, lambda x: x['items'][0], dict) or {}
|
||||
|
||||
title = item.get('title') or try_get(
|
||||
item, lambda x: x['movie']['headline_movie'],
|
||||
compat_str) or source['title']
|
||||
|
||||
formats = []
|
||||
hls_url = item.get('media_videourl_hls') or source.get('hls')
|
||||
if hls_url:
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
hls_url, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
dash_url = item.get('media_videourl_wv') or source.get('dash')
|
||||
if dash_url:
|
||||
formats.extend(self._extract_mpd_formats(
|
||||
dash_url, video_id, mpd_id='dash', fatal=False))
|
||||
mp4_url = item.get('media_videourl')
|
||||
if mp4_url:
|
||||
formats.append({
|
||||
'url': mp4_url,
|
||||
'format_id': 'mp4',
|
||||
'width': int_or_none(item.get('width')),
|
||||
'height': int_or_none(item.get('height')),
|
||||
'tbr': int_or_none(item.get('bitrate')),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
description = strip_or_none(item.get('descr'))
|
||||
thumbnail = item.get('media_content_thumbnail_large') or source.get('poster') or item.get('media_thumbnail')
|
||||
duration = int_or_none(item.get('media_length') or source.get('length'))
|
||||
timestamp = unified_timestamp(item.get('pubDate'))
|
||||
view_count = int_or_none(item.get('media_views'))
|
||||
age_limit = int_or_none(try_get(item, lambda x: x['movie']['fsk']))
|
||||
release_year = int_or_none(try_get(item, lambda x: x['movie']['rel_year']))
|
||||
|
||||
info = {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'thumbnail': thumbnail,
|
||||
'duration': duration,
|
||||
'timestamp': timestamp,
|
||||
'view_count': view_count,
|
||||
'age_limit': age_limit,
|
||||
'release_year': release_year,
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
if kind.lower() == 'serien':
|
||||
series = try_get(
|
||||
item, lambda x: x['special']['title'],
|
||||
compat_str) or source.get('format')
|
||||
season_number = int_or_none(self._search_regex(
|
||||
r'^S(\d{1,2})\s*E\d{1,2}', title, 'season number',
|
||||
default=None) or self._search_regex(
|
||||
r'/staffel-(\d+)/', url, 'season number', default=None))
|
||||
episode = source.get('title')
|
||||
episode_number = int_or_none(self._search_regex(
|
||||
r'^S\d{1,2}\s*E(\d{1,2})', title, 'episode number',
|
||||
default=None))
|
||||
info.update({
|
||||
'series': series,
|
||||
'season_number': season_number,
|
||||
'episode': episode,
|
||||
'episode_number': episode_number,
|
||||
})
|
||||
|
||||
return info
|
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user