Compare commits

..

83 Commits

Author SHA1 Message Date
Sergey M․
228c1d685b release 2020.05.29 2020-05-29 03:33:13 +07:00
Sergey M․
efd72b05d2 [ChangeLog] Actualize
[ci skip]
2020-05-29 03:28:44 +07:00
Sergey M․
fe515e5c75 [ard:beta] Extend _VALID_URL (closes #25405) 2020-05-29 02:01:51 +07:00
striker.sh
1db5ab6b34 [youtube] Add support for more invidious instances (#25417) 2020-05-27 01:26:45 +07:00
Sergey M․
2791e80b60 [postprocessor/ffmpeg] Embed series metadata with --add-metadata 2020-05-23 12:28:15 +07:00
JordanWeatherby
8f841fafcd [giantbomb] Extend _VALID_URL (#25222) 2020-05-21 04:30:50 +07:00
Michael Klein
a54c5f83c0 [ard] Improve _VALID_URL (closes #25134) (#25198) 2020-05-20 04:08:08 +07:00
Sergey M․
cd13343ad8 [redtube] Improve formats extraction and extract m3u8 formats (closes #25311, closes #25321) 2020-05-20 03:39:41 +07:00
Rob
9cd5f54e31 [utils] Fix file permissions in write_json_file (closes #12471) (#25122) 2020-05-20 03:21:52 +07:00
tlsssl
9a269547f2 [indavideo] Switch to HTTPS for API request (#25191) 2020-05-20 02:13:06 +07:00
Dave Loyall
bf097a5077 [redtube] Improve title extraction (#25208) 2020-05-20 02:11:05 +07:00
Remita Amine
52c50a10af [vimeo] improve format extraction and sorting(closes #25285) 2020-05-15 15:57:06 +01:00
Remita Amine
b334732709 [soundcloud] reduce API playlist page limit(closes #25274) 2020-05-15 14:13:02 +01:00
Juan Francisco Cantero Hurtado
384bf91f88 [youtube] Add support for yewtu.be (#25226) 2020-05-14 05:54:42 +07:00
TotalCaesar659
fae11394f0 [README.md] flake8 HTTPS URL (#25230) 2020-05-14 05:53:17 +07:00
comsomisha
adc13b0748 [mailru] Fix extraction (closes #24530) (#25239) 2020-05-14 05:51:40 +07:00
Sergey M․
327593257c [bbccouk] PEP8 2020-05-14 05:11:42 +07:00
Remita Amine
9d8f3a12a6 [spike] fix Bellator mgid extraction(closes #25195) 2020-05-12 20:49:08 +01:00
Sergey M․
b002bc433a release 2020.05.08 2020-05-08 18:10:37 +07:00
Sergey M․
b74896dad1 [ChangeLog] Actualize
[ci skip]
2020-05-08 18:07:05 +07:00
Sergey M․
fa3db38333 [youtube] Improve signature cipher extraction (closes #25188) 2020-05-08 17:42:30 +07:00
Sergey M․
30fa5c6087 [iprima] Improve extraction (closes #25138) 2020-05-06 23:20:14 +07:00
Sergey M․
6c907eb33f [downloader/http] Request last data block of exact remaining size
Always request last data block of exact size remaining to download if possible not the current block size.
2020-05-05 21:43:39 +07:00
Sergey M․
f7b42518dc [downloader/http] Finish downloading once received data length matches expected
Always do this if possible, i.e. if Content-Length or expected length is known, not only in test.
This will save unnecessary last extra loop trying to read 0 bytes.
2020-05-05 21:43:39 +07:00
Remita Amine
ce7db64bf1 [uol] fix extraction(closes #22007) 2020-05-05 11:19:40 +01:00
hh0rva1h
1328305851 [orf] Add support for more radio stations (closes #24938) (#24968) 2020-05-05 06:22:50 +07:00
Sergey M․
6c22cee673 [extractor/common] Use compat_cookiejar_Cookie for _set_cookie (closes #23256, closes #24776)
To always ensure cookie name and value are bytestrings on python 2.
2020-05-05 06:00:37 +07:00
Sergey M․
6d874fee2a [compat] Introduce compat_cookiejar_Cookie 2020-05-05 05:54:10 +07:00
Sergey M․
676723e0da [dailymotion] Fix typo 2020-05-05 05:09:07 +07:00
Sergey M․
c380cc28c4 [utils] Improve cookie files support
+ Add support for UTF-8 in cookie files
* Skip malformed cookie file entries instead of crashing (invalid entry len, invalid expires at)
2020-05-05 04:21:25 +07:00
Sergey M․
f7f304910d [puhutv] Remove no longer available HTTP formats (closes #25124) 2020-05-04 21:15:19 +07:00
Sergey M․
00a41ca4c3 release 2020.05.03 2020-05-03 00:05:05 +07:00
Sergey M․
66f32ca0e1 [ChangeLog] Actualize
[ci skip]
2020-05-02 23:59:25 +07:00
Sergey M․
6ffc3cf74a [crunchyroll] Fix and improve extraction (closes #25096, closes #25060) 2020-05-02 23:42:51 +07:00
Sergey M․
4433bb0245 [extractor/common] Extract multiple JSON-LD entries 2020-05-02 23:40:30 +07:00
Sergey M․
e40c758c2a [youtube] Improve player id extraction and add tests 2020-05-02 07:18:08 +07:00
Sergey M․
011e75e641 [youtube] Use redirected video id if any (closes #25063) 2020-05-01 00:40:38 +07:00
Remita Amine
2468a6fa64 [yahoo] fix GYAO Player extraction and relax title URL regex(closes #24178)(closes #24778) 2020-04-29 14:56:32 +01:00
Remita Amine
700265bfcf [tvplay] fix Viafree extraction(closes #15189)(closes #24473)(closes #24789) 2020-04-29 13:38:58 +01:00
Sergey M․
c97f5e934f [tenplay] Relax _VALID_URL (closes #25001) 2020-04-26 12:41:33 +07:00
Sergey M․
38db9a405a [prosiebensat1] Extract series metadata 2020-04-24 02:56:10 +07:00
Philipp Stehle
2cdfe977d7 [prosiebensat1] Improve extraction and remove 7tv.de support (#24948) 2020-04-24 02:44:13 +07:00
willbeaufoy
46d0baf941 [options] Clarify doc on --exec command (closes #19087) (#24883) 2020-04-24 02:31:38 +07:00
Sergey M․
00eb865b3c [youtube] Fix DRM videos detection (refs #24736) 2020-04-11 23:05:08 +07:00
Sergey M․
2f19835726 [thisoldhouse] Improve video id extraction (closes #24549) 2020-04-11 20:07:37 +07:00
AndrewMBL
533f3e3557 [thisoldhouse] Fix video id extraction (closes #24548)
Added support for:
with of without "www."
and either  ".chorus.build" or ".com"

It now validated correctly on older URL's
```
<iframe src="https://thisoldhouse.chorus.build/videos/zype/5e33baec27d2e50001d5f52f
```
and newer ones
```
<iframe src="https://www.thisoldhouse.com/videos/zype/5e2b70e95216cc0001615120
```
2020-04-11 20:07:32 +07:00
Sergey M․
75294a5ed0 [soundcloud] Improve AAC format extraction (closes #19173, closes #24708) 2020-04-10 17:26:03 +07:00
tom
b9e5f87291 [soundcloud] Extract AAC format 2020-04-10 17:25:04 +07:00
Sergey M․
6b09401b0b [youtube] Skip broken multifeed videos (closes #24711) 2020-04-09 22:42:43 +07:00
Sergey M․
5caf88ccb4 [nova:embed] Fix extraction (closes #24700) 2020-04-09 03:52:29 +07:00
Sergey M․
dcc8522fdb [motherless] Fix extraction (closes #24699) 2020-04-09 02:14:49 +07:00
Felix Stupp
c9595ee780 [twitch:clips] Extend _VALID_URL (closes #24290) (#24642) 2020-04-07 23:21:25 +07:00
Sergey M․
91bd3bd019 [tv4] Fix ISM formats extraction (closes #24667) 2020-04-07 22:56:06 +07:00
Sergey M․
13b08034b5 [extractor/common] Skip malformed ISM manifest XMLs while extracting ISM formats (#24667) 2020-04-07 22:55:59 +07:00
Sergey M․
6a6e1a0cd8 [tele5] Fix extraction (closes #24553) 2020-04-06 02:05:06 +07:00
Sergey M․
4e7b5bba5f [mofosex] Add support for generic embeds (closes #24633) 2020-04-06 01:29:58 +07:00
Sergey M․
52c4c51556 [youporn] Add support form generic embeds 2020-04-05 20:56:14 +07:00
Sergey M․
8fae1a04eb [spankwire] Add support for generic embeds (refs #24633) 2020-04-05 20:42:56 +07:00
Sergey M․
d44a707fdd [spankwire] Fix extraction (closes #18924, closes #20648) 2020-04-05 20:42:56 +07:00
Sergey M․
049c0486bb release 2020.03.24 2020-03-24 03:14:30 +07:00
Sergey M․
30b5121a1c [ChangeLog] Actualize
[ci skip]
2020-03-24 03:12:15 +07:00
Sergey M․
b439634f0e [ChangeLog] Actualize
[ci skip]
2020-03-24 03:07:34 +07:00
Sergey M․
6e47200b6e [teachable] Update test 2020-03-24 02:57:53 +07:00
Sergey M․
38fa761a45 [teachable] Update gns3 domain 2020-03-24 02:57:48 +07:00
Sergey M․
08a27407c4 [teachable] Update upskillcourses domain
New version does not use teachable platform any longer
2020-03-24 02:57:44 +07:00
Sergey M․
be7dacf9cf [generic] Look for teachable embeds before wistia 2020-03-24 02:57:38 +07:00
Sergey M․
4560adc820 [teachable] Extract chapter metadata (closes #24421) 2020-03-24 02:57:32 +07:00
Sergey M․
63dce3094b [bilibili] Add support for player.bilibili.com (closes #24402) 2020-03-24 00:24:39 +07:00
Sergey M․
b4eb08bb03 [bilibili] Add support for new URL schema with BV ids (closes #24439, closes #24442) 2020-03-24 00:11:39 +07:00
Remita Amine
2e20cb3636 [limelight] remove disabled API requests(closes #24255) 2020-03-23 12:57:10 +01:00
Remita Amine
a6c5859d6b [soundcloud] fix download url extraction(closes #24394) 2020-03-22 09:24:26 +01:00
Sergey M․
c76cdf2382 [cbc:watch] Fix authenticated device token caching (closes #19160) 2020-03-21 01:43:13 +07:00
Devon Meunier
787c360467 [cbc:watch] Add support for authentication 2020-03-21 01:43:08 +07:00
Sergey M․
73453430c1 [hellporno] Fix extraction (closes #24399) 2020-03-21 00:59:48 +07:00
Sergey M․
158bc5ac03 [xtube] Fix typo 2020-03-14 22:58:10 +07:00
Sergey M․
4568a11802 [xtube] Fix formats extraction (closes #24348) 2020-03-14 22:57:10 +07:00
Sergey M․
4cbce88f8b [ndr] Fix extraction (closes #24326) 2020-03-14 04:58:24 +07:00
Sergey M․
541fe3eaff [nhk] Update m3u8 URL and use native hls (#24329) 2020-03-14 04:42:40 +07:00
Sergey M․
9bfe088594 [nhk] Remove obsolete rtmp formats (closes #24329) 2020-03-14 04:40:11 +07:00
Sergey M․
fcaf4d7a06 [nhk] Relax _VALID_URL (#24329) 2020-03-14 04:39:21 +07:00
Remita Amine
40b6495d40 Revert "[vimeo] fix showcase password protected video extraction(closes #24224)"
This reverts commit 12ee431676.
2020-03-13 08:59:10 +01:00
Sergey M․
f1a8511f7b [utils] Add reference to cookie file format 2020-03-10 04:59:02 +07:00
Sergey M․
042b664933 Revert "[utils] Add support for cookies with spaces used instead of tabs"
According to [1] TABs must be used as separators between fields.
Files produces by some tools with spaces as separators are considered
malformed.

1. https://curl.haxx.se/docs/http-cookies.html

This reverts commit cff99c91d1.
2020-03-10 04:53:51 +07:00
62 changed files with 1451 additions and 751 deletions

View File

@@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.08. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.05.29. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2020.03.08**
- [ ] I've verified that I'm running youtube-dl version **2020.05.29**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.03.08
[debug] youtube-dl version 2020.05.29
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -19,7 +19,7 @@ labels: 'site-support-request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.08. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.05.29. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2020.03.08**
- [ ] I've verified that I'm running youtube-dl version **2020.05.29**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -18,13 +18,13 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.08. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.05.29. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2020.03.08**
- [ ] I've verified that I'm running youtube-dl version **2020.05.29**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.08. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.05.29. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2020.03.08**
- [ ] I've verified that I'm running youtube-dl version **2020.05.29**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.03.08
[debug] youtube-dl version 2020.05.29
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -19,13 +19,13 @@ labels: 'request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.03.08. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.05.29. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2020.03.08**
- [ ] I've verified that I'm running youtube-dl version **2020.05.29**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

View File

@@ -153,7 +153,7 @@ After you have ensured this site is distributing its content legally, you can fo
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py

104
ChangeLog
View File

@@ -1,7 +1,109 @@
version 2020.05.29
Core
* [postprocessor/ffmpeg] Embed series metadata with --add-metadata
* [utils] Fix file permissions in write_json_file (#12471, #25122)
Extractors
* [ard:beta] Extend URL regular expression (#25405)
+ [youtube] Add support for more invidious instances (#25417)
* [giantbomb] Extend URL regular expression (#25222)
* [ard] Improve URL regular expression (#25134, #25198)
* [redtube] Improve formats extraction and extract m3u8 formats (#25311,
#25321)
* [indavideo] Switch to HTTPS for API request (#25191)
* [redtube] Improve title extraction (#25208)
* [vimeo] Improve format extraction and sorting (#25285)
* [soundcloud] Reduce API playlist page limit (#25274)
+ [youtube] Add support for yewtu.be (#25226)
* [mailru] Fix extraction (#24530, #25239)
* [bellator] Fix mgid extraction (#25195)
version 2020.05.08
Core
* [downloader/http] Request last data block of exact remaining size
* [downloader/http] Finish downloading once received data length matches
expected
* [extractor/common] Use compat_cookiejar_Cookie for _set_cookie to always
ensure cookie name and value are bytestrings on python 2 (#23256, #24776)
+ [compat] Introduce compat_cookiejar_Cookie
* [utils] Improve cookie files support
+ Add support for UTF-8 in cookie files
* Skip malformed cookie file entries instead of crashing (invalid entry
length, invalid expires at)
Extractors
* [youtube] Improve signature cipher extraction (#25187, #25188)
* [iprima] Improve extraction (#25138)
* [uol] Fix extraction (#22007)
+ [orf] Add support for more radio stations (#24938, #24968)
* [dailymotion] Fix typo
- [puhutv] Remove no longer available HTTP formats (#25124)
version 2020.05.03
Core
+ [extractor/common] Extract multiple JSON-LD entries
* [options] Clarify doc on --exec command (#19087, #24883)
* [extractor/common] Skip malformed ISM manifest XMLs while extracting
ISM formats (#24667)
Extractors
* [crunchyroll] Fix and improve extraction (#25096, #25060)
* [youtube] Improve player id extraction
* [youtube] Use redirected video id if any (#25063)
* [yahoo] Fix GYAO Player extraction and relax URL regular expression
(#24178, #24778)
* [tvplay] Fix Viafree extraction (#15189, #24473, #24789)
* [tenplay] Relax URL regular expression (#25001)
+ [prosiebensat1] Extract series metadata
* [prosiebensat1] Improve extraction and remove 7tv.de support (#24948)
- [prosiebensat1] Remove 7tv.de support (#24948)
* [youtube] Fix DRM videos detection (#24736)
* [thisoldhouse] Fix video id extraction (#24548, #24549)
+ [soundcloud] Extract AAC format (#19173, #24708)
* [youtube] Skip broken multifeed videos (#24711)
* [nova:embed] Fix extraction (#24700)
* [motherless] Fix extraction (#24699)
* [twitch:clips] Extend URL regular expression (#24290, #24642)
* [tv4] Fix ISM formats extraction (#24667)
* [tele5] Fix extraction (#24553)
+ [mofosex] Add support for generic embeds (#24633)
+ [youporn] Add support for generic embeds
+ [spankwire] Add support for generic embeds (#24633)
* [spankwire] Fix extraction (#18924, #20648)
version 2020.03.24
Core
- [utils] Revert support for cookie files with spaces used instead of tabs
Extractors
* [teachable] Update upskillcourses and gns3 domains
* [generic] Look for teachable embeds before wistia
+ [teachable] Extract chapter metadata (#24421)
+ [bilibili] Add support for player.bilibili.com (#24402)
+ [bilibili] Add support for new URL schema with BV ids (#24439, #24442)
* [limelight] Remove disabled API requests (#24255)
* [soundcloud] Fix download URL extraction (#24394)
+ [cbc:watch] Add support for authentication (#19160)
* [hellporno] Fix extraction (#24399)
* [xtube] Fix formats extraction (#24348)
* [ndr] Fix extraction (#24326)
* [nhk] Update m3u8 URL and use native HLS downloader (#24329)
- [nhk] Remove obsolete rtmp formats (#24329)
* [nhk] Relax URL regular expression (#24329)
- [vimeo] Revert fix showcase password protected video extraction (#24224)
version 2020.03.08
Core
+ [utils] Add support for cookie files with spaces
+ [utils] Add support for cookie files with spaces used instead of tabs
Extractors
+ [pornhub] Add support for pornhubpremium.com (#24288)

View File

@@ -434,9 +434,9 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
either the path to the binary or its
containing directory.
--exec CMD Execute a command on the file after
downloading, similar to find's -exec
syntax. Example: --exec 'adb push {}
/sdcard/Music/ && rm {}'
downloading and post-processing, similar to
find's -exec syntax. Example: --exec 'adb
push {} /sdcard/Music/ && rm {}'
--convert-subs FORMAT Convert the subtitles to other format
(currently supported: srt|ass|vtt|lrc)
@@ -1032,7 +1032,7 @@ After you have ensured this site is distributing its content legally, you can fo
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py

View File

@@ -98,6 +98,7 @@
- **BiliBili**
- **BilibiliAudio**
- **BilibiliAudioAlbum**
- **BiliBiliPlayer**
- **BioBioChileTV**
- **BIQLE**
- **BitChute**
@@ -496,6 +497,7 @@
- **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex**
- **MofosexEmbed**
- **Mojvideo**
- **Morningstar**: morningstar.com
- **Motherless**
@@ -618,11 +620,21 @@
- **Ooyala**
- **OoyalaExternal**
- **OraTV**
- **orf:burgenland**: Radio Burgenland
- **orf:fm4**: radio FM4
- **orf:fm4:story**: fm4.orf.at stories
- **orf:iptv**: iptv.ORF.at
- **orf:kaernten**: Radio Kärnten
- **orf:noe**: Radio Niederösterreich
- **orf:oberoesterreich**: Radio Oberösterreich
- **orf:oe1**: Radio Österreich 1
- **orf:oe3**: Radio Österreich 3
- **orf:salzburg**: Radio Salzburg
- **orf:steiermark**: Radio Steiermark
- **orf:tirol**: Radio Tirol
- **orf:tvthek**: ORF TVthek
- **orf:vorarlberg**: Radio Vorarlberg
- **orf:wien**: Radio Wien
- **OsnatelTV**
- **OutsideTV**
- **PacktPub**

View File

@@ -14,9 +14,6 @@ from youtube_dl.utils import YoutubeDLCookieJar
class TestYoutubeDLCookieJar(unittest.TestCase):
def __assert_cookie_has_value(self, cookiejar, key):
self.assertEqual(cookiejar._cookies['www.foobar.foobar']['/'][key].value, key + '_VALUE')
def test_keep_session_cookies(self):
cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/session_cookies.txt')
cookiejar.load(ignore_discard=True, ignore_expires=True)
@@ -35,13 +32,19 @@ class TestYoutubeDLCookieJar(unittest.TestCase):
def test_strip_httponly_prefix(self):
cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/httponly_cookies.txt')
cookiejar.load(ignore_discard=True, ignore_expires=True)
self.__assert_cookie_has_value(cookiejar, 'HTTPONLY_COOKIE')
self.__assert_cookie_has_value(cookiejar, 'JS_ACCESSIBLE_COOKIE')
def test_convert_spaces_to_tabs(self):
cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/cookie_file_with_spaces.txt')
def assert_cookie_has_value(key):
self.assertEqual(cookiejar._cookies['www.foobar.foobar']['/'][key].value, key + '_VALUE')
assert_cookie_has_value('HTTPONLY_COOKIE')
assert_cookie_has_value('JS_ACCESSIBLE_COOKIE')
def test_malformed_cookies(self):
cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/malformed_cookies.txt')
cookiejar.load(ignore_discard=True, ignore_expires=True)
self.__assert_cookie_has_value(cookiejar, 'COOKIE')
# Cookies should be empty since all malformed cookie file entries
# will be ignored
self.assertFalse(cookiejar._cookies)
if __name__ == '__main__':

View File

@@ -74,6 +74,28 @@ _TESTS = [
]
class TestPlayerInfo(unittest.TestCase):
def test_youtube_extract_player_info(self):
PLAYER_URLS = (
('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/en_US/base.js', '64dddad9'),
# obsolete
('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'),
('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'),
('https://www.youtube.com/yts/jsbin/player_ias-vflCPQUIL/en_US/base.js', 'vflCPQUIL'),
('https://www.youtube.com/yts/jsbin/player-vflzQZbt7/en_US/base.js', 'vflzQZbt7'),
('https://www.youtube.com/yts/jsbin/player-en_US-vflaxXRn1/base.js', 'vflaxXRn1'),
('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js', 'vflXGBaUN'),
('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflKjOTVq/html5player.js', 'vflKjOTVq'),
('http://s.ytimg.com/yt/swfbin/watch_as3-vflrEm9Nq.swf', 'vflrEm9Nq'),
('https://s.ytimg.com/yts/swfbin/player-vflenCdZL/watch_as3.swf', 'vflenCdZL'),
)
for player_url, expected_player_id in PLAYER_URLS:
expected_player_type = player_url.split('.')[-1]
player_type, player_id = YoutubeIE._extract_player_info(player_url)
self.assertEqual(player_type, expected_player_type)
self.assertEqual(player_id, expected_player_id)
class TestSignature(unittest.TestCase):
def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__))

View File

@@ -1,5 +0,0 @@
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This is a generated file! Do not edit.
www.foobar.foobar FALSE / TRUE 2147483647 COOKIE COOKIE_VALUE

View File

@@ -0,0 +1,9 @@
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This is a generated file! Do not edit.
# Cookie file entry with invalid number of fields - 6 instead of 7
www.foobar.foobar FALSE / FALSE 0 COOKIE
# Cookie file entry with invalid expires at
www.foobar.foobar FALSE / FALSE 1.7976931348623157e+308 COOKIE VALUE

View File

@@ -57,6 +57,17 @@ try:
except ImportError: # Python 2
import cookielib as compat_cookiejar
if sys.version_info[0] == 2:
class compat_cookiejar_Cookie(compat_cookiejar.Cookie):
def __init__(self, version, name, value, *args, **kwargs):
if isinstance(name, compat_str):
name = name.encode()
if isinstance(value, compat_str):
value = value.encode()
compat_cookiejar.Cookie.__init__(self, version, name, value, *args, **kwargs)
else:
compat_cookiejar_Cookie = compat_cookiejar.Cookie
try:
import http.cookies as compat_cookies
except ImportError: # Python 2
@@ -2987,6 +2998,7 @@ __all__ = [
'compat_basestring',
'compat_chr',
'compat_cookiejar',
'compat_cookiejar_Cookie',
'compat_cookies',
'compat_ctypes_WINFUNCTYPE',
'compat_etree_Element',

View File

@@ -227,7 +227,7 @@ class HttpFD(FileDownloader):
while True:
try:
# Download and write
data_block = ctx.data.read(block_size if not is_test else min(block_size, data_len - byte_counter))
data_block = ctx.data.read(block_size if data_len is None else min(block_size, data_len - byte_counter))
# socket.timeout is a subclass of socket.error but may not have
# errno set
except socket.timeout as e:
@@ -299,7 +299,7 @@ class HttpFD(FileDownloader):
'elapsed': now - ctx.start_time,
})
if is_test and byte_counter == data_len:
if data_len is not None and byte_counter == data_len:
break
if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len:

View File

@@ -249,7 +249,7 @@ class ARDMediathekIE(ARDMediathekBaseIE):
class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TESTS = [{
# available till 14.02.2019
'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
@@ -263,6 +263,9 @@ class ARDIE(InfoExtractor):
'upload_date': '20180214',
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
'url': 'https://www.daserste.de/information/reportage-dokumentation/erlebnis-erde/videosextern/woelfe-und-herdenschutzhunde-ungleiche-brueder-102.html',
'only_matching': True,
}, {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'only_matching': True,
@@ -310,9 +313,9 @@ class ARDIE(InfoExtractor):
class ARDBetaMediathekIE(ARDMediathekBaseIE):
_VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/(?P<client>[^/]+)/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
'url': 'https://ardmediathek.de/ard/video/die-robuste-roswita/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f',
'info_dict': {
'display_id': 'die-robuste-roswita',
@@ -325,6 +328,15 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
'upload_date': '20191222',
'ext': 'mp4',
},
}, {
'url': 'https://beta.ardmediathek.de/ard/video/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'only_matching': True,
}, {
'url': 'https://ardmediathek.de/ard/video/saartalk/saartalk-gesellschaftsgift-haltung-gegen-hass/sr-fernsehen/Y3JpZDovL3NyLW9ubGluZS5kZS9TVF84MTY4MA/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/ard/video/trailer/private-eyes-s01-e01/one/Y3JpZDovL3dkci5kZS9CZWl0cmFnLTE1MTgwYzczLWNiMTEtNGNkMS1iMjUyLTg5MGYzOWQxZmQ1YQ/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/',
'only_matching': True,
@@ -336,7 +348,11 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
display_id = mobj.group('display_id') or video_id
display_id = mobj.group('display_id')
if display_id:
display_id = display_id.rstrip('/')
if not display_id:
display_id = video_id
player_page = self._download_json(
'https://api.ardmediathek.de/public-gateway',

View File

@@ -528,7 +528,7 @@ class BBCCoUkIE(InfoExtractor):
def get_programme_id(item):
def get_from_attributes(item):
for p in('identifier', 'group'):
for p in ('identifier', 'group'):
value = item.get(p)
if value and re.match(r'^[pb][\da-z]{7}$', value):
return value

View File

@@ -24,7 +24,18 @@ from ..utils import (
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/(?P<anime_id>\d+)/play#)(?P<id>\d+)'
_VALID_URL = r'''(?x)
https?://
(?:(?:www|bangumi)\.)?
bilibili\.(?:tv|com)/
(?:
(?:
video/[aA][vV]|
anime/(?P<anime_id>\d+)/play\#
)(?P<id_bv>\d+)|
video/[bB][vV](?P<id>[^/?#&]+)
)
'''
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',
@@ -92,6 +103,10 @@ class BiliBiliIE(InfoExtractor):
'skip_download': True, # Test metadata only
},
}]
}, {
# new BV video id format
'url': 'https://www.bilibili.com/video/BV1JE411F741',
'only_matching': True,
}]
_APP_KEY = 'iVGUTjsxvpLeuDCf'
@@ -109,7 +124,7 @@ class BiliBiliIE(InfoExtractor):
url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = mobj.group('id') or mobj.group('id_bv')
anime_id = mobj.group('anime_id')
webpage = self._download_webpage(url, video_id)
@@ -419,3 +434,17 @@ class BilibiliAudioAlbumIE(BilibiliAudioBaseIE):
entries, am_id, album_title, album_data.get('intro'))
return self.playlist_result(entries, am_id)
class BiliBiliPlayerIE(InfoExtractor):
_VALID_URL = r'https?://player\.bilibili\.com/player\.html\?.*?\baid=(?P<id>\d+)'
_TEST = {
'url': 'http://player.bilibili.com/player.html?aid=92494333&cid=157926707&page=1',
'only_matching': True,
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'http://www.bilibili.tv/video/av%s/' % video_id,
ie=BiliBiliIE.ie_key(), video_id=video_id)

View File

@@ -1,8 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
import hashlib
import json
import re
from xml.sax.saxutils import escape
from .common import InfoExtractor
from ..compat import (
@@ -216,6 +218,29 @@ class CBCWatchBaseIE(InfoExtractor):
'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
}
_GEO_COUNTRIES = ['CA']
_LOGIN_URL = 'https://api.loginradius.com/identity/v2/auth/login'
_TOKEN_URL = 'https://cloud-api.loginradius.com/sso/jwt/api/token'
_API_KEY = '3f4beddd-2061-49b0-ae80-6f1f2ed65b37'
_NETRC_MACHINE = 'cbcwatch'
def _signature(self, email, password):
data = json.dumps({
'email': email,
'password': password,
}).encode()
headers = {'content-type': 'application/json'}
query = {'apikey': self._API_KEY}
resp = self._download_json(self._LOGIN_URL, None, data=data, headers=headers, query=query)
access_token = resp['access_token']
# token
query = {
'access_token': access_token,
'apikey': self._API_KEY,
'jwtapp': 'jwt',
}
resp = self._download_json(self._TOKEN_URL, None, headers=headers, query=query)
return resp['signature']
def _call_api(self, path, video_id):
url = path if path.startswith('http') else self._API_BASE_URL + path
@@ -239,7 +264,8 @@ class CBCWatchBaseIE(InfoExtractor):
def _real_initialize(self):
if self._valid_device_token():
return
device = self._downloader.cache.load('cbcwatch', 'device') or {}
device = self._downloader.cache.load(
'cbcwatch', self._cache_device_key()) or {}
self._device_id, self._device_token = device.get('id'), device.get('token')
if self._valid_device_token():
return
@@ -248,16 +274,30 @@ class CBCWatchBaseIE(InfoExtractor):
def _valid_device_token(self):
return self._device_id and self._device_token
def _cache_device_key(self):
email, _ = self._get_login_info()
return '%s_device' % hashlib.sha256(email.encode()).hexdigest() if email else 'device'
def _register_device(self):
self._device_id = self._device_token = None
result = self._download_xml(
self._API_BASE_URL + 'device/register',
None, 'Acquiring device token',
data=b'<device><type>web</type></device>')
self._device_id = xpath_text(result, 'deviceId', fatal=True)
self._device_token = xpath_text(result, 'deviceToken', fatal=True)
email, password = self._get_login_info()
if email and password:
signature = self._signature(email, password)
data = '<login><token>{0}</token><device><deviceId>{1}</deviceId><type>web</type></device></login>'.format(
escape(signature), escape(self._device_id)).encode()
url = self._API_BASE_URL + 'device/login'
result = self._download_xml(
url, None, data=data,
headers={'content-type': 'application/xml'})
self._device_token = xpath_text(result, 'token', fatal=True)
else:
self._device_token = xpath_text(result, 'deviceToken', fatal=True)
self._downloader.cache.store(
'cbcwatch', 'device', {
'cbcwatch', self._cache_device_key(), {
'id': self._device_id,
'token': self._device_token,
})

View File

@@ -15,7 +15,7 @@ import time
import math
from ..compat import (
compat_cookiejar,
compat_cookiejar_Cookie,
compat_cookies,
compat_etree_Element,
compat_etree_fromstring,
@@ -1182,16 +1182,33 @@ class InfoExtractor(object):
'twitter card player')
def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
json_ld = self._search_regex(
JSON_LD_RE, html, 'JSON-LD', group='json_ld', **kwargs)
json_ld_list = list(re.finditer(JSON_LD_RE, html))
default = kwargs.get('default', NO_DEFAULT)
if not json_ld:
return default if default is not NO_DEFAULT else {}
# JSON-LD may be malformed and thus `fatal` should be respected.
# At the same time `default` may be passed that assumes `fatal=False`
# for _search_regex. Let's simulate the same behavior here as well.
fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
json_ld = []
for mobj in json_ld_list:
json_ld_item = self._parse_json(
mobj.group('json_ld'), video_id, fatal=fatal)
if not json_ld_item:
continue
if isinstance(json_ld_item, dict):
json_ld.append(json_ld_item)
elif isinstance(json_ld_item, (list, tuple)):
json_ld.extend(json_ld_item)
if json_ld:
json_ld = self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
if json_ld:
return json_ld
if default is not NO_DEFAULT:
return default
elif fatal:
raise RegexNotFoundError('Unable to extract JSON-LD')
else:
self._downloader.report_warning('unable to extract JSON-LD %s' % bug_reports_message())
return {}
def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
if isinstance(json_ld, compat_str):
@@ -1256,10 +1273,10 @@ class InfoExtractor(object):
extract_interaction_statistic(e)
for e in json_ld:
if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')):
if '@context' in e:
item_type = e.get('@type')
if expected_type is not None and expected_type != item_type:
return info
continue
if item_type in ('TVEpisode', 'Episode'):
episode_name = unescapeHTML(e.get('name'))
info.update({
@@ -1293,11 +1310,17 @@ class InfoExtractor(object):
})
elif item_type == 'VideoObject':
extract_video_object(e)
continue
if expected_type is None:
continue
else:
break
video = e.get('video')
if isinstance(video, dict) and video.get('@type') == 'VideoObject':
extract_video_object(video)
break
if expected_type is None:
continue
else:
break
return dict((k, v) for k, v in info.items() if v is not None)
@staticmethod
@@ -2340,6 +2363,8 @@ class InfoExtractor(object):
if res is False:
return []
ism_doc, urlh = res
if ism_doc is None:
return []
return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id)
@@ -2818,7 +2843,7 @@ class InfoExtractor(object):
def _set_cookie(self, domain, name, value, expire_time=None, port=None,
path='/', secure=False, discard=False, rest={}, **kwargs):
cookie = compat_cookiejar.Cookie(
cookie = compat_cookiejar_Cookie(
0, name, value, port, port is not None, domain, True,
domain.startswith('.'), path, True, secure, expire_time,
discard, None, None, rest)

View File

@@ -13,6 +13,7 @@ from ..compat import (
compat_b64decode,
compat_etree_Element,
compat_etree_fromstring,
compat_str,
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
@@ -25,9 +26,9 @@ from ..utils import (
intlist_to_bytes,
int_or_none,
lowercase_escape,
merge_dicts,
remove_end,
sanitized_Request,
unified_strdate,
urlencode_postdata,
xpath_text,
)
@@ -136,6 +137,7 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
# rtmp
'skip_download': True,
},
'skip': 'Video gone',
}, {
'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
'info_dict': {
@@ -157,11 +159,12 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'info_dict': {
'id': '702409',
'ext': 'mp4',
'title': 'Re:ZERO -Starting Life in Another World- Episode 5 The Morning of Our Promise Is Still Distant',
'description': 'md5:97664de1ab24bbf77a9c01918cb7dca9',
'title': compat_str,
'description': compat_str,
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'TV TOKYO',
'upload_date': '20160508',
'uploader': 'Re:Zero Partners',
'timestamp': 1462098900,
'upload_date': '20160501',
},
'params': {
# m3u8 download
@@ -172,12 +175,13 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'info_dict': {
'id': '727589',
'ext': 'mp4',
'title': "KONOSUBA -God's blessing on this wonderful world! 2 Episode 1 Give Me Deliverance From This Judicial Injustice!",
'description': 'md5:cbcf05e528124b0f3a0a419fc805ea7d',
'title': compat_str,
'description': compat_str,
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Kadokawa Pictures Inc.',
'upload_date': '20170118',
'series': "KONOSUBA -God's blessing on this wonderful world!",
'timestamp': 1484130900,
'upload_date': '20170111',
'series': compat_str,
'season': "KONOSUBA -God's blessing on this wonderful world! 2",
'season_number': 2,
'episode': 'Give Me Deliverance From This Judicial Injustice!',
@@ -200,10 +204,11 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'info_dict': {
'id': '535080',
'ext': 'mp4',
'title': '11eyes Episode 1 Red Night ~ Piros éjszaka',
'description': 'Kakeru and Yuka are thrown into an alternate nightmarish world they call "Red Night".',
'title': compat_str,
'description': compat_str,
'uploader': 'Marvelous AQL Inc.',
'upload_date': '20091021',
'timestamp': 1255512600,
'upload_date': '20091014',
},
'params': {
# Just test metadata extraction
@@ -224,15 +229,17 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
# just test metadata extraction
'skip_download': True,
},
'skip': 'Video gone',
}, {
# A video with a vastly different season name compared to the series name
'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532',
'info_dict': {
'id': '590532',
'ext': 'mp4',
'title': 'Haiyoru! Nyaruani (ONA) Episode 1 Test',
'description': 'Mahiro and Nyaruko talk about official certification.',
'title': compat_str,
'description': compat_str,
'uploader': 'TV TOKYO',
'timestamp': 1330956000,
'upload_date': '20120305',
'series': 'Nyarko-san: Another Crawling Chaos',
'season': 'Haiyoru! Nyaruani (ONA)',
@@ -442,23 +449,21 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
webpage, 'language', default=None, group='lang')
video_title = self._html_search_regex(
r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
webpage, 'video_title')
(r'(?s)<h1[^>]*>((?:(?!<h1).)*?<(?:span[^>]+itemprop=["\']title["\']|meta[^>]+itemprop=["\']position["\'])[^>]*>(?:(?!<h1).)+?)</h1>',
r'<title>(.+?),\s+-\s+.+? Crunchyroll'),
webpage, 'video_title', default=None)
if not video_title:
video_title = re.sub(r'^Watch\s+', '', self._og_search_description(webpage))
video_title = re.sub(r' {2,}', ' ', video_title)
video_description = (self._parse_json(self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
if video_description:
video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
video_upload_date = self._html_search_regex(
[r'<div>Availability for free users:(.+?)</div>', r'<div>[^<>]+<span>\s*(.+?\d{4})\s*</span></div>'],
webpage, 'video_upload_date', fatal=False, flags=re.DOTALL)
if video_upload_date:
video_upload_date = unified_strdate(video_upload_date)
video_uploader = self._html_search_regex(
# try looking for both an uploader that's a link and one that's not
[r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
webpage, 'video_uploader', fatal=False)
webpage, 'video_uploader', default=False)
formats = []
for stream in media.get('streams', []):
@@ -611,14 +616,15 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
webpage, 'season number', default=None))
return {
info = self._search_json_ld(webpage, video_id, default={})
return merge_dicts({
'id': video_id,
'title': video_title,
'description': video_description,
'duration': duration,
'thumbnail': thumbnail,
'uploader': video_uploader,
'upload_date': video_upload_date,
'series': series,
'season': season,
'season_number': season_number,
@@ -626,7 +632,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'episode_number': episode_number,
'subtitles': subtitles,
'formats': formats,
}
}, info)
class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):

View File

@@ -32,7 +32,7 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
@staticmethod
def _get_cookie_value(cookies, name):
cookie = cookies.get('name')
cookie = cookies.get(name)
if cookie:
return cookie.value

View File

@@ -105,6 +105,7 @@ from .bilibili import (
BiliBiliBangumiIE,
BilibiliAudioIE,
BilibiliAudioAlbumIE,
BiliBiliPlayerIE,
)
from .biobiochiletv import BioBioChileTVIE
from .bitchute import (
@@ -635,7 +636,10 @@ from .mixcloud import (
from .mlb import MLBIE
from .mnet import MnetIE
from .moevideo import MoeVideoIE
from .mofosex import MofosexIE
from .mofosex import (
MofosexIE,
MofosexEmbedIE,
)
from .mojvideo import MojvideoIE
from .morningstar import MorningstarIE
from .motherless import (
@@ -800,6 +804,16 @@ from .orf import (
ORFFM4IE,
ORFFM4StoryIE,
ORFOE1IE,
ORFOE3IE,
ORFNOEIE,
ORFWIEIE,
ORFBGLIE,
ORFOOEIE,
ORFSTMIE,
ORFKTNIE,
ORFSBGIE,
ORFTIRIE,
ORFVBGIE,
ORFIPTVIE,
)
from .outsidetv import OutsideTVIE

View File

@@ -60,6 +60,9 @@ from .tnaflix import TNAFlixNetworkEmbedIE
from .drtuber import DrTuberIE
from .redtube import RedTubeIE
from .tube8 import Tube8IE
from .mofosex import MofosexEmbedIE
from .spankwire import SpankwireIE
from .youporn import YouPornIE
from .vimeo import VimeoIE
from .dailymotion import DailymotionIE
from .dailymail import DailyMailIE
@@ -2536,6 +2539,11 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
# Look for Teachable embeds, must be before Wistia
teachable_url = TeachableIE._extract_url(webpage, url)
if teachable_url:
return self.url_result(teachable_url)
# Look for embedded Wistia player
wistia_urls = WistiaIE._extract_urls(webpage)
if wistia_urls:
@@ -2710,6 +2718,21 @@ class GenericIE(InfoExtractor):
if tube8_urls:
return self.playlist_from_matches(tube8_urls, video_id, video_title, ie=Tube8IE.ie_key())
# Look for embedded Mofosex player
mofosex_urls = MofosexEmbedIE._extract_urls(webpage)
if mofosex_urls:
return self.playlist_from_matches(mofosex_urls, video_id, video_title, ie=MofosexEmbedIE.ie_key())
# Look for embedded Spankwire player
spankwire_urls = SpankwireIE._extract_urls(webpage)
if spankwire_urls:
return self.playlist_from_matches(spankwire_urls, video_id, video_title, ie=SpankwireIE.ie_key())
# Look for embedded YouPorn player
youporn_urls = YouPornIE._extract_urls(webpage)
if youporn_urls:
return self.playlist_from_matches(youporn_urls, video_id, video_title, ie=YouPornIE.ie_key())
# Look for embedded Tvigle player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
@@ -3141,10 +3164,6 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
teachable_url = TeachableIE._extract_url(webpage, url)
if teachable_url:
return self.url_result(teachable_url)
indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
if indavideo_urls:
return self.playlist_from_matches(

View File

@@ -13,10 +13,10 @@ from ..utils import (
class GiantBombIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?giantbomb\.com/videos/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?giantbomb\.com/(?:videos|shows)/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
_TESTS = [{
'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/',
'md5': 'c8ea694254a59246a42831155dec57ac',
'md5': '132f5a803e7e0ab0e274d84bda1e77ae',
'info_dict': {
'id': '2300-9782',
'display_id': 'quick-look-destiny-the-dark-below',
@@ -26,7 +26,10 @@ class GiantBombIE(InfoExtractor):
'duration': 2399,
'thumbnail': r're:^https?://.*\.jpg$',
}
}
}, {
'url': 'https://www.giantbomb.com/shows/ben-stranding/2970-20212',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@@ -1,12 +1,11 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
js_to_json,
int_or_none,
merge_dicts,
remove_end,
determine_ext,
unified_timestamp,
)
@@ -14,15 +13,21 @@ class HellPornoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/',
'md5': '1fee339c610d2049699ef2aa699439f1',
'md5': 'f0a46ebc0bed0c72ae8fe4629f7de5f3',
'info_dict': {
'id': '149116',
'display_id': 'dixie-is-posing-with-naked-ass-very-erotic',
'ext': 'mp4',
'title': 'Dixie is posing with naked ass very erotic',
'description': 'md5:9a72922749354edb1c4b6e540ad3d215',
'categories': list,
'thumbnail': r're:https?://.*\.jpg$',
'duration': 240,
'timestamp': 1398762720,
'upload_date': '20140429',
'view_count': int,
'age_limit': 18,
}
},
}, {
'url': 'http://hellporno.net/v/186271/',
'only_matching': True,
@@ -36,40 +41,36 @@ class HellPornoIE(InfoExtractor):
title = remove_end(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), ' - Hell Porno')
flashvars = self._parse_json(self._search_regex(
r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flashvars'),
display_id, transform_source=js_to_json)
info = self._parse_html5_media_entries(url, webpage, display_id)[0]
self._sort_formats(info['formats'])
video_id = flashvars.get('video_id')
thumbnail = flashvars.get('preview_url')
ext = determine_ext(flashvars.get('postfix'), 'mp4')
video_id = self._search_regex(
(r'chs_object\s*=\s*["\'](\d+)',
r'params\[["\']video_id["\']\]\s*=\s*(\d+)'), webpage, 'video id',
default=display_id)
description = self._search_regex(
r'class=["\']desc_video_view_v2[^>]+>([^<]+)', webpage,
'description', fatal=False)
categories = [
c.strip()
for c in self._html_search_meta(
'keywords', webpage, 'categories', default='').split(',')
if c.strip()]
duration = int_or_none(self._og_search_property(
'video:duration', webpage, fatal=False))
timestamp = unified_timestamp(self._og_search_property(
'video:release_date', webpage, fatal=False))
view_count = int_or_none(self._search_regex(
r'>Views\s+(\d+)', webpage, 'view count', fatal=False))
formats = []
for video_url_key in ['video_url', 'video_alt_url']:
video_url = flashvars.get(video_url_key)
if not video_url:
continue
video_text = flashvars.get('%s_text' % video_url_key)
fmt = {
'url': video_url,
'ext': ext,
'format_id': video_text,
}
m = re.search(r'^(?P<height>\d+)[pP]', video_text)
if m:
fmt['height'] = int(m.group('height'))
formats.append(fmt)
self._sort_formats(formats)
categories = self._html_search_meta(
'keywords', webpage, 'categories', default='').split(',')
return {
return merge_dicts(info, {
'id': video_id,
'display_id': display_id,
'title': title,
'thumbnail': thumbnail,
'description': description,
'categories': categories,
'duration': duration,
'timestamp': timestamp,
'view_count': view_count,
'age_limit': 18,
'formats': formats,
}
})

View File

@@ -58,7 +58,7 @@ class IndavideoEmbedIE(InfoExtractor):
video_id = self._match_id(url)
video = self._download_json(
'http://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
'https://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
video_id)['data']
title = video['title']

View File

@@ -16,12 +16,22 @@ class IPrimaIE(InfoExtractor):
_GEO_BYPASS = False
_TESTS = [{
'url': 'http://play.iprima.cz/gondici-s-r-o-33',
'url': 'https://prima.iprima.cz/particka/92-epizoda',
'info_dict': {
'id': 'p136534',
'id': 'p51388',
'ext': 'mp4',
'title': 'Gondíci s. r. o. (34)',
'description': 'md5:16577c629d006aa91f59ca8d8e7f99bd',
'title': 'Partička (92)',
'description': 'md5:859d53beae4609e6dd7796413f1b6cac',
},
'params': {
'skip_download': True, # m3u8 download
},
}, {
'url': 'https://cnn.iprima.cz/videa/70-epizoda',
'info_dict': {
'id': 'p681554',
'ext': 'mp4',
'title': 'HLAVNÍ ZPRÁVY 3.5.2020',
},
'params': {
'skip_download': True, # m3u8 download
@@ -68,9 +78,15 @@ class IPrimaIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(
webpage, default=None) or self._search_regex(
r'<h1>([^<]+)', webpage, 'title')
video_id = self._search_regex(
(r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
r'data-product="([^"]+)">'),
r'data-product="([^"]+)">',
r'id=["\']player-(p\d+)"',
r'playerId\s*:\s*["\']player-(p\d+)'),
webpage, 'real id')
playerpage = self._download_webpage(
@@ -125,8 +141,8 @@ class IPrimaIE(InfoExtractor):
return {
'id': video_id,
'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'title': title,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'formats': formats,
'description': self._og_search_description(webpage),
'description': self._og_search_description(webpage, default=None),
}

View File

@@ -18,7 +18,6 @@ from ..utils import (
class LimelightBaseIE(InfoExtractor):
_PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
_API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json'
@classmethod
def _extract_urls(cls, webpage, source_url):
@@ -70,7 +69,8 @@ class LimelightBaseIE(InfoExtractor):
try:
return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
item_id, 'Downloading PlaylistService %s JSON' % method,
fatal=fatal, headers=headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
error = self._parse_json(e.cause.read().decode(), item_id)['detail']['contentAccessPermission']
@@ -79,22 +79,22 @@ class LimelightBaseIE(InfoExtractor):
raise ExtractorError(error, expected=True)
raise
def _call_api(self, organization_id, item_id, method):
return self._download_json(
self._API_URL % (organization_id, self._API_PATH, item_id, method),
item_id, 'Downloading API %s JSON' % method)
def _extract(self, item_id, pc_method, mobile_method, meta_method, referer=None):
def _extract(self, item_id, pc_method, mobile_method, referer=None):
pc = self._call_playlist_service(item_id, pc_method, referer=referer)
metadata = self._call_api(pc['orgId'], item_id, meta_method)
mobile = self._call_playlist_service(item_id, mobile_method, fatal=False, referer=referer)
return pc, mobile, metadata
mobile = self._call_playlist_service(
item_id, mobile_method, fatal=False, referer=referer)
return pc, mobile
def _extract_info(self, pc, mobile, i, referer):
get_item = lambda x, y: try_get(x, lambda x: x[y][i], dict) or {}
pc_item = get_item(pc, 'playlistItems')
mobile_item = get_item(mobile, 'mediaList')
video_id = pc_item.get('mediaId') or mobile_item['mediaId']
title = pc_item.get('title') or mobile_item['title']
def _extract_info(self, streams, mobile_urls, properties):
video_id = properties['media_id']
formats = []
urls = []
for stream in streams:
for stream in pc_item.get('streams', []):
stream_url = stream.get('url')
if not stream_url or stream.get('drmProtected') or stream_url in urls:
continue
@@ -155,7 +155,7 @@ class LimelightBaseIE(InfoExtractor):
})
formats.append(fmt)
for mobile_url in mobile_urls:
for mobile_url in mobile_item.get('mobileUrls', []):
media_url = mobile_url.get('mobileUrl')
format_id = mobile_url.get('targetMediaPlatform')
if not media_url or format_id in ('Widevine', 'SmoothStreaming') or media_url in urls:
@@ -179,54 +179,34 @@ class LimelightBaseIE(InfoExtractor):
self._sort_formats(formats)
title = properties['title']
description = properties.get('description')
timestamp = int_or_none(properties.get('publish_date') or properties.get('create_date'))
duration = float_or_none(properties.get('duration_in_milliseconds'), 1000)
filesize = int_or_none(properties.get('total_storage_in_bytes'))
categories = [properties.get('category')]
tags = properties.get('tags', [])
thumbnails = [{
'url': thumbnail['url'],
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
} for thumbnail in properties.get('thumbnails', []) if thumbnail.get('url')]
subtitles = {}
for caption in properties.get('captions', []):
lang = caption.get('language_code')
subtitles_url = caption.get('url')
if lang and subtitles_url:
subtitles.setdefault(lang, []).append({
'url': subtitles_url,
})
closed_captions_url = properties.get('closed_captions_url')
if closed_captions_url:
subtitles.setdefault('en', []).append({
'url': closed_captions_url,
'ext': 'ttml',
})
for flag in mobile_item.get('flags'):
if flag == 'ClosedCaptions':
closed_captions = self._call_playlist_service(
video_id, 'getClosedCaptionsDetailsByMediaId',
False, referer) or []
for cc in closed_captions:
cc_url = cc.get('webvttFileUrl')
if not cc_url:
continue
lang = cc.get('languageCode') or self._search_regex(r'/[a-z]{2}\.vtt', cc_url, 'lang', default='en')
subtitles.setdefault(lang, []).append({
'url': cc_url,
})
break
get_meta = lambda x: pc_item.get(x) or mobile_item.get(x)
return {
'id': video_id,
'title': title,
'description': description,
'description': get_meta('description'),
'formats': formats,
'timestamp': timestamp,
'duration': duration,
'filesize': filesize,
'categories': categories,
'tags': tags,
'thumbnails': thumbnails,
'duration': float_or_none(get_meta('durationInMilliseconds'), 1000),
'thumbnail': get_meta('previewImageUrl') or get_meta('thumbnailImageUrl'),
'subtitles': subtitles,
}
def _extract_info_helper(self, pc, mobile, i, metadata):
return self._extract_info(
try_get(pc, lambda x: x['playlistItems'][i]['streams'], list) or [],
try_get(mobile, lambda x: x['mediaList'][i]['mobileUrls'], list) or [],
metadata)
class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight'
@@ -251,8 +231,6 @@ class LimelightMediaIE(LimelightBaseIE):
'description': 'md5:8005b944181778e313d95c1237ddb640',
'thumbnail': r're:^https?://.*\.jpeg$',
'duration': 144.23,
'timestamp': 1244136834,
'upload_date': '20090604',
},
'params': {
# m3u8 download
@@ -268,30 +246,29 @@ class LimelightMediaIE(LimelightBaseIE):
'title': '3Play Media Overview Video',
'thumbnail': r're:^https?://.*\.jpeg$',
'duration': 78.101,
'timestamp': 1338929955,
'upload_date': '20120605',
'subtitles': 'mincount:9',
# TODO: extract all languages that were accessible via API
# 'subtitles': 'mincount:9',
'subtitles': 'mincount:1',
},
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'media'
_API_PATH = 'media'
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
source_url = smuggled_data.get('source_url')
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
pc, mobile, metadata = self._extract(
pc, mobile = self._extract(
video_id, 'getPlaylistByMediaId',
'getMobilePlaylistByMediaId', 'properties',
smuggled_data.get('source_url'))
'getMobilePlaylistByMediaId', source_url)
return self._extract_info_helper(pc, mobile, 0, metadata)
return self._extract_info(pc, mobile, 0, source_url)
class LimelightChannelIE(LimelightBaseIE):
@@ -313,6 +290,7 @@ class LimelightChannelIE(LimelightBaseIE):
'info_dict': {
'id': 'ab6a524c379342f9b23642917020c082',
'title': 'Javascript Sample Code',
'description': 'Javascript Sample Code - http://www.delvenetworks.com/sample-code/playerCode-demo.html',
},
'playlist_mincount': 3,
}, {
@@ -320,22 +298,23 @@ class LimelightChannelIE(LimelightBaseIE):
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'channel'
_API_PATH = 'channels'
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
channel_id = self._match_id(url)
source_url = smuggled_data.get('source_url')
pc, mobile, medias = self._extract(
pc, mobile = self._extract(
channel_id, 'getPlaylistByChannelId',
'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
'media', smuggled_data.get('source_url'))
source_url)
entries = [
self._extract_info_helper(pc, mobile, i, medias['media_list'][i])
for i in range(len(medias['media_list']))]
self._extract_info(pc, mobile, i, source_url)
for i in range(len(pc['playlistItems']))]
return self.playlist_result(entries, channel_id, pc['title'])
return self.playlist_result(
entries, channel_id, pc.get('title'), mobile.get('description'))
class LimelightChannelListIE(LimelightBaseIE):
@@ -368,10 +347,12 @@ class LimelightChannelListIE(LimelightBaseIE):
def _real_extract(self, url):
channel_list_id = self._match_id(url)
channel_list = self._call_playlist_service(channel_list_id, 'getMobileChannelListById')
channel_list = self._call_playlist_service(
channel_list_id, 'getMobileChannelListById')
entries = [
self.url_result('limelight:channel:%s' % channel['id'], 'LimelightChannel')
for channel in channel_list['channelList']]
return self.playlist_result(entries, channel_list_id, channel_list['title'])
return self.playlist_result(
entries, channel_list_id, channel_list['title'])

View File

@@ -128,6 +128,12 @@ class MailRuIE(InfoExtractor):
'http://api.video.mail.ru/videos/%s.json?new=1' % video_id,
video_id, 'Downloading video JSON')
headers = {}
video_key = self._get_cookies('https://my.mail.ru').get('video_key')
if video_key:
headers['Cookie'] = 'video_key=%s' % video_key.value
formats = []
for f in video_data['videos']:
video_url = f.get('url')
@@ -140,6 +146,7 @@ class MailRuIE(InfoExtractor):
'url': video_url,
'format_id': format_id,
'height': height,
'http_headers': headers,
})
self._sort_formats(formats)

View File

@@ -1,5 +1,8 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
str_to_int,
@@ -54,3 +57,23 @@ class MofosexIE(KeezMoviesIE):
})
return info
class MofosexEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.mofosex.com/embed/?videoid=318131&referrer=KM',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=\d+)',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'http://www.mofosex.com/videos/{0}/{0}.html'.format(video_id),
ie=MofosexIE.ie_key(), video_id=video_id)

View File

@@ -26,7 +26,7 @@ class MotherlessIE(InfoExtractor):
'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
'upload_date': '20100913',
'uploader_id': 'famouslyfuckedup',
'thumbnail': r're:http://.*\.jpg',
'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18,
}
}, {
@@ -40,7 +40,7 @@ class MotherlessIE(InfoExtractor):
'game', 'hairy'],
'upload_date': '20140622',
'uploader_id': 'Sulivana7x',
'thumbnail': r're:http://.*\.jpg',
'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18,
},
'skip': '404',
@@ -54,7 +54,7 @@ class MotherlessIE(InfoExtractor):
'categories': ['superheroine heroine superher'],
'upload_date': '20140827',
'uploader_id': 'shade0230',
'thumbnail': r're:http://.*\.jpg',
'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18,
}
}, {
@@ -76,7 +76,8 @@ class MotherlessIE(InfoExtractor):
raise ExtractorError('Video %s is for friends only' % video_id, expected=True)
title = self._html_search_regex(
r'id="view-upload-title">\s+([^<]+)<', webpage, 'title')
(r'(?s)<div[^>]+\bclass=["\']media-meta-title[^>]+>(.+?)</div>',
r'id="view-upload-title">\s+([^<]+)<'), webpage, 'title')
video_url = (self._html_search_regex(
(r'setup\(\{\s*["\']file["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
r'fileurl\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'),
@@ -84,14 +85,15 @@ class MotherlessIE(InfoExtractor):
or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id)
age_limit = self._rta_search(webpage)
view_count = str_to_int(self._html_search_regex(
r'<strong>Views</strong>\s+([^<]+)<',
(r'>(\d+)\s+Views<', r'<strong>Views</strong>\s+([^<]+)<'),
webpage, 'view count', fatal=False))
like_count = str_to_int(self._html_search_regex(
r'<strong>Favorited</strong>\s+([^<]+)<',
(r'>(\d+)\s+Favorites<', r'<strong>Favorited</strong>\s+([^<]+)<'),
webpage, 'like count', fatal=False))
upload_date = self._html_search_regex(
r'<strong>Uploaded</strong>\s+([^<]+)<', webpage, 'upload date')
(r'class=["\']count[^>]+>(\d+\s+[a-zA-Z]{3}\s+\d{4})<',
r'<strong>Uploaded</strong>\s+([^<]+)<'), webpage, 'upload date')
if 'Ago' in upload_date:
days = int(re.search(r'([0-9]+)', upload_date).group(1))
upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d')

View File

@@ -7,6 +7,7 @@ from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
merge_dicts,
parse_iso8601,
qualities,
try_get,
@@ -87,21 +88,25 @@ class NDRIE(NDRBaseIE):
def _extract_embed(self, webpage, display_id):
embed_url = self._html_search_meta(
'embedURL', webpage, 'embed URL', fatal=True)
'embedURL', webpage, 'embed URL',
default=None) or self._search_regex(
r'\bembedUrl["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'embed URL', group='url')
description = self._search_regex(
r'<p[^>]+itemprop="description">([^<]+)</p>',
webpage, 'description', default=None) or self._og_search_description(webpage)
timestamp = parse_iso8601(
self._search_regex(
r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="([^"]+)"',
webpage, 'upload date', fatal=False))
return {
webpage, 'upload date', default=None))
info = self._search_json_ld(webpage, display_id, default={})
return merge_dicts({
'_type': 'url_transparent',
'url': embed_url,
'display_id': display_id,
'description': description,
'timestamp': timestamp,
}
}, info)
class NJoyIE(NDRBaseIE):

View File

@@ -6,7 +6,7 @@ from .common import InfoExtractor
class NhkVodIE(InfoExtractor):
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[a-z]+-\d{8}-\d+)'
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[^/]+?-\d{8}-\d+)'
# Content available only for a limited period of time. Visit
# https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
_TESTS = [{
@@ -30,6 +30,9 @@ class NhkVodIE(InfoExtractor):
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
'only_matching': True,
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/j_art-20150903-1/',
'only_matching': True,
}]
_API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7a/episode/%s/%s/all%s.json'
@@ -82,15 +85,9 @@ class NhkVodIE(InfoExtractor):
audio = episode['audio']
audio_path = audio['audio']
info['formats'] = self._extract_m3u8_formats(
'https://nhks-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
episode_id, 'm4a', m3u8_id='hls', fatal=False)
for proto in ('rtmpt', 'rtmp'):
info['formats'].append({
'ext': 'flv',
'format_id': proto,
'url': '%s://flv.nhk.or.jp/ondemand/mp4:flv%s' % (proto, audio_path),
'vcodec': 'none',
})
'https://nhkworld-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
episode_id, 'm4a', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)
for f in info['formats']:
f['language'] = lang
return info

View File

@@ -6,6 +6,7 @@ import re
from .common import InfoExtractor
from ..utils import (
clean_html,
determine_ext,
int_or_none,
js_to_json,
qualities,
@@ -33,42 +34,76 @@ class NovaEmbedIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
bitrates = self._parse_json(
self._search_regex(
r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
video_id, transform_source=js_to_json)
QUALITIES = ('lq', 'mq', 'hq', 'hd')
quality_key = qualities(QUALITIES)
duration = None
formats = []
for format_id, format_list in bitrates.items():
if not isinstance(format_list, list):
format_list = [format_list]
for format_url in format_list:
format_url = url_or_none(format_url)
if not format_url:
continue
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
continue
f = {
'url': format_url,
}
f_id = format_id
for quality in QUALITIES:
if '%s.mp4' % quality in format_url:
f_id += '-%s' % quality
f.update({
'quality': quality_key(quality),
'format_note': quality.upper(),
player = self._parse_json(
self._search_regex(
r'Player\.init\s*\([^,]+,\s*({.+?})\s*,\s*{.+?}\s*\)\s*;',
webpage, 'player', default='{}'), video_id, fatal=False)
if player:
for format_id, format_list in player['tracks'].items():
if not isinstance(format_list, list):
format_list = [format_list]
for format_dict in format_list:
if not isinstance(format_dict, dict):
continue
format_url = url_or_none(format_dict.get('src'))
format_type = format_dict.get('type')
ext = determine_ext(format_url)
if (format_type == 'application/x-mpegURL'
or format_id == 'HLS' or ext == 'm3u8'):
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
elif (format_type == 'application/dash+xml'
or format_id == 'DASH' or ext == 'mpd'):
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
else:
formats.append({
'url': format_url,
})
break
f['format_id'] = f_id
formats.append(f)
duration = int_or_none(player.get('duration'))
else:
# Old path, not actual as of 08.04.2020
bitrates = self._parse_json(
self._search_regex(
r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
video_id, transform_source=js_to_json)
QUALITIES = ('lq', 'mq', 'hq', 'hd')
quality_key = qualities(QUALITIES)
for format_id, format_list in bitrates.items():
if not isinstance(format_list, list):
format_list = [format_list]
for format_url in format_list:
format_url = url_or_none(format_url)
if not format_url:
continue
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
continue
f = {
'url': format_url,
}
f_id = format_id
for quality in QUALITIES:
if '%s.mp4' % quality in format_url:
f_id += '-%s' % quality
f.update({
'quality': quality_key(quality),
'format_note': quality.upper(),
})
break
f['format_id'] = f_id
formats.append(f)
self._sort_formats(formats)
title = self._og_search_title(
@@ -81,7 +116,8 @@ class NovaEmbedIE(InfoExtractor):
r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='value')
duration = int_or_none(self._search_regex(
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
r'videoDuration\s*:\s*(\d+)', webpage, 'duration',
default=duration))
return {
'id': video_id,

View File

@@ -162,13 +162,12 @@ class ORFTVthekIE(InfoExtractor):
class ORFRadioIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
station = mobj.group('station')
show_date = mobj.group('date')
show_id = mobj.group('show')
data = self._download_json(
'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s'
% (station, show_id, show_date), show_id)
% (self._API_STATION, show_id, show_date), show_id)
entries = []
for info in data['streams']:
@@ -183,7 +182,7 @@ class ORFRadioIE(InfoExtractor):
duration = end - start if end and start else None
entries.append({
'id': loop_stream_id.replace('.mp3', ''),
'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (station, loop_stream_id),
'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (self._LOOP_STATION, loop_stream_id),
'title': title,
'description': clean_html(data.get('subtitle')),
'duration': duration,
@@ -205,6 +204,8 @@ class ORFFM4IE(ORFRadioIE):
IE_NAME = 'orf:fm4'
IE_DESC = 'radio FM4'
_VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>4\w+)'
_API_STATION = 'fm4'
_LOOP_STATION = 'fm4'
_TEST = {
'url': 'http://fm4.orf.at/player/20170107/4CC',
@@ -223,10 +224,142 @@ class ORFFM4IE(ORFRadioIE):
}
class ORFNOEIE(ORFRadioIE):
IE_NAME = 'orf:noe'
IE_DESC = 'Radio Niederösterreich'
_VALID_URL = r'https?://(?P<station>noe)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'noe'
_LOOP_STATION = 'oe2n'
_TEST = {
'url': 'https://noe.orf.at/player/20200423/NGM',
'only_matching': True,
}
class ORFWIEIE(ORFRadioIE):
IE_NAME = 'orf:wien'
IE_DESC = 'Radio Wien'
_VALID_URL = r'https?://(?P<station>wien)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'wie'
_LOOP_STATION = 'oe2w'
_TEST = {
'url': 'https://wien.orf.at/player/20200423/WGUM',
'only_matching': True,
}
class ORFBGLIE(ORFRadioIE):
IE_NAME = 'orf:burgenland'
IE_DESC = 'Radio Burgenland'
_VALID_URL = r'https?://(?P<station>burgenland)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'bgl'
_LOOP_STATION = 'oe2b'
_TEST = {
'url': 'https://burgenland.orf.at/player/20200423/BGM',
'only_matching': True,
}
class ORFOOEIE(ORFRadioIE):
IE_NAME = 'orf:oberoesterreich'
IE_DESC = 'Radio Oberösterreich'
_VALID_URL = r'https?://(?P<station>ooe)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'ooe'
_LOOP_STATION = 'oe2o'
_TEST = {
'url': 'https://ooe.orf.at/player/20200423/OGMO',
'only_matching': True,
}
class ORFSTMIE(ORFRadioIE):
IE_NAME = 'orf:steiermark'
IE_DESC = 'Radio Steiermark'
_VALID_URL = r'https?://(?P<station>steiermark)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'stm'
_LOOP_STATION = 'oe2st'
_TEST = {
'url': 'https://steiermark.orf.at/player/20200423/STGMS',
'only_matching': True,
}
class ORFKTNIE(ORFRadioIE):
IE_NAME = 'orf:kaernten'
IE_DESC = 'Radio Kärnten'
_VALID_URL = r'https?://(?P<station>kaernten)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'ktn'
_LOOP_STATION = 'oe2k'
_TEST = {
'url': 'https://kaernten.orf.at/player/20200423/KGUMO',
'only_matching': True,
}
class ORFSBGIE(ORFRadioIE):
IE_NAME = 'orf:salzburg'
IE_DESC = 'Radio Salzburg'
_VALID_URL = r'https?://(?P<station>salzburg)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'sbg'
_LOOP_STATION = 'oe2s'
_TEST = {
'url': 'https://salzburg.orf.at/player/20200423/SGUM',
'only_matching': True,
}
class ORFTIRIE(ORFRadioIE):
IE_NAME = 'orf:tirol'
IE_DESC = 'Radio Tirol'
_VALID_URL = r'https?://(?P<station>tirol)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'tir'
_LOOP_STATION = 'oe2t'
_TEST = {
'url': 'https://tirol.orf.at/player/20200423/TGUMO',
'only_matching': True,
}
class ORFVBGIE(ORFRadioIE):
IE_NAME = 'orf:vorarlberg'
IE_DESC = 'Radio Vorarlberg'
_VALID_URL = r'https?://(?P<station>vorarlberg)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'vbg'
_LOOP_STATION = 'oe2v'
_TEST = {
'url': 'https://vorarlberg.orf.at/player/20200423/VGUM',
'only_matching': True,
}
class ORFOE3IE(ORFRadioIE):
IE_NAME = 'orf:oe3'
IE_DESC = 'Radio Österreich 3'
_VALID_URL = r'https?://(?P<station>oe3)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'oe3'
_LOOP_STATION = 'oe3'
_TEST = {
'url': 'https://oe3.orf.at/player/20200424/3WEK',
'only_matching': True,
}
class ORFOE1IE(ORFRadioIE):
IE_NAME = 'orf:oe1'
IE_DESC = 'Radio Österreich 1'
_VALID_URL = r'https?://(?P<station>oe1)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
_API_STATION = 'oe1'
_LOOP_STATION = 'oe1'
_TEST = {
'url': 'http://oe1.orf.at/player/20170108/456544',

View File

@@ -20,20 +20,16 @@ class PokemonIE(InfoExtractor):
'ext': 'mp4',
'title': 'The Ol Raise and Switch!',
'description': 'md5:7db77f7107f98ba88401d3adc80ff7af',
'timestamp': 1511824728,
'upload_date': '20171127',
},
'add_id': ['LimelightMedia'],
}, {
# no data-video-title
'url': 'https://www.pokemon.com/us/pokemon-episodes/pokemon-movies/pokemon-the-rise-of-darkrai-2008',
'url': 'https://www.pokemon.com/fr/episodes-pokemon/films-pokemon/pokemon-lascension-de-darkrai-2008',
'info_dict': {
'id': '99f3bae270bf4e5097274817239ce9c8',
'id': 'dfbaf830d7e54e179837c50c0c6cc0e1',
'ext': 'mp4',
'title': 'Pokémon: The Rise of Darkrai',
'description': 'md5:ea8fbbf942e1e497d54b19025dd57d9d',
'timestamp': 1417778347,
'upload_date': '20141205',
'title': "Pokémon : L'ascension de Darkrai",
'description': 'md5:d1dbc9e206070c3e14a06ff557659fb5',
},
'add_id': ['LimelightMedia'],
'params': {

View File

@@ -11,6 +11,7 @@ from ..utils import (
determine_ext,
float_or_none,
int_or_none,
merge_dicts,
unified_strdate,
)
@@ -175,7 +176,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
(?:
(?:beta\.)?
(?:
prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia
prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|advopedia
)\.(?:de|at|ch)|
ran\.de|fem\.com|advopedia\.de|galileo\.tv/video
)
@@ -193,10 +194,14 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'info_dict': {
'id': '2104602',
'ext': 'mp4',
'title': 'Episode 18 - Staffel 2',
'title': 'CIRCUS HALLIGALLI - Episode 18 - Staffel 2',
'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
'upload_date': '20131231',
'duration': 5845.04,
'series': 'CIRCUS HALLIGALLI',
'season_number': 2,
'episode': 'Episode 18 - Staffel 2',
'episode_number': 18,
},
},
{
@@ -300,8 +305,9 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'info_dict': {
'id': '2572814',
'ext': 'mp4',
'title': 'Andreas Kümmert: Rocket Man',
'title': 'The Voice of Germany - Andreas Kümmert: Rocket Man',
'description': 'md5:6ddb02b0781c6adf778afea606652e38',
'timestamp': 1382041620,
'upload_date': '20131017',
'duration': 469.88,
},
@@ -310,7 +316,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
},
},
{
'url': 'http://www.fem.com/wellness/videos/wellness-video-clip-kurztripps-zum-valentinstag.html',
'url': 'http://www.fem.com/videos/beauty-lifestyle/kurztrips-zum-valentinstag',
'info_dict': {
'id': '2156342',
'ext': 'mp4',
@@ -332,19 +338,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'playlist_count': 2,
'skip': 'This video is unavailable',
},
{
'url': 'http://www.7tv.de/circus-halligalli/615-best-of-circus-halligalli-ganze-folge',
'info_dict': {
'id': '4187506',
'ext': 'mp4',
'title': 'Best of Circus HalliGalli',
'description': 'md5:8849752efd90b9772c9db6fdf87fb9e9',
'upload_date': '20151229',
},
'params': {
'skip_download': True,
},
},
{
# title in <h2 class="subtitle">
'url': 'http://www.prosieben.de/stars/oscar-award/videos/jetzt-erst-enthuellt-das-geheimnis-von-emma-stones-oscar-robe-clip',
@@ -421,7 +414,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
r'<div[^>]+id="veeseoDescription"[^>]*>(.+?)</div>',
]
_UPLOAD_DATE_REGEXES = [
r'<meta property="og:published_time" content="(.+?)">',
r'<span>\s*(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}) \|\s*<span itemprop="duration"',
r'<footer>\s*(\d{2}\.\d{2}\.\d{4}) \d{2}:\d{2} Uhr',
r'<span style="padding-left: 4px;line-height:20px; color:#404040">(\d{2}\.\d{2}\.\d{4})</span>',
@@ -451,17 +443,21 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
if description is None:
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
upload_date = unified_strdate(self._html_search_regex(
self._UPLOAD_DATE_REGEXES, webpage, 'upload date', default=None))
upload_date = unified_strdate(
self._html_search_meta('og:published_time', webpage,
'upload date', default=None)
or self._html_search_regex(self._UPLOAD_DATE_REGEXES,
webpage, 'upload date', default=None))
info.update({
json_ld = self._search_json_ld(webpage, clip_id, default={})
return merge_dicts(info, {
'id': clip_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
})
return info
}, json_ld)
def _extract_playlist(self, url, webpage):
playlist_id = self._html_search_regex(

View File

@@ -82,17 +82,6 @@ class PuhuTVIE(InfoExtractor):
urls = []
formats = []
def add_http_from_hls(m3u8_f):
http_url = m3u8_f['url'].replace('/hls/', '/mp4/').replace('/chunklist.m3u8', '.mp4')
if http_url != m3u8_f['url']:
f = m3u8_f.copy()
f.update({
'format_id': f['format_id'].replace('hls', 'http'),
'protocol': 'http',
'url': http_url,
})
formats.append(f)
for video in videos['data']['videos']:
media_url = url_or_none(video.get('url'))
if not media_url or media_url in urls:
@@ -101,12 +90,9 @@ class PuhuTVIE(InfoExtractor):
playlist = video.get('is_playlist')
if (video.get('stream_type') == 'hls' and playlist is True) or 'playlist.m3u8' in media_url:
m3u8_formats = self._extract_m3u8_formats(
formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)
for m3u8_f in m3u8_formats:
formats.append(m3u8_f)
add_http_from_hls(m3u8_f)
m3u8_id='hls', fatal=False))
continue
quality = int_or_none(video.get('quality'))
@@ -128,8 +114,6 @@ class PuhuTVIE(InfoExtractor):
format_id += '-%sp' % quality
f['format_id'] = format_id
formats.append(f)
if is_hls:
add_http_from_hls(f)
self._sort_formats(formats)
creator = try_get(

View File

@@ -4,6 +4,7 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
merge_dicts,
@@ -57,7 +58,7 @@ class RedTubeIE(InfoExtractor):
if not info.get('title'):
info['title'] = self._html_search_regex(
(r'<h(\d)[^>]+class="(?:video_title_text|videoTitle)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
(r'<h(\d)[^>]+class="(?:video_title_text|videoTitle|video_title)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
webpage, 'title', group='title',
default=None) or self._og_search_title(webpage)
@@ -77,7 +78,7 @@ class RedTubeIE(InfoExtractor):
})
medias = self._parse_json(
self._search_regex(
r'mediaDefinition\s*:\s*(\[.+?\])', webpage,
r'mediaDefinition["\']?\s*:\s*(\[.+?}\s*\])', webpage,
'media definitions', default='{}'),
video_id, fatal=False)
if medias and isinstance(medias, list):
@@ -85,6 +86,12 @@ class RedTubeIE(InfoExtractor):
format_url = url_or_none(media.get('videoUrl'))
if not format_url:
continue
if media.get('format') == 'hls' or determine_ext(format_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
continue
format_id = media.get('quality')
formats.append({
'url': format_url,

View File

@@ -27,6 +27,7 @@ from ..utils import (
unified_timestamp,
update_url_query,
url_or_none,
urlhandle_detect_ext,
)
@@ -96,7 +97,7 @@ class SoundcloudIE(InfoExtractor):
'repost_count': int,
}
},
# not streamable song, preview
# geo-restricted
{
'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep',
'info_dict': {
@@ -108,17 +109,13 @@ class SoundcloudIE(InfoExtractor):
'uploader_id': '9615865',
'timestamp': 1337635207,
'upload_date': '20120521',
'duration': 30,
'duration': 227.155,
'license': 'all-rights-reserved',
'view_count': int,
'like_count': int,
'comment_count': int,
'repost_count': int,
},
'params': {
# rtmp
'skip_download': True,
},
},
# private link
{
@@ -229,7 +226,6 @@ class SoundcloudIE(InfoExtractor):
'skip_download': True,
},
},
# not available via api.soundcloud.com/i1/tracks/id/streams
{
'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
@@ -250,11 +246,14 @@ class SoundcloudIE(InfoExtractor):
'comment_count': int,
'repost_count': int,
},
'expected_warnings': ['Unable to download JSON metadata'],
}
},
{
# with AAC HQ format available via OAuth token
'url': 'https://soundcloud.com/wandw/the-chainsmokers-ft-daya-dont-let-me-down-ww-remix-1',
'only_matching': True,
},
]
_API_BASE = 'https://api.soundcloud.com/'
_API_V2_BASE = 'https://api-v2.soundcloud.com/'
_BASE_URL = 'https://soundcloud.com/'
_IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
@@ -316,10 +315,9 @@ class SoundcloudIE(InfoExtractor):
def _resolv_url(cls, url):
return SoundcloudIE._API_V2_BASE + 'resolve?url=' + url
def _extract_info_dict(self, info, full_title=None, secret_token=None, version=2):
def _extract_info_dict(self, info, full_title=None, secret_token=None):
track_id = compat_str(info['id'])
title = info['title']
track_base_url = self._API_BASE + 'tracks/%s' % track_id
format_urls = set()
formats = []
@@ -328,21 +326,22 @@ class SoundcloudIE(InfoExtractor):
query['secret_token'] = secret_token
if info.get('downloadable') and info.get('has_downloads_left'):
format_url = update_url_query(
info.get('download_url') or track_base_url + '/download', query)
format_urls.add(format_url)
if version == 2:
v1_info = self._download_json(
track_base_url, track_id, query=query, fatal=False) or {}
else:
v1_info = info
formats.append({
'format_id': 'download',
'ext': v1_info.get('original_format') or 'mp3',
'filesize': int_or_none(v1_info.get('original_content_size')),
'url': format_url,
'preference': 10,
})
download_url = update_url_query(
self._API_V2_BASE + 'tracks/' + track_id + '/download', query)
redirect_url = (self._download_json(download_url, track_id, fatal=False) or {}).get('redirectUri')
if redirect_url:
urlh = self._request_webpage(
HEADRequest(redirect_url), track_id, fatal=False)
if urlh:
format_url = urlh.geturl()
format_urls.add(format_url)
formats.append({
'format_id': 'download',
'ext': urlhandle_detect_ext(urlh) or 'mp3',
'filesize': int_or_none(urlh.headers.get('Content-Length')),
'url': format_url,
'preference': 10,
})
def invalid_url(url):
return not url or url in format_urls
@@ -356,6 +355,9 @@ class SoundcloudIE(InfoExtractor):
format_id_list = []
if protocol:
format_id_list.append(protocol)
ext = f.get('ext')
if ext == 'aac':
f['abr'] = '256'
for k in ('ext', 'abr'):
v = f.get(k)
if v:
@@ -366,9 +368,13 @@ class SoundcloudIE(InfoExtractor):
abr = f.get('abr')
if abr:
f['abr'] = int(abr)
if protocol == 'hls':
protocol = 'm3u8' if ext == 'aac' else 'm3u8_native'
else:
protocol = 'http'
f.update({
'format_id': '_'.join(format_id_list),
'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
'protocol': protocol,
'preference': -10 if preview else None,
})
formats.append(f)
@@ -406,42 +412,11 @@ class SoundcloudIE(InfoExtractor):
}, 'http' if protocol == 'progressive' else protocol,
t.get('snipped') or '/preview/' in format_url)
if not formats:
# Old API, does not work for some tracks (e.g.
# https://soundcloud.com/giovannisarani/mezzo-valzer)
# and might serve preview URLs (e.g.
# http://www.soundcloud.com/snbrn/ele)
format_dict = self._download_json(
track_base_url + '/streams', track_id,
'Downloading track url', query=query, fatal=False) or {}
for key, stream_url in format_dict.items():
if invalid_url(stream_url):
continue
format_urls.add(stream_url)
mobj = re.search(r'(http|hls)_([^_]+)_(\d+)_url', key)
if mobj:
protocol, ext, abr = mobj.groups()
add_format({
'abr': abr,
'ext': ext,
'url': stream_url,
}, protocol)
if not formats:
# We fallback to the stream_url in the original info, this
# cannot be always used, sometimes it can give an HTTP 404 error
urlh = self._request_webpage(
HEADRequest(info.get('stream_url') or track_base_url + '/stream'),
track_id, query=query, fatal=False)
if urlh:
stream_url = urlh.geturl()
if not invalid_url(stream_url):
add_format({'url': stream_url}, 'http')
for f in formats:
f['vcodec'] = 'none'
if not formats and info.get('policy') == 'BLOCK':
self.raise_geo_restricted()
self._sort_formats(formats)
user = info.get('user') or {}
@@ -511,16 +486,10 @@ class SoundcloudIE(InfoExtractor):
resolve_title += '/%s' % token
info_json_url = self._resolv_url(self._BASE_URL + resolve_title)
version = 2
info = self._download_json(
info_json_url, full_title, 'Downloading info JSON', query=query, fatal=False)
if not info:
info = self._download_json(
info_json_url.replace(self._API_V2_BASE, self._API_BASE),
full_title, 'Downloading info JSON', query=query)
version = 1
info_json_url, full_title, 'Downloading info JSON', query=query)
return self._extract_info_dict(info, full_title, token, version)
return self._extract_info_dict(info, full_title, token)
class SoundcloudPlaylistBaseIE(SoundcloudIE):
@@ -590,7 +559,7 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
class SoundcloudPagedPlaylistBaseIE(SoundcloudIE):
def _extract_playlist(self, base_url, playlist_id, playlist_title):
COMMON_QUERY = {
'limit': 2000000000,
'limit': 80000,
'linked_partitioning': '1',
}

View File

@@ -3,34 +3,47 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
)
from ..utils import (
sanitized_Request,
float_or_none,
int_or_none,
merge_dicts,
str_or_none,
str_to_int,
unified_strdate,
url_or_none,
)
from ..aes import aes_decrypt_text
class SpankwireIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<id>[0-9]+)/?)'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?spankwire\.com/
(?:
[^/]+/video|
EmbedPlayer\.aspx/?\?.*?\bArticleId=
)
(?P<id>\d+)
'''
_TESTS = [{
# download URL pattern: */<height>P_<tbr>K_<video_id>.mp4
'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
'md5': '8bbfde12b101204b39e4b9fe7eb67095',
'md5': '5aa0e4feef20aad82cbcae3aed7ab7cd',
'info_dict': {
'id': '103545',
'ext': 'mp4',
'title': 'Buckcherry`s X Rated Music Video Crazy Bitch',
'description': 'Crazy Bitch X rated music video.',
'duration': 222,
'uploader': 'oreusz',
'uploader_id': '124697',
'upload_date': '20070507',
'timestamp': 1178587885,
'upload_date': '20070508',
'average_rating': float,
'view_count': int,
'comment_count': int,
'age_limit': 18,
}
'categories': list,
'tags': list,
},
}, {
# download URL pattern: */mp4_<format_id>_<video_id>.mp4
'url': 'http://www.spankwire.com/Titcums-Compiloation-I/video1921551/',
@@ -45,83 +58,125 @@ class SpankwireIE(InfoExtractor):
'upload_date': '20150822',
'age_limit': 18,
},
'params': {
'proxy': '127.0.0.1:8118'
},
'skip': 'removed',
}, {
'url': 'https://www.spankwire.com/EmbedPlayer.aspx/?ArticleId=156156&autostart=true',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?spankwire\.com/EmbedPlayer\.aspx/?\?.*?\bArticleId=\d+)',
webpage)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
req = sanitized_Request('http://www.' + mobj.group('url'))
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video = self._download_json(
'https://www.spankwire.com/api/video/%s.json' % video_id, video_id)
title = self._html_search_regex(
r'<h1>([^<]+)', webpage, 'title')
description = self._html_search_regex(
r'(?s)<div\s+id="descriptionContent">(.+?)</div>',
webpage, 'description', fatal=False)
thumbnail = self._html_search_regex(
r'playerData\.screenShot\s*=\s*["\']([^"\']+)["\']',
webpage, 'thumbnail', fatal=False)
uploader = self._html_search_regex(
r'by:\s*<a [^>]*>(.+?)</a>',
webpage, 'uploader', fatal=False)
uploader_id = self._html_search_regex(
r'by:\s*<a href="/(?:user/viewProfile|Profile\.aspx)\?.*?UserId=(\d+).*?"',
webpage, 'uploader id', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r'</a> on (.+?) at \d+:\d+',
webpage, 'upload date', fatal=False))
view_count = str_to_int(self._html_search_regex(
r'<div id="viewsCounter"><span>([\d,\.]+)</span> views</div>',
webpage, 'view count', fatal=False))
comment_count = str_to_int(self._html_search_regex(
r'<span\s+id="spCommentCount"[^>]*>([\d,\.]+)</span>',
webpage, 'comment count', fatal=False))
videos = re.findall(
r'playerData\.cdnPath([0-9]{3,})\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage)
heights = [int(video[0]) for video in videos]
video_urls = list(map(compat_urllib_parse_unquote, [video[1] for video in videos]))
if webpage.find(r'flashvars\.encrypted = "true"') != -1:
password = self._search_regex(
r'flashvars\.video_title = "([^"]+)',
webpage, 'password').replace('+', ' ')
video_urls = list(map(
lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'),
video_urls))
title = video['title']
formats = []
for height, video_url in zip(heights, video_urls):
path = compat_urllib_parse_urlparse(video_url).path
m = re.search(r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', path)
if m:
tbr = int(m.group('tbr'))
height = int(m.group('height'))
else:
tbr = None
formats.append({
'url': video_url,
'format_id': '%dp' % height,
'height': height,
'tbr': tbr,
videos = video.get('videos')
if isinstance(videos, dict):
for format_id, format_url in videos.items():
video_url = url_or_none(format_url)
if not format_url:
continue
height = int_or_none(self._search_regex(
r'(\d+)[pP]', format_id, 'height', default=None))
m = re.search(
r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', video_url)
if m:
tbr = int(m.group('tbr'))
height = height or int(m.group('height'))
else:
tbr = None
formats.append({
'url': video_url,
'format_id': '%dp' % height if height else format_id,
'height': height,
'tbr': tbr,
})
m3u8_url = url_or_none(video.get('HLS'))
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats, ('height', 'tbr', 'width', 'format_id'))
view_count = str_to_int(video.get('viewed'))
thumbnails = []
for preference, t in enumerate(('', '2x'), start=0):
thumbnail_url = url_or_none(video.get('poster%s' % t))
if not thumbnail_url:
continue
thumbnails.append({
'url': thumbnail_url,
'preference': preference,
})
self._sort_formats(formats)
age_limit = self._rta_search(webpage)
def extract_names(key):
entries_list = video.get(key)
if not isinstance(entries_list, list):
return
entries = []
for entry in entries_list:
name = str_or_none(entry.get('name'))
if name:
entries.append(name)
return entries
return {
categories = extract_names('categories')
tags = extract_names('tags')
uploader = None
info = {}
webpage = self._download_webpage(
'https://www.spankwire.com/_/video%s/' % video_id, video_id,
fatal=False)
if webpage:
info = self._search_json_ld(webpage, video_id, default={})
thumbnail_url = None
if 'thumbnail' in info:
thumbnail_url = url_or_none(info['thumbnail'])
del info['thumbnail']
if not thumbnail_url:
thumbnail_url = self._og_search_thumbnail(webpage)
if thumbnail_url:
thumbnails.append({
'url': thumbnail_url,
'preference': 10,
})
uploader = self._html_search_regex(
r'(?s)by\s*<a[^>]+\bclass=["\']uploaded__by[^>]*>(.+?)</a>',
webpage, 'uploader', fatal=False)
if not view_count:
view_count = str_to_int(self._search_regex(
r'data-views=["\']([\d,.]+)', webpage, 'view count',
fatal=False))
return merge_dicts({
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'description': video.get('description'),
'duration': int_or_none(video.get('duration')),
'thumbnails': thumbnails,
'uploader': uploader,
'uploader_id': uploader_id,
'upload_date': upload_date,
'uploader_id': str_or_none(video.get('userId')),
'timestamp': int_or_none(video.get('time_approved_on')),
'average_rating': float_or_none(video.get('rating')),
'view_count': view_count,
'comment_count': comment_count,
'comment_count': int_or_none(video.get('comments')),
'age_limit': 18,
'categories': categories,
'tags': tags,
'formats': formats,
'age_limit': age_limit,
}
}, info)

View File

@@ -8,15 +8,10 @@ class BellatorIE(MTVServicesInfoExtractor):
_TESTS = [{
'url': 'http://www.bellator.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg',
'info_dict': {
'id': 'b55e434e-fde1-4a98-b7cc-92003a034de4',
'ext': 'mp4',
'title': 'Douglas Lima vs. Paul Daley - Round 1',
'description': 'md5:805a8dd29310fd611d32baba2f767885',
},
'params': {
# m3u8 download
'skip_download': True,
'title': 'Michael Page vs. Evangelista Cyborg',
'description': 'md5:0d917fc00ffd72dd92814963fc6cbb05',
},
'playlist_count': 3,
}, {
'url': 'http://www.bellator.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page',
'only_matching': True,
@@ -25,6 +20,9 @@ class BellatorIE(MTVServicesInfoExtractor):
_FEED_URL = 'http://www.bellator.com/feeds/mrss/'
_GEO_COUNTRIES = ['US']
def _extract_mgid(self, webpage):
return self._extract_triforce_mgid(webpage)
class ParamountNetworkIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?paramountnetwork\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)'

View File

@@ -7,7 +7,9 @@ from .wistia import WistiaIE
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
get_element_by_class,
strip_or_none,
urlencode_postdata,
urljoin,
)
@@ -19,8 +21,8 @@ class TeachableBaseIE(InfoExtractor):
_SITES = {
# Only notable ones here
'upskillcourses.com': 'upskill',
'academy.gns3.com': 'gns3',
'v1.upskillcourses.com': 'upskill',
'gns3.teachable.com': 'gns3',
'academyhacker.com': 'academyhacker',
'stackskills.com': 'stackskills',
'market.saleshacker.com': 'saleshacker',
@@ -109,27 +111,29 @@ class TeachableIE(TeachableBaseIE):
''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
_TESTS = [{
'url': 'http://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
'url': 'https://gns3.teachable.com/courses/gns3-certified-associate/lectures/6842364',
'info_dict': {
'id': 'uzw6zw58or',
'ext': 'mp4',
'title': 'Welcome to the Course!',
'description': 'md5:65edb0affa582974de4625b9cdea1107',
'duration': 138.763,
'timestamp': 1479846621,
'upload_date': '20161122',
'id': 'untlgzk1v7',
'ext': 'bin',
'title': 'Overview',
'description': 'md5:071463ff08b86c208811130ea1c2464c',
'duration': 736.4,
'timestamp': 1542315762,
'upload_date': '20181115',
'chapter': 'Welcome',
'chapter_number': 1,
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://upskillcourses.com/courses/119763/lectures/1747100',
'url': 'http://v1.upskillcourses.com/courses/119763/lectures/1747100',
'only_matching': True,
}, {
'url': 'https://academy.gns3.com/courses/423415/lectures/6885939',
'url': 'https://gns3.teachable.com/courses/423415/lectures/6885939',
'only_matching': True,
}, {
'url': 'teachable:https://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
'url': 'teachable:https://v1.upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
'only_matching': True,
}]
@@ -173,11 +177,34 @@ class TeachableIE(TeachableBaseIE):
title = self._og_search_title(webpage, default=None)
chapter = None
chapter_number = None
section_item = self._search_regex(
r'(?s)(?P<li><li[^>]+\bdata-lecture-id=["\']%s[^>]+>.+?</li>)' % video_id,
webpage, 'section item', default=None, group='li')
if section_item:
chapter_number = int_or_none(self._search_regex(
r'data-ss-position=["\'](\d+)', section_item, 'section id',
default=None))
if chapter_number is not None:
sections = []
for s in re.findall(
r'(?s)<div[^>]+\bclass=["\']section-title[^>]+>(.+?)</div>', webpage):
section = strip_or_none(clean_html(s))
if not section:
sections = []
break
sections.append(section)
if chapter_number <= len(sections):
chapter = sections[chapter_number - 1]
entries = [{
'_type': 'url_transparent',
'url': wistia_url,
'ie_key': WistiaIE.ie_key(),
'title': title,
'chapter': chapter,
'chapter_number': chapter_number,
} for wistia_url in wistia_urls]
return self.playlist_result(entries, video_id, title)
@@ -192,20 +219,20 @@ class TeachableCourseIE(TeachableBaseIE):
/(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)
''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
_TESTS = [{
'url': 'http://upskillcourses.com/courses/essential-web-developer-course/',
'url': 'http://v1.upskillcourses.com/courses/essential-web-developer-course/',
'info_dict': {
'id': 'essential-web-developer-course',
'title': 'The Essential Web Developer Course (Free)',
},
'playlist_count': 192,
}, {
'url': 'http://upskillcourses.com/courses/119763/',
'url': 'http://v1.upskillcourses.com/courses/119763/',
'only_matching': True,
}, {
'url': 'http://upskillcourses.com/courses/enrolled/119763',
'url': 'http://v1.upskillcourses.com/courses/enrolled/119763',
'only_matching': True,
}, {
'url': 'https://academy.gns3.com/courses/enrolled/423415',
'url': 'https://gns3.teachable.com/courses/enrolled/423415',
'only_matching': True,
}, {
'url': 'teachable:https://learn.vrdev.school/p/gear-vr-developer-mini',

View File

@@ -1,9 +1,19 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .jwplatform import JWPlatformIE
from .nexx import NexxIE
from ..compat import compat_urlparse
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
NO_DEFAULT,
try_get,
)
class Tele5IE(InfoExtractor):
@@ -44,14 +54,49 @@ class Tele5IE(InfoExtractor):
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0]
if not video_id:
NEXX_ID_RE = r'\d{6,}'
JWPLATFORM_ID_RE = r'[a-zA-Z0-9]{8}'
def nexx_result(nexx_id):
return self.url_result(
'https://api.nexx.cloud/v3/759/videos/byid/%s' % nexx_id,
ie=NexxIE.ie_key(), video_id=nexx_id)
nexx_id = jwplatform_id = None
if video_id:
if re.match(NEXX_ID_RE, video_id):
return nexx_result(video_id)
elif re.match(JWPLATFORM_ID_RE, video_id):
jwplatform_id = video_id
if not nexx_id:
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._html_search_regex(
(r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](\d+)',
r'\s+id\s*=\s*["\']player_(\d{6,})',
r'\bdata-id\s*=\s*["\'](\d{6,})'), webpage, 'video id')
def extract_id(pattern, name, default=NO_DEFAULT):
return self._html_search_regex(
(r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](%s)' % pattern,
r'\s+id\s*=\s*["\']player_(%s)' % pattern,
r'\bdata-id\s*=\s*["\'](%s)' % pattern), webpage, name,
default=default)
nexx_id = extract_id(NEXX_ID_RE, 'nexx id', default=None)
if nexx_id:
return nexx_result(nexx_id)
if not jwplatform_id:
jwplatform_id = extract_id(JWPLATFORM_ID_RE, 'jwplatform id')
media = self._download_json(
'https://cdn.jwplayer.com/v2/media/' + jwplatform_id,
display_id)
nexx_id = try_get(
media, lambda x: x['playlist'][0]['nexx_id'], compat_str)
if nexx_id:
return nexx_result(nexx_id)
return self.url_result(
'https://api.nexx.cloud/v3/759/videos/byid/%s' % video_id,
ie=NexxIE.ie_key(), video_id=video_id)
'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),
video_id=jwplatform_id)

View File

@@ -38,8 +38,6 @@ class TeleQuebecIE(TeleQuebecBaseIE):
'ext': 'mp4',
'title': 'Un petit choc et puis repart!',
'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374',
'upload_date': '20180222',
'timestamp': 1519326631,
},
'params': {
'skip_download': True,

View File

@@ -10,8 +10,8 @@ from ..utils import (
class TenPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?10play\.com\.au/[^/]+/episodes/[^/]+/[^/]+/(?P<id>tpv\d{6}[a-z]{5})'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?:[^/]+/)+(?P<id>tpv\d{6}[a-z]{5})'
_TESTS = [{
'url': 'https://10play.com.au/masterchef/episodes/season-1/masterchef-s1-ep-1/tpv190718kwzga',
'info_dict': {
'id': '6060533435001',
@@ -27,7 +27,10 @@ class TenPlayIE(InfoExtractor):
'format': 'bestvideo',
'skip_download': True,
}
}
}, {
'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s'
def _real_extract(self, url):

View File

@@ -17,14 +17,12 @@ class TFOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
_TEST = {
'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
'md5': '47c987d0515561114cf03d1226a9d4c7',
'md5': 'cafbe4f47a8dae0ca0159937878100d6',
'info_dict': {
'id': '100463871',
'id': '7da3d50e495c406b8fc0b997659cc075',
'ext': 'mp4',
'title': 'Video Game Hackathon',
'description': 'md5:558afeba217c6c8d96c60e5421795c07',
'upload_date': '20160212',
'timestamp': 1455310233,
}
}

View File

@@ -31,6 +31,10 @@ class ThisOldHouseIE(InfoExtractor):
}, {
'url': 'https://www.thisoldhouse.com/21113884/s41-e13-paradise-lost',
'only_matching': True,
}, {
# iframe www.thisoldhouse.com
'url': 'https://www.thisoldhouse.com/21083431/seaside-transformation-the-westerly-project',
'only_matching': True,
}]
_ZYPE_TMPL = 'https://player.zype.com/embed/%s.html?api_key=hsOk_yMSPYNrT22e9pu8hihLXjaZf0JW5jsOWv4ZqyHJFvkJn6rtToHl09tbbsbe'
@@ -38,6 +42,6 @@ class ThisOldHouseIE(InfoExtractor):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'<iframe[^>]+src=[\'"](?:https?:)?//thisoldhouse\.chorus\.build/videos/zype/([0-9a-f]{24})',
r'<iframe[^>]+src=[\'"](?:https?:)?//(?:www\.)?thisoldhouse\.(?:chorus\.build|com)/videos/zype/([0-9a-f]{24})',
webpage, 'video id')
return self.url_result(self._ZYPE_TMPL % video_id, 'Zype', video_id)

View File

@@ -99,7 +99,7 @@ class TV4IE(InfoExtractor):
manifest_url.replace('.m3u8', '.f4m'),
video_id, f4m_id='hds', fatal=False))
formats.extend(self._extract_ism_formats(
re.sub(r'\.ism/.+?\.m3u8', r'.ism/Manifest', manifest_url),
re.sub(r'\.ism/.*?\.m3u8', r'.ism/Manifest', manifest_url),
video_id, ism_id='mss', fatal=False))
if not formats and info.get('is_geo_restricted'):

View File

@@ -6,7 +6,6 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
compat_urlparse,
)
from ..utils import (
@@ -15,9 +14,7 @@ from ..utils import (
int_or_none,
parse_iso8601,
qualities,
smuggle_url,
try_get,
unsmuggle_url,
update_url_query,
url_or_none,
)
@@ -235,11 +232,6 @@ class TVPlayIE(InfoExtractor):
]
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
video_id = self._match_id(url)
geo_country = self._search_regex(
r'https?://[^/]+\.([a-z]{2})', url,
@@ -285,8 +277,6 @@ class TVPlayIE(InfoExtractor):
'ext': ext,
}
if video_url.startswith('rtmp'):
if smuggled_data.get('skip_rtmp'):
continue
m = re.search(
r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
if not m:
@@ -347,115 +337,80 @@ class ViafreeIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
viafree\.
(?:
(?:dk|no)/programmer|
se/program
)
/(?:[^/]+/)+(?P<id>[^/?#&]+)
viafree\.(?P<country>dk|no|se)
/(?P<id>program(?:mer)?/(?:[^/]+/)+[^/?#&]+)
'''
_TESTS = [{
'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2',
'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
'info_dict': {
'id': '395375',
'id': '757786',
'ext': 'mp4',
'title': 'Husräddarna S02E02',
'description': 'md5:4db5c933e37db629b5a2f75dfb34829e',
'series': 'Husräddarna',
'season': 'Säsong 2',
'title': 'Det beste vorspielet - Sesong 2 - Episode 1',
'description': 'md5:b632cb848331404ccacd8cd03e83b4c3',
'series': 'Det beste vorspielet',
'season_number': 2,
'duration': 2576,
'timestamp': 1400596321,
'upload_date': '20140520',
'duration': 1116,
'timestamp': 1471200600,
'upload_date': '20160814',
},
'params': {
'skip_download': True,
},
'add_ie': [TVPlayIE.ie_key()],
}, {
# with relatedClips
'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-1',
'info_dict': {
'id': '758770',
'ext': 'mp4',
'title': 'Sommaren med YouTube-stjärnorna S01E01',
'description': 'md5:2bc69dce2c4bb48391e858539bbb0e3f',
'series': 'Sommaren med YouTube-stjärnorna',
'season': 'Säsong 1',
'season_number': 1,
'duration': 1326,
'timestamp': 1470905572,
'upload_date': '20160811',
},
'params': {
'skip_download': True,
},
'add_ie': [TVPlayIE.ie_key()],
'only_matching': True,
}, {
# Different og:image URL schema
'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-2',
'only_matching': True,
}, {
'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2',
'only_matching': True,
}, {
'url': 'http://www.viafree.dk/programmer/reality/paradise-hotel/saeson-7/episode-5',
'only_matching': True,
}]
_GEO_BYPASS = False
@classmethod
def suitable(cls, url):
return False if TVPlayIE.suitable(url) else super(ViafreeIE, cls).suitable(url)
def _real_extract(self, url):
video_id = self._match_id(url)
country, path = re.match(self._VALID_URL, url).groups()
content = self._download_json(
'https://viafree-content.mtg-api.com/viafree-content/v1/%s/path/%s' % (country, path), path)
program = content['_embedded']['viafreeBlocks'][0]['_embedded']['program']
guid = program['guid']
meta = content['meta']
title = meta['title']
webpage = self._download_webpage(url, video_id)
try:
stream_href = self._download_json(
program['_links']['streamLink']['href'], guid,
headers=self.geo_verification_headers())['embedded']['prioritizedStreams'][0]['links']['stream']['href']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
self.raise_geo_restricted(countries=[country])
raise
data = self._parse_json(
self._search_regex(
r'(?s)window\.App\s*=\s*({.+?})\s*;\s*</script',
webpage, 'data', default='{}'),
video_id, transform_source=lambda x: re.sub(
r'(?s)function\s+[a-zA-Z_][\da-zA-Z_]*\s*\([^)]*\)\s*{[^}]*}\s*',
'null', x), fatal=False)
formats = self._extract_m3u8_formats(stream_href, guid, 'mp4')
self._sort_formats(formats)
episode = program.get('episode') or {}
video_id = None
if data:
video_id = try_get(
data, lambda x: x['context']['dispatcher']['stores'][
'ContentPageProgramStore']['currentVideo']['id'],
compat_str)
# Fallback #1 (extract from og:image URL schema)
if not video_id:
thumbnail = self._og_search_thumbnail(webpage, default=None)
if thumbnail:
video_id = self._search_regex(
# Patterns seen:
# http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/inbox/765166/a2e95e5f1d735bab9f309fa345cc3f25.jpg
# http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/seasons/15204/758770/4a5ba509ca8bc043e1ebd1a76131cdf2.jpg
r'https?://[^/]+/imagecache/(?:[^/]+/)+(\d{6,})/',
thumbnail, 'video id', default=None)
# Fallback #2. Extract from raw JSON string.
# May extract wrong video id if relatedClips is present.
if not video_id:
video_id = self._search_regex(
r'currentVideo["\']\s*:\s*.+?["\']id["\']\s*:\s*["\'](\d{6,})',
webpage, 'video id')
return self.url_result(
smuggle_url(
'mtg:%s' % video_id,
{
'geo_countries': [
compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]],
# rtmp host mtgfs.fplive.net for viafree is unresolvable
'skip_rtmp': True,
}),
ie=TVPlayIE.ie_key(), video_id=video_id)
return {
'id': guid,
'title': title,
'thumbnail': meta.get('image'),
'description': meta.get('description'),
'series': episode.get('seriesTitle'),
'episode_number': int_or_none(episode.get('episodeNumber')),
'season_number': int_or_none(episode.get('seasonNumber')),
'duration': int_or_none(try_get(program, lambda x: x['video']['duration']['milliseconds']), 1000),
'timestamp': parse_iso8601(try_get(program, lambda x: x['availability']['start'])),
'formats': formats,
}
class TVPlayHomeIE(InfoExtractor):

View File

@@ -643,7 +643,14 @@ class TwitchStreamIE(TwitchBaseIE):
class TwitchClipsIE(TwitchBaseIE):
IE_NAME = 'twitch:clips'
_VALID_URL = r'https?://(?:clips\.twitch\.tv/(?:embed\?.*?\bclip=|(?:[^/]+/)*)|(?:www\.)?twitch\.tv/[^/]+/clip/)(?P<id>[^/?#&]+)'
_VALID_URL = r'''(?x)
https?://
(?:
clips\.twitch\.tv/(?:embed\?.*?\bclip=|(?:[^/]+/)*)|
(?:(?:www|go|m)\.)?twitch\.tv/[^/]+/clip/
)
(?P<id>[^/?#&]+)
'''
_TESTS = [{
'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat',
@@ -669,6 +676,12 @@ class TwitchClipsIE(TwitchBaseIE):
}, {
'url': 'https://clips.twitch.tv/embed?clip=InquisitiveBreakableYogurtJebaited',
'only_matching': True,
}, {
'url': 'https://m.twitch.tv/rossbroadcast/clip/ConfidentBraveHumanChefFrank',
'only_matching': True,
}, {
'url': 'https://go.twitch.tv/rossbroadcast/clip/ConfidentBraveHumanChefFrank',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -2,12 +2,17 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse_urlencode,
)
from ..utils import (
clean_html,
int_or_none,
parse_duration,
parse_iso8601,
qualities,
update_url_query,
str_or_none,
)
@@ -16,21 +21,25 @@ class UOLIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?uol\.com\.br/.*?(?:(?:mediaId|v)=|view/(?:[a-z0-9]+/)?|video(?:=|/(?:\d{4}/\d{2}/\d{2}/)?))(?P<id>\d+|[\w-]+-[A-Z0-9]+)'
_TESTS = [{
'url': 'http://player.mais.uol.com.br/player_video_v3.swf?mediaId=15951931',
'md5': '25291da27dc45e0afb5718a8603d3816',
'md5': '4f1e26683979715ff64e4e29099cf020',
'info_dict': {
'id': '15951931',
'ext': 'mp4',
'title': 'Miss simpatia é encontrada morta',
'description': 'md5:3f8c11a0c0556d66daf7e5b45ef823b2',
'timestamp': 1470421860,
'upload_date': '20160805',
}
}, {
'url': 'http://tvuol.uol.com.br/video/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
'md5': 'e41a2fb7b7398a3a46b6af37b15c00c9',
'md5': '2850a0e8dfa0a7307e04a96c5bdc5bc2',
'info_dict': {
'id': '15954259',
'ext': 'mp4',
'title': 'Incêndio destrói uma das maiores casas noturnas de Londres',
'description': 'Em Londres, um incêndio destruiu uma das maiores boates da cidade. Não há informações sobre vítimas.',
'timestamp': 1470674520,
'upload_date': '20160808',
}
}, {
'url': 'http://mais.uol.com.br/static/uolplayer/index.html?mediaId=15951931',
@@ -55,91 +64,55 @@ class UOLIE(InfoExtractor):
'only_matching': True,
}]
_FORMATS = {
'2': {
'width': 640,
'height': 360,
},
'5': {
'width': 1280,
'height': 720,
},
'6': {
'width': 426,
'height': 240,
},
'7': {
'width': 1920,
'height': 1080,
},
'8': {
'width': 192,
'height': 144,
},
'9': {
'width': 568,
'height': 320,
},
'11': {
'width': 640,
'height': 360,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
media_id = None
if video_id.isdigit():
media_id = video_id
if not media_id:
embed_page = self._download_webpage(
'https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id,
video_id, 'Downloading embed page', fatal=False)
if embed_page:
media_id = self._search_regex(
(r'uol\.com\.br/(\d+)', r'mediaId=(\d+)'),
embed_page, 'media id', default=None)
if not media_id:
webpage = self._download_webpage(url, video_id)
media_id = self._search_regex(r'mediaId=(\d+)', webpage, 'media id')
video_data = self._download_json(
'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % media_id,
media_id)['item']
# https://api.mais.uol.com.br/apiuol/v4/player/data/[MEDIA_ID]
'https://api.mais.uol.com.br/apiuol/v3/media/detail/' + video_id,
video_id)['item']
media_id = compat_str(video_data['mediaId'])
title = video_data['title']
ver = video_data.get('revision', 2)
query = {
'ver': video_data.get('numRevision', 2),
'r': 'http://mais.uol.com.br',
}
for k in ('token', 'sign'):
v = video_data.get(k)
if v:
query[k] = v
uol_formats = self._download_json(
'https://croupier.mais.uol.com.br/v3/formats/%s/jsonp' % media_id,
media_id)
quality = qualities(['mobile', 'WEBM', '360p', '720p', '1080p'])
formats = []
for f in video_data.get('formats', []):
for format_id, f in uol_formats.items():
if not isinstance(f, dict):
continue
f_url = f.get('url') or f.get('secureUrl')
if not f_url:
continue
query = {
'ver': ver,
'r': 'http://mais.uol.com.br',
}
for k in ('token', 'sign'):
v = f.get(k)
if v:
query[k] = v
f_url = update_url_query(f_url, query)
format_id = str_or_none(f.get('id'))
if format_id == '10':
formats.extend(self._extract_m3u8_formats(
f_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
format_id = format_id
if format_id == 'HLS':
m3u8_formats = self._extract_m3u8_formats(
f_url, media_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False)
encoded_query = compat_urllib_parse_urlencode(query)
for m3u8_f in m3u8_formats:
m3u8_f['extra_param_to_segment_url'] = encoded_query
m3u8_f['url'] = update_url_query(m3u8_f['url'], query)
formats.extend(m3u8_formats)
continue
fmt = {
formats.append({
'format_id': format_id,
'url': f_url,
'source_preference': 1,
}
fmt.update(self._FORMATS.get(format_id, {}))
formats.append(fmt)
self._sort_formats(formats, ('height', 'width', 'source_preference', 'tbr', 'ext'))
'quality': quality(format_id),
'preference': -1,
})
self._sort_formats(formats)
tags = []
for tag in video_data.get('tags', []):
@@ -148,12 +121,24 @@ class UOLIE(InfoExtractor):
continue
tags.append(tag_description)
thumbnails = []
for q in ('Small', 'Medium', 'Wmedium', 'Large', 'Wlarge', 'Xlarge'):
q_url = video_data.get('thumb' + q)
if not q_url:
continue
thumbnails.append({
'id': q,
'url': q_url,
})
return {
'id': media_id,
'title': title,
'description': clean_html(video_data.get('desMedia')),
'thumbnail': video_data.get('thumbnail'),
'duration': int_or_none(video_data.get('durationSeconds')) or parse_duration(video_data.get('duration')),
'description': clean_html(video_data.get('description')),
'thumbnails': thumbnails,
'duration': parse_duration(video_data.get('duration')),
'tags': tags,
'formats': formats,
'timestamp': parse_iso8601(video_data.get('publishDate'), ' '),
'view_count': int_or_none(video_data.get('viewsQtty')),
}

View File

@@ -140,28 +140,28 @@ class VimeoBaseInfoExtractor(InfoExtractor):
})
# TODO: fix handling of 308 status code returned for live archive manifest requests
sep_pattern = r'/sep/video/'
for files_type in ('hls', 'dash'):
for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items():
manifest_url = cdn_data.get('url')
if not manifest_url:
continue
format_id = '%s-%s' % (files_type, cdn_name)
if files_type == 'hls':
formats.extend(self._extract_m3u8_formats(
manifest_url, video_id, 'mp4',
'm3u8' if is_live else 'm3u8_native', m3u8_id=format_id,
note='Downloading %s m3u8 information' % cdn_name,
fatal=False))
elif files_type == 'dash':
mpd_pattern = r'/%s/(?:sep/)?video/' % video_id
mpd_manifest_urls = []
if re.search(mpd_pattern, manifest_url):
for suffix, repl in (('', 'video'), ('_sep', 'sep/video')):
mpd_manifest_urls.append((format_id + suffix, re.sub(
mpd_pattern, '/%s/%s/' % (video_id, repl), manifest_url)))
else:
mpd_manifest_urls = [(format_id, manifest_url)]
for f_id, m_url in mpd_manifest_urls:
sep_manifest_urls = []
if re.search(sep_pattern, manifest_url):
for suffix, repl in (('', 'video'), ('_sep', 'sep/video')):
sep_manifest_urls.append((format_id + suffix, re.sub(
sep_pattern, '/%s/' % repl, manifest_url)))
else:
sep_manifest_urls = [(format_id, manifest_url)]
for f_id, m_url in sep_manifest_urls:
if files_type == 'hls':
formats.extend(self._extract_m3u8_formats(
m_url, video_id, 'mp4',
'm3u8' if is_live else 'm3u8_native', m3u8_id=f_id,
note='Downloading %s m3u8 information' % cdn_name,
fatal=False))
elif files_type == 'dash':
if 'json=1' in m_url:
real_m_url = (self._download_json(m_url, video_id, fatal=False) or {}).get('url')
if real_m_url:
@@ -170,11 +170,6 @@ class VimeoBaseInfoExtractor(InfoExtractor):
m_url.replace('/master.json', '/master.mpd'), video_id, f_id,
'Downloading %s MPD information' % cdn_name,
fatal=False)
for f in mpd_formats:
if f.get('vcodec') == 'none':
f['preference'] = -50
elif f.get('acodec') == 'none':
f['preference'] = -40
formats.extend(mpd_formats)
live_archive = live_event.get('archive') or {}
@@ -186,6 +181,12 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'preference': 1,
})
for f in formats:
if f.get('vcodec') == 'none':
f['preference'] = -50
elif f.get('acodec') == 'none':
f['preference'] = -40
subtitles = {}
text_tracks = config['request'].get('text_tracks')
if text_tracks:
@@ -585,7 +586,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
url = 'https://vimeo.com/' + video_id
elif is_player:
url = 'https://player.vimeo.com/video/' + video_id
elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf', '/album/', '/showcase/')):
elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')):
url = 'https://vimeo.com/' + video_id
try:

View File

@@ -98,9 +98,9 @@ class XTubeIE(InfoExtractor):
title = config.get('title')
thumbnail = config.get('poster')
duration = int_or_none(config.get('duration'))
sources = config.get('sources')
sources = config.get('sources') or config.get('format')
if isinstance(sources, dict):
if not isinstance(sources, dict):
sources = self._parse_json(self._search_regex(
r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
webpage, 'sources', group='sources'), video_id,

View File

@@ -12,6 +12,7 @@ from ..compat import (
)
from ..utils import (
clean_html,
ExtractorError,
int_or_none,
mimetype2ext,
parse_iso8601,
@@ -368,31 +369,47 @@ class YahooGyaOPlayerIE(InfoExtractor):
'url': 'https://gyao.yahoo.co.jp/episode/%E3%81%8D%E3%81%AE%E3%81%86%E4%BD%95%E9%A3%9F%E3%81%B9%E3%81%9F%EF%BC%9F%20%E7%AC%AC2%E8%A9%B1%202019%2F4%2F12%E6%94%BE%E9%80%81%E5%88%86/5cb02352-b725-409e-9f8d-88f947a9f682',
'only_matching': True,
}]
_GEO_BYPASS = False
def _real_extract(self, url):
video_id = self._match_id(url).replace('/', ':')
video = self._download_json(
'https://gyao.yahoo.co.jp/dam/v1/videos/' + video_id,
video_id, query={
'fields': 'longDescription,title,videoId',
}, headers={
'X-User-Agent': 'Unknown Pc GYAO!/2.0.0 Web',
})
headers = self.geo_verification_headers()
headers['Accept'] = 'application/json'
resp = self._download_json(
'https://gyao.yahoo.co.jp/apis/playback/graphql', video_id, query={
'appId': 'dj00aiZpPUNJeDh2cU1RazU3UCZzPWNvbnN1bWVyc2VjcmV0Jng9NTk-',
'query': '''{
content(parameter: {contentId: "%s", logicaAgent: PC_WEB}) {
video {
delivery {
id
}
title
}
}
}''' % video_id,
}, headers=headers)
content = resp['data']['content']
if not content:
msg = resp['errors'][0]['message']
if msg == 'not in japan':
self.raise_geo_restricted(countries=['JP'])
raise ExtractorError(msg)
video = content['video']
return {
'_type': 'url_transparent',
'id': video_id,
'title': video['title'],
'url': smuggle_url(
'http://players.brightcove.net/4235717419001/SyG5P0gjb_default/index.html?videoId=' + video['videoId'],
'http://players.brightcove.net/4235717419001/SyG5P0gjb_default/index.html?videoId=' + video['delivery']['id'],
{'geo_countries': ['JP']}),
'description': video.get('longDescription'),
'ie_key': BrightcoveNewIE.ie_key(),
}
class YahooGyaOIE(InfoExtractor):
IE_NAME = 'yahoo:gyao'
_VALID_URL = r'https?://(?:gyao\.yahoo\.co\.jp/(?:p|title/[^/]+)|streaming\.yahoo\.co\.jp/p/y)/(?P<id>\d+/v\d+|[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_VALID_URL = r'https?://(?:gyao\.yahoo\.co\.jp/(?:p|title(?:/[^/]+)?)|streaming\.yahoo\.co\.jp/p/y)/(?P<id>\d+/v\d+|[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'https://gyao.yahoo.co.jp/p/00449/v03102/',
'info_dict': {
@@ -405,6 +422,9 @@ class YahooGyaOIE(InfoExtractor):
}, {
'url': 'https://gyao.yahoo.co.jp/title/%E3%81%97%E3%82%83%E3%81%B9%E3%81%8F%E3%82%8A007/5b025a49-b2e5-4dc7-945c-09c6634afacf',
'only_matching': True,
}, {
'url': 'https://gyao.yahoo.co.jp/title/5b025a49-b2e5-4dc7-945c-09c6634afacf',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -5,7 +5,6 @@ import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
sanitized_Request,
str_to_int,
unescapeHTML,
unified_strdate,
@@ -15,7 +14,7 @@ from ..aes import aes_decrypt_text
class YouPornIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?youporn\.com/watch/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?youporn\.com/(?:watch|embed)/(?P<id>\d+)(?:/(?P<display_id>[^/?#&]+))?'
_TESTS = [{
'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'md5': '3744d24c50438cf5b6f6d59feb5055c2',
@@ -57,16 +56,28 @@ class YouPornIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.youporn.com/embed/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'only_matching': True,
}, {
'url': 'http://www.youporn.com/watch/505835',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?youporn\.com/embed/\d+)',
webpage)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
display_id = mobj.group('display_id') or video_id
request = sanitized_Request(url)
request.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(request, display_id)
webpage = self._download_webpage(
'http://www.youporn.com/watch/%s' % video_id, display_id,
headers={'Cookie': 'age_verified=1'})
title = self._html_search_regex(
r'(?s)<div[^>]+class=["\']watchVideoTitle[^>]+>(.+?)</div>',

View File

@@ -388,8 +388,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(?:www\.)?invidious\.drycat\.fr/|
(?:www\.)?tube\.poal\.co/|
(?:www\.)?vid\.wxzm\.sx/|
(?:www\.)?yewtu\.be/|
(?:www\.)?yt\.elukerio\.org/|
(?:www\.)?yt\.lelux\.fi/|
(?:www\.)?invidious\.ggc-project\.de/|
(?:www\.)?yt\.maisputain\.ovh/|
(?:www\.)?invidious\.13ad\.de/|
(?:www\.)?invidious\.toot\.koeln/|
(?:www\.)?invidious\.fdn\.fr/|
(?:www\.)?watch\.nettohikari\.com/|
(?:www\.)?kgg2m7yk5aybusll\.onion/|
(?:www\.)?qklhadlycap4cnod\.onion/|
(?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion/|
@@ -397,6 +404,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(?:www\.)?fz253lmuao3strwbfbmx46yu7acac2jz27iwtorgmbqlkurlclmancad\.onion/|
(?:www\.)?invidious\.l4qlywnpwqsluw65ts7md3khrivpirse744un3x7mlskqauz5pyuzgqd\.onion/|
(?:www\.)?owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya\.b32\.i2p/|
(?:www\.)?4l2dgddgsrkf2ous66i6seeyi6etzfgrue332grh2n7madpwopotugyd\.onion/|
youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains
(?:.*?\#/)? # handle anchor (#/) redirect urls
(?: # the various things that can precede the ID:
@@ -426,6 +434,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(?(1).+)? # if we found the ID, everything can follow
$""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
_NEXT_URL_RE = r'[\?&]next_url=([^&]+)'
_PLAYER_INFO_RE = (
r'/(?P<id>[a-zA-Z0-9_-]{8,})/player_ias\.vflset(?:/[a-zA-Z]{2,3}_[a-zA-Z]{2,3})?/base\.(?P<ext>[a-z]+)$',
r'\b(?P<id>vfl[a-zA-Z0-9_-]+)\b.*?\.(?P<ext>[a-z]+)$',
)
_formats = {
'5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
'6': {'ext': 'flv', 'width': 450, 'height': 270, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
@@ -1227,6 +1239,26 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'url': 'https://www.youtubekids.com/watch?v=3b8nCWDgZ6Q',
'only_matching': True,
},
{
# invalid -> valid video id redirection
'url': 'DJztXj2GPfl',
'info_dict': {
'id': 'DJztXj2GPfk',
'ext': 'mp4',
'title': 'Panjabi MC - Mundian To Bach Ke (The Dictator Soundtrack)',
'description': 'md5:bf577a41da97918e94fa9798d9228825',
'upload_date': '20090125',
'uploader': 'Prochorowka',
'uploader_id': 'Prochorowka',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Prochorowka',
'artist': 'Panjabi MC',
'track': 'Beware of the Boys (Mundian to Bach Ke) - Motivo Hi-Lectro Remix',
'album': 'Beware of the Boys (Mundian To Bach Ke)',
},
'params': {
'skip_download': True,
},
}
]
def __init__(self, *args, **kwargs):
@@ -1253,14 +1285,18 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
""" Return a string representation of a signature """
return '.'.join(compat_str(len(part)) for part in example_sig.split('.'))
def _extract_signature_function(self, video_id, player_url, example_sig):
id_m = re.match(
r'.*?[-.](?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player(?:-new)?|(?:/[a-z]{2,3}_[A-Z]{2})?/base)?\.(?P<ext>[a-z]+)$',
player_url)
if not id_m:
@classmethod
def _extract_player_info(cls, player_url):
for player_re in cls._PLAYER_INFO_RE:
id_m = re.search(player_re, player_url)
if id_m:
break
else:
raise ExtractorError('Cannot identify player %r' % player_url)
player_type = id_m.group('ext')
player_id = id_m.group('id')
return id_m.group('ext'), id_m.group('id')
def _extract_signature_function(self, video_id, player_url, example_sig):
player_type, player_id = self._extract_player_info(player_url)
# Read from filesystem cache
func_id = '%s_%s_%s' % (
@@ -1678,7 +1714,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Get video webpage
url = proto + '://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1&bpctr=9999999999' % video_id
video_webpage = self._download_webpage(url, video_id)
video_webpage, urlh = self._download_webpage_handle(url, video_id)
qs = compat_parse_qs(compat_urllib_parse_urlparse(urlh.geturl()).query)
video_id = qs.get('v', [None])[0] or video_id
# Attempt to extract SWF player URL
mobj = re.search(r'swfConfig.*?"(https?:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage)
@@ -1840,15 +1879,26 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# fields may contain comma as well (see
# https://github.com/ytdl-org/youtube-dl/issues/8536)
feed_data = compat_parse_qs(compat_urllib_parse_unquote_plus(feed))
def feed_entry(name):
return try_get(feed_data, lambda x: x[name][0], compat_str)
feed_id = feed_entry('id')
if not feed_id:
continue
feed_title = feed_entry('title')
title = video_title
if feed_title:
title += ' (%s)' % feed_title
entries.append({
'_type': 'url_transparent',
'ie_key': 'Youtube',
'url': smuggle_url(
'%s://www.youtube.com/watch?v=%s' % (proto, feed_data['id'][0]),
{'force_singlefeed': True}),
'title': '%s (%s)' % (video_title, feed_data['title'][0]),
'title': title,
})
feed_ids.append(feed_data['id'][0])
feed_ids.append(feed_id)
self.to_screen(
'Downloading multifeed video (%s) - add --no-playlist to just download video %s'
% (', '.join(feed_ids), video_id))
@@ -1919,12 +1969,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}
for fmt in streaming_formats:
if fmt.get('drm_families'):
if fmt.get('drmFamilies') or fmt.get('drm_families'):
continue
url = url_or_none(fmt.get('url'))
if not url:
cipher = fmt.get('cipher')
cipher = fmt.get('cipher') or fmt.get('signatureCipher')
if not cipher:
continue
url_data = compat_parse_qs(cipher)
@@ -1975,22 +2025,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if self._downloader.params.get('verbose'):
if player_url is None:
player_version = 'unknown'
player_desc = 'unknown'
else:
if player_url.endswith('swf'):
player_version = self._search_regex(
r'-(.+?)(?:/watch_as3)?\.swf$', player_url,
'flash player', fatal=False)
player_desc = 'flash player %s' % player_version
else:
player_version = self._search_regex(
[r'html5player-([^/]+?)(?:/html5player(?:-new)?)?\.js',
r'(?:www|player(?:_ias)?)[-.]([^/]+)(?:/[a-z]{2,3}_[A-Z]{2})?/base\.js'],
player_url,
'html5 player', fatal=False)
player_desc = 'html5 player %s' % player_version
player_type, player_version = self._extract_player_info(player_url)
player_desc = '%s player %s' % ('flash' if player_type == 'swf' else 'html5', player_version)
parts_sizes = self._signature_cache_id(encrypted_sig)
self.to_screen('{%s} signature length %s, %s' %
(format_id, parts_sizes, player_desc))

View File

@@ -853,7 +853,7 @@ def parseOpts(overrideArguments=None):
postproc.add_option(
'--exec',
metavar='CMD', dest='exec_cmd',
help='Execute a command on the file after downloading, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'')
help='Execute a command on the file after downloading and post-processing, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'')
postproc.add_option(
'--convert-subs', '--convert-subtitles',
metavar='FORMAT', dest='convertsubtitles', default=None,

View File

@@ -447,6 +447,13 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
metadata[meta_f] = info[info_f]
break
# See [1-4] for some info on media metadata/metadata supported
# by ffmpeg.
# 1. https://kdenlive.org/en/project/adding-meta-data-to-mp4-video/
# 2. https://wiki.multimedia.cx/index.php/FFmpeg_Metadata
# 3. https://kodi.wiki/view/Video_file_tagging
# 4. http://atomicparsley.sourceforge.net/mpeg-4files.html
add('title', ('track', 'title'))
add('date', 'upload_date')
add(('description', 'comment'), 'description')
@@ -457,6 +464,10 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
add('album')
add('album_artist')
add('disc', 'disc_number')
add('show', 'series')
add('season_number')
add('episode_id', ('episode', 'episode_id'))
add('episode_sort', 'episode_number')
if not metadata:
self._downloader.to_screen('[ffmpeg] There isn\'t any metadata to add')

View File

@@ -7,6 +7,7 @@ import base64
import binascii
import calendar
import codecs
import collections
import contextlib
import ctypes
import datetime
@@ -30,6 +31,7 @@ import ssl
import subprocess
import sys
import tempfile
import time
import traceback
import xml.etree.ElementTree
import zlib
@@ -1835,6 +1837,12 @@ def write_json_file(obj, fn):
os.unlink(fn)
except OSError:
pass
try:
mask = os.umask(0)
os.umask(mask)
os.chmod(tf.name, 0o666 & ~mask)
except OSError:
pass
os.rename(tf.name, fn)
except Exception:
try:
@@ -2729,15 +2737,72 @@ class YoutubeDLHTTPSHandler(compat_urllib_request.HTTPSHandler):
class YoutubeDLCookieJar(compat_cookiejar.MozillaCookieJar):
"""
See [1] for cookie file format.
1. https://curl.haxx.se/docs/http-cookies.html
"""
_HTTPONLY_PREFIX = '#HttpOnly_'
_ENTRY_LEN = 7
_HEADER = '''# Netscape HTTP Cookie File
# This file is generated by youtube-dl. Do not edit.
'''
_CookieFileEntry = collections.namedtuple(
'CookieFileEntry',
('domain_name', 'include_subdomains', 'path', 'https_only', 'expires_at', 'name', 'value'))
def save(self, filename=None, ignore_discard=False, ignore_expires=False):
"""
Save cookies to a file.
Most of the code is taken from CPython 3.8 and slightly adapted
to support cookie files with UTF-8 in both python 2 and 3.
"""
if filename is None:
if self.filename is not None:
filename = self.filename
else:
raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT)
# Store session cookies with `expires` set to 0 instead of an empty
# string
for cookie in self:
if cookie.expires is None:
cookie.expires = 0
compat_cookiejar.MozillaCookieJar.save(self, filename, ignore_discard, ignore_expires)
with io.open(filename, 'w', encoding='utf-8') as f:
f.write(self._HEADER)
now = time.time()
for cookie in self:
if not ignore_discard and cookie.discard:
continue
if not ignore_expires and cookie.is_expired(now):
continue
if cookie.secure:
secure = 'TRUE'
else:
secure = 'FALSE'
if cookie.domain.startswith('.'):
initial_dot = 'TRUE'
else:
initial_dot = 'FALSE'
if cookie.expires is not None:
expires = compat_str(cookie.expires)
else:
expires = ''
if cookie.value is None:
# cookies.txt regards 'Set-Cookie: foo' as a cookie
# with no name, whereas http.cookiejar regards it as a
# cookie with no value.
name = ''
value = cookie.name
else:
name = cookie.name
value = cookie.value
f.write(
'\t'.join([cookie.domain, initial_dot, cookie.path,
secure, expires, name, value]) + '\n')
def load(self, filename=None, ignore_discard=False, ignore_expires=False):
"""Load cookies from a file."""
@@ -2747,17 +2812,30 @@ class YoutubeDLCookieJar(compat_cookiejar.MozillaCookieJar):
else:
raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT)
def prepare_line(line):
if line.startswith(self._HTTPONLY_PREFIX):
line = line[len(self._HTTPONLY_PREFIX):]
# comments and empty lines are fine
if line.startswith('#') or not line.strip():
return line
cookie_list = line.split('\t')
if len(cookie_list) != self._ENTRY_LEN:
raise compat_cookiejar.LoadError('invalid length %d' % len(cookie_list))
cookie = self._CookieFileEntry(*cookie_list)
if cookie.expires_at and not cookie.expires_at.isdigit():
raise compat_cookiejar.LoadError('invalid expires at %s' % cookie.expires_at)
return line
cf = io.StringIO()
with open(filename) as f:
with io.open(filename, encoding='utf-8') as f:
for line in f:
if line.startswith(self._HTTPONLY_PREFIX):
line = line[len(self._HTTPONLY_PREFIX):]
# Cookie file may contain spaces instead of tabs.
# Replace all spaces with tabs to make such cookie files work
# with MozillaCookieJar.
if not line.startswith('#'):
line = re.sub(r' +', r'\t', line)
cf.write(compat_str(line))
try:
cf.write(prepare_line(line))
except compat_cookiejar.LoadError as e:
write_string(
'WARNING: skipping cookie file entry due to %s: %r\n'
% (e, line), sys.stderr)
continue
cf.seek(0)
self._really_load(cf, filename, ignore_discard, ignore_expires)
# Session cookies are denoted by either `expires` field set to

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2020.03.08'
__version__ = '2020.05.29'