Compare commits

..

110 Commits

Author SHA1 Message Date
96f88e91b7 release 2016.06.23 2016-06-23 04:29:34 +07:00
3331a4644d [vk] Remove unused import 2016-06-23 04:27:10 +07:00
adf1921dc1 [xnxx] Improve _VALID_URL (Closes #9858) 2016-06-23 04:26:49 +07:00
97674f0419 [xnxx] Replace test 2016-06-23 04:24:00 +07:00
rr-
73843ae8ac [xnxx] fix url regex
The pattern has changed from "video123412" to "video-o8xa19".
The changes maintain backwards compatibility with old-style URLs.
2016-06-23 04:19:55 +07:00
f2bb8c036a [vk] Modernize 2016-06-23 04:18:43 +07:00
75ca6bcee2 [vk] Workaround buggy new.vk.com Set-Cookie headers 2016-06-23 04:17:13 +07:00
089657ed1f [vimeo:album] Add paged example URL 2016-06-23 02:00:03 +07:00
b5eab86c24 [vimeo:album] Impove _VALID_URL 2016-06-23 01:56:58 +07:00
c8e3e0974b [vimeo:channel] Improve playlist extraction 2016-06-23 01:28:36 +07:00
dfc8f46e1c [vimeo:channel] Add video id to url_result
This will allow us to decide much faster that we don't want an already archived video,
and will allow having to download webpages for each video that has already been downloaded,
thus significantly speeding up the archival of channels that have no new content.
2016-06-23 01:26:27 +07:00
c143ddce5d [vimeo] Override original URL only when necessary 2016-06-23 00:51:36 +07:00
169d836feb lazy-extractors: Fix after commit 6e6b9f600f
The problem was in the following code:

    class ArteTVPlus7IE(ArteTVBaseIE):

        ...

        @classmethod
        def suitable(cls, url):
            return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)

And its sublcasses like ArteTVCinemaIE.

Since in the lazy_extractors.py file ArteTVCinemaIE was not a subclass of ArteTVPlus7IE, super(ArteTVPlus7IE, cls) failed.

To fix it we have to make it a subclass. Since the order of _ALL_CLASSES is arbitrary we must sort them so that the base classes are defined first. We also must add base classes like YoutubeBaseInfoExtractor.
2016-06-22 19:20:50 +02:00
6ae938b295 [Vine] Extract view count 2016-06-22 23:57:35 +07:00
cf40fdf5c1 release 2016.06.22 2016-06-22 23:43:24 +07:00
23bdae0955 [svt] Various improvements
+ [svt:play] Add fallback path looking for video id and fix extraction for oppetarkiv
* [svt:base] Detect geo restriction
* [svt:base] Extract series related metadata
2016-06-22 23:36:07 +07:00
ca74c90bf5 Fix issue downloading facebook videos
youtube-dl expects the format items to be returned as a list,
but when there's only one item Facebook returns a dict instead,
this wraps the dict in a list if necessary
2016-06-22 12:52:15 +01:00
7cfc1e2a10 [gametrailers] Remove extractor
gametrailers closed (see http://www.polygon.com/2016/2/8/10944452/gametrailers-shuts-down-after-13-year-run)
2016-06-21 22:31:41 +07:00
1ac5705f62 [gamespot] extract all formats 2016-06-21 13:37:57 +01:00
e4f90ea0a7 [svt] Fix extraction for SVTPlay (closes #9809) 2016-06-21 17:55:53 +08:00
cdfc187cd5 [cbs] Remove unused import 2016-06-20 22:40:33 +07:00
feef925f49 [streamcloud] Capture error message (#9840) 2016-06-20 22:40:22 +07:00
19e2d1cdea release 2016.06.20 2016-06-20 20:50:01 +07:00
8369a4fe76 [downloader/hls] Simplify and carry long lines 2016-06-20 21:55:17 +07:00
1f749b6658 Revert "[jsinterp] Avoid double key lookup for setting new key"
This reverts commit 7c05097633.
2016-06-20 13:29:13 +02:00
819707920a [cbs] fix _VALID_URL 2016-06-19 23:55:19 +01:00
43518503a6 [cbs,cbsnews,cbssports] reduce requests while extracting all formats 2016-06-19 23:40:00 +01:00
5839d556e4 [theplatform] reduce requests for theplatform feed info extraction 2016-06-19 23:37:05 +01:00
6c83e583b3 [radiojavan] PEP8
E275 is added in pycodestyle 2.6

See https://github.com/PyCQA/pycodestyle/pull/491
2016-06-19 13:32:08 +08:00
6aeb64b673 Merge pull request #8201 from remitamine/hls-aes
[downloader/hls] Add support for AES-128 encrypted segments in hlsnative downloader
2016-06-19 13:25:08 +08:00
6cd64b6806 [foxsports] extract http formats 2016-06-19 05:45:48 +01:00
e154c65128 [downloader/hls] Add support for AES-128 encrypted segments in hlsnative downloader 2016-06-19 01:01:40 +01:00
a50fd6e026 release 2016.06.19.1 2016-06-19 03:57:14 +07:00
6a55bb66ee [vimeo] Fix rented videos (Closes #9830) 2016-06-19 03:56:01 +07:00
7c05097633 [jsinterp] Avoid double key lookup for setting new key
In order to add a new key to both __objects and __functions dicts on jsinterp.py, it is
necessary to first verify if a key was present and if not, create the key and
assign it to a value.

However, this can be done with a single step using dict setdefault method.
2016-06-19 03:29:45 +07:00
589568789f release 2016.06.19 2016-06-19 02:30:29 +07:00
7577d849a6 [r7] Fix extraction and add support for articles (Closes #9826) 2016-06-19 02:25:34 +07:00
cb23192bc4 [closertotruth] Update and improve (Closes #8680) 2016-06-19 00:35:29 +07:00
41c1023300 [closertotruth] Add extractor
Removed print statement from code.

Replaced two regex searches with the corret ones.

Removed some unnecessary semicolumns

fixed title extraction

refactored everything to search_regex

processed comments on commit 5650b0d, fixed feedback from flake8

Improved regexes and returns info dict now.

Added support for closertotruth interview URL

Added support for episodes page
2016-06-18 23:19:56 +07:00
90b6288cce [arte:+7] Simplify _VALID_URL 2016-06-18 22:23:48 +07:00
c1823c8ad9 [README.md] Remove 'small' from description (#9814) 2016-06-18 22:08:48 +07:00
d7c6c656c5 [arte:+7] Expand _VALID_URL (Closes #9820) 2016-06-18 21:42:17 +07:00
b0b128049a [extractors] Update references to sportschau (#9799) 2016-06-18 13:43:47 +08:00
e8f13f2637 [sportschau.de] Fix extraction and moved to its own file (closes #9799) 2016-06-18 13:42:58 +08:00
b5aad37f6b [ard] Remove SportschauIE, which is now based on WDR (#9799) 2016-06-18 13:42:39 +08:00
6d0d4fc26d [wdr] Add WDRBaseIE, for Sportschau (#9799) 2016-06-18 13:40:55 +08:00
0278aa443f [br] Skip invalid tests 2016-06-18 12:53:48 +08:00
1f35745758 [azubu] Don't fail on optional fields 2016-06-18 12:39:08 +08:00
573c35272f [bbc] Skip a geo-restricted test case 2016-06-18 12:35:55 +08:00
09e3f91e40 [arte] Update _TESTS and fix for pages with multiple YouTube videos
Some tests are from #6895 and #6613
2016-06-18 12:34:58 +08:00
1b6cf16be7 [aftonbladet] Fix extraction 2016-06-18 12:27:39 +08:00
26264cb056 [adobetv] Use embedded data in the webpage
Sometimes the HTML webpage is returned even with '?format=json'
2016-06-18 12:21:40 +08:00
a72df5f36f [mtvservices] Fix ext for RTMP streams 2016-06-18 12:19:06 +08:00
c878e635de [bet] Moved to MTVServices 2016-06-18 12:17:24 +08:00
0f47cc2e92 release 2016.06.18.1 2016-06-18 06:20:34 +07:00
5fc2757682 release 2016.06.18 2016-06-18 06:00:05 +07:00
e3944c2621 [pornhd] Add working test 2016-06-18 05:50:17 +07:00
667d96480b [pornhd] Detect removed videos and modernize 2016-06-18 05:42:20 +07:00
e6fe993c31 [pornhd] Improve formats extraction 2016-06-18 05:37:53 +07:00
d0d93f76ea [pornhd] Fix metadata extraction 2016-06-18 05:30:46 +07:00
20a6a154fe [mtv] Use compat_xpath and fix FutureWarning 2016-06-18 04:46:26 +07:00
f011876076 [nickde] Add extractor (Closes #9778) 2016-06-18 04:40:48 +07:00
6929569403 [mitele] Extract series metadata and make title more robust (Closes #9758) 2016-06-18 04:06:19 +07:00
eb451890da [carambatv] Add extractor (Closes #9815) 2016-06-18 03:04:14 +07:00
ded7511a70 [bbccouk] Add support for playlists (Closes #9812) 2016-06-17 23:42:52 +07:00
d2161cade5 release 2016.06.16 2016-06-16 22:40:55 +07:00
27e5fa8198 [cda] Fix extraction (Closes #9803) 2016-06-16 22:33:12 +07:00
efbd1eb51a [wimp] Fix extraction and update _TESTS 2016-06-16 12:27:21 +08:00
369ff75081 [jwplatform] Improved JWPlayer support 2016-06-16 12:26:45 +08:00
47212f7bcb [utils] Don't transform numbers not starting with a zero
Fix test_Viidea and maybe others
2016-06-16 11:00:54 +08:00
4c93ee8d14 [imdb] Improve _VALID_URL (Closes #9788) 2016-06-15 22:34:55 +07:00
8bc4dbb1af [wrzuta.pl] Detect error and update _TESTS 2016-06-14 11:14:59 +08:00
6c3760292c [pornhub] Improve title extraction (Closes #9777) 2016-06-14 04:57:59 +07:00
4cef70db6c [devscripts/release.sh] Add flag for gpg-sign commits 2016-06-14 03:16:56 +07:00
ff4af6ec59 [lynda] Remove superfluous _NETRC_MACHINE 2016-06-14 02:49:33 +07:00
d01fb21d4c release 2016.06.14 2016-06-14 02:19:42 +07:00
a4ea28eee6 Credit @venth for wrzuta:playlist (#9341) 2016-06-14 02:15:47 +07:00
bc2a871f3e Credit @dracony for rockstargames (#9737) 2016-06-14 02:15:09 +07:00
1759672eed [wrzuta:playlist] Improve and simplify (Closes #9341) 2016-06-14 02:13:54 +07:00
fea55ef4a9 [wrzuta.pl:playlist] Added playlist extraction from wrzuta.pl 2016-06-14 02:10:48 +07:00
16b6bd01d2 [rockstargames] Improve and add Youtube fallback (Closes #9737) 2016-06-14 01:11:24 +07:00
14d0f4e0f3 Added extractor for rockstargames.com 2016-06-14 01:09:35 +07:00
778f969447 [twitch:clips] Add extractor (Closes #9767) 2016-06-14 00:06:31 +07:00
79cd8b3d8a [README.md] Suggest checking extractor code under all Python versions 2016-06-13 10:04:04 +07:00
b4663f12b1 [README.md] Update links to info dict metafields 2016-06-13 07:16:35 +07:00
b50e02c1e4 [README.md] Update links to options available for YoutubeDL 2016-06-13 07:05:32 +07:00
33b72ce64e [xfileshare] Improve removed videos detection 2016-06-13 01:19:54 +07:00
cf2bf840ba [xfileshare] Fix test 2016-06-13 01:11:14 +07:00
bccdac6874 [xfileshare:xvidstage] Add support for videos with packed codes (Closes #4335) 2016-06-13 01:11:04 +07:00
e69f9f5d68 [downloader/external] Decode error string before writing to stderr 2016-06-12 16:45:07 +07:00
77a9a9c295 release 2016.06.12 2016-06-12 12:06:48 +07:00
84dcd1c4e4 [streamcloud] Detect removed videos (Closes #3768) 2016-06-12 11:08:39 +07:00
971e3b7520 [nrk:skole] Fix extraction 2016-06-12 07:20:37 +07:00
4e79011729 [nrktv] Fix tests 2016-06-12 06:57:04 +07:00
a936ac321c [README.md] Document using output template in batch files (Closes #9717) 2016-06-12 06:39:31 +07:00
98960c911c [instagram] Extract metadata from JSON 2016-06-12 06:06:04 +07:00
329ca3bef6 [utils] Add try_get
To reduce boilerplate when accessing JSON
2016-06-12 06:05:34 +07:00
2c3322e36e [youporn] Fix metadata extraction 2016-06-12 04:49:37 +07:00
80ae228b34 [matchtv] Modernize 2016-06-12 01:57:23 +07:00
6d28c408cf [viki] Do not use a fallback language for title in the first try
In test_Viki_3, 'titles' gives a Hebrew title.
2016-06-11 23:00:44 +08:00
c83b35d4aa [viki] Update _TESTS 2016-06-11 22:39:13 +08:00
94e5d6aedb [viki] Skip a geo-restricted test 2016-06-11 21:49:01 +08:00
531a74968c [vimeo] Fix extraction for VimeoReview videos 2016-06-11 21:35:08 +08:00
c5edd147d1 [generic] Remove an invalid test
Now handled by telewebion.py
2016-06-11 18:39:58 +08:00
856150d056 [telewebion] Add new extractor (closes #5135) 2016-06-11 18:39:58 +08:00
03ebea89b0 Merge pull request #9755 from vxbinaca/patch-2
[utils] Change Firefox 44 to 47
2016-06-11 17:38:45 +08:00
15d106787e [utils] Change Firefox 44 to 47
See commit title.
2016-06-11 05:36:31 -04:00
7aab3696dd [kuwo] Update _TESTS 2016-06-11 15:37:04 +08:00
47787efa2b [leeco] Recognize Le Sports URLs (fixes #9750) 2016-06-11 13:14:41 +08:00
4a420119a6 release 2016.06.11.3 2016-06-11 08:34:30 +07:00
65 changed files with 1789 additions and 809 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.11.2*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.23*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.11.2** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.23**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.06.11.2 [debug] youtube-dl version 2016.06.23
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -173,3 +173,5 @@ Kevin Deldycke
inondle inondle
Tomáš Čech Tomáš Čech
Déstin Reed Déstin Reed
Roman Tsiupa
Artur Krysiak

View File

@ -142,9 +142,9 @@ After you have ensured this site is distributing it's content legally, you can f
``` ```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`. 8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). 9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this: 10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py

View File

@ -44,7 +44,7 @@ Or with [MacPorts](https://www.macports.org/):
Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html). Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
# DESCRIPTION # DESCRIPTION
**youtube-dl** is a small command-line program to download videos from **youtube-dl** is a command-line program to download videos from
YouTube.com and a few more sites. It requires the Python interpreter, version YouTube.com and a few more sites. It requires the Python interpreter, version
2.6, 2.7, or 3.2+, and it is not platform specific. It should work on 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on
your Unix box, on Windows or on Mac OS X. It is released to the public domain, your Unix box, on Windows or on Mac OS X. It is released to the public domain,
@ -553,6 +553,10 @@ The current default template is `%(title)s-%(id)s.%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title: In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
#### Output template and Windows batch files
If you are using output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`.
#### Output template examples #### Output template examples
Note on Windows you may need to use double quotes instead of single. Note on Windows you may need to use double quotes instead of single.
@ -931,9 +935,9 @@ After you have ensured this site is distributing it's content legally, you can f
``` ```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`. 8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). 9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this: 10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py
@ -960,7 +964,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc']) ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
``` ```
Most likely, you'll want to use various options. For a list of what can be done, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L121-L269). For a start, if you want to intercept youtube-dl's output, set a `logger` object. Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L128-L278). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file: Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:

View File

@ -14,15 +14,17 @@ if os.path.exists(lazy_extractors_filename):
os.remove(lazy_extractors_filename) os.remove(lazy_extractors_filename)
from youtube_dl.extractor import _ALL_CLASSES from youtube_dl.extractor import _ALL_CLASSES
from youtube_dl.extractor.common import InfoExtractor from youtube_dl.extractor.common import InfoExtractor, SearchInfoExtractor
with open('devscripts/lazy_load_template.py', 'rt') as f: with open('devscripts/lazy_load_template.py', 'rt') as f:
module_template = f.read() module_template = f.read()
module_contents = [module_template + '\n' + getsource(InfoExtractor.suitable)] module_contents = [
module_template + '\n' + getsource(InfoExtractor.suitable) + '\n',
'class LazyLoadSearchExtractor(LazyLoadExtractor):\n pass\n']
ie_template = ''' ie_template = '''
class {name}(LazyLoadExtractor): class {name}({bases}):
_VALID_URL = {valid_url!r} _VALID_URL = {valid_url!r}
_module = '{module}' _module = '{module}'
''' '''
@ -34,10 +36,20 @@ make_valid_template = '''
''' '''
def get_base_name(base):
if base is InfoExtractor:
return 'LazyLoadExtractor'
elif base is SearchInfoExtractor:
return 'LazyLoadSearchExtractor'
else:
return base.__name__
def build_lazy_ie(ie, name): def build_lazy_ie(ie, name):
valid_url = getattr(ie, '_VALID_URL', None) valid_url = getattr(ie, '_VALID_URL', None)
s = ie_template.format( s = ie_template.format(
name=name, name=name,
bases=', '.join(map(get_base_name, ie.__bases__)),
valid_url=valid_url, valid_url=valid_url,
module=ie.__module__) module=ie.__module__)
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__: if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
@ -47,12 +59,35 @@ def build_lazy_ie(ie, name):
s += make_valid_template.format(valid_url=ie._make_valid_url()) s += make_valid_template.format(valid_url=ie._make_valid_url())
return s return s
# find the correct sorting and add the required base classes so that sublcasses
# can be correctly created
classes = _ALL_CLASSES[:-1]
ordered_cls = []
while classes:
for c in classes[:]:
bases = set(c.__bases__) - set((object, InfoExtractor, SearchInfoExtractor))
stop = False
for b in bases:
if b not in classes and b not in ordered_cls:
if b.__name__ == 'GenericIE':
exit()
classes.insert(0, b)
stop = True
if stop:
break
if all(b in ordered_cls for b in bases):
ordered_cls.append(c)
classes.remove(c)
break
ordered_cls.append(_ALL_CLASSES[-1])
names = [] names = []
for ie in list(sorted(_ALL_CLASSES[:-1], key=lambda cls: cls.ie_key())) + _ALL_CLASSES[-1:]: for ie in ordered_cls:
name = ie.ie_key() + 'IE' name = ie.__name__
src = build_lazy_ie(ie, name) src = build_lazy_ie(ie, name)
module_contents.append(src) module_contents.append(src)
names.append(name) if ie in _ALL_CLASSES:
names.append(name)
module_contents.append( module_contents.append(
'_ALL_CLASSES = [{0}]'.format(', '.join(names))) '_ALL_CLASSES = [{0}]'.format(', '.join(names)))

View File

@ -15,6 +15,7 @@
set -e set -e
skip_tests=true skip_tests=true
gpg_sign_commits=""
buildserver='localhost:8142' buildserver='localhost:8142'
while true while true
@ -24,6 +25,10 @@ case "$1" in
skip_tests=false skip_tests=false
shift shift
;; ;;
--gpg-sign-commits|-S)
gpg_sign_commits="-S"
shift
;;
--buildserver) --buildserver)
buildserver="$2" buildserver="$2"
shift 2 shift 2
@ -69,7 +74,7 @@ sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..." /bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
git commit -m "release $version" git commit $gpg_sign_commits -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..." /bin/echo -e "\n### Now tagging, signing and pushing..."
git tag -s -m "Release $version" "$version" git tag -s -m "Release $version" "$version"
@ -116,7 +121,7 @@ git clone --branch gh-pages --single-branch . build/gh-pages
"$ROOT/devscripts/gh-pages/update-copyright.py" "$ROOT/devscripts/gh-pages/update-copyright.py"
"$ROOT/devscripts/gh-pages/update-sites.py" "$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update git add *.html *.html.in update
git commit -m "release $version" git commit $gpg_sign_commits -m "release $version"
git push "$ROOT" gh-pages git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages git push "$ORIGIN_URL" gh-pages
) )

View File

@ -44,8 +44,8 @@
- **appletrailers:section** - **appletrailers:section**
- **archive.org**: archive.org videos - **archive.org**: archive.org videos
- **ARD** - **ARD**
- **ARD:mediathek**
- **ARD:mediathek**: Saarländischer Rundfunk - **ARD:mediathek**: Saarländischer Rundfunk
- **ARD:mediathek**
- **arte.tv** - **arte.tv**
- **arte.tv:+7** - **arte.tv:+7**
- **arte.tv:cinema** - **arte.tv:cinema**
@ -74,6 +74,8 @@
- **bbc**: BBC - **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer - **bbc.co.uk**: BBC iPlayer
- **bbc.co.uk:article**: BBC articles - **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist**
- **BeatportPro** - **BeatportPro**
- **Beeg** - **Beeg**
- **BehindKink** - **BehindKink**
@ -104,6 +106,8 @@
- **canalc2.tv** - **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv - **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas** - **Canvas**
- **CarambaTV**
- **CarambaTVPage**
- **CBC** - **CBC**
- **CBCPlayer** - **CBCPlayer**
- **CBS** - **CBS**
@ -124,6 +128,7 @@
- **cliphunter** - **cliphunter**
- **ClipRs** - **ClipRs**
- **Clipsyndicate** - **Clipsyndicate**
- **CloserToTruth**
- **cloudtime**: CloudTime - **cloudtime**: CloudTime
- **Cloudy** - **Cloudy**
- **Clubic** - **Clubic**
@ -243,7 +248,6 @@
- **Gamersyde** - **Gamersyde**
- **GameSpot** - **GameSpot**
- **GameStar** - **GameStar**
- **Gametrailers**
- **Gazeta** - **Gazeta**
- **GDCVault** - **GDCVault**
- **generic**: Generic downloader that works on some sites - **generic**: Generic downloader that works on some sites
@ -432,6 +436,7 @@
- **nhl.com:videocenter** - **nhl.com:videocenter**
- **nhl.com:videocenter:category**: NHL videocenter category - **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com** - **nick.com**
- **nick.de**
- **niconico**: ニコニコ動画 - **niconico**: ニコニコ動画
- **NiconicoPlaylist** - **NiconicoPlaylist**
- **njoy**: N-JOY - **njoy**: N-JOY
@ -516,6 +521,7 @@
- **qqmusic:singer**: QQ音乐 - 歌手 - **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜 - **qqmusic:toplist**: QQ音乐 - 排行榜
- **R7** - **R7**
- **R7Article**
- **radio.de** - **radio.de**
- **radiobremen** - **radiobremen**
- **radiocanada** - **radiocanada**
@ -535,6 +541,7 @@
- **revision3:embed** - **revision3:embed**
- **RICE** - **RICE**
- **RingTV** - **RingTV**
- **RockstarGames**
- **RottenTomatoes** - **RottenTomatoes**
- **Roxwel** - **Roxwel**
- **RTBF** - **RTBF**
@ -647,6 +654,7 @@
- **Telegraaf** - **Telegraaf**
- **TeleMB** - **TeleMB**
- **TeleTask** - **TeleTask**
- **Telewebion**
- **TF1** - **TF1**
- **TheIntercept** - **TheIntercept**
- **ThePlatform** - **ThePlatform**
@ -698,6 +706,7 @@
- **TVPlay**: TV3Play and related services - **TVPlay**: TV3Play and related services
- **Tweakers** - **Tweakers**
- **twitch:chapter** - **twitch:chapter**
- **twitch:clips**
- **twitch:past_broadcasts** - **twitch:past_broadcasts**
- **twitch:profile** - **twitch:profile**
- **twitch:stream** - **twitch:stream**
@ -792,10 +801,11 @@
- **WNL** - **WNL**
- **WorldStarHipHop** - **WorldStarHipHop**
- **wrzuta.pl** - **wrzuta.pl**
- **wrzuta.pl:playlist**
- **WSJ**: Wall Street Journal - **WSJ**: Wall Street Journal
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To - **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE
- **XHamster** - **XHamster**
- **XHamsterEmbed** - **XHamsterEmbed**
- **xiami:album**: 虾米音乐 - 专辑 - **xiami:album**: 虾米音乐 - 专辑

View File

@ -640,6 +640,9 @@ class TestUtil(unittest.TestCase):
"1":{"src":"skipped", "type": "application/vnd.apple.mpegURL"} "1":{"src":"skipped", "type": "application/vnd.apple.mpegURL"}
}''') }''')
inp = '''{"foo":101}'''
self.assertEqual(js_to_json(inp), '''{"foo":101}''')
def test_js_to_json_edgecases(self): def test_js_to_json_edgecases(self):
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}") on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"}) self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})

View File

@ -85,7 +85,7 @@ class ExternalFD(FileDownloader):
cmd, stderr=subprocess.PIPE) cmd, stderr=subprocess.PIPE)
_, stderr = p.communicate() _, stderr = p.communicate()
if p.returncode != 0: if p.returncode != 0:
self.to_stderr(stderr) self.to_stderr(stderr.decode('utf-8', 'replace'))
return p.returncode return p.returncode

View File

@ -2,14 +2,24 @@ from __future__ import unicode_literals
import os.path import os.path
import re import re
import binascii
try:
from Crypto.Cipher import AES
can_decrypt_frag = True
except ImportError:
can_decrypt_frag = False
from .fragment import FragmentFD from .fragment import FragmentFD
from .external import FFmpegFD from .external import FFmpegFD
from ..compat import compat_urlparse from ..compat import (
compat_urlparse,
compat_struct_pack,
)
from ..utils import ( from ..utils import (
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
parse_m3u8_attributes,
) )
@ -21,7 +31,7 @@ class HlsFD(FragmentFD):
@staticmethod @staticmethod
def can_download(manifest): def can_download(manifest):
UNSUPPORTED_FEATURES = ( UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE)', # encrypted streams [1] r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2] r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
# Live streams heuristic does not always work (e.g. geo restricted to Germany # Live streams heuristic does not always work (e.g. geo restricted to Germany
@ -39,7 +49,9 @@ class HlsFD(FragmentFD):
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2 # 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2
# 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5 # 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5
) )
return all(not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES) check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
return all(check_results)
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
man_url = info_dict['url'] man_url = info_dict['url']
@ -57,36 +69,60 @@ class HlsFD(FragmentFD):
fd.add_progress_hook(ph) fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict) return fd.real_download(filename, info_dict)
fragment_urls = [] total_frags = 0
for line in s.splitlines(): for line in s.splitlines():
line = line.strip() line = line.strip()
if line and not line.startswith('#'): if line and not line.startswith('#'):
segment_url = ( total_frags += 1
line
if re.match(r'^https?://', line)
else compat_urlparse.urljoin(man_url, line))
fragment_urls.append(segment_url)
# We only download the first fragment during the test
if self.params.get('test', False):
break
ctx = { ctx = {
'filename': filename, 'filename': filename,
'total_frags': len(fragment_urls), 'total_frags': total_frags,
} }
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
i = 0
media_sequence = 0
decrypt_info = {'METHOD': 'NONE'}
frags_filenames = [] frags_filenames = []
for i, frag_url in enumerate(fragment_urls): for line in s.splitlines():
frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i) line = line.strip()
success = ctx['dl'].download(frag_filename, {'url': frag_url}) if line:
if not success: if not line.startswith('#'):
return False frag_url = (
down, frag_sanitized = sanitize_open(frag_filename, 'rb') line
ctx['dest_stream'].write(down.read()) if re.match(r'^https?://', line)
down.close() else compat_urlparse.urljoin(man_url, line))
frags_filenames.append(frag_sanitized) frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
success = ctx['dl'].download(frag_filename, {'url': frag_url})
if not success:
return False
down, frag_sanitized = sanitize_open(frag_filename, 'rb')
frag_content = down.read()
down.close()
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
ctx['dest_stream'].write(frag_content)
frags_filenames.append(frag_sanitized)
# We only download the first fragment during the test
if self.params.get('test', False):
break
i += 1
media_sequence += 1
elif line.startswith('#EXT-X-KEY'):
decrypt_info = parse_m3u8_attributes(line[11:])
if decrypt_info['METHOD'] == 'AES-128':
if 'IV' in decrypt_info:
decrypt_info['IV'] = binascii.unhexlify(decrypt_info['IV'][2:])
if not re.match(r'^https?://', decrypt_info['URI']):
decrypt_info['URI'] = compat_urlparse.urljoin(
man_url, decrypt_info['URI'])
decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read()
elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
media_sequence = int(line[22:])
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@ -156,7 +156,10 @@ class AdobeTVVideoIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json(url + '?format=json', video_id) webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(self._search_regex(
r'var\s+bridge\s*=\s*([^;]+);', webpage, 'bridged data'), video_id)
formats = [{ formats = [{
'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')), 'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')),

View File

@ -24,10 +24,10 @@ class AftonbladetIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
# find internal video meta data # find internal video meta data
meta_url = 'http://aftonbladet-play.drlib.aptoma.no/video/%s.json' meta_url = 'http://aftonbladet-play-metadata.cdn.drvideo.aptoma.no/video/%s.json'
player_config = self._parse_json(self._html_search_regex( player_config = self._parse_json(self._html_search_regex(
r'data-player-config="([^"]+)"', webpage, 'player config'), video_id) r'data-player-config="([^"]+)"', webpage, 'player config'), video_id)
internal_meta_id = player_config['videoId'] internal_meta_id = player_config['aptomaVideoId']
internal_meta_url = meta_url % internal_meta_id internal_meta_url = meta_url % internal_meta_id
internal_meta_json = self._download_json( internal_meta_json = self._download_json(
internal_meta_url, video_id, 'Downloading video meta data') internal_meta_url, video_id, 'Downloading video meta data')

View File

@ -8,7 +8,6 @@ from .generic import GenericIE
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
get_element_by_attribute,
qualities, qualities,
int_or_none, int_or_none,
parse_duration, parse_duration,
@ -274,41 +273,3 @@ class ARDIE(InfoExtractor):
'upload_date': upload_date, 'upload_date': upload_date,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
} }
class SportschauIE(ARDMediathekIE):
IE_NAME = 'Sportschau'
_VALID_URL = r'(?P<baseurl>https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video(?P<id>[^/#?]+))\.html'
_TESTS = [{
'url': 'http://www.sportschau.de/tourdefrance/videoseppeltkokainhatnichtsmitklassischemdopingzutun100.html',
'info_dict': {
'id': 'seppeltkokainhatnichtsmitklassischemdopingzutun100',
'ext': 'mp4',
'title': 'Seppelt: "Kokain hat nichts mit klassischem Doping zu tun"',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'Der ARD-Doping Experte Hajo Seppelt gibt seine Einschätzung zum ersten Dopingfall der diesjährigen Tour de France um den Italiener Luca Paolini ab.',
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
base_url = mobj.group('baseurl')
webpage = self._download_webpage(url, video_id)
title = get_element_by_attribute('class', 'headline', webpage)
description = self._html_search_meta('description', webpage, 'description')
info = self._extract_media_info(
base_url + '-mc_defaultQuality-h.json', webpage, video_id)
info.update({
'title': title,
'description': description,
})
return info

View File

@ -180,11 +180,14 @@ class ArteTVBaseIE(InfoExtractor):
class ArteTVPlus7IE(ArteTVBaseIE): class ArteTVPlus7IE(ArteTVBaseIE):
IE_NAME = 'arte.tv:+7' IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&]+)' _VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/[^/]+/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D', 'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22',
'only_matching': True,
}] }]
@classmethod @classmethod
@ -240,10 +243,10 @@ class ArteTVPlus7IE(ArteTVBaseIE):
return self._extract_from_json_url(json_url, video_id, lang, title=title) return self._extract_from_json_url(json_url, video_id, lang, title=title)
# Different kind of embed URL (e.g. # Different kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium) # http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
embed_url = self._search_regex( entries = [
r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', self.url_result(url)
webpage, 'embed url', group='url') for _, url in re.findall(r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', webpage)]
return self.url_result(embed_url) return self.playlist_result(entries)
# It also uses the arte_vp_url url from the webpage to extract the information # It also uses the arte_vp_url url from the webpage to extract the information
@ -252,22 +255,17 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design', 'url': 'http://creative.arte.tv/fr/episode/osmosis-episode-1',
'info_dict': { 'info_dict': {
'id': '72176', 'id': '057405-001-A',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Folge 2 - Corporate Design', 'title': 'OSMOSIS - N\'AYEZ PLUS PEUR D\'AIMER (1)',
'upload_date': '20131004', 'upload_date': '20150716',
}, },
}, { }, {
'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion', 'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion',
'info_dict': { 'playlist_count': 11,
'id': '160676', 'add_ie': ['Youtube'],
'ext': 'mp4',
'title': 'Monty Python live (mostly)',
'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
'upload_date': '20140805',
}
}, { }, {
'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde', 'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
'only_matching': True, 'only_matching': True,
@ -349,14 +347,13 @@ class ArteTVCinemaIE(ArteTVPlus7IE):
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)' _VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
_TESTS = [{ _TESTS = [{
'url': 'http://cinema.arte.tv/de/node/38291', 'url': 'http://cinema.arte.tv/fr/article/les-ailes-du-desir-de-julia-reck',
'md5': '6b275511a5107c60bacbeeda368c3aa1', 'md5': 'a5b9dd5575a11d93daf0e3f404f45438',
'info_dict': { 'info_dict': {
'id': '055876-000_PWA12025-D', 'id': '062494-000-A',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Tod auf dem Nil', 'title': 'Film lauréat du concours web - "Les ailes du désir" de Julia Reck',
'upload_date': '20160122', 'upload_date': '20150807',
'description': 'md5:7f749bbb77d800ef2be11d54529b96bc',
}, },
}] }]

View File

@ -46,6 +46,7 @@ class AzubuIE(InfoExtractor):
'uploader_id': 272749, 'uploader_id': 272749,
'view_count': int, 'view_count': int,
}, },
'skip': 'Channel offline',
}, },
] ]
@ -56,22 +57,26 @@ class AzubuIE(InfoExtractor):
'http://www.azubu.tv/api/video/%s' % video_id, video_id)['data'] 'http://www.azubu.tv/api/video/%s' % video_id, video_id)['data']
title = data['title'].strip() title = data['title'].strip()
description = data['description'] description = data.get('description')
thumbnail = data['thumbnail'] thumbnail = data.get('thumbnail')
view_count = data['view_count'] view_count = data.get('view_count')
uploader = data['user']['username'] user = data.get('user', {})
uploader_id = data['user']['id'] uploader = user.get('username')
uploader_id = user.get('id')
stream_params = json.loads(data['stream_params']) stream_params = json.loads(data['stream_params'])
timestamp = float_or_none(stream_params['creationDate'], 1000) timestamp = float_or_none(stream_params.get('creationDate'), 1000)
duration = float_or_none(stream_params['length'], 1000) duration = float_or_none(stream_params.get('length'), 1000)
renditions = stream_params.get('renditions') or [] renditions = stream_params.get('renditions') or []
video = stream_params.get('FLVFullLength') or stream_params.get('videoFullLength') video = stream_params.get('FLVFullLength') or stream_params.get('videoFullLength')
if video: if video:
renditions.append(video) renditions.append(video)
if not renditions and not user.get('channel', {}).get('is_live', True):
raise ExtractorError('%s said: channel is offline.' % self.IE_NAME, expected=True)
formats = [{ formats = [{
'url': fmt['url'], 'url': fmt['url'],
'width': fmt['frameWidth'], 'width': fmt['frameWidth'],

View File

@ -31,7 +31,7 @@ class BBCCoUkIE(InfoExtractor):
music/clips[/#]| music/clips[/#]|
radio/player/ radio/player/
) )
(?P<id>%s) (?P<id>%s)(?!/(?:episodes|broadcasts|clips))
''' % _ID_REGEX ''' % _ID_REGEX
_MEDIASELECTOR_URLS = [ _MEDIASELECTOR_URLS = [
@ -192,6 +192,7 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Now it\'s really geo-restricted',
}, { }, {
# compact player (https://github.com/rg3/youtube-dl/issues/8147) # compact player (https://github.com/rg3/youtube-dl/issues/8147)
'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player', 'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
@ -698,7 +699,9 @@ class BBCIE(BBCCoUkIE):
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if BBCCoUkIE.suitable(url) or BBCCoUkArticleIE.suitable(url) else super(BBCIE, cls).suitable(url) EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE)
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
else super(BBCIE, cls).suitable(url))
def _extract_from_media_meta(self, media_meta, video_id): def _extract_from_media_meta(self, media_meta, video_id):
# Direct links to media in media metadata (e.g. # Direct links to media in media metadata (e.g.
@ -975,3 +978,72 @@ class BBCCoUkArticleIE(InfoExtractor):
r'<div[^>]+typeof="Clip"[^>]+resource="([^"]+)"', webpage)] r'<div[^>]+typeof="Clip"[^>]+resource="([^"]+)"', webpage)]
return self.playlist_result(entries, playlist_id, title, description) return self.playlist_result(entries, playlist_id, title, description)
class BBCCoUkPlaylistBaseIE(InfoExtractor):
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
for video_id in re.findall(
self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage)]
title, description = self._extract_title_and_description(webpage)
return self.playlist_result(entries, playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/episodes/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
_TEST = {
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance',
'description': 'French thriller serial about a missing teenager.',
},
'playlist_mincount': 6,
}
def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
description = self._search_regex(
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
webpage, 'description', fatal=False, group='value')
return title, description
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/programmes/(?P<id>%s)/(?:episodes|broadcasts|clips)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/programmes/%s'
_VIDEO_ID_TEMPLATE = r'data-pid=["\'](%s)'
_TESTS = [{
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/clips',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance - Clips - BBC Four',
'description': 'French thriller serial about a missing teenager.',
},
'playlist_mincount': 7,
}, {
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/broadcasts/2016/06',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/clips',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/programmes/b055jkys/episodes/player',
'only_matching': True,
}]
def _extract_title_and_description(self, webpage):
title = self._og_search_title(webpage, fatal=False)
description = self._og_search_description(webpage)
return title, description

View File

@ -1,31 +1,27 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .mtv import MTVServicesInfoExtractor
from ..compat import compat_urllib_parse_unquote from ..utils import unified_strdate
from ..utils import ( from ..compat import compat_urllib_parse_urlencode
xpath_text,
xpath_with_ns,
int_or_none,
parse_iso8601,
)
class BetIE(InfoExtractor): class BetIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bet\.com/(?:[^/]+/)+(?P<id>.+?)\.html' _VALID_URL = r'https?://(?:www\.)?bet\.com/(?:[^/]+/)+(?P<id>.+?)\.html'
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.bet.com/news/politics/2014/12/08/in-bet-exclusive-obama-talks-race-and-racism.html', 'url': 'http://www.bet.com/news/politics/2014/12/08/in-bet-exclusive-obama-talks-race-and-racism.html',
'info_dict': { 'info_dict': {
'id': 'news/national/2014/a-conversation-with-president-obama', 'id': '07e96bd3-8850-3051-b856-271b457f0ab8',
'display_id': 'in-bet-exclusive-obama-talks-race-and-racism', 'display_id': 'in-bet-exclusive-obama-talks-race-and-racism',
'ext': 'flv', 'ext': 'flv',
'title': 'A Conversation With President Obama', 'title': 'A Conversation With President Obama',
'description': 'md5:699d0652a350cf3e491cd15cc745b5da', 'description': 'President Obama urges persistence in confronting racism and bias.',
'duration': 1534, 'duration': 1534,
'timestamp': 1418075340,
'upload_date': '20141208', 'upload_date': '20141208',
'uploader': 'admin',
'thumbnail': 're:(?i)^https?://.*\.jpg$', 'thumbnail': 're:(?i)^https?://.*\.jpg$',
'subtitles': {
'en': 'mincount:2',
}
}, },
'params': { 'params': {
# rtmp download # rtmp download
@ -35,16 +31,17 @@ class BetIE(InfoExtractor):
{ {
'url': 'http://www.bet.com/video/news/national/2014/justice-for-ferguson-a-community-reacts.html', 'url': 'http://www.bet.com/video/news/national/2014/justice-for-ferguson-a-community-reacts.html',
'info_dict': { 'info_dict': {
'id': 'news/national/2014/justice-for-ferguson-a-community-reacts', 'id': '9f516bf1-7543-39c4-8076-dd441b459ba9',
'display_id': 'justice-for-ferguson-a-community-reacts', 'display_id': 'justice-for-ferguson-a-community-reacts',
'ext': 'flv', 'ext': 'flv',
'title': 'Justice for Ferguson: A Community Reacts', 'title': 'Justice for Ferguson: A Community Reacts',
'description': 'A BET News special.', 'description': 'A BET News special.',
'duration': 1696, 'duration': 1696,
'timestamp': 1416942360,
'upload_date': '20141125', 'upload_date': '20141125',
'uploader': 'admin',
'thumbnail': 're:(?i)^https?://.*\.jpg$', 'thumbnail': 're:(?i)^https?://.*\.jpg$',
'subtitles': {
'en': 'mincount:2',
}
}, },
'params': { 'params': {
# rtmp download # rtmp download
@ -53,57 +50,32 @@ class BetIE(InfoExtractor):
} }
] ]
_FEED_URL = "http://feeds.mtvnservices.com/od/feed/bet-mrss-player"
def _get_feed_query(self, uri):
return compat_urllib_parse_urlencode({
'uuid': uri,
})
def _extract_mgid(self, webpage):
return self._search_regex(r'data-uri="([^"]+)', webpage, 'mgid')
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
mgid = self._extract_mgid(webpage)
videos_info = self._get_videos_info(mgid)
media_url = compat_urllib_parse_unquote(self._search_regex( info_dict = videos_info['entries'][0]
[r'mediaURL\s*:\s*"([^"]+)"', r"var\s+mrssMediaUrl\s*=\s*'([^']+)'"],
webpage, 'media URL'))
video_id = self._search_regex( upload_date = unified_strdate(self._html_search_meta('date', webpage))
r'/video/(.*)/_jcr_content/', media_url, 'video id') description = self._html_search_meta('description', webpage)
mrss = self._download_xml(media_url, display_id) info_dict.update({
item = mrss.find('./channel/item')
NS_MAP = {
'dc': 'http://purl.org/dc/elements/1.1/',
'media': 'http://search.yahoo.com/mrss/',
'ka': 'http://kickapps.com/karss',
}
title = xpath_text(item, './title', 'title')
description = xpath_text(
item, './description', 'description', fatal=False)
timestamp = parse_iso8601(xpath_text(
item, xpath_with_ns('./dc:date', NS_MAP),
'upload date', fatal=False))
uploader = xpath_text(
item, xpath_with_ns('./dc:creator', NS_MAP),
'uploader', fatal=False)
media_content = item.find(
xpath_with_ns('./media:content', NS_MAP))
duration = int_or_none(media_content.get('duration'))
smil_url = media_content.get('url')
thumbnail = media_content.find(
xpath_with_ns('./media:thumbnail', NS_MAP)).get('url')
formats = self._extract_smil_formats(smil_url, display_id)
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail, 'upload_date': upload_date,
'timestamp': timestamp, })
'uploader': uploader,
'duration': duration, return info_dict
'formats': formats,
}

View File

@ -29,7 +29,8 @@ class BRIE(InfoExtractor):
'duration': 180, 'duration': 180,
'uploader': 'Reinhard Weber', 'uploader': 'Reinhard Weber',
'upload_date': '20150422', 'upload_date': '20150422',
} },
'skip': '404 not found',
}, },
{ {
'url': 'http://www.br.de/nachrichten/oberbayern/inhalt/muenchner-polizeipraesident-schreiber-gestorben-100.html', 'url': 'http://www.br.de/nachrichten/oberbayern/inhalt/muenchner-polizeipraesident-schreiber-gestorben-100.html',
@ -40,7 +41,8 @@ class BRIE(InfoExtractor):
'title': 'Manfred Schreiber ist tot', 'title': 'Manfred Schreiber ist tot',
'description': 'md5:b454d867f2a9fc524ebe88c3f5092d97', 'description': 'md5:b454d867f2a9fc524ebe88c3f5092d97',
'duration': 26, 'duration': 26,
} },
'skip': '404 not found',
}, },
{ {
'url': 'https://www.br-klassik.de/audio/peeping-tom-premierenkritik-dance-festival-muenchen-100.html', 'url': 'https://www.br-klassik.de/audio/peeping-tom-premierenkritik-dance-festival-muenchen-100.html',
@ -51,7 +53,8 @@ class BRIE(InfoExtractor):
'title': 'Kurzweilig und sehr bewegend', 'title': 'Kurzweilig und sehr bewegend',
'description': 'md5:0351996e3283d64adeb38ede91fac54e', 'description': 'md5:0351996e3283d64adeb38ede91fac54e',
'duration': 296, 'duration': 296,
} },
'skip': '404 not found',
}, },
{ {
'url': 'http://www.br.de/radio/bayern1/service/team/videos/team-video-erdelt100.html', 'url': 'http://www.br.de/radio/bayern1/service/team/videos/team-video-erdelt100.html',

View File

@ -0,0 +1,88 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
float_or_none,
int_or_none,
try_get,
)
class CarambaTVIE(InfoExtractor):
_VALID_URL = r'(?:carambatv:|https?://video1\.carambatv\.ru/v/)(?P<id>\d+)'
_TESTS = [{
'url': 'http://video1.carambatv.ru/v/191910501',
'md5': '2f4a81b7cfd5ab866ee2d7270cb34a2a',
'info_dict': {
'id': '191910501',
'ext': 'mp4',
'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 2678.31,
},
}, {
'url': 'carambatv:191910501',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
'http://video1.carambatv.ru/v/%s/videoinfo.js' % video_id,
video_id)
title = video['title']
base_url = video.get('video') or 'http://video1.carambatv.ru/v/%s/' % video_id
formats = [{
'url': base_url + f['fn'],
'height': int_or_none(f.get('height')),
'format_id': '%sp' % f['height'] if f.get('height') else None,
} for f in video['qualities'] if f.get('fn')]
self._sort_formats(formats)
thumbnail = video.get('splash')
duration = float_or_none(try_get(
video, lambda x: x['annotations'][0]['end_time'], compat_str))
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}
class CarambaTVPageIE(InfoExtractor):
_VALID_URL = r'https?://carambatv\.ru/(?:[^/]+/)+(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://carambatv.ru/movie/bad-comedian/razborka-v-manile/',
'md5': '',
'info_dict': {
'id': '191910501',
'ext': 'mp4',
'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 2678.31,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = self._og_search_property('video:iframe', webpage, default=None)
if not video_url:
video_id = self._search_regex(
r'(?:video_id|crmb_vuid)\s*[:=]\s*["\']?(\d+)',
webpage, 'video id')
video_url = 'carambatv:%s' % video_id
return self.url_result(video_url, CarambaTVIE.ie_key())

View File

@ -1,17 +1,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re from .theplatform import ThePlatformFeedIE
from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
xpath_text,
xpath_element,
int_or_none, int_or_none,
find_xpath_attr, find_xpath_attr,
) )
class CBSBaseIE(ThePlatformIE): class CBSBaseIE(ThePlatformFeedIE):
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'): def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL') closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return { return {
@ -21,9 +17,22 @@ class CBSBaseIE(ThePlatformIE):
}] }]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else [] } if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info(
'dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id, lambda entry: {
'series': entry.get('cbs$SeriesTitle'),
'season_number': int_or_none(entry.get('cbs$SeasonNumber')),
'episode': entry.get('cbs$EpisodeTitle'),
'episode_number': int_or_none(entry.get('cbs$EpisodeNumber')),
}, {
'StreamPack': {
'manifest': 'm3u',
}
})
class CBSIE(CBSBaseIE): class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:(?P<content_id>\w+)|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<display_id>[^/]+))' _VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/', 'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
@ -38,25 +47,7 @@ class CBSIE(CBSBaseIE):
'upload_date': '20131127', 'upload_date': '20131127',
'uploader': 'CBSI-NEW', 'uploader': 'CBSI-NEW',
}, },
'params': { 'expected_warnings': ['Failed to download m3u8 information'],
# rtmp download
'skip_download': True,
},
'_skip': 'Blocked outside the US',
}, {
'url': 'http://www.cbs.com/shows/liveonletterman/artist/221752/st-vincent/',
'info_dict': {
'id': 'WWF_5KqY3PK1',
'display_id': 'st-vincent',
'ext': 'flv',
'title': 'Live on Letterman - St. Vincent',
'description': 'Live On Letterman: St. Vincent in concert from New York\'s Ed Sullivan Theater on Tuesday, July 16, 2014.',
'duration': 3221,
},
'params': {
# rtmp download
'skip_download': True,
},
'_skip': 'Blocked outside the US', '_skip': 'Blocked outside the US',
}, { }, {
'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/', 'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/',
@ -68,44 +59,5 @@ class CBSIE(CBSBaseIE):
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _real_extract(self, url): def _real_extract(self, url):
content_id, display_id = re.match(self._VALID_URL, url).groups() content_id = self._match_id(url)
if not content_id: return self._extract_video_info('byGuid=%s' % content_id, content_id)
webpage = self._download_webpage(url, display_id)
content_id = self._search_regex(
[r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"],
webpage, 'content id')
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
pid = xpath_text(item, 'pid')
if not pid:
continue
tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid
if '.m3u8' in xpath_text(item, 'contentUrl', default=''):
tp_release_url += '&manifest=m3u'
tp_formats, tp_subtitles = self._extract_theplatform_smil(
tp_release_url, content_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self.get_metadata('dJ5BDC/media/guid/2198311517/%s' % content_id, content_id)
info.update({
'id': content_id,
'display_id': display_id,
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
})
return info

View File

@ -30,9 +30,12 @@ class CBSNewsIE(CBSBaseIE):
{ {
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/', 'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'info_dict': { 'info_dict': {
'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack', 'id': 'SNJBOYzXiWBOvaLsdzwH8fmtP1SCd91Y',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack', 'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'description': 'md5:4a6983e480542d8b333a947bfc64ddc7',
'upload_date': '19700101',
'uploader': 'CBSI-NEW',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 205, 'duration': 205,
'subtitles': { 'subtitles': {
@ -58,30 +61,8 @@ class CBSNewsIE(CBSBaseIE):
webpage, 'video JSON info'), video_id) webpage, 'video JSON info'), video_id)
item = video_info['item'] if 'item' in video_info else video_info item = video_info['item'] if 'item' in video_info else video_info
title = item.get('articleTitle') or item.get('hed') guid = item['mpxRefId']
duration = item.get('duration') return self._extract_video_info('byGuid=%s' % guid, guid)
thumbnail = item.get('mediaImage') or item.get('thumbnail')
subtitles = {}
formats = []
for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
pid = item.get('media' + format_id)
if not pid:
continue
release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' % pid
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}
class CBSNewsLiveVideoIE(InfoExtractor): class CBSNewsLiveVideoIE(InfoExtractor):

View File

@ -1,30 +1,28 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re from .cbs import CBSBaseIE
from .common import InfoExtractor
class CBSSportsIE(InfoExtractor): class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)' _VALID_URL = r'https?://www\.cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s', 'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast',
'info_dict': { 'info_dict': {
'id': '_d5_GbO8p1sT', 'id': '708337219968',
'ext': 'flv', 'ext': 'mp4',
'title': 'US Open flashbacks: 1990s', 'title': 'Ben Simmons the next LeBron? Not so fast',
'description': 'Bill Macatee relives the best moments in US Open history from the 1990s.', 'description': 'md5:854294f627921baba1f4b9a990d87197',
'timestamp': 1466293740,
'upload_date': '20160618',
'uploader': 'CBSI-NEW',
}, },
} 'params': {
# m3u8 download
'skip_download': True,
}
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
section = mobj.group('section') return self._extract_video_info('byId=%s' % video_id, video_id)
video_id = mobj.group('id')
all_videos = self._download_json(
'http://www.cbssports.com/data/video/player/getVideos/%s?as=json' % section,
video_id)
# The json file contains the info of all the videos in the section
video_info = next(v for v in all_videos if v['pcid'] == video_id)
return self.url_result('theplatform:%s' % video_info['pid'], 'ThePlatform')

View File

@ -58,7 +58,8 @@ class CDAIE(InfoExtractor):
def extract_format(page, version): def extract_format(page, version):
unpacked = decode_packed_codes(page) unpacked = decode_packed_codes(page)
format_url = self._search_regex( format_url = self._search_regex(
r"url:\\'(.+?)\\'", unpacked, '%s url' % version, fatal=False) r"(?:file|url)\s*:\s*(\\?[\"'])(?P<url>http.+?)\1", unpacked,
'%s url' % version, fatal=False, group='url')
if not format_url: if not format_url:
return return
f = { f = {
@ -75,7 +76,8 @@ class CDAIE(InfoExtractor):
info_dict['formats'].append(f) info_dict['formats'].append(f)
if not info_dict['duration']: if not info_dict['duration']:
info_dict['duration'] = parse_duration(self._search_regex( info_dict['duration'] = parse_duration(self._search_regex(
r"duration:\\'(.+?)\\'", unpacked, 'duration', fatal=False)) r"duration\s*:\s*(\\?[\"'])(?P<duration>.+?)\1",
unpacked, 'duration', fatal=False, group='duration'))
extract_format(webpage, 'default') extract_format(webpage, 'default')

View File

@ -0,0 +1,92 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class CloserToTruthIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?closertotruth\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://closertotruth.com/series/solutions-the-mind-body-problem#video-3688',
'info_dict': {
'id': '0_zof1ktre',
'display_id': 'solutions-the-mind-body-problem',
'ext': 'mov',
'title': 'Solutions to the Mind-Body Problem?',
'upload_date': '20140221',
'timestamp': 1392956007,
'uploader_id': 'CTTXML'
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://closertotruth.com/episodes/how-do-brains-work',
'info_dict': {
'id': '0_iuxai6g6',
'display_id': 'how-do-brains-work',
'ext': 'mov',
'title': 'How do Brains Work?',
'upload_date': '20140221',
'timestamp': 1392956024,
'uploader_id': 'CTTXML'
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://closertotruth.com/interviews/1725',
'info_dict': {
'id': '1725',
'title': 'AyaFr-002',
},
'playlist_mincount': 2,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
partner_id = self._search_regex(
r'<script[^>]+src=["\'].*?\b(?:partner_id|p)/(\d+)',
webpage, 'kaltura partner_id')
title = self._search_regex(
r'<title>(.+?)\s*\|\s*.+?</title>', webpage, 'video title')
select = self._search_regex(
r'(?s)<select[^>]+id="select-version"[^>]*>(.+?)</select>',
webpage, 'select version', default=None)
if select:
entry_ids = set()
entries = []
for mobj in re.finditer(
r'<option[^>]+value=(["\'])(?P<id>[0-9a-z_]+)(?:#.+?)?\1[^>]*>(?P<title>[^<]+)',
webpage):
entry_id = mobj.group('id')
if entry_id in entry_ids:
continue
entry_ids.add(entry_id)
entries.append({
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, entry_id),
'ie_key': 'Kaltura',
'title': mobj.group('title'),
})
if entries:
return self.playlist_result(entries, display_id, title)
entry_id = self._search_regex(
r'<a[^>]+id=(["\'])embed-kaltura\1[^>]+data-kaltura=(["\'])(?P<id>[0-9a-z_]+)\2',
webpage, 'kaltura entry_id', group='id')
return {
'_type': 'url_transparent',
'display_id': display_id,
'url': 'kaltura:%s:%s' % (partner_id, entry_id),
'ie_key': 'Kaltura',
'title': title
}

View File

@ -53,6 +53,7 @@ from ..utils import (
mimetype2ext, mimetype2ext,
update_Request, update_Request,
update_url_query, update_url_query,
parse_m3u8_attributes,
) )
@ -1150,23 +1151,11 @@ class InfoExtractor(object):
}] }]
last_info = None last_info = None
last_media = None last_media = None
kv_rex = re.compile(
r'(?P<key>[a-zA-Z_-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)')
for line in m3u8_doc.splitlines(): for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-STREAM-INF:'): if line.startswith('#EXT-X-STREAM-INF:'):
last_info = {} last_info = parse_m3u8_attributes(line)
for m in kv_rex.finditer(line):
v = m.group('val')
if v.startswith('"'):
v = v[1:-1]
last_info[m.group('key')] = v
elif line.startswith('#EXT-X-MEDIA:'): elif line.startswith('#EXT-X-MEDIA:'):
last_media = {} last_media = parse_m3u8_attributes(line)
for m in kv_rex.finditer(line):
v = m.group('val')
if v.startswith('"'):
v = v[1:-1]
last_media[m.group('key')] = v
elif line.startswith('#') or not line.strip(): elif line.startswith('#') or not line.strip():
continue continue
else: else:

View File

@ -44,7 +44,6 @@ from .archiveorg import ArchiveOrgIE
from .ard import ( from .ard import (
ARDIE, ARDIE,
ARDMediathekIE, ARDMediathekIE,
SportschauIE,
) )
from .arte import ( from .arte import (
ArteTvIE, ArteTvIE,
@ -71,6 +70,8 @@ from .bandcamp import BandcampIE, BandcampAlbumIE
from .bbc import ( from .bbc import (
BBCCoUkIE, BBCCoUkIE,
BBCCoUkArticleIE, BBCCoUkArticleIE,
BBCCoUkIPlayerPlaylistIE,
BBCCoUkPlaylistIE,
BBCIE, BBCIE,
) )
from .beeg import BeegIE from .beeg import BeegIE
@ -108,6 +109,10 @@ from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE from .canalc2 import Canalc2IE
from .canvas import CanvasIE from .canvas import CanvasIE
from .carambatv import (
CarambaTVIE,
CarambaTVPageIE,
)
from .cbc import ( from .cbc import (
CBCIE, CBCIE,
CBCPlayerIE, CBCPlayerIE,
@ -135,6 +140,7 @@ from .cliprs import ClipRsIE
from .clipfish import ClipfishIE from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE from .cliphunter import CliphunterIE
from .clipsyndicate import ClipsyndicateIE from .clipsyndicate import ClipsyndicateIE
from .closertotruth import CloserToTruthIE
from .cloudy import CloudyIE from .cloudy import CloudyIE
from .clubic import ClubicIE from .clubic import ClubicIE
from .clyp import ClypIE from .clyp import ClypIE
@ -279,7 +285,6 @@ from .gameone import (
from .gamersyde import GamersydeIE from .gamersyde import GamersydeIE
from .gamespot import GameSpotIE from .gamespot import GameSpotIE
from .gamestar import GameStarIE from .gamestar import GameStarIE
from .gametrailers import GametrailersIE
from .gazeta import GazetaIE from .gazeta import GazetaIE
from .gdcvault import GDCVaultIE from .gdcvault import GDCVaultIE
from .generic import GenericIE from .generic import GenericIE
@ -512,7 +517,10 @@ from .nhl import (
NHLVideocenterCategoryIE, NHLVideocenterCategoryIE,
NHLIE, NHLIE,
) )
from .nick import NickIE from .nick import (
NickIE,
NickDeIE,
)
from .niconico import NiconicoIE, NiconicoPlaylistIE from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninegag import NineGagIE from .ninegag import NineGagIE
from .noco import NocoIE from .noco import NocoIE
@ -622,7 +630,10 @@ from .qqmusic import (
QQMusicToplistIE, QQMusicToplistIE,
QQMusicPlaylistIE, QQMusicPlaylistIE,
) )
from .r7 import R7IE from .r7 import (
R7IE,
R7ArticleIE,
)
from .radiocanada import ( from .radiocanada import (
RadioCanadaIE, RadioCanadaIE,
RadioCanadaAudioVideoIE, RadioCanadaAudioVideoIE,
@ -649,6 +660,7 @@ from .revision3 import (
from .rice import RICEIE from .rice import RICEIE
from .ringtv import RingTVIE from .ringtv import RingTVIE
from .ro220 import Ro220IE from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE
from .rottentomatoes import RottenTomatoesIE from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE from .roxwel import RoxwelIE
from .rtbf import RTBFIE from .rtbf import RTBFIE
@ -737,6 +749,7 @@ from .sportbox import (
SportBoxEmbedIE, SportBoxEmbedIE,
) )
from .sportdeutschland import SportDeutschlandIE from .sportdeutschland import SportDeutschlandIE
from .sportschau import SportschauIE
from .srgssr import ( from .srgssr import (
SRGSSRIE, SRGSSRIE,
SRGSSRPlayIE, SRGSSRPlayIE,
@ -777,6 +790,7 @@ from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE from .telegraaf import TelegraafIE
from .telemb import TeleMBIE from .telemb import TeleMBIE
from .teletask import TeleTaskIE from .teletask import TeleTaskIE
from .telewebion import TelewebionIE
from .testurl import TestURLIE from .testurl import TestURLIE
from .tf1 import TF1IE from .tf1 import TF1IE
from .theintercept import TheInterceptIE from .theintercept import TheInterceptIE
@ -861,6 +875,7 @@ from .twitch import (
TwitchProfileIE, TwitchProfileIE,
TwitchPastBroadcastsIE, TwitchPastBroadcastsIE,
TwitchStreamIE, TwitchStreamIE,
TwitchClipsIE,
) )
from .twitter import ( from .twitter import (
TwitterCardIE, TwitterCardIE,
@ -977,7 +992,10 @@ from .weiqitv import WeiqiTVIE
from .wimp import WimpIE from .wimp import WimpIE
from .wistia import WistiaIE from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE from .worldstarhiphop import WorldStarHipHopIE
from .wrzuta import WrzutaIE from .wrzuta import (
WrzutaIE,
WrzutaPlaylistIE,
)
from .wsj import WSJIE from .wsj import WSJIE
from .xbef import XBefIE from .xbef import XBefIE
from .xboxclips import XboxClipsIE from .xboxclips import XboxClipsIE

View File

@ -239,6 +239,8 @@ class FacebookIE(InfoExtractor):
formats = [] formats = []
for format_id, f in video_data.items(): for format_id, f in video_data.items():
if f and isinstance(f, dict):
f = [f]
if not f or not isinstance(f, list): if not f or not isinstance(f, list):
continue continue
for quality in ('sd', 'hd'): for quality in ('sd', 'hd'):

View File

@ -1,7 +1,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import smuggle_url from ..utils import (
smuggle_url,
update_url_query,
)
class FoxSportsIE(InfoExtractor): class FoxSportsIE(InfoExtractor):
@ -9,11 +12,15 @@ class FoxSportsIE(InfoExtractor):
_TEST = { _TEST = {
'url': 'http://www.foxsports.com/video?vid=432609859715', 'url': 'http://www.foxsports.com/video?vid=432609859715',
'md5': 'b49050e955bebe32c301972e4012ac17',
'info_dict': { 'info_dict': {
'id': 'gA0bHB3Ladz3', 'id': 'i0qKWsk3qJaM',
'ext': 'flv', 'ext': 'mp4',
'title': 'Courtney Lee on going up 2-0 in series vs. Blazers', 'title': 'Courtney Lee on going up 2-0 in series vs. Blazers',
'description': 'Courtney Lee talks about Memphis being focused.', 'description': 'Courtney Lee talks about Memphis being focused.',
'upload_date': '20150423',
'timestamp': 1429761109,
'uploader': 'NEWA-FNG-FOXSPORTS',
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
} }
@ -28,5 +35,8 @@ class FoxSportsIE(InfoExtractor):
r"data-player-config='([^']+)'", webpage, 'data player config'), r"data-player-config='([^']+)'", webpage, 'data player config'),
video_id) video_id)
return self.url_result(smuggle_url( return self.url_result(smuggle_url(update_url_query(
config['releaseURL'] + '&manifest=f4m', {'force_smil_url': True})) config['releaseURL'], {
'mbr': 'true',
'switch': 'http',
}), {'force_smil_url': True}))

View File

@ -1,19 +1,19 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import json
from .common import InfoExtractor from .once import OnceIE
from ..compat import ( from ..compat import (
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urlparse,
) )
from ..utils import ( from ..utils import (
unescapeHTML, unescapeHTML,
url_basename,
dict_get,
) )
class GameSpotIE(InfoExtractor): class GameSpotIE(OnceIE):
_VALID_URL = r'https?://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?' _VALID_URL = r'https?://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/', 'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
@ -39,29 +39,73 @@ class GameSpotIE(InfoExtractor):
webpage = self._download_webpage(url, page_id) webpage = self._download_webpage(url, page_id)
data_video_json = self._search_regex( data_video_json = self._search_regex(
r'data-video=["\'](.*?)["\']', webpage, 'data video') r'data-video=["\'](.*?)["\']', webpage, 'data video')
data_video = json.loads(unescapeHTML(data_video_json)) data_video = self._parse_json(unescapeHTML(data_video_json), page_id)
streams = data_video['videoStreams'] streams = data_video['videoStreams']
manifest_url = None
formats = [] formats = []
f4m_url = streams.get('f4m_stream') f4m_url = streams.get('f4m_stream')
if f4m_url is not None: if f4m_url:
# Transform the manifest url to a link to the mp4 files manifest_url = f4m_url
# they are used in mobile devices. formats.extend(self._extract_f4m_formats(
f4m_path = compat_urlparse.urlparse(f4m_url).path f4m_url + '?hdcore=3.7.0', page_id, f4m_id='hds', fatal=False))
QUALITIES_RE = r'((,\d+)+,?)' m3u8_url = streams.get('m3u8_stream')
qualities = self._search_regex(QUALITIES_RE, f4m_path, 'qualities').strip(',').split(',') if m3u8_url:
http_path = f4m_path[1:].split('/', 1)[1] manifest_url = m3u8_url
http_template = re.sub(QUALITIES_RE, r'%s', http_path) m3u8_formats = self._extract_m3u8_formats(
http_template = http_template.replace('.csmil/manifest.f4m', '') m3u8_url, page_id, 'mp4', 'm3u8_native',
http_template = compat_urlparse.urljoin( m3u8_id='hls', fatal=False)
'http://video.gamespotcdn.com/', http_template) formats.extend(m3u8_formats)
for q in qualities: progressive_url = dict_get(
formats.append({ streams, ('progressive_hd', 'progressive_high', 'progressive_low'))
'url': http_template % q, if progressive_url and manifest_url:
'ext': 'mp4', qualities_basename = self._search_regex(
'format_id': q, '/([^/]+)\.csmil/',
}) manifest_url, 'qualities basename', default=None)
else: if qualities_basename:
QUALITIES_RE = r'((,\d+)+,?)'
qualities = self._search_regex(
QUALITIES_RE, qualities_basename,
'qualities', default=None)
if qualities:
qualities = list(map(lambda q: int(q), qualities.strip(',').split(',')))
qualities.sort()
http_template = re.sub(QUALITIES_RE, r'%d', qualities_basename)
http_url_basename = url_basename(progressive_url)
if m3u8_formats:
self._sort_formats(m3u8_formats)
m3u8_formats = list(filter(
lambda f: f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
m3u8_formats))
if len(qualities) == len(m3u8_formats):
for q, m3u8_format in zip(qualities, m3u8_formats):
f = m3u8_format.copy()
f.update({
'url': progressive_url.replace(
http_url_basename, http_template % q),
'format_id': f['format_id'].replace('hls', 'http'),
'protocol': 'http',
})
formats.append(f)
else:
for q in qualities:
formats.append({
'url': progressive_url.replace(
http_url_basename, http_template % q),
'ext': 'mp4',
'format_id': 'http-%d' % q,
'tbr': q,
})
onceux_json = self._search_regex(
r'data-onceux-options=["\'](.*?)["\']', webpage, 'data video', default=None)
if onceux_json:
onceux_url = self._parse_json(unescapeHTML(onceux_json), page_id).get('metadataUri')
if onceux_url:
formats.extend(self._extract_once_formats(re.sub(
r'https?://[^/]+', 'http://once.unicornmedia.com', onceux_url).replace('ads/vmap/', '')))
if not formats:
for quality in ['sd', 'hd']: for quality in ['sd', 'hd']:
# It's actually a link to a flv file # It's actually a link to a flv file
flv_url = streams.get('f4m_{0}'.format(quality)) flv_url = streams.get('f4m_{0}'.format(quality))
@ -71,6 +115,7 @@ class GameSpotIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'format_id': quality, 'format_id': quality,
}) })
self._sort_formats(formats)
return { return {
'id': data_video['guid'], 'id': data_video['guid'],

View File

@ -1,62 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_age_limit,
url_basename,
)
class GametrailersIE(InfoExtractor):
_VALID_URL = r'https?://www\.gametrailers\.com/videos/view/[^/]+/(?P<id>.+)'
_TEST = {
'url': 'http://www.gametrailers.com/videos/view/gametrailers-com/116437-Just-Cause-3-Review',
'md5': 'f28c4efa0bdfaf9b760f6507955b6a6a',
'info_dict': {
'id': '2983958',
'ext': 'mp4',
'display_id': '116437-Just-Cause-3-Review',
'title': 'Just Cause 3 - Review',
'description': 'It\'s a lot of fun to shoot at things and then watch them explode in Just Cause 3, but should there be more to the experience than that?',
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'<title>(.+?)\|', webpage, 'title').strip()
embed_url = self._proto_relative_url(
self._search_regex(
r'src=\'(//embed.gametrailers.com/embed/[^\']+)\'', webpage,
'embed url'),
scheme='http:')
video_id = url_basename(embed_url)
embed_page = self._download_webpage(embed_url, video_id)
embed_vars_json = self._search_regex(
r'(?s)var embedVars = (\{.*?\})\s*</script>', embed_page,
'embed vars')
info = self._parse_json(embed_vars_json, video_id)
formats = []
for media in info['media']:
if media['mediaPurpose'] == 'play':
formats.append({
'url': media['uri'],
'height': media['height'],
'width:': media['width'],
})
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'formats': formats,
'thumbnail': info.get('thumbUri'),
'description': self._og_search_description(webpage),
'duration': int_or_none(info.get('videoLengthInSeconds')),
'age_limit': parse_age_limit(info.get('audienceRating')),
}

View File

@ -1073,20 +1073,6 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
} }
}, },
# Contains a SMIL manifest
{
'url': 'http://www.telewebion.com/fa/1263668/%D9%82%D8%B1%D8%B9%D9%87%E2%80%8C%DA%A9%D8%B4%DB%8C-%D9%84%DB%8C%DA%AF-%D9%82%D9%87%D8%B1%D9%85%D8%A7%D9%86%D8%A7%D9%86-%D8%A7%D8%B1%D9%88%D9%BE%D8%A7/%2B-%D9%81%D9%88%D8%AA%D8%A8%D8%A7%D9%84.html',
'info_dict': {
'id': 'file',
'ext': 'flv',
'title': '+ Football: Lottery Champions League Europe',
'uploader': 'www.telewebion.com',
},
'params': {
# rtmpe downloads
'skip_download': True,
}
},
# Brightcove URL in single quotes # Brightcove URL in single quotes
{ {
'url': 'http://www.sportsnet.ca/baseball/mlb/sn-presents-russell-martin-world-citizen/', 'url': 'http://www.sportsnet.ca/baseball/mlb/sn-presents-russell-martin-world-citizen/',

View File

@ -12,7 +12,7 @@ from ..utils import (
class ImdbIE(InfoExtractor): class ImdbIE(InfoExtractor):
IE_NAME = 'imdb' IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers' IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/video/[^/]+/vi(?P<id>\d+)' _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video/[^/]+/|title/tt\d+.*?#lb-)vi(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897', 'url': 'http://www.imdb.com/video/imdb/vi2524815897',
@ -25,6 +25,12 @@ class ImdbIE(InfoExtractor):
}, { }, {
'url': 'http://www.imdb.com/video/_/vi2524815897', 'url': 'http://www.imdb.com/video/_/vi2524815897',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.imdb.com/title/tt1667889/?ref_=ext_shr_eml_vi#lb-vi2524815897',
'only_matching': True,
}, {
'url': 'http://www.imdb.com/title/tt1667889/#lb-vi2524815897',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -8,6 +8,7 @@ from ..utils import (
int_or_none, int_or_none,
limit_length, limit_length,
lowercase_escape, lowercase_escape,
try_get,
) )
@ -19,10 +20,16 @@ class InstagramIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'aye83DjauH', 'id': 'aye83DjauH',
'ext': 'mp4', 'ext': 'mp4',
'uploader_id': 'naomipq',
'title': 'Video by naomipq', 'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8', 'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
} 'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1371748545,
'upload_date': '20130620',
'uploader_id': 'naomipq',
'uploader': 'Naomi Leonor Phan-Quang',
'like_count': int,
'comment_count': int,
},
}, { }, {
# missing description # missing description
'url': 'https://www.instagram.com/p/BA-pQFBG8HZ/?taken-by=britneyspears', 'url': 'https://www.instagram.com/p/BA-pQFBG8HZ/?taken-by=britneyspears',
@ -31,6 +38,13 @@ class InstagramIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'uploader_id': 'britneyspears', 'uploader_id': 'britneyspears',
'title': 'Video by britneyspears', 'title': 'Video by britneyspears',
'thumbnail': 're:^https?://.*\.jpg',
'timestamp': 1453760977,
'upload_date': '20160125',
'uploader_id': 'britneyspears',
'uploader': 'Britney Spears',
'like_count': int,
'comment_count': int,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -67,21 +81,57 @@ class InstagramIE(InfoExtractor):
url = mobj.group('url') url = mobj.group('url')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',
webpage, 'uploader id', fatal=False) (video_url, description, thumbnail, timestamp, uploader,
desc = self._search_regex( uploader_id, like_count, comment_count) = [None] * 8
r'"caption":"(.+?)"', webpage, 'description', default=None)
if desc is not None: shared_data = self._parse_json(
desc = lowercase_escape(desc) self._search_regex(
r'window\._sharedData\s*=\s*({.+?});',
webpage, 'shared data', default='{}'),
video_id, fatal=False)
if shared_data:
media = try_get(
shared_data, lambda x: x['entry_data']['PostPage'][0]['media'], dict)
if media:
video_url = media.get('video_url')
description = media.get('caption')
thumbnail = media.get('display_src')
timestamp = int_or_none(media.get('date'))
uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username')
like_count = int_or_none(media.get('likes', {}).get('count'))
comment_count = int_or_none(media.get('comments', {}).get('count'))
if not video_url:
video_url = self._og_search_video_url(webpage, secure=False)
if not uploader_id:
uploader_id = self._search_regex(
r'"owner"\s*:\s*{\s*"username"\s*:\s*"(.+?)"',
webpage, 'uploader id', fatal=False)
if not description:
description = self._search_regex(
r'"caption"\s*:\s*"(.+?)"', webpage, 'description', default=None)
if description is not None:
description = lowercase_escape(description)
if not thumbnail:
thumbnail = self._og_search_thumbnail(webpage)
return { return {
'id': video_id, 'id': video_id,
'url': self._og_search_video_url(webpage, secure=False), 'url': video_url,
'ext': 'mp4', 'ext': 'mp4',
'title': 'Video by %s' % uploader_id, 'title': 'Video by %s' % uploader_id,
'thumbnail': self._og_search_thumbnail(webpage), 'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'uploader_id': uploader_id, 'uploader_id': uploader_id,
'description': desc, 'uploader': uploader,
'like_count': like_count,
'comment_count': comment_count,
} }

View File

@ -12,9 +12,35 @@ from ..utils import (
class JWPlatformBaseIE(InfoExtractor): class JWPlatformBaseIE(InfoExtractor):
@staticmethod
def _find_jwplayer_data(webpage):
# TODO: Merge this with JWPlayer-related codes in generic.py
mobj = re.search(
'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None): def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
video_data = jwplayer_data['playlist'][0] video_data = jwplayer_data['playlist'][0]
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
if 'sources' not in video_data:
video_data['sources'] = [video_data]
formats = [] formats = []
for source in video_data['sources']: for source in video_data['sources']:
source_url = self._proto_relative_url(source['file']) source_url = self._proto_relative_url(source['file'])

View File

@ -148,8 +148,8 @@ class KuwoAlbumIE(InfoExtractor):
'url': 'http://www.kuwo.cn/album/502294/', 'url': 'http://www.kuwo.cn/album/502294/',
'info_dict': { 'info_dict': {
'id': '502294', 'id': '502294',
'title': 'M', 'title': 'Made\xa0Series\xa0《M》',
'description': 'md5:6a7235a84cc6400ec3b38a7bdaf1d60c', 'description': 'md5:d463f0d8a0ff3c3ea3d6ed7452a9483f',
}, },
'playlist_count': 2, 'playlist_count': 2,
} }
@ -209,7 +209,7 @@ class KuwoSingerIE(InfoExtractor):
'url': 'http://www.kuwo.cn/mingxing/bruno+mars/', 'url': 'http://www.kuwo.cn/mingxing/bruno+mars/',
'info_dict': { 'info_dict': {
'id': 'bruno+mars', 'id': 'bruno+mars',
'title': 'Bruno Mars', 'title': 'Bruno\xa0Mars',
}, },
'playlist_mincount': 329, 'playlist_mincount': 329,
}, { }, {
@ -306,7 +306,7 @@ class KuwoMvIE(KuwoBaseIE):
'id': '6480076', 'id': '6480076',
'ext': 'mp4', 'ext': 'mp4',
'title': 'My HouseMV', 'title': 'My HouseMV',
'creator': '2PM', 'creator': 'PM02:00',
}, },
# In this video, music URLs (anti.s) are blocked outside China and # In this video, music URLs (anti.s) are blocked outside China and
# USA, while the MV URL (mvurl) is available globally, so force the MV # USA, while the MV URL (mvurl) is available globally, so force the MV

View File

@ -28,7 +28,7 @@ from ..utils import (
class LeIE(InfoExtractor): class LeIE(InfoExtractor):
IE_DESC = '乐视网' IE_DESC = '乐视网'
_VALID_URL = r'https?://www\.le\.com/ptv/vplay/(?P<id>\d+)\.html' _VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|sports\.le\.com/video)/(?P<id>\d+)\.html'
_URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html' _URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html'
@ -69,6 +69,9 @@ class LeIE(InfoExtractor):
'hls_prefer_native': True, 'hls_prefer_native': True,
}, },
'skip': 'Only available in China', 'skip': 'Only available in China',
}, {
'url': 'http://sports.le.com/video/25737697.html',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@ -196,7 +199,7 @@ class LeIE(InfoExtractor):
class LePlaylistIE(InfoExtractor): class LePlaylistIE(InfoExtractor):
_VALID_URL = r'https?://[a-z]+\.le\.com/[a-z]+/(?P<id>[a-z0-9_]+)' _VALID_URL = r'https?://[a-z]+\.le\.com/(?!video)[a-z]+/(?P<id>[a-z0-9_]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.le.com/tv/46177.html', 'url': 'http://www.le.com/tv/46177.html',

View File

@ -95,7 +95,6 @@ class LyndaIE(LyndaBaseIE):
IE_NAME = 'lynda' IE_NAME = 'lynda'
IE_DESC = 'lynda.com videos' IE_DESC = 'lynda.com videos'
_VALID_URL = r'https?://www\.lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)' _VALID_URL = r'https?://www\.lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)'
_NETRC_MACHINE = 'lynda'
_TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]' _TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]'

View File

@ -4,16 +4,12 @@ from __future__ import unicode_literals
import random import random
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode from ..utils import xpath_text
from ..utils import (
sanitized_Request,
xpath_text,
)
class MatchTVIE(InfoExtractor): class MatchTVIE(InfoExtractor):
_VALID_URL = r'https?://matchtv\.ru/?#live-player' _VALID_URL = r'https?://matchtv\.ru(?:/on-air|/?#live-player)'
_TEST = { _TESTS = [{
'url': 'http://matchtv.ru/#live-player', 'url': 'http://matchtv.ru/#live-player',
'info_dict': { 'info_dict': {
'id': 'matchtv-live', 'id': 'matchtv-live',
@ -24,12 +20,16 @@ class MatchTVIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
} }, {
'url': 'http://matchtv.ru/on-air/',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = 'matchtv-live' video_id = 'matchtv-live'
request = sanitized_Request( video_url = self._download_json(
'http://player.matchtv.ntvplus.tv/player/smil?%s' % compat_urllib_parse_urlencode({ 'http://player.matchtv.ntvplus.tv/player/smil', video_id,
query={
'ts': '', 'ts': '',
'quality': 'SD', 'quality': 'SD',
'contentId': '561d2c0df7159b37178b4567', 'contentId': '561d2c0df7159b37178b4567',
@ -40,11 +40,10 @@ class MatchTVIE(InfoExtractor):
'contentType': 'channel', 'contentType': 'channel',
'timeShift': '0', 'timeShift': '0',
'platform': 'portal', 'platform': 'portal',
}), },
headers={ headers={
'Referer': 'http://player.matchtv.ntvplus.tv/embed-player/NTVEmbedPlayer.swf', 'Referer': 'http://player.matchtv.ntvplus.tv/embed-player/NTVEmbedPlayer.swf',
}) })['data']['videoUrl']
video_url = self._download_json(request, video_id)['data']['videoUrl']
f4m_url = xpath_text(self._download_xml(video_url, video_id), './to') f4m_url = xpath_text(self._download_xml(video_url, video_id), './to')
formats = self._extract_f4m_formats(f4m_url, video_id) formats = self._extract_f4m_formats(f4m_url, video_id)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -1,5 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
@ -8,6 +11,7 @@ from ..compat import (
from ..utils import ( from ..utils import (
get_element_by_attribute, get_element_by_attribute,
int_or_none, int_or_none,
remove_start,
) )
@ -15,7 +19,7 @@ class MiTeleIE(InfoExtractor):
IE_DESC = 'mitele.es' IE_DESC = 'mitele.es'
_VALID_URL = r'https?://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/' _VALID_URL = r'https?://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/'
_TEST = { _TESTS = [{
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/', 'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
# MD5 is unstable # MD5 is unstable
'info_dict': { 'info_dict': {
@ -24,10 +28,31 @@ class MiTeleIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'title': 'Tor, la web invisible', 'title': 'Tor, la web invisible',
'description': 'md5:3b6fce7eaa41b2d97358726378d9369f', 'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
'series': 'Diario de',
'season': 'La redacción',
'episode': 'Programa 144',
'thumbnail': 're:(?i)^https?://.*\.jpg$', 'thumbnail': 're:(?i)^https?://.*\.jpg$',
'duration': 2913, 'duration': 2913,
}, },
} }, {
# no explicit title
'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/temporada-6/programa-226/',
'info_dict': {
'id': 'eLZSwoEd1S3pVyUm8lc6F',
'display_id': 'programa-226',
'ext': 'flv',
'title': 'Cuarto Milenio - Temporada 6 - Programa 226',
'description': 'md5:50daf9fadefa4e62d9fc866d0c015701',
'series': 'Cuarto Milenio',
'season': 'Temporada 6',
'episode': 'Programa 226',
'thumbnail': 're:(?i)^https?://.*\.jpg$',
'duration': 7312,
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -70,7 +95,22 @@ class MiTeleIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
title = self._search_regex( title = self._search_regex(
r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>', webpage, 'title') r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>',
webpage, 'title', default=None)
mobj = re.search(r'''(?sx)
class="Destacado-text"[^>]*>.*?<h1>\s*
<span>(?P<series>[^<]+)</span>\s*
<span>(?P<season>[^<]+)</span>\s*
<span>(?P<episode>[^<]+)</span>''', webpage)
series, season, episode = mobj.groups() if mobj else [None] * 3
if not title:
if mobj:
title = '%s - %s - %s' % (series, season, episode)
else:
title = remove_start(self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), 'Ver online ')
video_id = self._search_regex( video_id = self._search_regex(
r'data-media-id\s*=\s*"([^"]+)"', webpage, r'data-media-id\s*=\s*"([^"]+)"', webpage,
@ -83,6 +123,9 @@ class MiTeleIE(InfoExtractor):
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'description': get_element_by_attribute('class', 'text', webpage), 'description': get_element_by_attribute('class', 'text', webpage),
'series': series,
'season': season,
'episode': episode,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'formats': formats, 'formats': formats,

View File

@ -6,6 +6,7 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_str, compat_str,
compat_xpath,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -84,9 +85,10 @@ class MTVServicesInfoExtractor(InfoExtractor):
rtmp_video_url = rendition.find('./src').text rtmp_video_url = rendition.find('./src').text
if rtmp_video_url.endswith('siteunavail.png'): if rtmp_video_url.endswith('siteunavail.png'):
continue continue
new_url = self._transform_rtmp_url(rtmp_video_url)
formats.append({ formats.append({
'ext': ext, 'ext': 'flv' if new_url.startswith('rtmp') else ext,
'url': self._transform_rtmp_url(rtmp_video_url), 'url': new_url,
'format_id': rendition.get('bitrate'), 'format_id': rendition.get('bitrate'),
'width': int(rendition.get('width')), 'width': int(rendition.get('width')),
'height': int(rendition.get('height')), 'height': int(rendition.get('height')),
@ -139,9 +141,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
itemdoc, './/{http://search.yahoo.com/mrss/}category', itemdoc, './/{http://search.yahoo.com/mrss/}category',
'scheme', 'urn:mtvn:video_title') 'scheme', 'urn:mtvn:video_title')
if title_el is None: if title_el is None:
title_el = itemdoc.find('.//{http://search.yahoo.com/mrss/}title') title_el = itemdoc.find(compat_xpath('.//{http://search.yahoo.com/mrss/}title'))
if title_el is None: if title_el is None:
title_el = itemdoc.find('.//title') or itemdoc.find('./title') title_el = itemdoc.find(compat_xpath('.//title'))
if title_el.text is None: if title_el.text is None:
title_el = None title_el = None

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor from .mtv import MTVServicesInfoExtractor
from ..compat import compat_urllib_parse_urlencode from ..compat import compat_urllib_parse_urlencode
from ..utils import update_url_query
class NickIE(MTVServicesInfoExtractor): class NickIE(MTVServicesInfoExtractor):
@ -61,3 +62,26 @@ class NickIE(MTVServicesInfoExtractor):
def _extract_mgid(self, webpage): def _extract_mgid(self, webpage):
return self._search_regex(r'data-contenturi="([^"]+)', webpage, 'mgid') return self._search_regex(r'data-contenturi="([^"]+)', webpage, 'mgid')
class NickDeIE(MTVServicesInfoExtractor):
IE_NAME = 'nick.de'
_VALID_URL = r'https?://(?:www\.)?nick\.de/(?:playlist|shows)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.nick.de/playlist/3773-top-videos/videos/episode/17306-zu-wasser-und-zu-land-rauchende-erdnusse',
'only_matching': True,
}, {
'url': 'http://www.nick.de/shows/342-icarly',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mrss_url = update_url_query(self._search_regex(
r'data-mrss=(["\'])(?P<url>http.+?)\1', webpage, 'mrss url', group='url'),
{'siteKey': 'nick.de'})
return self._get_videos_info_from_url(mrss_url, video_id)

View File

@ -163,7 +163,7 @@ class NRKTVIE(NRKBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': '20 spørsmål 23.05.2014', 'title': '20 spørsmål 23.05.2014',
'description': 'md5:bdea103bc35494c143c6a9acdd84887a', 'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'duration': 1741.52, 'duration': 1741,
}, },
}, { }, {
'url': 'https://tv.nrk.no/program/mdfp15000514', 'url': 'https://tv.nrk.no/program/mdfp15000514',
@ -173,7 +173,7 @@ class NRKTVIE(NRKBaseIE):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting 24.05.2014', 'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting 24.05.2014',
'description': 'md5:89290c5ccde1b3a24bb8050ab67fe1db', 'description': 'md5:89290c5ccde1b3a24bb8050ab67fe1db',
'duration': 4605.08, 'duration': 4605,
}, },
}, { }, {
# single playlist video # single playlist video
@ -260,30 +260,34 @@ class NRKPlaylistIE(InfoExtractor):
class NRKSkoleIE(InfoExtractor): class NRKSkoleIE(InfoExtractor):
IE_DESC = 'NRK Skole' IE_DESC = 'NRK Skole'
_VALID_URL = r'https?://(?:www\.)?nrk\.no/skole/klippdetalj?.*\btopic=(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?nrk\.no/skole/?\?.*\bmediaId=(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://nrk.no/skole/klippdetalj?topic=nrk:klipp/616532', 'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099',
'md5': '04cd85877cc1913bce73c5d28a47e00f', 'md5': '6bc936b01f9dd8ed45bc58b252b2d9b6',
'info_dict': { 'info_dict': {
'id': '6021', 'id': '6021',
'ext': 'flv', 'ext': 'mp4',
'title': 'Genetikk og eneggede tvillinger', 'title': 'Genetikk og eneggede tvillinger',
'description': 'md5:3aca25dcf38ec30f0363428d2b265f8d', 'description': 'md5:3aca25dcf38ec30f0363428d2b265f8d',
'duration': 399, 'duration': 399,
}, },
}, { }, {
'url': 'http://www.nrk.no/skole/klippdetalj?topic=nrk%3Aklipp%2F616532#embed', 'url': 'https://www.nrk.no/skole/?page=objectives&subject=naturfag&objective=K15114&mediaId=19355',
'only_matching': True,
}, {
'url': 'http://www.nrk.no/skole/klippdetalj?topic=urn:x-mediadb:21379',
'only_matching': True, 'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(self._match_id(url)) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(
'https://mimir.nrk.no/plugin/1.0/static?mediaId=%s' % video_id,
video_id)
nrk_id = self._parse_json(
self._search_regex(
r'<script[^>]+type=["\']application/json["\'][^>]*>({.+?})</script>',
webpage, 'application json'),
video_id)['activeMedia']['psId']
nrk_id = self._search_regex(r'data-nrk-id=["\'](\d+)', webpage, 'nrk id')
return self.url_result('nrk:%s' % nrk_id) return self.url_result('nrk:%s' % nrk_id)

View File

@ -1,19 +1,32 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError,
int_or_none, int_or_none,
js_to_json, js_to_json,
qualities,
) )
class PornHdIE(InfoExtractor): class PornHdIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornhd\.com/(?:[a-z]{2,4}/)?videos/(?P<id>\d+)(?:/(?P<display_id>.+))?' _VALID_URL = r'https?://(?:www\.)?pornhd\.com/(?:[a-z]{2,4}/)?videos/(?P<id>\d+)(?:/(?P<display_id>.+))?'
_TEST = { _TESTS = [{
'url': 'http://www.pornhd.com/videos/9864/selfie-restroom-masturbation-fun-with-chubby-cutie-hd-porn-video',
'md5': 'c8b964b1f0a4b5f7f28ae3a5c9f86ad5',
'info_dict': {
'id': '9864',
'display_id': 'selfie-restroom-masturbation-fun-with-chubby-cutie-hd-porn-video',
'ext': 'mp4',
'title': 'Restroom selfie masturbation',
'description': 'md5:3748420395e03e31ac96857a8f125b2b',
'thumbnail': 're:^https?://.*\.jpg',
'view_count': int,
'age_limit': 18,
}
}, {
# removed video
'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video', 'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
'md5': '956b8ca569f7f4d8ec563e2c41598441', 'md5': '956b8ca569f7f4d8ec563e2c41598441',
'info_dict': { 'info_dict': {
@ -25,8 +38,9 @@ class PornHdIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg', 'thumbnail': 're:^https?://.*\.jpg',
'view_count': int, 'view_count': int,
'age_limit': 18, 'age_limit': 18,
} },
} 'skip': 'Not available anymore',
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -38,28 +52,38 @@ class PornHdIE(InfoExtractor):
title = self._html_search_regex( title = self._html_search_regex(
[r'<span[^>]+class=["\']video-name["\'][^>]*>([^<]+)', [r'<span[^>]+class=["\']video-name["\'][^>]*>([^<]+)',
r'<title>(.+?) - .*?[Pp]ornHD.*?</title>'], webpage, 'title') r'<title>(.+?) - .*?[Pp]ornHD.*?</title>'], webpage, 'title')
description = self._html_search_regex(
r'<div class="description">([^<]+)</div>', webpage, 'description', fatal=False)
view_count = int_or_none(self._html_search_regex(
r'(\d+) views\s*</span>', webpage, 'view count', fatal=False))
thumbnail = self._search_regex(
r"'poster'\s*:\s*'([^']+)'", webpage, 'thumbnail', fatal=False)
quality = qualities(['sd', 'hd']) sources = self._parse_json(js_to_json(self._search_regex(
sources = json.loads(js_to_json(self._search_regex(
r"(?s)'sources'\s*:\s*(\{.+?\})\s*\}[;,)]", r"(?s)'sources'\s*:\s*(\{.+?\})\s*\}[;,)]",
webpage, 'sources'))) webpage, 'sources', default='{}')), video_id)
if not sources:
message = self._html_search_regex(
r'(?s)<(div|p)[^>]+class="no-video"[^>]*>(?P<value>.+?)</\1',
webpage, 'error message', group='value')
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
formats = [] formats = []
for qname, video_url in sources.items(): for format_id, video_url in sources.items():
if not video_url: if not video_url:
continue continue
height = int_or_none(self._search_regex(
r'^(\d+)[pP]', format_id, 'height', default=None))
formats.append({ formats.append({
'url': video_url, 'url': video_url,
'format_id': qname, 'format_id': format_id,
'quality': quality(qname), 'height': height,
}) })
self._sort_formats(formats) self._sort_formats(formats)
description = self._html_search_regex(
r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1',
webpage, 'description', fatal=False, group='value')
view_count = int_or_none(self._html_search_regex(
r'(\d+) views\s*<', webpage, 'view count', fatal=False))
thumbnail = self._search_regex(
r"'poster'\s*:\s*'([^']+)'", webpage, 'thumbnail', fatal=False)
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,

View File

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools import itertools
@ -39,7 +40,25 @@ class PornHubIE(InfoExtractor):
'dislike_count': int, 'dislike_count': int,
'comment_count': int, 'comment_count': int,
'age_limit': 18, 'age_limit': 18,
} },
}, {
# non-ASCII title
'url': 'http://www.pornhub.com/view_video.php?viewkey=1331683002',
'info_dict': {
'id': '1331683002',
'ext': 'mp4',
'title': '重庆婷婷女王足交',
'uploader': 'cj397186295',
'duration': 1753,
'view_count': int,
'like_count': int,
'dislike_count': int,
'comment_count': int,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph557bbb6676d2d', 'url': 'http://www.pornhub.com/view_video.php?viewkey=ph557bbb6676d2d',
'only_matching': True, 'only_matching': True,
@ -76,19 +95,25 @@ class PornHubIE(InfoExtractor):
'PornHub said: %s' % error_msg, 'PornHub said: %s' % error_msg,
expected=True, video_id=video_id) expected=True, video_id=video_id)
# video_title from flashvars contains whitespace instead of non-ASCII (see
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore.
title = self._html_search_meta(
'twitter:title', webpage, default=None) or self._search_regex(
(r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
r'shareTitle\s*=\s*(["\'])(?P<title>.+?)\1'),
webpage, 'title', group='title')
flashvars = self._parse_json( flashvars = self._parse_json(
self._search_regex( self._search_regex(
r'var\s+flashvars_\d+\s*=\s*({.+?});', webpage, 'flashvars', default='{}'), r'var\s+flashvars_\d+\s*=\s*({.+?});', webpage, 'flashvars', default='{}'),
video_id) video_id)
if flashvars: if flashvars:
video_title = flashvars.get('video_title')
thumbnail = flashvars.get('image_url') thumbnail = flashvars.get('image_url')
duration = int_or_none(flashvars.get('video_duration')) duration = int_or_none(flashvars.get('video_duration'))
else: else:
video_title, thumbnail, duration = [None] * 3 title, thumbnail, duration = [None] * 3
if not video_title:
video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
video_uploader = self._html_search_regex( video_uploader = self._html_search_regex(
r'(?s)From:&nbsp;.+?<(?:a href="/users/|a href="/channels/|span class="username)[^>]+>(.+?)<', r'(?s)From:&nbsp;.+?<(?:a href="/users/|a href="/channels/|span class="username)[^>]+>(.+?)<',
@ -137,7 +162,7 @@ class PornHubIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'uploader': video_uploader, 'uploader': video_uploader,
'title': video_title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'view_count': view_count, 'view_count': view_count,

View File

@ -2,22 +2,19 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import int_or_none
js_to_json,
unescapeHTML,
int_or_none,
)
class R7IE(InfoExtractor): class R7IE(InfoExtractor):
_VALID_URL = r'''(?x)https?:// _VALID_URL = r'''(?x)
https?://
(?: (?:
(?:[a-zA-Z]+)\.r7\.com(?:/[^/]+)+/idmedia/| (?:[a-zA-Z]+)\.r7\.com(?:/[^/]+)+/idmedia/|
noticias\.r7\.com(?:/[^/]+)+/[^/]+-| noticias\.r7\.com(?:/[^/]+)+/[^/]+-|
player\.r7\.com/video/i/ player\.r7\.com/video/i/
) )
(?P<id>[\da-f]{24}) (?P<id>[\da-f]{24})
''' '''
_TESTS = [{ _TESTS = [{
'url': 'http://videos.r7.com/policiais-humilham-suspeito-a-beira-da-morte-morre-com-dignidade-/idmedia/54e7050b0cf2ff57e0279389.html', 'url': 'http://videos.r7.com/policiais-humilham-suspeito-a-beira-da-morte-morre-com-dignidade-/idmedia/54e7050b0cf2ff57e0279389.html',
'md5': '403c4e393617e8e8ddc748978ee8efde', 'md5': '403c4e393617e8e8ddc748978ee8efde',
@ -25,6 +22,7 @@ class R7IE(InfoExtractor):
'id': '54e7050b0cf2ff57e0279389', 'id': '54e7050b0cf2ff57e0279389',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Policiais humilham suspeito à beira da morte: "Morre com dignidade"', 'title': 'Policiais humilham suspeito à beira da morte: "Morre com dignidade"',
'description': 'md5:01812008664be76a6479aa58ec865b72',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 98, 'duration': 98,
'like_count': int, 'like_count': int,
@ -44,45 +42,72 @@ class R7IE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage( video = self._download_json(
'http://player.r7.com/video/i/%s' % video_id, video_id) 'http://player-api.r7.com/video/i/%s' % video_id, video_id)
item = self._parse_json(js_to_json(self._search_regex( title = video['title']
r'(?s)var\s+item\s*=\s*({.+?});', webpage, 'player')), video_id)
title = unescapeHTML(item['title'])
thumbnail = item.get('init', {}).get('thumbUri')
duration = None
statistics = item.get('statistics', {})
like_count = int_or_none(statistics.get('likes'))
view_count = int_or_none(statistics.get('views'))
formats = [] formats = []
for format_key, format_dict in item['playlist'][0].items(): media_url_hls = video.get('media_url_hls')
src = format_dict.get('src') if media_url_hls:
if not src: formats.extend(self._extract_m3u8_formats(
continue media_url_hls, video_id, 'mp4', entry_protocol='m3u8_native',
format_id = format_dict.get('format') or format_key m3u8_id='hls', fatal=False))
if duration is None: media_url = video.get('media_url')
duration = format_dict.get('duration') if media_url:
if '.f4m' in src: f = {
formats.extend(self._extract_f4m_formats(src, video_id, preference=-1)) 'url': media_url,
elif src.endswith('.m3u8'): 'format_id': 'http',
formats.extend(self._extract_m3u8_formats(src, video_id, 'mp4', preference=-2)) }
else: # m3u8 format always matches the http format, let's copy metadata from
formats.append({ # one to another
'url': src, m3u8_formats = list(filter(
'format_id': format_id, lambda f: f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
}) formats))
if len(m3u8_formats) == 1:
f_copy = m3u8_formats[0].copy()
f_copy.update(f)
f_copy['protocol'] = 'http'
f = f_copy
formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
description = video.get('description')
thumbnail = video.get('thumb')
duration = int_or_none(video.get('media_duration'))
like_count = int_or_none(video.get('likes'))
view_count = int_or_none(video.get('views'))
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'like_count': like_count, 'like_count': like_count,
'view_count': view_count, 'view_count': view_count,
'formats': formats, 'formats': formats,
} }
class R7ArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:[a-zA-Z]+)\.r7\.com/(?:[^/]+/)+[^/?#&]+-(?P<id>\d+)'
_TEST = {
'url': 'http://tv.r7.com/record-play/balanco-geral/videos/policiais-humilham-suspeito-a-beira-da-morte-morre-com-dignidade-16102015',
'only_matching': True,
}
@classmethod
def suitable(cls, url):
return False if R7IE.suitable(url) else super(R7ArticleIE, cls).suitable(url)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'<div[^>]+(?:id=["\']player-|class=["\']embed["\'][^>]+id=["\'])([\da-f]{24})',
webpage, 'video id')
return self.url_result('http://player.r7.com/video/i/%s' % video_id, R7IE.ie_key())

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import( from ..utils import (
unified_strdate, unified_strdate,
str_to_int, str_to_int,
) )

View File

@ -0,0 +1,69 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
)
class RockstarGamesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rockstargames\.com/videos(?:/video/|#?/?\?.*\bvideo=)(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.rockstargames.com/videos/video/11544/',
'md5': '03b5caa6e357a4bd50e3143fc03e5733',
'info_dict': {
'id': '11544',
'ext': 'mp4',
'title': 'Further Adventures in Finance and Felony Trailer',
'description': 'md5:6d31f55f30cb101b5476c4a379e324a3',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1464876000,
'upload_date': '20160602',
}
}, {
'url': 'http://www.rockstargames.com/videos#/?video=48',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
'https://www.rockstargames.com/videoplayer/videos/get-video.json',
video_id, query={
'id': video_id,
'locale': 'en_us',
})['video']
title = video['title']
formats = []
for video in video['files_processed']['video/mp4']:
if not video.get('src'):
continue
resolution = video.get('resolution')
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', resolution or '', 'height', default=None))
formats.append({
'url': self._proto_relative_url(video['src']),
'format_id': resolution,
'height': height,
})
if not formats:
youtube_id = video.get('youtube_id')
if youtube_id:
return self.url_result(youtube_id, 'Youtube')
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video.get('description'),
'thumbnail': self._proto_relative_url(video.get('screencap')),
'timestamp': parse_iso8601(video.get('created')),
'formats': formats,
}

View File

@ -0,0 +1,38 @@
# coding: utf-8
from __future__ import unicode_literals
from .wdr import WDRBaseIE
from ..utils import get_element_by_attribute
class SportschauIE(WDRBaseIE):
IE_NAME = 'Sportschau'
_VALID_URL = r'https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video-?(?P<id>[^/#?]+)\.html'
_TEST = {
'url': 'http://www.sportschau.de/uefaeuro2016/videos/video-dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100.html',
'info_dict': {
'id': 'mdb-1140188',
'display_id': 'dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100',
'ext': 'mp4',
'title': 'DFB-Team geht gut gelaunt ins Spiel gegen Polen',
'description': 'Vor dem zweiten Gruppenspiel gegen Polen herrscht gute Stimmung im deutschen Team. Insbesondere Bastian Schweinsteiger strotzt vor Optimismus nach seinem Tor gegen die Ukraine.',
'upload_date': '20160615',
},
'skip': 'Geo-restricted to Germany',
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = get_element_by_attribute('class', 'headline', webpage)
description = self._html_search_meta('description', webpage, 'description')
info = self._extract_wdr_video(webpage, video_id)
info.update({
'title': title,
'description': description,
})
return info

View File

@ -5,7 +5,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
sanitized_Request, ExtractorError,
urlencode_postdata, urlencode_postdata,
) )
@ -14,7 +14,7 @@ class StreamcloudIE(InfoExtractor):
IE_NAME = 'streamcloud.eu' IE_NAME = 'streamcloud.eu'
_VALID_URL = r'https?://streamcloud\.eu/(?P<id>[a-zA-Z0-9_-]+)(?:/(?P<fname>[^#?]*)\.html)?' _VALID_URL = r'https?://streamcloud\.eu/(?P<id>[a-zA-Z0-9_-]+)(?:/(?P<fname>[^#?]*)\.html)?'
_TEST = { _TESTS = [{
'url': 'http://streamcloud.eu/skp9j99s4bpz/youtube-dl_test_video_____________-BaW_jenozKc.mp4.html', 'url': 'http://streamcloud.eu/skp9j99s4bpz/youtube-dl_test_video_____________-BaW_jenozKc.mp4.html',
'md5': '6bea4c7fa5daaacc2a946b7146286686', 'md5': '6bea4c7fa5daaacc2a946b7146286686',
'info_dict': { 'info_dict': {
@ -23,7 +23,10 @@ class StreamcloudIE(InfoExtractor):
'title': 'youtube-dl test video \'/\\ ä ↭', 'title': 'youtube-dl test video \'/\\ ä ↭',
}, },
'skip': 'Only available from the EU' 'skip': 'Only available from the EU'
} }, {
'url': 'http://streamcloud.eu/ua8cmfh1nbe6/NSHIP-148--KUC-NG--H264-.mp4.html',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -31,26 +34,36 @@ class StreamcloudIE(InfoExtractor):
orig_webpage = self._download_webpage(url, video_id) orig_webpage = self._download_webpage(url, video_id)
if '>File Not Found<' in orig_webpage:
raise ExtractorError(
'Video %s does not exist' % video_id, expected=True)
fields = re.findall(r'''(?x)<input\s+ fields = re.findall(r'''(?x)<input\s+
type="(?:hidden|submit)"\s+ type="(?:hidden|submit)"\s+
name="([^"]+)"\s+ name="([^"]+)"\s+
(?:id="[^"]+"\s+)? (?:id="[^"]+"\s+)?
value="([^"]*)" value="([^"]*)"
''', orig_webpage) ''', orig_webpage)
post = urlencode_postdata(fields)
self._sleep(12, video_id) self._sleep(12, video_id)
headers = {
b'Content-Type': b'application/x-www-form-urlencoded',
}
req = sanitized_Request(url, post, headers)
webpage = self._download_webpage( webpage = self._download_webpage(
req, video_id, note='Downloading video page ...') url, video_id, data=urlencode_postdata(fields), headers={
title = self._html_search_regex( b'Content-Type': b'application/x-www-form-urlencoded',
r'<h1[^>]*>([^<]+)<', webpage, 'title') })
video_url = self._search_regex(
r'file:\s*"([^"]+)"', webpage, 'video URL') try:
title = self._html_search_regex(
r'<h1[^>]*>([^<]+)<', webpage, 'title')
video_url = self._search_regex(
r'file:\s*"([^"]+)"', webpage, 'video URL')
except ExtractorError:
message = self._html_search_regex(
r'(?s)<div[^>]+class=(["\']).*?msgboxinfo.*?\1[^>]*>(?P<message>.+?)</div>',
webpage, 'message', default=None, group='message')
if message:
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
raise
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'image:\s*"([^"]+)"', webpage, 'thumbnail URL', fatal=False) r'image:\s*"([^"]+)"', webpage, 'thumbnail URL', fatal=False)

View File

@ -6,17 +6,14 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
dict_get,
int_or_none,
try_get,
) )
class SVTBaseIE(InfoExtractor): class SVTBaseIE(InfoExtractor):
def _extract_video(self, url, video_id): def _extract_video(self, video_info, video_id):
info = self._download_json(url, video_id)
title = info['context']['title']
thumbnail = info['context'].get('thumbnailImage')
video_info = info['video']
formats = [] formats = []
for vr in video_info['videoReferences']: for vr in video_info['videoReferences']:
player_type = vr.get('playerType') player_type = vr.get('playerType')
@ -40,27 +37,49 @@ class SVTBaseIE(InfoExtractor):
'format_id': player_type, 'format_id': player_type,
'url': vurl, 'url': vurl,
}) })
if not formats and video_info.get('rights', {}).get('geoBlockedSweden'):
self.raise_geo_restricted('This video is only available in Sweden')
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}
subtitle_references = video_info.get('subtitleReferences') subtitle_references = dict_get(video_info, ('subtitles', 'subtitleReferences'))
if isinstance(subtitle_references, list): if isinstance(subtitle_references, list):
for sr in subtitle_references: for sr in subtitle_references:
subtitle_url = sr.get('url') subtitle_url = sr.get('url')
subtitle_lang = sr.get('language', 'sv')
if subtitle_url: if subtitle_url:
subtitles.setdefault('sv', []).append({'url': subtitle_url}) if determine_ext(subtitle_url) == 'm3u8':
# TODO(yan12125): handle WebVTT in m3u8 manifests
continue
duration = video_info.get('materialLength') subtitles.setdefault(subtitle_lang, []).append({'url': subtitle_url})
age_limit = 18 if video_info.get('inappropriateForChildren') else 0
title = video_info.get('title')
series = video_info.get('programTitle')
season_number = int_or_none(video_info.get('season'))
episode = video_info.get('episodeTitle')
episode_number = int_or_none(video_info.get('episodeNumber'))
duration = int_or_none(dict_get(video_info, ('materialLength', 'contentDuration')))
age_limit = None
adult = dict_get(
video_info, ('inappropriateForChildren', 'blockedForChildren'),
skip_false_values=False)
if adult is not None:
age_limit = 18 if adult else 0
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'age_limit': age_limit, 'age_limit': age_limit,
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
} }
@ -68,11 +87,11 @@ class SVTIE(SVTBaseIE):
_VALID_URL = r'https?://(?:www\.)?svt\.se/wd\?(?:.*?&)?widgetId=(?P<widget_id>\d+)&.*?\barticleId=(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?svt\.se/wd\?(?:.*?&)?widgetId=(?P<widget_id>\d+)&.*?\barticleId=(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.svt.se/wd?widgetId=23991&sectionId=541&articleId=2900353&type=embed&contextSectionId=123&autostart=false', 'url': 'http://www.svt.se/wd?widgetId=23991&sectionId=541&articleId=2900353&type=embed&contextSectionId=123&autostart=false',
'md5': '9648197555fc1b49e3dc22db4af51d46', 'md5': '33e9a5d8f646523ce0868ecfb0eed77d',
'info_dict': { 'info_dict': {
'id': '2900353', 'id': '2900353',
'ext': 'flv', 'ext': 'mp4',
'title': 'Här trycker Jagr till Giroux (under SVT-intervjun)', 'title': 'Stjärnorna skojar till det - under SVT-intervjun',
'duration': 27, 'duration': 27,
'age_limit': 0, 'age_limit': 0,
}, },
@ -89,15 +108,20 @@ class SVTIE(SVTBaseIE):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
widget_id = mobj.group('widget_id') widget_id = mobj.group('widget_id')
article_id = mobj.group('id') article_id = mobj.group('id')
return self._extract_video(
info = self._download_json(
'http://www.svt.se/wd?widgetId=%s&articleId=%s&format=json&type=embed&output=json' % (widget_id, article_id), 'http://www.svt.se/wd?widgetId=%s&articleId=%s&format=json&type=embed&output=json' % (widget_id, article_id),
article_id) article_id)
info_dict = self._extract_video(info['video'], article_id)
info_dict['title'] = info['context']['title']
return info_dict
class SVTPlayIE(SVTBaseIE): class SVTPlayIE(SVTBaseIE):
IE_DESC = 'SVT Play and Öppet arkiv' IE_DESC = 'SVT Play and Öppet arkiv'
_VALID_URL = r'https?://(?:www\.)?(?P<host>svtplay|oppetarkiv)\.se/video/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/video/(?P<id>[0-9]+)'
_TEST = { _TESTS = [{
'url': 'http://www.svtplay.se/video/5996901/flygplan-till-haile-selassie/flygplan-till-haile-selassie-2', 'url': 'http://www.svtplay.se/video/5996901/flygplan-till-haile-selassie/flygplan-till-haile-selassie-2',
'md5': '2b6704fe4a28801e1a098bbf3c5ac611', 'md5': '2b6704fe4a28801e1a098bbf3c5ac611',
'info_dict': { 'info_dict': {
@ -113,12 +137,47 @@ class SVTPlayIE(SVTBaseIE):
}] }]
}, },
}, },
} }, {
# geo restricted to Sweden
'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
host = mobj.group('host') webpage = self._download_webpage(url, video_id)
return self._extract_video(
'http://www.%s.se/video/%s?output=json' % (host, video_id), data = self._parse_json(
video_id) self._search_regex(
r'root\["__svtplay"\]\s*=\s*([^;]+);',
webpage, 'embedded data', default='{}'),
video_id, fatal=False)
thumbnail = self._og_search_thumbnail(webpage)
if data:
video_info = try_get(
data, lambda x: x['context']['dispatcher']['stores']['VideoTitlePageStore']['data']['video'],
dict)
if video_info:
info_dict = self._extract_video(video_info, video_id)
info_dict.update({
'title': data['context']['dispatcher']['stores']['MetaStore']['title'],
'thumbnail': thumbnail,
})
return info_dict
video_id = self._search_regex(
r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
webpage, 'video id', default=None)
if video_id:
data = self._download_json(
'http://www.svt.se/videoplayer-api/video/%s' % video_id, video_id)
info_dict = self._extract_video(data, video_id)
if not info_dict.get('title'):
info_dict['title'] = re.sub(
r'\s*\|\s*.+?$', '',
info_dict.get('episode') or self._og_search_title(webpage))
return info_dict

View File

@ -0,0 +1,55 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class TelewebionIE(InfoExtractor):
_VALID_URL = r'https?://www\.telewebion\.com/#!/episode/(?P<id>\d+)'
_TEST = {
'url': 'http://www.telewebion.com/#!/episode/1263668/',
'info_dict': {
'id': '1263668',
'ext': 'mp4',
'title': 'قرعه\u200cکشی لیگ قهرمانان اروپا',
'thumbnail': 're:^https?://.*\.jpg',
'view_count': int,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
secure_token = self._download_webpage(
'http://m.s2.telewebion.com/op/op?action=getSecurityToken', video_id)
episode_details = self._download_json(
'http://m.s2.telewebion.com/op/op', video_id,
query={'action': 'getEpisodeDetails', 'episode_id': video_id})
m3u8_url = 'http://m.s1.telewebion.com/smil/%s.m3u8?filepath=%s&m3u8=1&secure_token=%s' % (
video_id, episode_details['file_path'], secure_token)
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', m3u8_id='hls')
picture_paths = [
episode_details.get('picture_path'),
episode_details.get('large_picture_path'),
]
thumbnails = [{
'url': picture_path,
'preference': idx,
} for idx, picture_path in enumerate(picture_paths) if picture_path is not None]
return {
'id': video_id,
'title': episode_details['title'],
'formats': formats,
'thumbnails': thumbnails,
'view_count': episode_details.get('view_count'),
}

View File

@ -277,9 +277,9 @@ class ThePlatformIE(ThePlatformBaseIE):
class ThePlatformFeedIE(ThePlatformBaseIE): class ThePlatformFeedIE(ThePlatformBaseIE):
_URL_TEMPLATE = '%s//feed.theplatform.com/f/%s/%s?form=json&byGuid=%s' _URL_TEMPLATE = '%s//feed.theplatform.com/f/%s/%s?form=json&%s'
_VALID_URL = r'https?://feed\.theplatform\.com/f/(?P<provider_id>[^/]+)/(?P<feed_id>[^?/]+)\?(?:[^&]+&)*byGuid=(?P<id>[a-zA-Z0-9_]+)' _VALID_URL = r'https?://feed\.theplatform\.com/f/(?P<provider_id>[^/]+)/(?P<feed_id>[^?/]+)\?(?:[^&]+&)*(?P<filter>by(?:Gui|I)d=(?P<id>[\w-]+))'
_TEST = { _TESTS = [{
# From http://player.theplatform.com/p/7wvmTC/MSNBCEmbeddedOffSite?guid=n_hardball_5biden_140207 # From http://player.theplatform.com/p/7wvmTC/MSNBCEmbeddedOffSite?guid=n_hardball_5biden_140207
'url': 'http://feed.theplatform.com/f/7wvmTC/msnbc_video-p-test?form=json&pretty=true&range=-40&byGuid=n_hardball_5biden_140207', 'url': 'http://feed.theplatform.com/f/7wvmTC/msnbc_video-p-test?form=json&pretty=true&range=-40&byGuid=n_hardball_5biden_140207',
'md5': '6e32495b5073ab414471b615c5ded394', 'md5': '6e32495b5073ab414471b615c5ded394',
@ -295,32 +295,38 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
'categories': ['MSNBC/Issues/Democrats', 'MSNBC/Issues/Elections/Election 2016'], 'categories': ['MSNBC/Issues/Democrats', 'MSNBC/Issues/Elections/Election 2016'],
'uploader': 'NBCU-NEWS', 'uploader': 'NBCU-NEWS',
}, },
} }]
def _real_extract(self, url): def _extract_feed_info(self, provider_id, feed_id, filter_query, video_id, custom_fields=None, asset_types_query={}):
mobj = re.match(self._VALID_URL, url) real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, filter_query)
entry = self._download_json(real_url, video_id)['entries'][0]
video_id = mobj.group('id')
provider_id = mobj.group('provider_id')
feed_id = mobj.group('feed_id')
real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, video_id)
feed = self._download_json(real_url, video_id)
entry = feed['entries'][0]
formats = [] formats = []
subtitles = {} subtitles = {}
first_video_id = None first_video_id = None
duration = None duration = None
asset_types = []
for item in entry['media$content']: for item in entry['media$content']:
smil_url = item['plfile$url'] + '&mbr=true' smil_url = item['plfile$url']
cur_video_id = ThePlatformIE._match_id(smil_url) cur_video_id = ThePlatformIE._match_id(smil_url)
if first_video_id is None: if first_video_id is None:
first_video_id = cur_video_id first_video_id = cur_video_id
duration = float_or_none(item.get('plfile$duration')) duration = float_or_none(item.get('plfile$duration'))
cur_formats, cur_subtitles = self._extract_theplatform_smil(smil_url, video_id, 'Downloading SMIL data for %s' % cur_video_id) for asset_type in item['plfile$assetTypes']:
formats.extend(cur_formats) if asset_type in asset_types:
subtitles = self._merge_subtitles(subtitles, cur_subtitles) continue
asset_types.append(asset_type)
query = {
'mbr': 'true',
'formats': item['plfile$format'],
'assetTypes': asset_type,
}
if asset_type in asset_types_query:
query.update(asset_types_query[asset_type])
cur_formats, cur_subtitles = self._extract_theplatform_smil(update_url_query(
smil_url, query), video_id, 'Downloading SMIL data for %s' % asset_type)
formats.extend(cur_formats)
subtitles = self._merge_subtitles(subtitles, cur_subtitles)
self._sort_formats(formats) self._sort_formats(formats)
@ -344,5 +350,17 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
'timestamp': timestamp, 'timestamp': timestamp,
'categories': categories, 'categories': categories,
}) })
if custom_fields:
ret.update(custom_fields(entry))
return ret return ret
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
provider_id = mobj.group('provider_id')
feed_id = mobj.group('feed_id')
filter_query = mobj.group('filter')
return self._extract_feed_info(provider_id, feed_id, filter_query, video_id)

View File

@ -16,6 +16,7 @@ from ..compat import (
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
js_to_json,
orderedSet, orderedSet,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
@ -454,3 +455,45 @@ class TwitchStreamIE(TwitchBaseIE):
'formats': formats, 'formats': formats,
'is_live': True, 'is_live': True,
} }
class TwitchClipsIE(InfoExtractor):
IE_NAME = 'twitch:clips'
_VALID_URL = r'https?://clips\.twitch\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://clips.twitch.tv/ea/AggressiveCobraPoooound',
'md5': '761769e1eafce0ffebfb4089cb3847cd',
'info_dict': {
'id': 'AggressiveCobraPoooound',
'ext': 'mp4',
'title': 'EA Play 2016 Live from the Novo Theatre',
'thumbnail': 're:^https?://.*\.jpg',
'creator': 'EA',
'uploader': 'stereotype_',
'uploader_id': 'stereotype_',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
clip = self._parse_json(
self._search_regex(
r'(?s)clipInfo\s*=\s*({.+?});', webpage, 'clip info'),
video_id, transform_source=js_to_json)
video_url = clip['clip_video_url']
title = clip['channel_title']
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'creator': clip.get('broadcaster_display_name') or clip.get('broadcaster_login'),
'uploader': clip.get('curator_login'),
'uploader_id': clip.get('curator_display_name'),
}

View File

@ -101,10 +101,13 @@ class VikiBaseIE(InfoExtractor):
self.report_warning('Unable to get session token, login has probably failed') self.report_warning('Unable to get session token, login has probably failed')
@staticmethod @staticmethod
def dict_selection(dict_obj, preferred_key): def dict_selection(dict_obj, preferred_key, allow_fallback=True):
if preferred_key in dict_obj: if preferred_key in dict_obj:
return dict_obj.get(preferred_key) return dict_obj.get(preferred_key)
if not allow_fallback:
return
filtered_dict = list(filter(None, [dict_obj.get(k) for k in dict_obj.keys()])) filtered_dict = list(filter(None, [dict_obj.get(k) for k in dict_obj.keys()]))
return filtered_dict[0] if filtered_dict else None return filtered_dict[0] if filtered_dict else None
@ -127,7 +130,7 @@ class VikiIE(VikiBaseIE):
}, { }, {
# clip # clip
'url': 'http://www.viki.com/videos/1067139v-the-avengers-age-of-ultron-press-conference', 'url': 'http://www.viki.com/videos/1067139v-the-avengers-age-of-ultron-press-conference',
'md5': '86c0b5dbd4d83a6611a79987cc7a1989', 'md5': 'feea2b1d7b3957f70886e6dfd8b8be84',
'info_dict': { 'info_dict': {
'id': '1067139v', 'id': '1067139v',
'ext': 'mp4', 'ext': 'mp4',
@ -156,17 +159,18 @@ class VikiIE(VikiBaseIE):
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} },
'skip': 'Blocked in the US',
}, { }, {
# episode # episode
'url': 'http://www.viki.com/videos/44699v-boys-over-flowers-episode-1', 'url': 'http://www.viki.com/videos/44699v-boys-over-flowers-episode-1',
'md5': '190f3ef426005ba3a080a63325955bc3', 'md5': '1f54697dabc8f13f31bf06bb2e4de6db',
'info_dict': { 'info_dict': {
'id': '44699v', 'id': '44699v',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Boys Over Flowers - Episode 1', 'title': 'Boys Over Flowers - Episode 1',
'description': 'md5:52617e4f729c7d03bfd4bcbbb6e946f2', 'description': 'md5:b89cf50038b480b88b5b3c93589a9076',
'duration': 4155, 'duration': 4204,
'timestamp': 1270496524, 'timestamp': 1270496524,
'upload_date': '20100405', 'upload_date': '20100405',
'uploader': 'group8', 'uploader': 'group8',
@ -196,7 +200,7 @@ class VikiIE(VikiBaseIE):
}, { }, {
# non-English description # non-English description
'url': 'http://www.viki.com/videos/158036v-love-in-magic', 'url': 'http://www.viki.com/videos/158036v-love-in-magic',
'md5': '1713ae35df5a521b31f6dc40730e7c9c', 'md5': '013dc282714e22acf9447cad14ff1208',
'info_dict': { 'info_dict': {
'id': '158036v', 'id': '158036v',
'ext': 'mp4', 'ext': 'mp4',
@ -217,7 +221,7 @@ class VikiIE(VikiBaseIE):
self._check_errors(video) self._check_errors(video)
title = self.dict_selection(video.get('titles', {}), 'en') title = self.dict_selection(video.get('titles', {}), 'en', allow_fallback=False)
if not title: if not title:
title = 'Episode %d' % video.get('number') if video.get('type') == 'episode' else video.get('id') or video_id title = 'Episode %d' % video.get('number') if video.get('type') == 'episode' else video.get('id') or video_id
container_titles = video.get('container', {}).get('titles', {}) container_titles = video.get('container', {}).get('titles', {})
@ -302,7 +306,7 @@ class VikiChannelIE(VikiBaseIE):
'title': 'Boys Over Flowers', 'title': 'Boys Over Flowers',
'description': 'md5:ecd3cff47967fe193cff37c0bec52790', 'description': 'md5:ecd3cff47967fe193cff37c0bec52790',
}, },
'playlist_count': 70, 'playlist_mincount': 71,
}, { }, {
'url': 'http://www.viki.com/tv/1354c-poor-nastya-complete', 'url': 'http://www.viki.com/tv/1354c-poor-nastya-complete',
'info_dict': { 'info_dict': {

View File

@ -8,6 +8,7 @@ import itertools
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_str,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
@ -24,6 +25,7 @@ from ..utils import (
urlencode_postdata, urlencode_postdata,
unescapeHTML, unescapeHTML,
parse_filesize, parse_filesize,
try_get,
) )
@ -66,6 +68,69 @@ class VimeoBaseInfoExtractor(InfoExtractor):
def _set_vimeo_cookie(self, name, value): def _set_vimeo_cookie(self, name, value):
self._set_cookie('vimeo.com', name, value) self._set_cookie('vimeo.com', name, value)
def _vimeo_sort_formats(self, formats):
# Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
# at the same time without actual units specified. This lead to wrong sorting.
self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'format_id'))
def _parse_config(self, config, video_id):
# Extract title
video_title = config['video']['title']
# Extract uploader, uploader_url and uploader_id
video_uploader = config['video'].get('owner', {}).get('name')
video_uploader_url = config['video'].get('owner', {}).get('url')
video_uploader_id = video_uploader_url.split('/')[-1] if video_uploader_url else None
# Extract video thumbnail
video_thumbnail = config['video'].get('thumbnail')
if video_thumbnail is None:
video_thumbs = config['video'].get('thumbs')
if video_thumbs and isinstance(video_thumbs, dict):
_, video_thumbnail = sorted((int(width if width.isdigit() else 0), t_url) for (width, t_url) in video_thumbs.items())[-1]
# Extract video duration
video_duration = int_or_none(config['video'].get('duration'))
formats = []
config_files = config['video'].get('files') or config['request'].get('files', {})
for f in config_files.get('progressive', []):
video_url = f.get('url')
if not video_url:
continue
formats.append({
'url': video_url,
'format_id': 'http-%s' % f.get('quality'),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'fps': int_or_none(f.get('fps')),
'tbr': int_or_none(f.get('bitrate')),
})
m3u8_url = config_files.get('hls', {}).get('url')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
subtitles = {}
text_tracks = config['request'].get('text_tracks')
if text_tracks:
for tt in text_tracks:
subtitles[tt['lang']] = [{
'ext': 'vtt',
'url': 'https://vimeo.com' + tt['url'],
}]
return {
'title': video_title,
'uploader': video_uploader,
'uploader_id': video_uploader_id,
'uploader_url': video_uploader_url,
'thumbnail': video_thumbnail,
'duration': video_duration,
'formats': formats,
'subtitles': subtitles,
}
class VimeoIE(VimeoBaseInfoExtractor): class VimeoIE(VimeoBaseInfoExtractor):
"""Information extractor for vimeo.com.""" """Information extractor for vimeo.com."""
@ -81,7 +146,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
\. \.
)? )?
vimeo(?P<pro>pro)?\.com/ vimeo(?P<pro>pro)?\.com/
(?!channels/[^/?#]+/?(?:$|[?#])|[^/]+/review/|(?:album|ondemand)/) (?!(?:channels|album)/[^/?#]+/?(?:$|[?#])|[^/]+/review/|ondemand/)
(?:.*?/)? (?:.*?/)?
(?: (?:
(?: (?:
@ -153,7 +218,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_id': 'user18948128', 'uploader_id': 'user18948128',
'uploader': 'Jaime Marquínez Ferrándiz', 'uploader': 'Jaime Marquínez Ferrándiz',
'duration': 10, 'duration': 10,
'description': 'This is "youtube-dl password protected test video" by Jaime Marquínez Ferrándiz on Vimeo, the home for high quality videos and the people\u2026', 'description': 'This is "youtube-dl password protected test video" by on Vimeo, the home for high quality videos and the people who love them.',
}, },
'params': { 'params': {
'videopassword': 'youtube-dl', 'videopassword': 'youtube-dl',
@ -162,8 +227,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
{ {
'url': 'http://vimeo.com/channels/keypeele/75629013', 'url': 'http://vimeo.com/channels/keypeele/75629013',
'md5': '2f86a05afe9d7abc0b9126d229bbe15d', 'md5': '2f86a05afe9d7abc0b9126d229bbe15d',
'note': 'Video is freely available via original URL '
'and protected with password when accessed via http://vimeo.com/75629013',
'info_dict': { 'info_dict': {
'id': '75629013', 'id': '75629013',
'ext': 'mp4', 'ext': 'mp4',
@ -207,7 +270,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
{ {
# contains original format # contains original format
'url': 'https://vimeo.com/33951933', 'url': 'https://vimeo.com/33951933',
'md5': '53c688fa95a55bf4b7293d37a89c5c53', 'md5': '2d9f5475e0537f013d0073e812ab89e6',
'info_dict': { 'info_dict': {
'id': '33951933', 'id': '33951933',
'ext': 'mp4', 'ext': 'mp4',
@ -219,6 +282,29 @@ class VimeoIE(VimeoBaseInfoExtractor):
'description': 'md5:ae23671e82d05415868f7ad1aec21147', 'description': 'md5:ae23671e82d05415868f7ad1aec21147',
}, },
}, },
{
# only available via https://vimeo.com/channels/tributes/6213729 and
# not via https://vimeo.com/6213729
'url': 'https://vimeo.com/channels/tributes/6213729',
'info_dict': {
'id': '6213729',
'ext': 'mp4',
'title': 'Vimeo Tribute: The Shining',
'uploader': 'Casey Donahue',
'uploader_url': 're:https?://(?:www\.)?vimeo\.com/caseydonahue',
'uploader_id': 'caseydonahue',
'upload_date': '20090821',
'description': 'md5:bdbf314014e58713e6e5b66eb252f4a6',
},
'params': {
'skip_download': True,
},
'expected_warnings': ['Unable to download JSON metadata'],
},
{
'url': 'http://vimeo.com/moogaloop.swf?clip_id=2539741',
'only_matching': True,
},
{ {
'url': 'https://vimeo.com/109815029', 'url': 'https://vimeo.com/109815029',
'note': 'Video not completely processed, "failed" seed status', 'note': 'Video not completely processed, "failed" seed status',
@ -228,6 +314,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
'url': 'https://vimeo.com/groups/travelhd/videos/22439234', 'url': 'https://vimeo.com/groups/travelhd/videos/22439234',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'https://vimeo.com/album/2632481/video/79010983',
'only_matching': True,
},
{ {
# source file returns 403: Forbidden # source file returns 403: Forbidden
'url': 'https://vimeo.com/7809605', 'url': 'https://vimeo.com/7809605',
@ -304,7 +394,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
orig_url = url orig_url = url
if mobj.group('pro') or mobj.group('player'): if mobj.group('pro') or mobj.group('player'):
url = 'https://player.vimeo.com/video/' + video_id url = 'https://player.vimeo.com/video/' + video_id
else: elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')):
url = 'https://vimeo.com/' + video_id url = 'https://vimeo.com/' + video_id
# Retrieve video webpage to extract further information # Retrieve video webpage to extract further information
@ -382,28 +472,24 @@ class VimeoIE(VimeoBaseInfoExtractor):
if config.get('view') == 4: if config.get('view') == 4:
config = self._verify_player_video_password(url, video_id) config = self._verify_player_video_password(url, video_id)
if '>You rented this title.<' in webpage: def is_rented():
if '>You rented this title.<' in webpage:
return True
if config.get('user', {}).get('purchased'):
return True
label = try_get(
config, lambda x: x['video']['vod']['purchase_options'][0]['label_string'], compat_str)
if label and label.startswith('You rented this'):
return True
return False
if is_rented():
feature_id = config.get('video', {}).get('vod', {}).get('feature_id') feature_id = config.get('video', {}).get('vod', {}).get('feature_id')
if feature_id and not data.get('force_feature_id', False): if feature_id and not data.get('force_feature_id', False):
return self.url_result(smuggle_url( return self.url_result(smuggle_url(
'https://player.vimeo.com/player/%s' % feature_id, 'https://player.vimeo.com/player/%s' % feature_id,
{'force_feature_id': True}), 'Vimeo') {'force_feature_id': True}), 'Vimeo')
# Extract title
video_title = config['video']['title']
# Extract uploader, uploader_url and uploader_id
video_uploader = config['video'].get('owner', {}).get('name')
video_uploader_url = config['video'].get('owner', {}).get('url')
video_uploader_id = video_uploader_url.split('/')[-1] if video_uploader_url else None
# Extract video thumbnail
video_thumbnail = config['video'].get('thumbnail')
if video_thumbnail is None:
video_thumbs = config['video'].get('thumbs')
if video_thumbs and isinstance(video_thumbs, dict):
_, video_thumbnail = sorted((int(width if width.isdigit() else 0), t_url) for (width, t_url) in video_thumbs.items())[-1]
# Extract video description # Extract video description
video_description = self._html_search_regex( video_description = self._html_search_regex(
@ -423,9 +509,6 @@ class VimeoIE(VimeoBaseInfoExtractor):
if not video_description and not mobj.group('player'): if not video_description and not mobj.group('player'):
self._downloader.report_warning('Cannot find video description') self._downloader.report_warning('Cannot find video description')
# Extract video duration
video_duration = int_or_none(config['video'].get('duration'))
# Extract upload date # Extract upload date
video_upload_date = None video_upload_date = None
mobj = re.search(r'<time[^>]+datetime="([^"]+)"', webpage) mobj = re.search(r'<time[^>]+datetime="([^"]+)"', webpage)
@ -463,53 +546,22 @@ class VimeoIE(VimeoBaseInfoExtractor):
'format_id': source_name, 'format_id': source_name,
'preference': 1, 'preference': 1,
}) })
config_files = config['video'].get('files') or config['request'].get('files', {})
for f in config_files.get('progressive', []):
video_url = f.get('url')
if not video_url:
continue
formats.append({
'url': video_url,
'format_id': 'http-%s' % f.get('quality'),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'fps': int_or_none(f.get('fps')),
'tbr': int_or_none(f.get('bitrate')),
})
m3u8_url = config_files.get('hls', {}).get('url')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
# Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
# at the same time without actual units specified. This lead to wrong sorting.
self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'format_id'))
subtitles = {} info_dict = self._parse_config(config, video_id)
text_tracks = config['request'].get('text_tracks') formats.extend(info_dict['formats'])
if text_tracks: self._vimeo_sort_formats(formats)
for tt in text_tracks: info_dict.update({
subtitles[tt['lang']] = [{
'ext': 'vtt',
'url': 'https://vimeo.com' + tt['url'],
}]
return {
'id': video_id, 'id': video_id,
'uploader': video_uploader,
'uploader_url': video_uploader_url,
'uploader_id': video_uploader_id,
'upload_date': video_upload_date,
'title': video_title,
'thumbnail': video_thumbnail,
'description': video_description,
'duration': video_duration,
'formats': formats, 'formats': formats,
'upload_date': video_upload_date,
'description': video_description,
'webpage_url': url, 'webpage_url': url,
'view_count': view_count, 'view_count': view_count,
'like_count': like_count, 'like_count': like_count,
'comment_count': comment_count, 'comment_count': comment_count,
'subtitles': subtitles, })
}
return info_dict
class VimeoOndemandIE(VimeoBaseInfoExtractor): class VimeoOndemandIE(VimeoBaseInfoExtractor):
@ -603,8 +655,21 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
webpage = self._login_list_password(page_url, list_id, webpage) webpage = self._login_list_password(page_url, list_id, webpage)
yield self._extract_list_title(webpage) yield self._extract_list_title(webpage)
for video_id in re.findall(r'id="clip_(\d+?)"', webpage): # Try extracting href first since not all videos are available via
yield self.url_result('https://vimeo.com/%s' % video_id, 'Vimeo') # short https://vimeo.com/id URL (e.g. https://vimeo.com/channels/tributes/6213729)
clips = re.findall(
r'id="clip_(\d+)"[^>]*>\s*<a[^>]+href="(/(?:[^/]+/)*\1)', webpage)
if clips:
for video_id, video_url in clips:
yield self.url_result(
compat_urlparse.urljoin(base_url, video_url),
VimeoIE.ie_key(), video_id=video_id)
# More relaxed fallback
else:
for video_id in re.findall(r'id=["\']clip_(\d+)', webpage):
yield self.url_result(
'https://vimeo.com/%s' % video_id,
VimeoIE.ie_key(), video_id=video_id)
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None: if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
break break
@ -641,7 +706,7 @@ class VimeoUserIE(VimeoChannelIE):
class VimeoAlbumIE(VimeoChannelIE): class VimeoAlbumIE(VimeoChannelIE):
IE_NAME = 'vimeo:album' IE_NAME = 'vimeo:album'
_VALID_URL = r'https://vimeo\.com/album/(?P<id>\d+)' _VALID_URL = r'https://vimeo\.com/album/(?P<id>\d+)(?:$|[?#]|/(?!video))'
_TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>' _TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
_TESTS = [{ _TESTS = [{
'url': 'https://vimeo.com/album/2632481', 'url': 'https://vimeo.com/album/2632481',
@ -661,6 +726,13 @@ class VimeoAlbumIE(VimeoChannelIE):
'params': { 'params': {
'videopassword': 'youtube-dl', 'videopassword': 'youtube-dl',
} }
}, {
'url': 'https://vimeo.com/album/2632481/sort:plays/format:thumbnail',
'only_matching': True,
}, {
# TODO: respect page number
'url': 'https://vimeo.com/album/2632481/page:2/sort:plays/format:thumbnail',
'only_matching': True,
}] }]
def _page_url(self, base_url, pagenum): def _page_url(self, base_url, pagenum):
@ -692,7 +764,7 @@ class VimeoGroupsIE(VimeoAlbumIE):
return self._extract_videos(name, 'https://vimeo.com/groups/%s' % name) return self._extract_videos(name, 'https://vimeo.com/groups/%s' % name)
class VimeoReviewIE(InfoExtractor): class VimeoReviewIE(VimeoBaseInfoExtractor):
IE_NAME = 'vimeo:review' IE_NAME = 'vimeo:review'
IE_DESC = 'Review pages on vimeo' IE_DESC = 'Review pages on vimeo'
_VALID_URL = r'https://vimeo\.com/[^/]+/review/(?P<id>[^/]+)' _VALID_URL = r'https://vimeo\.com/[^/]+/review/(?P<id>[^/]+)'
@ -704,6 +776,7 @@ class VimeoReviewIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': "DICK HARDWICK 'Comedian'", 'title': "DICK HARDWICK 'Comedian'",
'uploader': 'Richard Hardwick', 'uploader': 'Richard Hardwick',
'uploader_id': 'user21297594',
} }
}, { }, {
'note': 'video player needs Referer', 'note': 'video player needs Referer',
@ -716,14 +789,18 @@ class VimeoReviewIE(InfoExtractor):
'uploader': 'DevWeek Events', 'uploader': 'DevWeek Events',
'duration': 2773, 'duration': 2773,
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader_id': 'user22258446',
} }
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id') config = self._download_json(
player_url = 'https://player.vimeo.com/player/' + video_id 'https://player.vimeo.com/video/%s/config' % video_id, video_id)
return self.url_result(player_url, 'Vimeo', video_id) info_dict = self._parse_config(config, video_id)
self._vimeo_sort_formats(info_dict['formats'])
info_dict['id'] = video_id
return info_dict
class VimeoWatchLaterIE(VimeoChannelIE): class VimeoWatchLaterIE(VimeoChannelIE):

View File

@ -24,6 +24,7 @@ class VineIE(InfoExtractor):
'upload_date': '20130519', 'upload_date': '20130519',
'uploader': 'Jack Dorsey', 'uploader': 'Jack Dorsey',
'uploader_id': '76', 'uploader_id': '76',
'view_count': int,
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
@ -39,6 +40,7 @@ class VineIE(InfoExtractor):
'upload_date': '20140815', 'upload_date': '20140815',
'uploader': 'Mars Ruiz', 'uploader': 'Mars Ruiz',
'uploader_id': '1102363502380728320', 'uploader_id': '1102363502380728320',
'view_count': int,
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
@ -54,6 +56,7 @@ class VineIE(InfoExtractor):
'upload_date': '20130430', 'upload_date': '20130430',
'uploader': 'Z3k3', 'uploader': 'Z3k3',
'uploader_id': '936470460173008896', 'uploader_id': '936470460173008896',
'view_count': int,
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
@ -71,6 +74,7 @@ class VineIE(InfoExtractor):
'upload_date': '20150705', 'upload_date': '20150705',
'uploader': 'Pimry_zaa', 'uploader': 'Pimry_zaa',
'uploader_id': '1135760698325307392', 'uploader_id': '1135760698325307392',
'view_count': int,
'like_count': int, 'like_count': int,
'comment_count': int, 'comment_count': int,
'repost_count': int, 'repost_count': int,
@ -109,6 +113,7 @@ class VineIE(InfoExtractor):
'upload_date': unified_strdate(data.get('created')), 'upload_date': unified_strdate(data.get('created')),
'uploader': username, 'uploader': username,
'uploader_id': data.get('userIdStr'), 'uploader_id': data.get('userIdStr'),
'view_count': int_or_none(data.get('loops', {}).get('count')),
'like_count': int_or_none(data.get('likes', {}).get('count')), 'like_count': int_or_none(data.get('likes', {}).get('count')),
'comment_count': int_or_none(data.get('comments', {}).get('count')), 'comment_count': int_or_none(data.get('comments', {}).get('count')),
'repost_count': int_or_none(data.get('reposts', {}).get('count')), 'repost_count': int_or_none(data.get('reposts', {}).get('count')),

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
import re import re
import json import json
import sys
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
@ -10,7 +11,6 @@ from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
orderedSet, orderedSet,
sanitized_Request,
str_to_int, str_to_int,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
@ -190,7 +190,7 @@ class VKIE(InfoExtractor):
if username is None: if username is None:
return return
login_page = self._download_webpage( login_page, url_handle = self._download_webpage_handle(
'https://vk.com', None, 'Downloading login page') 'https://vk.com', None, 'Downloading login page')
login_form = self._hidden_inputs(login_page) login_form = self._hidden_inputs(login_page)
@ -200,11 +200,26 @@ class VKIE(InfoExtractor):
'pass': password.encode('cp1251'), 'pass': password.encode('cp1251'),
}) })
request = sanitized_Request( # https://new.vk.com/ serves two same remixlhk cookies in Set-Cookie header
'https://login.vk.com/?act=login', # and expects the first one to be set rather than second (see
urlencode_postdata(login_form)) # https://github.com/rg3/youtube-dl/issues/9841#issuecomment-227871201).
# As of RFC6265 the newer one cookie should be set into cookie store
# what actually happens.
# We will workaround this VK issue by resetting the remixlhk cookie to
# the first one manually.
cookies = url_handle.headers.get('Set-Cookie')
if sys.version_info[0] >= 3:
cookies = cookies.encode('iso-8859-1')
cookies = cookies.decode('utf-8')
remixlhk = re.search(r'remixlhk=(.+?);.*?\bdomain=(.+?)(?:[,;]|$)', cookies)
if remixlhk:
value, domain = remixlhk.groups()
self._set_cookie(domain, 'remixlhk', value)
login_page = self._download_webpage( login_page = self._download_webpage(
request, None, note='Logging in as %s' % username) 'https://login.vk.com/?act=login', None,
note='Logging in as %s' % username,
data=urlencode_postdata(login_form))
if re.search(r'onLoginFailed', login_page): if re.search(r'onLoginFailed', login_page):
raise ExtractorError( raise ExtractorError(

View File

@ -15,7 +15,87 @@ from ..utils import (
) )
class WDRIE(InfoExtractor): class WDRBaseIE(InfoExtractor):
def _extract_wdr_video(self, webpage, display_id):
# for wdr.de the data-extension is in a tag with the class "mediaLink"
# for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
# for wdrmaus its in a link to the page in a multiline "videoLink"-tag
json_metadata = self._html_search_regex(
r'class=(?:"(?:mediaLink|wdrrPlayerPlayBtn)\b[^"]*"[^>]+|"videoLink\b[^"]*"[\s]*>\n[^\n]*)data-extension="([^"]+)"',
webpage, 'media link', default=None, flags=re.MULTILINE)
if not json_metadata:
return
media_link_obj = self._parse_json(json_metadata, display_id,
transform_source=js_to_json)
jsonp_url = media_link_obj['mediaObj']['url']
metadata = self._download_json(
jsonp_url, 'metadata', transform_source=strip_jsonp)
metadata_tracker_data = metadata['trackerData']
metadata_media_resource = metadata['mediaResource']
formats = []
# check if the metadata contains a direct URL to a file
for kind, media_resource in metadata_media_resource.items():
if kind not in ('dflt', 'alt'):
continue
for tag_name, medium_url in media_resource.items():
if tag_name not in ('videoURL', 'audioURL'):
continue
ext = determine_ext(medium_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
medium_url, display_id, 'mp4', 'm3u8_native',
m3u8_id='hls'))
elif ext == 'f4m':
manifest_url = update_url_query(
medium_url, {'hdcore': '3.2.0', 'plugin': 'aasp-3.2.0.77.18'})
formats.extend(self._extract_f4m_formats(
manifest_url, display_id, f4m_id='hds', fatal=False))
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
medium_url, 'stream', fatal=False))
else:
a_format = {
'url': medium_url
}
if ext == 'unknown_video':
urlh = self._request_webpage(
medium_url, display_id, note='Determining extension')
ext = urlhandle_detect_ext(urlh)
a_format['ext'] = ext
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
caption_url = metadata_media_resource.get('captionURL')
if caption_url:
subtitles['de'] = [{
'url': caption_url,
'ext': 'ttml',
}]
title = metadata_tracker_data['trackerClipTitle']
return {
'id': metadata_tracker_data.get('trackerClipId', display_id),
'display_id': display_id,
'title': title,
'alt_title': metadata_tracker_data.get('trackerClipSubcategory'),
'formats': formats,
'subtitles': subtitles,
'upload_date': unified_strdate(metadata_tracker_data.get('trackerClipAirTime')),
}
class WDRIE(WDRBaseIE):
_CURRENT_MAUS_URL = r'https?://(?:www\.)wdrmaus.de/(?:[^/]+/){1,2}[^/?#]+\.php5' _CURRENT_MAUS_URL = r'https?://(?:www\.)wdrmaus.de/(?:[^/]+/){1,2}[^/?#]+\.php5'
_PAGE_REGEX = r'/(?:mediathek/)?[^/]+/(?P<type>[^/]+)/(?P<display_id>.+)\.html' _PAGE_REGEX = r'/(?:mediathek/)?[^/]+/(?P<type>[^/]+)/(?P<display_id>.+)\.html'
_VALID_URL = r'(?P<page_url>https?://(?:www\d\.)?wdr\d?\.de)' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL _VALID_URL = r'(?P<page_url>https?://(?:www\d\.)?wdr\d?\.de)' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL
@ -91,10 +171,10 @@ class WDRIE(InfoExtractor):
}, },
{ {
'url': 'http://www.wdrmaus.de/sachgeschichten/sachgeschichten/achterbahn.php5', 'url': 'http://www.wdrmaus.de/sachgeschichten/sachgeschichten/achterbahn.php5',
# HDS download, MD5 is unstable 'md5': '803138901f6368ee497b4d195bb164f2',
'info_dict': { 'info_dict': {
'id': 'mdb-186083', 'id': 'mdb-186083',
'ext': 'flv', 'ext': 'mp4',
'upload_date': '20130919', 'upload_date': '20130919',
'title': 'Sachgeschichte - Achterbahn ', 'title': 'Sachgeschichte - Achterbahn ',
'description': '- Die Sendung mit der Maus -', 'description': '- Die Sendung mit der Maus -',
@ -120,14 +200,9 @@ class WDRIE(InfoExtractor):
display_id = mobj.group('display_id') display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
# for wdr.de the data-extension is in a tag with the class "mediaLink" info_dict = self._extract_wdr_video(webpage, display_id)
# for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
# for wdrmaus its in a link to the page in a multiline "videoLink"-tag
json_metadata = self._html_search_regex(
r'class=(?:"(?:mediaLink|wdrrPlayerPlayBtn)\b[^"]*"[^>]+|"videoLink\b[^"]*"[\s]*>\n[^\n]*)data-extension="([^"]+)"',
webpage, 'media link', default=None, flags=re.MULTILINE)
if not json_metadata: if not info_dict:
entries = [ entries = [
self.url_result(page_url + href[0], 'WDR') self.url_result(page_url + href[0], 'WDR')
for href in re.findall( for href in re.findall(
@ -140,86 +215,22 @@ class WDRIE(InfoExtractor):
raise ExtractorError('No downloadable streams found', expected=True) raise ExtractorError('No downloadable streams found', expected=True)
media_link_obj = self._parse_json(json_metadata, display_id,
transform_source=js_to_json)
jsonp_url = media_link_obj['mediaObj']['url']
metadata = self._download_json(
jsonp_url, 'metadata', transform_source=strip_jsonp)
metadata_tracker_data = metadata['trackerData']
metadata_media_resource = metadata['mediaResource']
formats = []
# check if the metadata contains a direct URL to a file
for kind, media_resource in metadata_media_resource.items():
if kind not in ('dflt', 'alt'):
continue
for tag_name, medium_url in media_resource.items():
if tag_name not in ('videoURL', 'audioURL'):
continue
ext = determine_ext(medium_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
medium_url, display_id, 'mp4', 'm3u8_native',
m3u8_id='hls'))
elif ext == 'f4m':
manifest_url = update_url_query(
medium_url, {'hdcore': '3.2.0', 'plugin': 'aasp-3.2.0.77.18'})
formats.extend(self._extract_f4m_formats(
manifest_url, display_id, f4m_id='hds', fatal=False))
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
medium_url, 'stream', fatal=False))
else:
a_format = {
'url': medium_url
}
if ext == 'unknown_video':
urlh = self._request_webpage(
medium_url, display_id, note='Determining extension')
ext = urlhandle_detect_ext(urlh)
a_format['ext'] = ext
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
caption_url = metadata_media_resource.get('captionURL')
if caption_url:
subtitles['de'] = [{
'url': caption_url,
'ext': 'ttml',
}]
title = metadata_tracker_data.get('trackerClipTitle')
is_live = url_type == 'live' is_live = url_type == 'live'
if is_live: if is_live:
title = self._live_title(title) info_dict.update({
upload_date = None 'title': self._live_title(info_dict['title']),
elif 'trackerClipAirTime' in metadata_tracker_data: 'upload_date': None,
upload_date = metadata_tracker_data['trackerClipAirTime'] })
else: elif 'upload_date' not in info_dict:
upload_date = self._html_search_meta('DC.Date', webpage, 'upload date') info_dict['upload_date'] = unified_strdate(self._html_search_meta('DC.Date', webpage, 'upload date'))
if upload_date: info_dict.update({
upload_date = unified_strdate(upload_date)
return {
'id': metadata_tracker_data.get('trackerClipId', display_id),
'display_id': display_id,
'title': title,
'alt_title': metadata_tracker_data.get('trackerClipSubcategory'),
'formats': formats,
'upload_date': upload_date,
'description': self._html_search_meta('Description', webpage), 'description': self._html_search_meta('Description', webpage),
'is_live': is_live, 'is_live': is_live,
'subtitles': subtitles, })
}
return info_dict
class WDRMobileIE(InfoExtractor): class WDRMobileIE(InfoExtractor):

View File

@ -1,29 +1,33 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor
from .youtube import YoutubeIE from .youtube import YoutubeIE
from .jwplatform import JWPlatformBaseIE
class WimpIE(InfoExtractor): class WimpIE(JWPlatformBaseIE):
_VALID_URL = r'https?://(?:www\.)?wimp\.com/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?wimp\.com/(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.wimp.com/maruexhausted/', 'url': 'http://www.wimp.com/maru-is-exhausted/',
'md5': 'ee21217ffd66d058e8b16be340b74883', 'md5': 'ee21217ffd66d058e8b16be340b74883',
'info_dict': { 'info_dict': {
'id': 'maruexhausted', 'id': 'maru-is-exhausted',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Maru is exhausted.', 'title': 'Maru is exhausted.',
'description': 'md5:57e099e857c0a4ea312542b684a869b8', 'description': 'md5:57e099e857c0a4ea312542b684a869b8',
} }
}, { }, {
'url': 'http://www.wimp.com/clowncar/', 'url': 'http://www.wimp.com/clowncar/',
'md5': '4e2986c793694b55b37cf92521d12bb4', 'md5': '5c31ad862a90dc5b1f023956faec13fe',
'info_dict': { 'info_dict': {
'id': 'clowncar', 'id': 'cG4CEr2aiSg',
'ext': 'webm', 'ext': 'webm',
'title': 'It\'s like a clown car.', 'title': 'Basset hound clown car...incredible!',
'description': 'md5:0e56db1370a6e49c5c1d19124c0d2fb2', 'description': '5 of my Bassets crawled in this dog loo! www.bellinghambassets.com\n\nFor licensing/usage please contact: licensing(at)jukinmediadotcom',
'upload_date': '20140303',
'uploader': 'Gretchen Hoey',
'uploader_id': 'gretchenandjeff1',
}, },
'add_ie': ['Youtube'],
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -41,14 +45,13 @@ class WimpIE(InfoExtractor):
'ie_key': YoutubeIE.ie_key(), 'ie_key': YoutubeIE.ie_key(),
} }
video_url = self._search_regex( info_dict = self._extract_jwplayer_data(
r'<video[^>]+>\s*<source[^>]+src=(["\'])(?P<url>.+?)\1', webpage, video_id, require_title=False)
webpage, 'video URL', group='url')
return { info_dict.update({
'id': video_id, 'id': video_id,
'url': video_url,
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
} })
return info_dict

View File

@ -5,8 +5,10 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError,
int_or_none, int_or_none,
qualities, qualities,
remove_start,
) )
@ -26,16 +28,17 @@ class WrzutaIE(InfoExtractor):
'uploader_id': 'laboratoriumdextera', 'uploader_id': 'laboratoriumdextera',
'description': 'md5:7fb5ef3c21c5893375fda51d9b15d9cd', 'description': 'md5:7fb5ef3c21c5893375fda51d9b15d9cd',
}, },
'skip': 'Redirected to wrzuta.pl',
}, { }, {
'url': 'http://jolka85.wrzuta.pl/audio/063jOPX5ue2/liber_natalia_szroeder_-_teraz_ty', 'url': 'http://vexling.wrzuta.pl/audio/01xBFabGXu6/james_horner_-_into_the_na_39_vi_world_bonus',
'md5': 'bc78077859bea7bcfe4295d7d7fc9025', 'md5': 'f80564fb5a2ec6ec59705ae2bf2ba56d',
'info_dict': { 'info_dict': {
'id': '063jOPX5ue2', 'id': '01xBFabGXu6',
'ext': 'ogg', 'ext': 'mp3',
'title': 'Liber & Natalia Szroeder - Teraz Ty', 'title': 'James Horner - Into The Na\'vi World [Bonus]',
'duration': 203, 'description': 'md5:30a70718b2cd9df3120fce4445b0263b',
'uploader_id': 'jolka85', 'duration': 95,
'description': 'md5:2d2b6340f9188c8c4cd891580e481096', 'uploader_id': 'vexling',
}, },
}] }]
@ -45,7 +48,10 @@ class WrzutaIE(InfoExtractor):
typ = mobj.group('typ') typ = mobj.group('typ')
uploader = mobj.group('uploader') uploader = mobj.group('uploader')
webpage = self._download_webpage(url, video_id) webpage, urlh = self._download_webpage_handle(url, video_id)
if urlh.geturl() == 'http://www.wrzuta.pl/':
raise ExtractorError('Video removed', expected=True)
quality = qualities(['SD', 'MQ', 'HQ', 'HD']) quality = qualities(['SD', 'MQ', 'HQ', 'HD'])
@ -80,3 +86,73 @@ class WrzutaIE(InfoExtractor):
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'age_limit': embedpage.get('minimalAge', 0), 'age_limit': embedpage.get('minimalAge', 0),
} }
class WrzutaPlaylistIE(InfoExtractor):
"""
this class covers extraction of wrzuta playlist entries
the extraction process bases on following steps:
* collect information of playlist size
* download all entries provided on
the playlist webpage (the playlist is split
on two pages: first directly reached from webpage
second: downloaded on demand by ajax call and rendered
using the ajax call response)
* in case size of extracted entries not reached total number of entries
use the ajax call to collect the remaining entries
"""
IE_NAME = 'wrzuta.pl:playlist'
_VALID_URL = r'https?://(?P<uploader>[0-9a-zA-Z]+)\.wrzuta\.pl/playlista/(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'http://miromak71.wrzuta.pl/playlista/7XfO4vE84iR/moja_muza',
'playlist_mincount': 14,
'info_dict': {
'id': '7XfO4vE84iR',
'title': 'Moja muza',
},
}, {
'url': 'http://heroesf70.wrzuta.pl/playlista/6Nj3wQHx756/lipiec_-_lato_2015_muzyka_swiata',
'playlist_mincount': 144,
'info_dict': {
'id': '6Nj3wQHx756',
'title': 'Lipiec - Lato 2015 Muzyka Świata',
},
}, {
'url': 'http://miromak71.wrzuta.pl/playlista/7XfO4vE84iR',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
uploader = mobj.group('uploader')
webpage = self._download_webpage(url, playlist_id)
playlist_size = int_or_none(self._html_search_regex(
(r'<div[^>]+class=["\']playlist-counter["\'][^>]*>\d+/(\d+)',
r'<div[^>]+class=["\']all-counter["\'][^>]*>(.+?)</div>'),
webpage, 'playlist size', default=None))
playlist_title = remove_start(
self._og_search_title(webpage), 'Playlista: ')
entries = []
if playlist_size:
entries = [
self.url_result(entry_url)
for _, entry_url in re.findall(
r'<a[^>]+href=(["\'])(http.+?)\1[^>]+class=["\']playlist-file-page',
webpage)]
if playlist_size > len(entries):
playlist_content = self._download_json(
'http://%s.wrzuta.pl/xhr/get_playlist_offset/%s' % (uploader, playlist_id),
playlist_id,
'Downloading playlist JSON',
'Unable to download playlist JSON')
entries.extend([
self.url_result(entry['filelink'])
for entry in playlist_content.get('files', []) if entry.get('filelink')])
return self.playlist_result(entries, playlist_id, playlist_title)

View File

@ -5,8 +5,10 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
decode_packed_codes,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
NO_DEFAULT,
sanitized_Request, sanitized_Request,
urlencode_postdata, urlencode_postdata,
) )
@ -23,20 +25,24 @@ class XFileShareIE(InfoExtractor):
('thevideobee.to', 'TheVideoBee'), ('thevideobee.to', 'TheVideoBee'),
('vidto.me', 'Vidto'), ('vidto.me', 'Vidto'),
('streamin.to', 'Streamin.To'), ('streamin.to', 'Streamin.To'),
('xvidstage.com', 'XVIDSTAGE'),
) )
IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1]) IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1])
_VALID_URL = (r'https?://(?P<host>(?:www\.)?(?:%s))/(?:embed-)?(?P<id>[0-9a-zA-Z]+)' _VALID_URL = (r'https?://(?P<host>(?:www\.)?(?:%s))/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
% '|'.join(re.escape(site) for site in list(zip(*_SITES))[0])) % '|'.join(re.escape(site) for site in list(zip(*_SITES))[0]))
_FILE_NOT_FOUND_REGEX = r'>(?:404 - )?File Not Found<' _FILE_NOT_FOUND_REGEXES = (
r'>(?:404 - )?File Not Found<',
r'>The file was removed by administrator<',
)
_TESTS = [{ _TESTS = [{
'url': 'http://gorillavid.in/06y9juieqpmi', 'url': 'http://gorillavid.in/06y9juieqpmi',
'md5': '5ae4a3580620380619678ee4875893ba', 'md5': '5ae4a3580620380619678ee4875893ba',
'info_dict': { 'info_dict': {
'id': '06y9juieqpmi', 'id': '06y9juieqpmi',
'ext': 'flv', 'ext': 'mp4',
'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ', 'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ',
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
@ -78,6 +84,17 @@ class XFileShareIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Big Buck Bunny trailer', 'title': 'Big Buck Bunny trailer',
}, },
}, {
'url': 'http://xvidstage.com/e0qcnl03co6z',
'info_dict': {
'id': 'e0qcnl03co6z',
'ext': 'mp4',
'title': 'Chucky Prank 2015.mp4',
},
}, {
# removed by administrator
'url': 'http://xvidstage.com/amfy7atlkx25',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -87,7 +104,7 @@ class XFileShareIE(InfoExtractor):
url = 'http://%s/%s' % (mobj.group('host'), video_id) url = 'http://%s/%s' % (mobj.group('host'), video_id)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
if re.search(self._FILE_NOT_FOUND_REGEX, webpage) is not None: if any(re.search(p, webpage) for p in self._FILE_NOT_FOUND_REGEXES):
raise ExtractorError('Video %s does not exist' % video_id, expected=True) raise ExtractorError('Video %s does not exist' % video_id, expected=True)
fields = self._hidden_inputs(webpage) fields = self._hidden_inputs(webpage)
@ -113,10 +130,23 @@ class XFileShareIE(InfoExtractor):
r'>Watch (.+) ', r'>Watch (.+) ',
r'<h2 class="video-page-head">([^<]+)</h2>'], r'<h2 class="video-page-head">([^<]+)</h2>'],
webpage, 'title', default=None) or self._og_search_title(webpage)).strip() webpage, 'title', default=None) or self._og_search_title(webpage)).strip()
video_url = self._search_regex(
[r'file\s*:\s*["\'](http[^"\']+)["\'],', def extract_video_url(default=NO_DEFAULT):
r'file_link\s*=\s*\'(https?:\/\/[0-9a-zA-z.\/\-_]+)'], return self._search_regex(
webpage, 'file url') (r'file\s*:\s*(["\'])(?P<url>http.+?)\1,',
r'file_link\s*=\s*(["\'])(?P<url>http.+?)\1',
r'addVariable\((\\?["\'])file\1\s*,\s*(\\?["\'])(?P<url>http.+?)\2\)',
r'<embed[^>]+src=(["\'])(?P<url>http.+?)\1'),
webpage, 'file url', default=default, group='url')
video_url = extract_video_url(default=None)
if not video_url:
webpage = decode_packed_codes(self._search_regex(
r"(}\('(.+)',(\d+),(\d+),'[^']*\b(?:file|embed)\b[^']*'\.split\('\|'\))",
webpage, 'packed code'))
video_url = extract_video_url()
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', default=None) r'image\s*:\s*["\'](http[^"\']+)["\'],', webpage, 'thumbnail', default=None)

View File

@ -6,17 +6,23 @@ from ..compat import compat_urllib_parse_unquote
class XNXXIE(InfoExtractor): class XNXXIE(InfoExtractor):
_VALID_URL = r'^https?://(?:video|www)\.xnxx\.com/video(?P<id>[0-9]+)/(.*)' _VALID_URL = r'https?://(?:video|www)\.xnxx\.com/video-?(?P<id>[0-9a-z]+)/'
_TEST = { _TESTS = [{
'url': 'http://video.xnxx.com/video1135332/lida_naked_funny_actress_5_', 'url': 'http://www.xnxx.com/video-55awb78/skyrim_test_video',
'md5': '0831677e2b4761795f68d417e0b7b445', 'md5': 'ef7ecee5af78f8b03dca2cf31341d3a0',
'info_dict': { 'info_dict': {
'id': '1135332', 'id': '55awb78',
'ext': 'flv', 'ext': 'flv',
'title': 'lida » Naked Funny Actress (5)', 'title': 'Skyrim Test Video',
'age_limit': 18, 'age_limit': 18,
} },
} }, {
'url': 'http://video.xnxx.com/video1135332/lida_naked_funny_actress_5_',
'only_matching': True,
}, {
'url': 'http://www.xnxx.com/video-55awb78/',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)

View File

@ -17,7 +17,7 @@ class YouPornIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?youporn\.com/watch/(?P<id>\d+)/(?P<display_id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?youporn\.com/watch/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/', 'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'md5': '71ec5fcfddacf80f495efa8b6a8d9a89', 'md5': '3744d24c50438cf5b6f6d59feb5055c2',
'info_dict': { 'info_dict': {
'id': '505835', 'id': '505835',
'display_id': 'sex-ed-is-it-safe-to-masturbate-daily', 'display_id': 'sex-ed-is-it-safe-to-masturbate-daily',
@ -121,21 +121,21 @@ class YouPornIE(InfoExtractor):
webpage, 'thumbnail', fatal=False, group='thumbnail') webpage, 'thumbnail', fatal=False, group='thumbnail')
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'(?s)<div[^>]+class=["\']videoInfoBy(?:\s+[^"\']+)?["\'][^>]*>\s*By:\s*</div>(.+?)</(?:a|div)>', r'(?s)<div[^>]+class=["\']submitByLink["\'][^>]*>(.+?)</div>',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._html_search_regex( upload_date = unified_strdate(self._html_search_regex(
r'(?s)<div[^>]+class=["\']videoInfoTime["\'][^>]*>(.+?)</div>', r'(?s)<div[^>]+class=["\']videoInfo(?:Date|Time)["\'][^>]*>(.+?)</div>',
webpage, 'upload date', fatal=False)) webpage, 'upload date', fatal=False))
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)
average_rating = int_or_none(self._search_regex( average_rating = int_or_none(self._search_regex(
r'<div[^>]+class=["\']videoInfoRating["\'][^>]*>\s*<div[^>]+class=["\']videoRatingPercentage["\'][^>]*>(\d+)%</div>', r'<div[^>]+class=["\']videoRatingPercentage["\'][^>]*>(\d+)%</div>',
webpage, 'average rating', fatal=False)) webpage, 'average rating', fatal=False))
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
r'(?s)<div[^>]+class=["\']videoInfoViews["\'][^>]*>.*?([\d,.]+)\s*</div>', r'(?s)<div[^>]+class=(["\']).*?\bvideoInfoViews\b.*?\1[^>]*>.*?(?P<count>[\d,.]+)<',
webpage, 'view count', fatal=False)) webpage, 'view count', fatal=False, group='count'))
comment_count = str_to_int(self._search_regex( comment_count = str_to_int(self._search_regex(
r'>All [Cc]omments? \(([\d,.]+)\)', r'>All [Cc]omments? \(([\d,.]+)\)',
webpage, 'comment count', fatal=False)) webpage, 'comment count', fatal=False))

View File

@ -76,7 +76,7 @@ def register_socks_protocols():
compiled_regex_type = type(re.compile('')) compiled_regex_type = type(re.compile(''))
std_headers = { std_headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/44.0 (Chrome)', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate', 'Accept-Encoding': 'gzip, deflate',
@ -1901,6 +1901,16 @@ def dict_get(d, key_or_keys, default=None, skip_false_values=True):
return d.get(key_or_keys, default) return d.get(key_or_keys, default)
def try_get(src, getter, expected_type=None):
try:
v = getter(src)
except (AttributeError, KeyError, TypeError, IndexError):
pass
else:
if expected_type is None or isinstance(v, expected_type):
return v
def encode_compat_str(string, encoding=preferredencoding(), errors='strict'): def encode_compat_str(string, encoding=preferredencoding(), errors='strict'):
return string if isinstance(string, compat_str) else compat_str(string, encoding, errors) return string if isinstance(string, compat_str) else compat_str(string, encoding, errors)
@ -1960,7 +1970,7 @@ def js_to_json(code):
'(?:[^'\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^'\\]*'| '(?:[^'\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^'\\]*'|
/\*.*?\*/|,(?=\s*[\]}])| /\*.*?\*/|,(?=\s*[\]}])|
[a-zA-Z_][.a-zA-Z_0-9]*| [a-zA-Z_][.a-zA-Z_0-9]*|
(?:0[xX][0-9a-fA-F]+|0+[0-7]+)(?:\s*:)?| \b(?:0[xX][0-9a-fA-F]+|0+[0-7]+)(?:\s*:)?|
[0-9]+(?=\s*:) [0-9]+(?=\s*:)
''', fix_kv, code) ''', fix_kv, code)
@ -2842,3 +2852,12 @@ def decode_packed_codes(code):
return re.sub( return re.sub(
r'\b(\w+)\b', lambda mobj: symbol_table[mobj.group(0)], r'\b(\w+)\b', lambda mobj: symbol_table[mobj.group(0)],
obfucasted_code) obfucasted_code)
def parse_m3u8_attributes(attrib):
info = {}
for (key, val) in re.findall(r'(?P<key>[A-Z0-9-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)', attrib):
if val.startswith('"'):
val = val[1:-1]
info[key] = val
return info

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2016.06.11.2' __version__ = '2016.06.23'