Compare commits

..

1 Commits

Author SHA1 Message Date
4ea8d2ee76 release 2016.06.04 2016-06-04 22:42:10 +07:00
494 changed files with 8997 additions and 25387 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.11.14.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.04*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.11.14.1** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.04**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.11.14.1 [debug] youtube-dl version 2016.06.04
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}
@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
### Description of your *issue*, suggested solution and other information ### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible. Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* requires account credentials please provide them or explain how one can obtain them. If work on your *issue* required an account credentials please provide them or explain how one can obtain them.

View File

@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
### Description of your *issue*, suggested solution and other information ### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible. Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* requires account credentials please provide them or explain how one can obtain them. If work on your *issue* required an account credentials please provide them or explain how one can obtain them.

View File

@ -1,27 +0,0 @@
## Please follow the guide below
- You will be asked some questions, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *pull request* (like that [x])
- Use *Preview* tab to see how your *pull request* will actually look like
---
### Before submitting a *pull request* make sure you have:
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:
- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)
- [ ] I am not the original author of this code but it is in public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
### What is the purpose of your *pull request*?
- [ ] Bug fix
- [ ] Improvement
- [ ] New extractor
- [ ] New feature
---
### Description of your *pull request* and other information
Explanation of your *pull request* in arbitrary form goes here. Please make sure the description explains the purpose and effect of your *pull request* and is worded well enough to be understood. Provide as much context and examples as possible.

2
.gitignore vendored
View File

@ -29,8 +29,6 @@ updates_key.pem
*.m4a *.m4a
*.m4v *.m4v
*.mp3 *.mp3
*.3gp
*.wav
*.part *.part
*.swp *.swp
test/testdata test/testdata

View File

@ -7,6 +7,9 @@ python:
- "3.4" - "3.4"
- "3.5" - "3.5"
sudo: false sudo: false
install:
- bash ./devscripts/install_srelay.sh
- export PATH=$PATH:$(pwd)/tmp/srelay-0.4.8b6
script: nosetests test --verbose script: nosetests test --verbose
notifications: notifications:
email: email:

19
AUTHORS
View File

@ -26,7 +26,7 @@ Albert Kim
Pierre Rudloff Pierre Rudloff
Huarong Huo Huarong Huo
Ismael Mejía Ismael Mejía
Steffan Donal Steffan 'Ruirize' James
Andras Elso Andras Elso
Jelle van der Waa Jelle van der Waa
Marcin Cieślak Marcin Cieślak
@ -173,20 +173,3 @@ Kevin Deldycke
inondle inondle
Tomáš Čech Tomáš Čech
Déstin Reed Déstin Reed
Roman Tsiupa
Artur Krysiak
Jakub Adam Wieczorek
Aleksandar Topuzović
Nehal Patel
Rob van Bekkum
Petr Zvoníček
Pratyush Singh
Aleksander Nitecki
Sebastian Blunt
Matěj Cepl
Xie Yanbo
Philip Xu
John Hawkinson
Rich Leeper
Zhong Jianxin
Thor77

View File

@ -12,7 +12,7 @@ $ youtube-dl -v <your command line>
[debug] Proxy map: {} [debug] Proxy map: {}
... ...
``` ```
**Do not post screenshots of verbose logs; only plain text is acceptable.** **Do not post screenshots of verbose log only plain text is acceptable.**
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever. The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
@ -46,7 +46,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
### Why are existing options not enough? ### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem. Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
### Is there enough context in your bug report? ### Is there enough context in your bug report?
@ -66,7 +66,7 @@ Only post features that you (or an incapacitated friend you can personally talk
### Is your question about youtube-dl? ### Is your question about youtube-dl?
It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug. It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different or even the reporter's own application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
# DEVELOPER INSTRUCTIONS # DEVELOPER INSTRUCTIONS
@ -85,7 +85,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need If you want to create a build of youtube-dl yourself, you'll need
* python * python
* make (only GNU make is supported) * make (both GNU make and BSD make are supported)
* pandoc * pandoc
* zip * zip
* nosetests * nosetests
@ -97,17 +97,9 @@ If you want to add support for a new site, first of all **make sure** this site
After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`): After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork) 1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
2. Check out the source code with: 2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
3. Start a new git branch with
cd youtube-dl
git checkout -b yourextractor
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`: 4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
```python ```python
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@ -150,149 +142,17 @@ After you have ensured this site is distributing it's content legally, you can f
``` ```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+. 8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this: 9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py $ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor' $ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor $ git push origin yourextractor
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it. 11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions! In any case, thank you very much for your contributions!
## youtube-dl coding conventions
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
- `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
```python
meta = self._download_json(url, video_id)
```
Assume at this point `meta`'s layout is:
```python
{
...
"summary": "some fancy summary text",
...
}
```
Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python
description = meta.get('summary') # correct
```
and not like:
```python
description = meta['summary'] # incorrect
```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', fatal=False)
```
With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
You can also pass `default=<some fallback value>`, for example:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', default=None)
```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
### Provide fallbacks
When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
#### Example
Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
```python
title = meta['title']
```
If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
```python
title = meta.get('title') or self._og_search_title(webpage)
```
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible
When using regular expressions try to write them fuzzy and flexible.
#### Example
Say you need to extract `title` from the following HTML code:
```html
<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
```
The code for that task should look similar to:
```python
title = self._search_regex(
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
```
Or even better:
```python
title = self._search_regex(
r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
webpage, 'title', group='title')
```
Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like:
```python
title = self._search_regex(
r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
webpage, 'title', group='title')
```
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

970
ChangeLog
View File

@ -1,970 +0,0 @@
version 2016.11.14.1
Core
+ [downoader/fragment,f4m,hls] Respect HTTP headers from info dict
* [extractor/common] Fix media templates with Bandwidth substitution pattern in
MPD manifests (#11175)
* [extractor/common] Improve thumbnail extraction from JSON-LD
Extractors
+ [nrk] Workaround geo restriction
+ [nrk] Improve error detection and messages
+ [afreecatv] Add support for vod.afreecatv.com (#11174)
* [cda] Fix and improve extraction (#10929, #10936)
* [plays] Fix extraction (#11165)
* [eagleplatform] Fix extraction (#11160)
+ [audioboom] Recognize /posts/ URLs (#11149)
version 2016.11.08.1
Extractors
* [espn:article] Fix support for espn.com articles
* [franceculture] Fix extraction (#11140)
version 2016.11.08
Extractors
* [tmz:article] Fix extraction (#11052)
* [espn] Fix extraction (#11041)
* [mitele] Fix extraction after website redesign (#10824)
- [ard] Remove age restriction check (#11129)
* [generic] Improve support for pornhub.com embeds (#11100)
+ [generic] Add support for redtube.com embeds (#11099)
+ [generic] Add support for drtuber.com embeds (#11098)
+ [redtube] Add support for embed URLs
+ [drtuber] Add support for embed URLs
+ [yahoo] Improve content id extraction (#11088)
* [toutv] Relax URL regular expression (#11121)
version 2016.11.04
Core
* [extractor/common] Tolerate malformed RESOLUTION attribute in m3u8
manifests (#11113)
* [downloader/ism] Fix AVC Decoder Configuration Record
Extractors
+ [fox9] Add support for fox9.com (#11110)
+ [anvato] Extract more metadata and improve formats extraction
* [vodlocker] Improve removed videos detection (#11106)
+ [vzaar] Add support for vzaar.com (#11093)
+ [vice] Add support for uplynk preplay videos (#11101)
* [tubitv] Fix extraction (#11061)
+ [shahid] Add support for authentication (#11091)
+ [radiocanada] Add subtitles support (#11096)
+ [generic] Add support for ISM manifests
version 2016.11.02
Core
+ Add basic support for Smooth Streaming protocol (#8118, #10969)
* Improve MPD manifest base URL extraction (#10909, #11079)
* Fix --match-filter for int-like strings (#11082)
Extractors
+ [mva] Add support for ISM formats
+ [msn] Add support for ISM formats
+ [onet] Add support for ISM formats
+ [tvp] Add support for ISM formats
+ [nicknight] Add support for nicknight sites (#10769)
version 2016.10.30
Extractors
* [facebook] Improve 1080P video detection (#11073)
* [imgur] Recognize /r/ URLs (#11071)
* [beeg] Fix extraction (#11069)
* [openload] Fix extraction (#10408)
* [gvsearch] Modernize and fix search request (#11051)
* [adultswim] Fix extraction (#10979)
+ [nobelprize] Add support for nobelprize.org (#9999)
* [hornbunny] Fix extraction (#10981)
* [tvp] Improve video id extraction (#10585)
version 2016.10.26
Extractors
+ [rentv] Add support for ren.tv (#10620)
+ [ard] Detect unavailable videos (#11018)
* [vk] Fix extraction (#11022)
version 2016.10.25
Core
* Running youtube-dl in the background is fixed (#10996, #10706, #955)
Extractors
+ [jamendo] Add support for jamendo.com (#10132, #10736)
+ [pandatv] Add support for panda.tv (#10736)
+ [dotsub] Support Vimeo embed (#10964)
* [litv] Fix extraction
+ [vimeo] Delegate ondemand redirects to ondemand extractor (#10994)
* [vivo] Fix extraction (#11003)
+ [twitch:stream] Add support for rebroadcasts (#10995)
* [pluralsight] Fix subtitles conversion (#10990)
version 2016.10.21.1
Extractors
+ [pluralsight] Process all clip URLs (#10984)
version 2016.10.21
Core
- Disable thumbnails embedding in mkv
+ Add support for Comcast multiple-system operator (#10819)
Extractors
* [pluralsight] Adapt to new API (#10972)
* [openload] Fix extraction (#10408, #10971)
+ [natgeo] Extract m3u8 formats (#10959)
version 2016.10.19
Core
+ [utils] Expose PACKED_CODES_RE
+ [extractor/common] Extract non smil wowza mpd manifests
+ [extractor/common] Detect f4m audio-only formats
Extractors
* [vidzi] Fix extraction (#10908, #10952)
* [urplay] Fix subtitles extraction
+ [urplay] Add support for urskola.se (#10915)
+ [orf] Add subtitles support (#10939)
* [youtube] Fix --no-playlist behavior for youtu.be/id URLs (#10896)
* [nrk] Relax URL regular expression (#10928)
+ [nytimes] Add support for podcasts (#10926)
* [pluralsight] Relax URL regular expression (#10941)
version 2016.10.16
Core
* [postprocessor/ffmpeg] Return correct filepath and ext in updated information
in FFmpegExtractAudioPP (#10879)
Extractors
+ [ruutu] Add support for supla.fi (#10849)
+ [theoperaplatform] Add support for theoperaplatform.eu (#10914)
* [lynda] Fix height for prioritized streams
+ [lynda] Add fallback extraction scenario
* [lynda] Switch to https (#10916)
+ [huajiao] New extractor (#10917)
* [cmt] Fix mgid extraction (#10813)
+ [safari:course] Add support for techbus.safaribooksonline.com
* [orf:tvthek] Fix extraction and modernize (#10898)
* [chirbit] Fix extraction of user profile pages
* [carambatv] Fix extraction
* [canalplus] Fix extraction for some videos
* [cbsinteractive] Fix extraction for cnet.com
* [parliamentliveuk] Lower case URLs are now recognized (#10912)
version 2016.10.12
Core
+ Support HTML media elements without child nodes
* [Makefile] Support for GNU make < 4 is fixed; BSD make dropped (#9387)
Extractors
* [dailymotion] Fix extraction (#10901)
* [vimeo:review] Fix extraction (#10900)
* [nhl] Correctly handle invalid formats (#10713)
* [footyroom] Fix extraction (#10810)
* [abc.net.au:iview] Fix for standalone (non series) videos (#10895)
+ [hbo] Add support for episode pages (#10892)
* [allocine] Fix extraction (#10860)
+ [nextmedia] Recognize action news on AppleDaily
* [lego] Improve info extraction and bypass geo restriction (#10872)
version 2016.10.07
Extractors
+ [iprima] Detect geo restriction
* [facebook] Fix video extraction (#10846)
+ [commonprotocols] Support direct MMS links (#10838)
+ [generic] Add support for multiple vimeo embeds (#10862)
+ [nzz] Add support for nzz.ch (#4407)
+ [npo] Detect geo restriction
+ [npo] Add support for 2doc.nl (#10842)
+ [lego] Add support for lego.com (#10369)
+ [tonline] Add support for t-online.de (#10376)
* [techtalks] Relax URL regular expression (#10840)
* [youtube:live] Extend URL regular expression (#10839)
+ [theweatherchannel] Add support for weather.com (#7188)
+ [thisoldhouse] Add support for thisoldhouse.com (#10837)
+ [nhl] Add support for wch2016.com (#10833)
* [pornoxo] Use JWPlatform to improve metadata extraction
version 2016.10.02
Core
* Fix possibly lost extended attributes during post-processing
+ Support pyxattr as well as python-xattr for --xattrs and
--xattr-set-filesize (#9054)
Extractors
+ [jwplatform] Support DASH streams in JWPlayer
+ [jwplatform] Support old-style JWPlayer playlists
+ [byutv:event] Add extractor
* [periscope:user] Fix extraction (#10820)
* [dctp] Fix extraction (#10734)
+ [instagram] Extract video dimensions (#10790)
+ [tvland] Extend URL regular expression (#10812)
+ [vgtv] Add support for tv.aftonbladet.se (#10800)
- [aftonbladet] Remove extractor
* [vk] Fix timestamp and view count extraction (#10760)
+ [vk] Add support for running and finished live streams (#10799)
+ [leeco] Recognize more Le Sports URLs (#10794)
+ [instagram] Extract comments (#10788)
+ [ketnet] Extract mzsource formats (#10770)
* [limelight:media] Improve HTTP formats extraction
version 2016.09.27
Core
+ Add hdcore query parameter to akamai f4m formats
+ Delegate HLS live streams downloading to ffmpeg
+ Improved support for HTML5 subtitles
Extractors
+ [vk] Add support for dailymotion embeds (#10661)
* [promptfile] Fix extraction (#10634)
* [kaltura] Speed up embed regular expressions (#10764)
+ [npo] Add support for anderetijden.nl (#10754)
+ [prosiebensat1] Add support for advopedia sites
* [mwave] Relax URL regular expression (#10735, #10748)
* [prosiebensat1] Fix playlist support (#10745)
+ [prosiebensat1] Add support for sat1gold sites (#10745)
+ [cbsnews:livevideo] Fix extraction and extract m3u8 formats
+ [brightcove:new] Add support for live streams
* [soundcloud] Generalize playlist entries extraction (#10733)
+ [mtv] Add support for new URL schema (#8169, #9808)
* [einthusan] Fix extraction (#10714)
+ [twitter] Support Periscope embeds (#10737)
+ [openload] Support subtitles (#10625)
version 2016.09.24
Core
+ Add support for watchTVeverywhere.com authentication provider based MSOs for
Adobe Pass authentication (#10709)
Extractors
+ [soundcloud:playlist] Provide video id for early playlist entries (#10733)
+ [prosiebensat1] Add support for kabeleinsdoku (#10732)
* [cbs] Extract info from thunder videoPlayerService (#10728)
* [openload] Fix extraction (#10408)
+ [ustream] Support the new HLS streams (#10698)
+ [ooyala] Extract all HLS formats
+ [cartoonnetwork] Add support for Adobe Pass authentication
+ [soundcloud] Extract license metadata
+ [fox] Add support for Adobe Pass authentication (#8584)
+ [tbs] Add support for Adobe Pass authentication (#10642, #10222)
+ [trutv] Add support for Adobe Pass authentication (#10519)
+ [turner] Add support for Adobe Pass authentication
version 2016.09.19
Extractors
+ [crunchyroll] Check if already authenticated (#10700)
- [twitch:stream] Remove fallback to profile extraction when stream is offline
* [thisav] Improve title extraction (#10682)
* [vyborymos] Improve station info extraction
version 2016.09.18
Core
+ Introduce manifest_url and fragments fields in formats dictionary for
fragmented media
+ Provide manifest_url field for DASH segments, HLS and HDS
+ Provide fragments field for DASH segments
* Rework DASH segments downloader to use fragments field
+ Add helper method for Wowza Streaming Engine formats extraction
Extractors
+ [vyborymos] Add extractor for vybory.mos.ru (#10692)
+ [xfileshare] Add title regular expression for streamin.to (#10646)
+ [globo:article] Add support for multiple videos (#10653)
+ [thisav] Recognize HTML5 videos (#10447)
* [jwplatform] Improve JWPlayer detection
+ [mangomolo] Add support for Mangomolo embeds
+ [toutv] Add support for authentication (#10669)
* [franceinter] Fix upload date extraction
* [tv4] Fix HLS and HDS formats extraction (#10659)
version 2016.09.15
Core
* Improve _hidden_inputs
+ Introduce improved explicit Adobe Pass support
+ Add --ap-mso to provide multiple-system operator identifier
+ Add --ap-username to provide MSO account username
+ Add --ap-password to provide MSO account password
+ Add --ap-list-mso to list all supported MSOs
+ Add support for Rogers Cable multiple-system operator (#10606)
Extractors
* [crunchyroll] Fix authentication (#10655)
* [twitch] Fix API calls (#10654, #10660)
+ [bellmedia] Add support for more Bell Media Television sites
* [franceinter] Fix extraction (#10538, #2105)
* [kuwo] Improve error detection (#10650)
+ [go] Add support for free full episodes (#10439)
* [bilibili] Fix extraction for specific videos (#10647)
* [nhk] Fix extraction (#10633)
* [kaltura] Improve audio detection
* [kaltura] Skip chun format
+ [vimeo:ondemand] Pass Referer along with embed URL (#10624)
+ [nbc] Add support for NBC Olympics (#10361)
version 2016.09.11.1
Extractors
+ [tube8] Extract categories and tags (#10579)
+ [pornhub] Extract categories and tags (#10499)
* [openload] Temporary fix (#10408)
+ [foxnews] Add support Fox News articles (#10598)
* [viafree] Improve video id extraction (#10615)
* [iwara] Fix extraction after relaunch (#10462, #3215)
+ [tfo] Add extractor for tfo.org
* [lrt] Fix audio extraction (#10566)
* [9now] Fix extraction (#10561)
+ [canalplus] Add support for c8.fr (#10577)
* [newgrounds] Fix uploader extraction (#10584)
+ [polskieradio:category] Add support for category lists (#10576)
+ [ketnet] Add extractor for ketnet.be (#10343)
+ [canvas] Add support for een.be (#10605)
+ [telequebec] Add extractor for telequebec.tv (#1999)
* [parliamentliveuk] Fix extraction (#9137)
version 2016.09.08
Extractors
+ [jwplatform] Extract height from format label
+ [yahoo] Extract Brightcove Legacy Studio embeds (#9345)
* [videomore] Fix extraction (#10592)
* [foxgay] Fix extraction (#10480)
+ [rmcdecouverte] Add extractor for rmcdecouverte.bfmtv.com (#9709)
* [gamestar] Fix metadata extraction (#10479)
* [puls4] Fix extraction (#10583)
+ [cctv] Add extractor for CCTV and CNTV (#8153)
+ [lci] Add extractor for lci.fr (#10573)
+ [wat] Extract DASH formats
+ [viafree] Improve video id detection (#10569)
+ [trutv] Add extractor for trutv.com (#10519)
+ [nick] Add support for nickelodeon.nl (#10559)
+ [abcotvs:clips] Add support for clips.abcotvs.com
+ [abcotvs] Add support for ABC Owned Television Stations sites (#9551)
+ [miaopai] Add extractor for miaopai.com (#10556)
* [gamestar] Fix metadata extraction (#10479)
+ [bilibili] Add support for episodes (#10190)
+ [tvnoe] Add extractor for tvnoe.cz (#10524)
version 2016.09.04.1
Core
* In DASH downloader if the first segment fails, abort the whole download
process to prevent throttling (#10497)
+ Add support for --skip-unavailable-fragments and --fragment retries in
hlsnative downloader (#10165, #10448).
+ Add support for --skip-unavailable-fragments in DASH downloader
+ Introduce --skip-unavailable-fragments option for fragment based downloaders
that allows to skip fragments unavailable due to a HTTP error
* Fix extraction of video/audio entries with src attribute in
_parse_html5_media_entries (#10540)
Extractors
* [theplatform] Relax URL regular expression (#10546)
* [youtube:playlist] Extend URL regular expression
* [rottentomatoes] Delegate extraction to internetvideoarchive extractor
* [internetvideoarchive] Extract all formats
* [pornvoisines] Fix extraction (#10469)
* [rottentomatoes] Fix extraction (#10467)
* [espn] Extend URL regular expression (#10549)
* [vimple] Extend URL regular expression (#10547)
* [youtube:watchlater] Fix extraction (#10544)
* [youjizz] Fix extraction (#10437)
+ [foxnews] Add support for FoxNews Insider (#10445)
+ [fc2] Recognize Flash player URLs (#10512)
version 2016.09.03
Core
* Restore usage of NAME attribute from EXT-X-MEDIA tag for formats codes in
_extract_m3u8_formats (#10522)
* Handle semicolon in mimetype2ext
Extractors
+ [youtube] Add support for rental videos' previews (#10532)
* [youtube:playlist] Fallback to video extraction for video/playlist URLs when
no playlist is actually served (#10537)
+ [drtv] Add support for dr.dk/nyheder (#10536)
+ [facebook:plugins:video] Add extractor (#10530)
+ [go] Add extractor for *.go.com sites
* [adobepass] Check for authz_token expiration (#10527)
* [nytimes] improve extraction
* [thestar] Fix extraction (#10465)
* [glide] Fix extraction (#10478)
- [exfm] Remove extractor (#10482)
* [youporn] Fix categories and tags extraction (#10521)
+ [curiositystream] Add extractor for app.curiositystream.com
- [thvideo] Remove extractor (#10464)
* [movingimage] Fix for the new site name (#10466)
+ [cbs] Add support for once formats (#10515)
* [limelight] Skip ism snd duplicate manifests
+ [porncom] Extract categories and tags (#10510)
+ [facebook] Extract timestamp (#10508)
+ [yahoo] Extract more formats
version 2016.08.31
Extractors
* [soundcloud] Fix URL regular expression to avoid clashes with sets (#10505)
* [bandcamp:album] Fix title extraction (#10455)
* [pyvideo] Fix extraction (#10468)
+ [ctv] Add support for tsn.ca, bnn.ca and thecomedynetwork.ca (#10016)
* [9c9media] Extract more metadata
* [9c9media] Fix multiple stacks extraction (#10016)
* [adultswim] Improve video info extraction (#10492)
* [vodplatform] Improve embed regular expression
- [played] Remove extractor (#10470)
+ [tbs] Add extractor for tbs.com and tntdrama.com (#10222)
+ [cartoonnetwork] Add extractor for cartoonnetwork.com (#10110)
* [adultswim] Rework in terms of turner extractor
* [cnn] Rework in terms of turner extractor
* [nba] Rework in terms of turner extractor
+ [turner] Add base extractor for Turner Broadcasting System based sites
* [bilibili] Fix extraction (#10375)
* [openload] Fix extraction (#10408)
version 2016.08.28
Core
+ Add warning message that ffmpeg doesn't support SOCKS
* Improve thumbnail sorting
+ Extract formats from #EXT-X-MEDIA tags in _extract_m3u8_formats
* Fill IV with leading zeros for IVs shorter than 16 octets in hlsnative
+ Add ac-3 to the list of audio codecs in parse_codecs
Extractors
* [periscope:user] Fix extraction (#10453)
* [douyutv] Fix extraction (#10153, #10318, #10444)
+ [nhk:vod] Add extractor for www3.nhk.or.jp on demand (#4437, #10424)
- [trutube] Remove extractor (#10438)
+ [usanetwork] Add extractor for usanetwork.com
* [crackle] Fix extraction (#10333)
* [spankbang] Fix description and uploader extraction (#10339)
* [discoverygo] Detect cable provider restricted videos (#10425)
+ [cbc] Add support for watch.cbc.ca
* [kickstarter] Silent the warning for og:description (#10415)
* [mtvservices:embedded] Fix extraction for the new 'edge' player (#10363)
version 2016.08.24.1
Extractors
+ [pluralsight] Add support for subtitles (#9681)
version 2016.08.24
Extractors
* [youtube] Fix authentication (#10392)
* [openload] Fix extraction (#10408)
+ [bravotv] Add support for Adobe Pass (#10407)
* [bravotv] Fix clip info extraction (#10407)
* [eagleplatform] Improve embedded videos detection (#10409)
* [awaan] Fix extraction
* [mtvservices:embedded] Update config URL
+ [abc:iview] Add extractor (#6148)
version 2016.08.22
Core
* Improve formats and subtitles extension auto calculation
+ Recognize full unit names in parse_filesize
+ Add support for m3u8 manifests in HTML5 multimedia tags
* Fix octal/hexadecimal number detection in js_to_json
Extractors
+ [ivi] Add support for 720p and 1080p
+ [charlierose] Add new extractor (#10382)
* [1tv] Fix extraction (#9249)
* [twitch] Renew authentication
* [kaltura] Improve subtitles extension calculation
+ [zingmp3] Add support for video clips
* [zingmp3] Fix extraction (#10041)
* [kaltura] Improve subtitles extraction (#10279)
* [cultureunplugged] Fix extraction (#10330)
+ [cnn] Add support for money.cnn.com (#2797)
* [cbsnews] Fix extraction (#10362)
* [cbs] Fix extraction (#10393)
+ [litv] Support 'promo' URLs (#10385)
* [snotr] Fix extraction (#10338)
* [n-tv.de] Fix extraction (#10331)
* [globo:article] Relax URL and video id regular expressions (#10379)
version 2016.08.19
Core
- Remove output template description from --help
* Recognize lowercase units in parse_filesize
Extractors
+ [porncom] Add extractor for porn.com (#2251, #10251)
+ [generic] Add support for DBTV embeds
* [vk:wallpost] Fix audio extraction for new site layout
* [vk] Fix authentication
+ [hgtvcom:show] Add extractor for hgtv.com shows (#10365)
+ [discoverygo] Add support for another GO network sites
version 2016.08.17
Core
+ Add _get_netrc_login_info
Extractors
* [mofosex] Extract all formats (#10335)
+ [generic] Add support for vbox7 embeds
+ [vbox7] Add support for embed URLs
+ [viafree] Add extractor (#10358)
+ [mtg] Add support for viafree URLs (#10358)
* [theplatform] Extract all subtitles per language
+ [xvideos] Fix HLS extraction (#10356)
+ [amcnetworks] Add extractor
+ [bbc:playlist] Add support for pagination (#10349)
+ [fxnetworks] Add extractor (#9462)
* [cbslocal] Fix extraction for SendtoNews-based videos
* [sendtonews] Fix extraction
* [jwplatform] Extract video id from JWPlayer data
- [zippcast] Remove extractor (#10332)
+ [viceland] Add extractor (#8799)
+ [adobepass] Add base extractor for Adobe Pass Authentication
* [life:embed] Improve extraction
* [vgtv] Detect geo restricted videos (#10348)
+ [uplynk] Add extractor
* [xiami] Fix extraction (#10342)
version 2016.08.13
Core
* Show progress for curl external downloader
* Forward more options to curl external downloader
Extractors
* [pbs] Fix description extraction
* [franceculture] Fix extraction (#10324)
* [pornotube] Fix extraction (#10322)
* [4tube] Fix metadata extraction (#10321)
* [imgur] Fix width and height extraction (#10325)
* [expotv] Improve extraction
+ [vbox7] Fix extraction (#10309)
- [tapely] Remove extractor (#10323)
* [muenchentv] Fix extraction (#10313)
+ [24video] Add support for .me and .xxx TLDs
* [24video] Fix comment count extraction
* [sunporno] Add support for embed URLs
* [sunporno] Fix metadata extraction (#10316)
+ [hgtv] Add extractor for hgtv.ca (#3999)
- [pbs] Remove request to unavailable API
+ [pbs] Add support for high quality HTTP formats
+ [crunchyroll] Add support for HLS formats (#10301)
version 2016.08.12
Core
* Subtitles are now written as is. Newline conversions are disabled. (#10268)
+ Recognize more formats in unified_timestamp
Extractors
- [goldenmoustache] Remove extractor (#10298)
* [drtuber] Improve title extraction
* [drtuber] Make dislike count optional (#10297)
* [chirbit] Fix extraction (#10296)
* [francetvinfo] Relax URL regular expression
* [rtlnl] Relax URL regular expression (#10282)
* [formula1] Relax URL regular expression (#10283)
* [wat] Improve extraction (#10281)
* [ctsnews] Fix extraction
version 2016.08.10
Core
* Make --metadata-from-title non fatal when title does not match the pattern
* Introduce options for randomized sleep before each download
--min-sleep-interval and --max-sleep-interval (#9930)
* Respect default in _search_json_ld
Extractors
+ [uol] Add extractor for uol.com.br (#4263)
* [rbmaradio] Fix extraction and extract all formats (#10242)
+ [sonyliv] Add extractor for sonyliv.com (#10258)
* [aparat] Fix extraction
* [cwtv] Extract HTTP formats
+ [rozhlas] Add extractor for prehravac.rozhlas.cz (#10253)
* [kuwo:singer] Fix extraction
version 2016.08.07
Core
+ Add support for TV Parental Guidelines ratings in parse_age_limit
+ Add decode_png (#9706)
+ Add support for partOfTVSeries in JSON-LD
* Lower master M3U8 manifest preference for better format sorting
Extractors
+ [discoverygo] Add extractor (#10245)
* [flipagram] Make JSON-LD extraction non fatal
* [generic] Make JSON-LD extraction non fatal
+ [bbc] Add support for morph embeds (#10239)
* [tnaflixnetworkbase] Improve title extraction
* [tnaflix] Fix metadata extraction (#10249)
* [fox] Fix theplatform release URL query
* [openload] Fix extraction (#9706)
* [bbc] Skip duplicate manifest URLs
* [bbc] Improve format code
+ [bbc] Add support for DASH and F4M
* [bbc] Improve format sorting and listing
* [bbc] Improve playlist extraction
+ [pokemon] Add extractor (#10093)
+ [condenast] Add fallback scenario for video info extraction
version 2016.08.06
Core
* Add support for JSON-LD root list entries (#10203)
* Improve unified_timestamp
* Lower preference of RTSP formats in generic sorting
+ Add support for multiple properties in _og_search_property
* Improve password hiding from verbose output
Extractors
+ [adultswim] Add support for trailers (#10235)
* [archiveorg] Improve extraction (#10219)
+ [jwplatform] Add support for playlists
+ [jwplatform] Add support for relative URLs
* [jwplatform] Improve audio detection
+ [tvplay] Capture and output native error message
+ [tvplay] Extract series metadata
+ [tvplay] Add support for subtitles (#10194)
* [tvp] Improve extraction (#7799)
* [cbslocal] Fix timestamp parsing (#10213)
+ [naver] Add support for subtitles (#8096)
* [naver] Improve extraction
* [condenast] Improve extraction
* [engadget] Relax URL regular expression
* [5min] Fix extraction
+ [nationalgeographic] Add support for Episode Guide
+ [kaltura] Add support for subtitles
* [kaltura] Optimize network requests
+ [vodplatform] Add extractor for vod-platform.net
- [gamekings] Remove extractor
* [limelight] Extract HTTP formats
* [ntvru] Fix extraction
+ [comedycentral] Re-add :tds and :thedailyshow shortnames
version 2016.08.01
Fixed/improved extractors
- [yandexmusic:track] Adapt to changes in track location JSON (#10193)
- [bloomberg] Support another form of player (#10187)
- [limelight] Skip DRM protected videos
- [safari] Relax regular expressions for URL matching (#10202)
- [cwtv] Add support for cwtvpr.com (#10196)
version 2016.07.30
Fixed/improved extractors
- [twitch:clips] Sort formats
- [tv2] Use m3u8_native
- [tv2:article] Fix video detection (#10188)
- rtve (#10076)
- [dailymotion:playlist] Optimize download archive processing (#10180)
version 2016.07.28
Fixed/improved extractors
- shared (#10170)
- soundcloud (#10179)
- twitch (#9767)
version 2016.07.26.2
Fixed/improved extractors
- smotri
- camdemy
- mtv
- comedycentral
- cmt
- cbc
- mgtv
- orf
version 2016.07.24
New extractors
- arkena (#8682)
- lcp (#8682)
Fixed/improved extractors
- facebook (#10151)
- dailymail
- telegraaf
- dcn
- onet
- tvp
Miscellaneous
- Support $Time$ in DASH manifests
version 2016.07.22
New extractors
- odatv (#9285)
Fixed/improved extractors
- bbc
- youjizz (#10131)
- youtube (#10140)
- pornhub (#10138)
- eporner (#10139)
version 2016.07.17
New extractors
- nintendo (#9986)
- streamable (#9122)
Fixed/improved extractors
- ard (#10095)
- mtv
- comedycentral (#10101)
- viki (#10098)
- spike (#10106)
Miscellaneous
- Improved twitter player detection (#10090)
version 2016.07.16
New extractors
- ninenow (#5181)
Fixed/improved extractors
- rtve (#10076)
- brightcove
- 3qsdn
- syfy (#9087, #3820, #2388)
- youtube (#10083)
Miscellaneous
- Fix subtitle embedding for video-only and audio-only files (#10081)
version 2016.07.13
New extractors
- rudo
Fixed/improved extractors
- biobiochiletv
- tvplay
- dbtv
- brightcove
- tmz
- youtube (#10059)
- shahid (#10062)
- vk
- ellentv (#10067)
version 2016.07.11
New Extractors
- roosterteeth (#9864)
Fixed/improved extractors
- miomio (#9605)
- vuclip
- youtube
- vidzi (#10058)
version 2016.07.09.2
Fixed/improved extractors
- vimeo (#1638)
- facebook (#10048)
- lynda (#10047)
- animeondemand
Fixed/improved features
- Embedding subtitles no longer throws an error with problematic inputs (#9063)
version 2016.07.09.1
Fixed/improved extractors
- youtube
- ard
- srmediatek (#9373)
version 2016.07.09
New extractors
- Flipagram (#9898)
Fixed/improved extractors
- telecinco
- toutv
- radiocanada
- tweakers (#9516)
- lynda
- nick (#7542)
- polskieradio (#10028)
- le
- facebook (#9851)
- mgtv
- animeondemand (#10031)
Fixed/improved features
- `--postprocessor-args` and `--downloader-args` now accepts non-ASCII inputs
on non-Windows systems
version 2016.07.07
New extractors
- kamcord (#10001)
Fixed/improved extractors
- spiegel (#10018)
- metacafe (#8539, #3253)
- onet (#9950)
- francetv (#9955)
- brightcove (#9965)
- daum (#9972)
version 2016.07.06
Fixed/improved extractors
- youtube (#10007, #10009)
- xuite
- stitcher
- spiegel
- slideshare
- sandia
- rtvnh
- prosiebensat1
- onionstudios
version 2016.07.05
Fixed/improved extractors
- brightcove
- yahoo (#9995)
- pornhub (#9997)
- iqiyi
- kaltura (#5557)
- la7
- Changed features
- Rename --cn-verfication-proxy to --geo-verification-proxy
Miscellaneous
- Add script for displaying downloads statistics
version 2016.07.03.1
Fixed/improved extractors
- theplatform
- aenetworks
- nationalgeographic
- hrti (#9482)
- facebook (#5701)
- buzzfeed (#5701)
- rai (#8617, #9157, #9232, #8552, #8551)
- nationalgeographic (#9991)
- iqiyi
version 2016.07.03
New extractors
- hrti (#9482)
Fixed/improved extractors
- vk (#9981)
- facebook (#9938)
- xtube (#9953, #9961)
version 2016.07.02
New extractors
- fusion (#9958)
Fixed/improved extractors
- twitch (#9975)
- vine (#9970)
- periscope (#9967)
- pornhub (#8696)
version 2016.07.01
New extractors
- 9c9media
- ctvnews (#2156)
- ctv (#4077)
Fixed/Improved extractors
- rds
- meta (#8789)
- pornhub (#9964)
- sixplay (#2183)
New features
- Accept quoted strings across multiple lines (#9940)

View File

@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean: clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete find . -name "*.pyc" -delete
find . -name "*.class" -delete find . -name "*.class" -delete
@ -12,7 +12,7 @@ SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python PYTHON ?= /usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local # set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi) SYSCONFDIR != if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
install -d $(DESTDIR)$(BINDIR) install -d $(DESTDIR)$(BINDIR)
@ -90,11 +90,11 @@ fish-completion: youtube-dl.fish
lazy-extractors: youtube_dl/extractor/lazy_extractors.py lazy-extractors: youtube_dl/extractor/lazy_extractors.py
_EXTRACTOR_FILES = $(shell find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py') _EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py'
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES) youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@ $(PYTHON) devscripts/make_lazy_extractors.py $@
youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish ChangeLog youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
@tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \ @tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
--exclude '*.DS_Store' \ --exclude '*.DS_Store' \
--exclude '*.kate-swp' \ --exclude '*.kate-swp' \
@ -107,7 +107,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude 'docs/_build' \ --exclude 'docs/_build' \
-- \ -- \
bin devscripts test youtube_dl docs \ bin devscripts test youtube_dl docs \
ChangeLog LICENSE README.md README.txt \ LICENSE README.md README.txt \
Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \ Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \
youtube-dl.zsh youtube-dl.fish setup.py \ youtube-dl.zsh youtube-dl.fish setup.py \
youtube-dl youtube-dl

354
README.md
View File

@ -17,7 +17,7 @@ youtube-dl - download videos from youtube.com or other video platforms
To install it right away for all UNIX users (Linux, OS X, etc.), type: To install it right away for all UNIX users (Linux, OS X, etc.), type:
sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl sudo curl https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl sudo chmod a+rx /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget: If you do not have curl, you can alternatively use a recent wget:
@ -27,24 +27,18 @@ If you do not have curl, you can alternatively use a recent wget:
Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`). Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
You can also use pip: OS X users can install **youtube-dl** with [Homebrew](http://brew.sh/).
sudo pip install --upgrade youtube-dl
This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.
OS X users can install youtube-dl with [Homebrew](http://brew.sh/):
brew install youtube-dl brew install youtube-dl
Or with [MacPorts](https://www.macports.org/): You can also use pip:
sudo port install youtube-dl sudo pip install youtube-dl
Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html). Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
# DESCRIPTION # DESCRIPTION
**youtube-dl** is a command-line program to download videos from **youtube-dl** is a small command-line program to download videos from
YouTube.com and a few more sites. It requires the Python interpreter, version YouTube.com and a few more sites. It requires the Python interpreter, version
2.6, 2.7, or 3.2+, and it is not platform specific. It should work on 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on
your Unix box, on Windows or on Mac OS X. It is released to the public domain, your Unix box, on Windows or on Mac OS X. It is released to the public domain,
@ -89,8 +83,6 @@ which means you can modify it, redistribute it or use it however you like.
--mark-watched Mark videos watched (YouTube only) --mark-watched Mark videos watched (YouTube only)
--no-mark-watched Do not mark videos watched (YouTube only) --no-mark-watched Do not mark videos watched (YouTube only)
--no-color Do not emit color codes in output --no-color Do not emit color codes in output
--abort-on-unavailable-fragment Abort downloading when some fragment is not
available
## Network Options: ## Network Options:
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy. --proxy URL Use the specified HTTP/HTTPS/SOCKS proxy.
@ -105,9 +97,9 @@ which means you can modify it, redistribute it or use it however you like.
(experimental) (experimental)
-6, --force-ipv6 Make all connections via IPv6 -6, --force-ipv6 Make all connections via IPv6
(experimental) (experimental)
--geo-verification-proxy URL Use this proxy to verify the IP address for --cn-verification-proxy URL Use this proxy to verify the IP address for
some geo-restricted sites. The default some Chinese sites. The default proxy
proxy specified by --proxy (or none, if the specified by --proxy (or none, if the
options is not present) is used for the options is not present) is used for the
actual downloading. (experimental) actual downloading. (experimental)
@ -175,10 +167,7 @@ which means you can modify it, redistribute it or use it however you like.
-R, --retries RETRIES Number of retries (default is 10), or -R, --retries RETRIES Number of retries (default is 10), or
"infinite". "infinite".
--fragment-retries RETRIES Number of retries for a fragment (default --fragment-retries RETRIES Number of retries for a fragment (default
is 10), or "infinite" (DASH and hlsnative is 10), or "infinite" (DASH only)
only)
--skip-unavailable-fragments Skip unavailable fragments (DASH and
hlsnative only)
--buffer-size SIZE Size of download buffer (e.g. 1024 or 16K) --buffer-size SIZE Size of download buffer (e.g. 1024 or 16K)
(default is 1024) (default is 1024)
--no-resize-buffer Do not automatically adjust the buffer --no-resize-buffer Do not automatically adjust the buffer
@ -206,8 +195,32 @@ which means you can modify it, redistribute it or use it however you like.
-a, --batch-file FILE File containing URLs to download ('-' for -a, --batch-file FILE File containing URLs to download ('-' for
stdin) stdin)
--id Use only video ID in file name --id Use only video ID in file name
-o, --output TEMPLATE Output filename template, see the "OUTPUT -o, --output TEMPLATE Output filename template. Use %(title)s to
TEMPLATE" for all the info get the title, %(uploader)s for the
uploader name, %(uploader_id)s for the
uploader nickname if different,
%(autonumber)s to get an automatically
incremented number, %(ext)s for the
filename extension, %(format)s for the
format description (like "22 - 1280x720" or
"HD"), %(format_id)s for the unique id of
the format (like YouTube's itags: "137"),
%(upload_date)s for the upload date
(YYYYMMDD), %(extractor)s for the provider
(youtube, metacafe, etc), %(id)s for the
video id, %(playlist_title)s,
%(playlist_id)s, or %(playlist)s (=title if
present, ID otherwise) for the playlist the
video is in, %(playlist_index)s for the
position in the playlist. %(height)s and
%(width)s for the width and height of the
video format. %(resolution)s for a textual
description of the resolution of the video
format. %% for a literal percent. Use - to
output to stdout. Can also be used to
download to a different directory, for
example with -o '/my/downloads/%(uploader)s
/%(title)s-%(id)s.%(ext)s' .
--autonumber-size NUMBER Specify the number of digits in --autonumber-size NUMBER Specify the number of digits in
%(autonumber)s when it is present in output %(autonumber)s when it is present in output
filename template or --auto-number option filename template or --auto-number option
@ -236,7 +249,7 @@ which means you can modify it, redistribute it or use it however you like.
--write-info-json Write video metadata to a .info.json file --write-info-json Write video metadata to a .info.json file
--write-annotations Write video annotations to a --write-annotations Write video annotations to a
.annotations.xml file .annotations.xml file
--load-info-json FILE JSON file containing the video information --load-info FILE JSON file containing the video information
(created with the "--write-info-json" (created with the "--write-info-json"
option) option)
--cookies FILE File to read cookies from and dump cookie --cookies FILE File to read cookies from and dump cookie
@ -311,15 +324,7 @@ which means you can modify it, redistribute it or use it however you like.
bidirectional text support. Requires bidiv bidirectional text support. Requires bidiv
or fribidi executable in PATH or fribidi executable in PATH
--sleep-interval SECONDS Number of seconds to sleep before each --sleep-interval SECONDS Number of seconds to sleep before each
download when used alone or a lower bound download.
of a range for randomized sleep before each
download (minimum possible number of
seconds to sleep) when used along with
--max-sleep-interval.
--max-sleep-interval SECONDS Upper bound of a range for randomized sleep
before each download (maximum possible
number of seconds to sleep). Must only be
used along with --min-sleep-interval.
## Video Format Options: ## Video Format Options:
-f, --format FORMAT Video format code, see the "FORMAT -f, --format FORMAT Video format code, see the "FORMAT
@ -358,17 +363,6 @@ which means you can modify it, redistribute it or use it however you like.
-n, --netrc Use .netrc authentication data -n, --netrc Use .netrc authentication data
--video-password PASSWORD Video password (vimeo, smotri, youku) --video-password PASSWORD Video password (vimeo, smotri, youku)
## Adobe Pass Options:
--ap-mso MSO Adobe Pass multiple-system operator (TV
provider) identifier, use --ap-list-mso for
a list of available MSOs
--ap-username USERNAME Multiple-system operator account login
--ap-password PASSWORD Multiple-system operator account password.
If this option is left out, youtube-dl will
ask interactively.
--ap-list-mso List all supported multiple-system
operators
## Post-processing Options: ## Post-processing Options:
-x, --extract-audio Convert video files to audio-only files -x, --extract-audio Convert video files to audio-only files
(requires ffmpeg or avconv and ffprobe or (requires ffmpeg or avconv and ffprobe or
@ -424,22 +418,13 @@ which means you can modify it, redistribute it or use it however you like.
# CONFIGURATION # CONFIGURATION
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. Note that by default configuration file may not exist so you may need to create it yourself. You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`.
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory: For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
``` ```
# Lines starting with # are comments
# Always extract audio
-x -x
# Do not copy the mtime
--no-mtime --no-mtime
# Use this proxy
--proxy 127.0.0.1:3128 --proxy 127.0.0.1:3128
# Save all videos under Movies directory in your home directory
-o ~/Movies/%(title)s.%(ext)s -o ~/Movies/%(title)s.%(ext)s
``` ```
@ -449,12 +434,12 @@ You can use `--ignore-config` if you want to disable the configuration file for
### Authentication with `.netrc` file ### Authentication with `.netrc` file
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you: You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
``` ```
touch $HOME/.netrc touch $HOME/.netrc
chmod a-rwx,u+rw $HOME/.netrc chmod a-rwx,u+rw $HOME/.netrc
``` ```
After that you can add credentials for an extractor in the following format, where *extractor* is the name of the extractor in lowercase: After that you can add credentials for extractor in the following format, where *extractor* is the name of extractor in lowercase:
``` ```
machine <extractor> login <login> password <password> machine <extractor> login <login> password <password>
``` ```
@ -520,9 +505,6 @@ The basic usage is not to set any template arguments when downloading a single f
- `autonumber`: Five-digit number that will be increased with each download, starting at zero - `autonumber`: Five-digit number that will be increased with each download, starting at zero
- `playlist`: Name or id of the playlist that contains the video - `playlist`: Name or id of the playlist that contains the video
- `playlist_index`: Index of the video in the playlist padded with leading zeros according to the total length of the playlist - `playlist_index`: Index of the video in the playlist padded with leading zeros according to the total length of the playlist
- `playlist_id`: Playlist identifier
- `playlist_title`: Playlist title
Available for the video that belongs to some logical chapter or section: Available for the video that belongs to some logical chapter or section:
- `chapter`: Name or title of the chapter the video belongs to - `chapter`: Name or title of the chapter the video belongs to
@ -550,22 +532,18 @@ Available for the media that is a track or a part of a music album:
- `disc_number`: Number of the disc or other physical medium the track belongs to - `disc_number`: Number of the disc or other physical medium the track belongs to
- `release_year`: Year (YYYY) when the album was released - `release_year`: Year (YYYY) when the album was released
Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with `NA`. Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`.
For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj`, this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory. For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
Output templates can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` which will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you. Output template can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` that will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
To use percent literals in an output template use `%%`. To output to stdout use `-o -`. To specify percent literal in output template use `%%`. To output to stdout use `-o -`.
The current default template is `%(title)s-%(id)s.%(ext)s`. The current default template is `%(title)s-%(id)s.%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title: In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
#### Output template and Windows batch files
If you are using an output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`.
#### Output template examples #### Output template examples
Note on Windows you may need to use double quotes instead of single. Note on Windows you may need to use double quotes instead of single.
@ -597,7 +575,7 @@ $ youtube-dl -o - BaW_jenozKc
By default youtube-dl tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dl will guess it for you by **default**. By default youtube-dl tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dl will guess it for you by **default**.
But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so-called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more. But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more.
The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download. The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
@ -605,21 +583,21 @@ The general syntax for format selection is `--format FORMAT` or shorter `-f FORM
The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific. The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.
You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download the best quality format of a particular file extension served as a single file, e.g. `-f webm` will download the best quality format with the `webm` extension served as a single file. You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download best quality format of particular file extension served as a single file, e.g. `-f webm` will download best quality format with `webm` extension served as a single file.
You can also use special names to select particular edge case formats: You can also use special names to select particular edge case format:
- `best`: Select the best quality format represented by a single file with video and audio. - `best`: Select best quality format represented by single file with video and audio
- `worst`: Select the worst quality format represented by a single file with video and audio. - `worst`: Select worst quality format represented by single file with video and audio
- `bestvideo`: Select the best quality video-only format (e.g. DASH video). May not be available. - `bestvideo`: Select best quality video only format (e.g. DASH video), may not be available
- `worstvideo`: Select the worst quality video-only format. May not be available. - `worstvideo`: Select worst quality video only format, may not be available
- `bestaudio`: Select the best quality audio only-format. May not be available. - `bestaudio`: Select best quality audio only format, may not be available
- `worstaudio`: Select the worst quality audio only-format. May not be available. - `worstaudio`: Select worst quality audio only format, may not be available
For example, to download the worst quality video-only format you can use `-f worstvideo`. For example, to download worst quality video only format you can use `-f worstvideo`.
If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download. If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
If you want to download several formats of the same video use a comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or a more sophisticated example combined with the precedence feature: `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`. If you want to download several formats of the same video use comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or more sophisticated example combined with precedence feature `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`). You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`).
@ -641,15 +619,15 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
- `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native` - `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
- `format_id`: A short description of the format - `format_id`: A short description of the format
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster. Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster.
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download the best video-only format, the best audio-only format and mux them together with ffmpeg/avconv. You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download best video only format, best audio only format and mux them together with ffmpeg/avconv.
Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`. Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
Since the end of April 2015 and version 2015.04.26, youtube-dl uses `-f bestvideo+bestaudio/best` as the default format selection (see [#5447](https://github.com/rg3/youtube-dl/issues/5447), [#5456](https://github.com/rg3/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed. Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see [#5447](https://github.com/rg3/youtube-dl/issues/5447), [#5456](https://github.com/rg3/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl. If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl.
@ -669,11 +647,7 @@ $ youtube-dl -f 'best[filesize<50M]'
# Download best format available via direct link over HTTP/HTTPS protocol # Download best format available via direct link over HTTP/HTTPS protocol
$ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]' $ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'
# Download the best video format and the best audio format without merging them
$ youtube-dl -f 'bestvideo,bestaudio' -o '%(title)s.f%(format_id)s.%(ext)s'
``` ```
Note that in the last example, an output template is recommended as bestvideo and bestaudio may have the same file name.
# VIDEO SELECTION # VIDEO SELECTION
@ -728,7 +702,7 @@ Add a file exclusion for `youtube-dl.exe` in Windows Defender settings.
YouTube changed their playlist format in March 2014 and later on, so you'll need at least youtube-dl 2014.07.25 to download all YouTube videos. YouTube changed their playlist format in March 2014 and later on, so you'll need at least youtube-dl 2014.07.25 to download all YouTube videos.
If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging people](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update. If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging guys](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update.
### I'm getting an error when trying to use output template: `error: using output template conflicts with using title, video ID or auto number` ### I'm getting an error when trying to use output template: `error: using output template conflicts with using title, video ID or auto number`
@ -754,11 +728,11 @@ Videos or video formats streamed via RTMP protocol can only be downloaded when [
### I have downloaded a video but how can I play it? ### I have downloaded a video but how can I play it?
Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org/) or [mplayer](http://www.mplayerhq.hu/). Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/).
### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser. ### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser.
It depends a lot on the service. In many cases, requests for the video (to download/play it) must come from the same IP address and with the same cookies and/or HTTP headers. Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl. You can also get necessary cookies and HTTP headers from JSON output obtained with `--dump-json`. It depends a lot on the service. In many cases, requests for the video (to download/play it) must come from the same IP address and with the same cookies. Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl.
It may be beneficial to use IPv6; in some cases, the restrictions are only applied to IPv4. Some services (sometimes only for a subset of videos) do not restrict the video URL by IP address, cookie, or user-agent, but these are the exception rather than the rule. It may be beneficial to use IPv6; in some cases, the restrictions are only applied to IPv4. Some services (sometimes only for a subset of videos) do not restrict the video URL by IP address, cookie, or user-agent, but these are the exception rather than the rule.
@ -836,42 +810,10 @@ Either prepend `http://www.youtube.com/watch?v=` or separate the ID from the opt
### How do I pass cookies to youtube-dl? ### How do I pass cookies to youtube-dl?
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare). Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).
### How do I stream directly to media player?
You will first need to tell youtube-dl to stream media to stdout with `-o -`, and also tell your media player to read from stdin (it must be capable of this for streaming) and then pipe former to latter. For example, streaming to [vlc](http://www.videolan.org/) can be achieved with:
youtube-dl -o - "http://www.youtube.com/watch?v=BaW_jenozKcj" | vlc -
### How do I download only new videos from a playlist?
Use download-archive feature. With this feature you should initially download the complete playlist with `--download-archive /path/to/download/archive/file.txt` that will record identifiers of all the videos in a special file. Each subsequent run with the same `--download-archive` will download only new videos and skip all videos that have been downloaded before. Note that only successful downloads are recorded in the file.
For example, at first,
youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
will download the complete `PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re` playlist and create a file `archive.txt`. Each subsequent run will only download new videos if any:
youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
### Should I add `--hls-prefer-native` into my config?
When youtube-dl detects an HLS video, it can download it either with the built-in downloader or ffmpeg. Since many HLS streams are slightly invalid and ffmpeg/youtube-dl each handle some invalid cases better than the other, there is an option to switch the downloader if needed.
When youtube-dl knows that one particular downloader works better for a given website, that downloader will be picked. Otherwise, youtube-dl will pick the best downloader for general compatibility, which at the moment happens to be ffmpeg. This choice may change in future versions of youtube-dl, with improvements of the built-in downloader and/or ffmpeg.
In particular, the generic extractor (used when your website is not in the [list of supported sites by youtube-dl](http://rg3.github.io/youtube-dl/supportedsites.html) cannot mandate one specific downloader.
If you put either `--hls-prefer-native` or `--hls-prefer-ffmpeg` into your configuration, a different subset of videos will fail to download correctly. Instead, it is much better to [file an issue](https://yt-dl.org/bug) or a pull request which details why the native or the ffmpeg HLS downloader is a better choice for your use case.
### Can you add support for this anime video site, or site which shows current movies for free? ### Can you add support for this anime video site, or site which shows current movies for free?
As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl. As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl.
@ -900,12 +842,6 @@ It is *not* possible to detect whether a URL is supported or not. That's because
If you want to find out whether a given URL is supported, simply call youtube-dl with it. If you get no videos back, chances are the URL is either not referring to a video or unsupported. You can find out which by examining the output (if you run youtube-dl on the console) or catching an `UnsupportedError` exception if you run it from a Python program. If you want to find out whether a given URL is supported, simply call youtube-dl with it. If you get no videos back, chances are the URL is either not referring to a video or unsupported. You can find out which by examining the output (if you run youtube-dl on the console) or catching an `UnsupportedError` exception if you run it from a Python program.
# Why do I need to go through that much red tape when filing bugs?
Before we had the issue template, despite our extensive [bug reporting instructions](#bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube-dl but in general shell usage), because the problem was already reported multiple times before, because people did not actually read an error message, even if it said "please install ffmpeg", because people did not mention the URL they were trying to download and many more simple, easy-to-avoid problems, many of whom were totally unrelated to youtube-dl.
youtube-dl is an open-source project manned by too few volunteers, so we'd rather spend time fixing bugs where we are certain none of those simple problems apply, and where we can be reasonably confident to be able to reproduce the issue without asking the reporter repeatedly. As such, the output of `youtube-dl -v YOUR_URL_HERE` is really all that's required to file an issue. The issue template also guides you through some basic steps you can do, such as checking that your version of youtube-dl is current.
# DEVELOPER INSTRUCTIONS # DEVELOPER INSTRUCTIONS
Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution. Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
@ -923,7 +859,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need If you want to create a build of youtube-dl yourself, you'll need
* python * python
* make (only GNU make is supported) * make (both GNU make and BSD make are supported)
* pandoc * pandoc
* zip * zip
* nosetests * nosetests
@ -935,17 +871,9 @@ If you want to add support for a new site, first of all **make sure** this site
After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`): After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork) 1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
2. Check out the source code with: 2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
3. Start a new git branch with
cd youtube-dl
git checkout -b yourextractor
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`: 4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
```python ```python
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@ -988,152 +916,20 @@ After you have ensured this site is distributing it's content legally, you can f
``` ```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+. 8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this: 9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py $ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor' $ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor $ git push origin yourextractor
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it. 11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions! In any case, thank you very much for your contributions!
## youtube-dl coding conventions
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
- `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
```python
meta = self._download_json(url, video_id)
```
Assume at this point `meta`'s layout is:
```python
{
...
"summary": "some fancy summary text",
...
}
```
Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python
description = meta.get('summary') # correct
```
and not like:
```python
description = meta['summary'] # incorrect
```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', fatal=False)
```
With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
You can also pass `default=<some fallback value>`, for example:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', default=None)
```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
### Provide fallbacks
When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
#### Example
Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
```python
title = meta['title']
```
If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
```python
title = meta.get('title') or self._og_search_title(webpage)
```
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible
When using regular expressions try to write them fuzzy and flexible.
#### Example
Say you need to extract `title` from the following HTML code:
```html
<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
```
The code for that task should look similar to:
```python
title = self._search_regex(
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
```
Or even better:
```python
title = self._search_regex(
r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
webpage, 'title', group='title')
```
Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like:
```python
title = self._search_regex(
r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
webpage, 'title', group='title')
```
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
# EMBEDDING YOUTUBE-DL # EMBEDDING YOUTUBE-DL
youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/rg3/youtube-dl/issues/new). youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/rg3/youtube-dl/issues/new).
@ -1149,7 +945,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc']) ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
``` ```
Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L128-L278). For a start, if you want to intercept youtube-dl's output, set a `logger` object. Most likely, you'll want to use various options. For a list of what can be done, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L121-L269). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file: Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:
@ -1190,7 +986,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
# BUGS # BUGS
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)). Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this: **Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
``` ```
@ -1206,7 +1002,7 @@ $ youtube-dl -v <your command line>
[debug] Proxy map: {} [debug] Proxy map: {}
... ...
``` ```
**Do not post screenshots of verbose logs; only plain text is acceptable.** **Do not post screenshots of verbose log only plain text is acceptable.**
The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever. The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
@ -1240,7 +1036,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
### Why are existing options not enough? ### Why are existing options not enough?
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem. Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
### Is there enough context in your bug report? ### Is there enough context in your bug report?
@ -1260,7 +1056,7 @@ Only post features that you (or an incapacitated friend you can personally talk
### Is your question about youtube-dl? ### Is your question about youtube-dl?
It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug. It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different or even the reporter's own application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
# COPYRIGHT # COPYRIGHT

View File

@ -13,7 +13,6 @@ import os.path
sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__))))) sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__)))))
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_input,
compat_http_server, compat_http_server,
compat_str, compat_str,
compat_urlparse, compat_urlparse,
@ -31,6 +30,11 @@ try:
except ImportError: # Python 2 except ImportError: # Python 2
import SocketServer as compat_socketserver import SocketServer as compat_socketserver
try:
compat_input = raw_input
except NameError: # Python 3
compat_input = input
class BuildHTTPServer(compat_socketserver.ThreadingMixIn, compat_http_server.HTTPServer): class BuildHTTPServer(compat_socketserver.ThreadingMixIn, compat_http_server.HTTPServer):
allow_reuse_address = True allow_reuse_address = True

View File

@ -1,111 +0,0 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import base64
import json
import mimetypes
import netrc
import optparse
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_basestring,
compat_input,
compat_getpass,
compat_print,
compat_urllib_request,
)
from youtube_dl.utils import (
make_HTTPS_handler,
sanitized_Request,
)
class GitHubReleaser(object):
_API_URL = 'https://api.github.com/repos/rg3/youtube-dl/releases'
_UPLOADS_URL = 'https://uploads.github.com/repos/rg3/youtube-dl/releases/%s/assets?name=%s'
_NETRC_MACHINE = 'github.com'
def __init__(self, debuglevel=0):
self._init_github_account()
https_handler = make_HTTPS_handler({}, debuglevel=debuglevel)
self._opener = compat_urllib_request.build_opener(https_handler)
def _init_github_account(self):
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
self._username = info[0]
self._password = info[2]
compat_print('Using GitHub credentials found in .netrc...')
return
else:
compat_print('No GitHub credentials found in .netrc')
except (IOError, netrc.NetrcParseError):
compat_print('Unable to parse .netrc')
self._username = compat_input(
'Type your GitHub username or email address and press [Return]: ')
self._password = compat_getpass(
'Type your GitHub password and press [Return]: ')
def _call(self, req):
if isinstance(req, compat_basestring):
req = sanitized_Request(req)
# Authorizing manually since GitHub does not response with 401 with
# WWW-Authenticate header set (see
# https://developer.github.com/v3/#basic-authentication)
b64 = base64.b64encode(
('%s:%s' % (self._username, self._password)).encode('utf-8')).decode('ascii')
req.add_header('Authorization', 'Basic %s' % b64)
response = self._opener.open(req).read().decode('utf-8')
return json.loads(response)
def list_releases(self):
return self._call(self._API_URL)
def create_release(self, tag_name, name=None, body='', draft=False, prerelease=False):
data = {
'tag_name': tag_name,
'target_commitish': 'master',
'name': name,
'body': body,
'draft': draft,
'prerelease': prerelease,
}
req = sanitized_Request(self._API_URL, json.dumps(data).encode('utf-8'))
return self._call(req)
def create_asset(self, release_id, asset):
asset_name = os.path.basename(asset)
url = self._UPLOADS_URL % (release_id, asset_name)
# Our files are small enough to be loaded directly into memory.
data = open(asset, 'rb').read()
req = sanitized_Request(url, data)
mime_type, _ = mimetypes.guess_type(asset_name)
req.add_header('Content-Type', mime_type or 'application/octet-stream')
return self._call(req)
def main():
parser = optparse.OptionParser(usage='%prog VERSION BUILDPATH')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected a version and a build directory')
version, build_path = args
releaser = GitHubReleaser()
new_release = releaser.create_release(version, name='youtube-dl %s' % version)
release_id = new_release['id']
for asset in os.listdir(build_path):
compat_print('Uploading %s...' % asset)
releaser.create_asset(release_id, os.path.join(build_path, asset))
if __name__ == '__main__':
main()

View File

@ -15,9 +15,13 @@ data = urllib.request.urlopen(URL).read()
with open('download.html.in', 'r', encoding='utf-8') as tmplf: with open('download.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read() template = tmplf.read()
md5sum = hashlib.md5(data).hexdigest()
sha1sum = hashlib.sha1(data).hexdigest()
sha256sum = hashlib.sha256(data).hexdigest() sha256sum = hashlib.sha256(data).hexdigest()
template = template.replace('@PROGRAM_VERSION@', version) template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_URL@', URL) template = template.replace('@PROGRAM_URL@', URL)
template = template.replace('@PROGRAM_MD5SUM@', md5sum)
template = template.replace('@PROGRAM_SHA1SUM@', sha1sum)
template = template.replace('@PROGRAM_SHA256SUM@', sha256sum) template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0]) template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0])
template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1]) template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1])

8
devscripts/install_srelay.sh Executable file
View File

@ -0,0 +1,8 @@
#!/bin/bash
mkdir -p tmp && cd tmp
wget -N http://downloads.sourceforge.net/project/socks-relay/socks-relay/srelay-0.4.8/srelay-0.4.8b6.tar.gz
tar zxvf srelay-0.4.8b6.tar.gz
cd srelay-0.4.8b6
./configure
make

View File

@ -1,4 +1,4 @@
# coding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re

View File

@ -14,17 +14,15 @@ if os.path.exists(lazy_extractors_filename):
os.remove(lazy_extractors_filename) os.remove(lazy_extractors_filename)
from youtube_dl.extractor import _ALL_CLASSES from youtube_dl.extractor import _ALL_CLASSES
from youtube_dl.extractor.common import InfoExtractor, SearchInfoExtractor from youtube_dl.extractor.common import InfoExtractor
with open('devscripts/lazy_load_template.py', 'rt') as f: with open('devscripts/lazy_load_template.py', 'rt') as f:
module_template = f.read() module_template = f.read()
module_contents = [ module_contents = [module_template + '\n' + getsource(InfoExtractor.suitable)]
module_template + '\n' + getsource(InfoExtractor.suitable) + '\n',
'class LazyLoadSearchExtractor(LazyLoadExtractor):\n pass\n']
ie_template = ''' ie_template = '''
class {name}({bases}): class {name}(LazyLoadExtractor):
_VALID_URL = {valid_url!r} _VALID_URL = {valid_url!r}
_module = '{module}' _module = '{module}'
''' '''
@ -36,20 +34,10 @@ make_valid_template = '''
''' '''
def get_base_name(base):
if base is InfoExtractor:
return 'LazyLoadExtractor'
elif base is SearchInfoExtractor:
return 'LazyLoadSearchExtractor'
else:
return base.__name__
def build_lazy_ie(ie, name): def build_lazy_ie(ie, name):
valid_url = getattr(ie, '_VALID_URL', None) valid_url = getattr(ie, '_VALID_URL', None)
s = ie_template.format( s = ie_template.format(
name=name, name=name,
bases=', '.join(map(get_base_name, ie.__bases__)),
valid_url=valid_url, valid_url=valid_url,
module=ie.__module__) module=ie.__module__)
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__: if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
@ -59,35 +47,12 @@ def build_lazy_ie(ie, name):
s += make_valid_template.format(valid_url=ie._make_valid_url()) s += make_valid_template.format(valid_url=ie._make_valid_url())
return s return s
# find the correct sorting and add the required base classes so that sublcasses
# can be correctly created
classes = _ALL_CLASSES[:-1]
ordered_cls = []
while classes:
for c in classes[:]:
bases = set(c.__bases__) - set((object, InfoExtractor, SearchInfoExtractor))
stop = False
for b in bases:
if b not in classes and b not in ordered_cls:
if b.__name__ == 'GenericIE':
exit()
classes.insert(0, b)
stop = True
if stop:
break
if all(b in ordered_cls for b in bases):
ordered_cls.append(c)
classes.remove(c)
break
ordered_cls.append(_ALL_CLASSES[-1])
names = [] names = []
for ie in ordered_cls: for ie in list(sorted(_ALL_CLASSES[:-1], key=lambda cls: cls.ie_key())) + _ALL_CLASSES[-1:]:
name = ie.__name__ name = ie.ie_key() + 'IE'
src = build_lazy_ie(ie, name) src = build_lazy_ie(ie, name)
module_contents.append(src) module_contents.append(src)
if ie in _ALL_CLASSES: names.append(name)
names.append(name)
module_contents.append( module_contents.append(
'_ALL_CLASSES = [{0}]'.format(', '.join(names))) '_ALL_CLASSES = [{0}]'.format(', '.join(names)))

View File

@ -54,21 +54,17 @@ def filter_options(readme):
if in_options: if in_options:
if line.lstrip().startswith('-'): if line.lstrip().startswith('-'):
split = re.split(r'\s{2,}', line.lstrip()) option, description = re.split(r'\s{2,}', line.lstrip())
# Description string may start with `-` as well. If there is split_option = option.split(' ')
# only one piece then it's a description bit not an option.
if len(split) > 1:
option, description = split
split_option = option.split(' ')
if not split_option[-1].startswith('-'): # metavar if not split_option[-1].startswith('-'): # metavar
option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]]) option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
# Pandoc's definition_lists. See http://pandoc.org/README.html # Pandoc's definition_lists. See http://pandoc.org/README.html
# for more information. # for more information.
ret += '\n%s\n: %s\n' % (option, description) ret += '\n%s\n: %s\n' % (option, description)
continue else:
ret += line.lstrip() + '\n' ret += line.lstrip() + '\n'
else: else:
ret += line + '\n' ret += line + '\n'

View File

@ -15,7 +15,6 @@
set -e set -e
skip_tests=true skip_tests=true
gpg_sign_commits=""
buildserver='localhost:8142' buildserver='localhost:8142'
while true while true
@ -25,10 +24,6 @@ case "$1" in
skip_tests=false skip_tests=false
shift shift
;; ;;
--gpg-sign-commits|-S)
gpg_sign_commits="-S"
shift
;;
--buildserver) --buildserver)
buildserver="$2" buildserver="$2"
shift 2 shift 2
@ -60,9 +55,6 @@ if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; e
if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi
read -p "Is ChangeLog up to date? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
/bin/echo -e "\n### First of all, testing..." /bin/echo -e "\n### First of all, testing..."
make clean make clean
if $skip_tests ; then if $skip_tests ; then
@ -74,13 +66,10 @@ fi
/bin/echo -e "\n### Changing version in version.py..." /bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Changing version in ChangeLog..."
sed -i "s/<unreleased>/$version/" ChangeLog
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..." /bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py ChangeLog git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
git commit $gpg_sign_commits -m "release $version" git commit -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..." /bin/echo -e "\n### Now tagging, signing and pushing..."
git tag -s -m "Release $version" "$version" git tag -s -m "Release $version" "$version"
@ -106,16 +95,15 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
(cd build/$version/ && sha256sum $RELEASE_FILES > SHA2-256SUMS) (cd build/$version/ && sha256sum $RELEASE_FILES > SHA2-256SUMS)
(cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS) (cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS)
/bin/echo -e "\n### Signing and uploading the new binaries to GitHub..." /bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..."
for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done
scp -r "build/$version" ytdl@yt-dl.org:html/tmp/
ROOT=$(pwd) ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/"
python devscripts/create-github-release.py $version "$ROOT/build/$version"
ssh ytdl@yt-dl.org "sh html/update_latest.sh $version" ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
/bin/echo -e "\n### Now switching to gh-pages..." /bin/echo -e "\n### Now switching to gh-pages..."
git clone --branch gh-pages --single-branch . build/gh-pages git clone --branch gh-pages --single-branch . build/gh-pages
ROOT=$(pwd)
( (
set -e set -e
ORIGIN_URL=$(git config --get remote.origin.url) ORIGIN_URL=$(git config --get remote.origin.url)
@ -127,7 +115,7 @@ git clone --branch gh-pages --single-branch . build/gh-pages
"$ROOT/devscripts/gh-pages/update-copyright.py" "$ROOT/devscripts/gh-pages/update-copyright.py"
"$ROOT/devscripts/gh-pages/update-sites.py" "$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update git add *.html *.html.in update
git commit $gpg_sign_commits -m "release $version" git commit -m "release $version"
git push "$ROOT" gh-pages git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages git push "$ORIGIN_URL" gh-pages
) )

View File

@ -1,47 +0,0 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import itertools
import json
import os
import re
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_print,
compat_urllib_request,
)
from youtube_dl.utils import format_bytes
def format_size(bytes):
return '%s (%d bytes)' % (format_bytes(bytes), bytes)
total_bytes = 0
for page in itertools.count(1):
releases = json.loads(compat_urllib_request.urlopen(
'https://api.github.com/repos/rg3/youtube-dl/releases?page=%s' % page
).read().decode('utf-8'))
if not releases:
break
for release in releases:
compat_print(release['name'])
for asset in release['assets']:
asset_name = asset['name']
total_bytes += asset['download_count'] * asset['size']
if all(not re.match(p, asset_name) for p in (
r'^youtube-dl$',
r'^youtube-dl-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
r'^youtube-dl\.exe$')):
continue
compat_print(
' %s size: %s downloads: %d'
% (asset_name, format_size(asset['size']), asset['download_count']))
compat_print('total downloads traffic: %s' % format_size(total_bytes))

View File

@ -1,4 +1,4 @@
# coding: utf-8 # -*- coding: utf-8 -*-
# #
# youtube-dl documentation build configuration file, created by # youtube-dl documentation build configuration file, created by
# sphinx-quickstart on Fri Mar 14 21:05:43 2014. # sphinx-quickstart on Fri Mar 14 21:05:43 2014.

View File

@ -13,16 +13,11 @@
- **5min** - **5min**
- **8tracks** - **8tracks**
- **91porn** - **91porn**
- **9c9media**
- **9c9media:stack**
- **9gag** - **9gag**
- **9now.com.au**
- **abc.net.au** - **abc.net.au**
- **abc.net.au:iview** - **Abc7News**
- **abcnews** - **abcnews**
- **abcnews:video** - **abcnews:video**
- **abcotvs**: ABC Owned Television Stations
- **abcotvs:clips**
- **AcademicEarth:Course** - **AcademicEarth:Course**
- **acast** - **acast**
- **acast:channel** - **acast:channel**
@ -33,13 +28,11 @@
- **AdobeTVVideo** - **AdobeTVVideo**
- **AdultSwim** - **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **AfreecaTV**: afreecatv.com - **Aftonbladet**
- **AirMozilla** - **AirMozilla**
- **AlJazeera** - **AlJazeera**
- **Allocine** - **Allocine**
- **AlphaPorno** - **AlphaPorno**
- **AMCNetworks**
- **anderetijden**: npo.nl and ntr.nl
- **AnimeOnDemand** - **AnimeOnDemand**
- **anitube.se** - **anitube.se**
- **AnySex** - **AnySex**
@ -51,7 +44,7 @@
- **archive.org**: archive.org videos - **archive.org**: archive.org videos
- **ARD** - **ARD**
- **ARD:mediathek** - **ARD:mediathek**
- **Arkena** - **ARD:mediathek**: Saarländischer Rundfunk
- **arte.tv** - **arte.tv**
- **arte.tv:+7** - **arte.tv:+7**
- **arte.tv:cinema** - **arte.tv:cinema**
@ -70,10 +63,6 @@
- **audiomack** - **audiomack**
- **audiomack:album** - **audiomack:album**
- **auroravid**: AuroraVid - **auroravid**: AuroraVid
- **AWAAN**
- **awaan:live**
- **awaan:season**
- **awaan:video**
- **Azubu** - **Azubu**
- **AzubuLive** - **AzubuLive**
- **BaiduVideo**: 百度视频 - **BaiduVideo**: 百度视频
@ -84,12 +73,9 @@
- **bbc**: BBC - **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer - **bbc.co.uk**: BBC iPlayer
- **bbc.co.uk:article**: BBC articles - **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist** - **BeatportPro**
- **bbc.co.uk:playlist**
- **Beatport**
- **Beeg** - **Beeg**
- **BehindKink** - **BehindKink**
- **BellMedia**
- **Bet** - **Bet**
- **Bigflix** - **Bigflix**
- **Bild**: Bild.de - **Bild**: Bild.de
@ -111,31 +97,23 @@
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen - **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BuzzFeed** - **BuzzFeed**
- **BYUtv** - **BYUtv**
- **BYUtvEvent**
- **Camdemy** - **Camdemy**
- **CamdemyFolder** - **CamdemyFolder**
- **CamWithHer** - **CamWithHer**
- **canalc2.tv** - **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv - **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas** - **Canvas**
- **CarambaTV** - **CBC**
- **CarambaTVPage** - **CBCPlayer**
- **CartoonNetwork**
- **cbc.ca**
- **cbc.ca:player**
- **cbc.ca:watch**
- **cbc.ca:watch:video**
- **CBS** - **CBS**
- **CBSInteractive** - **CBSInteractive**
- **CBSLocal** - **CBSLocal**
- **cbsnews**: CBS News - **CBSNews**: CBS News
- **cbsnews:livevideo**: CBS News Live Videos - **CBSNewsLiveVideo**: CBS News Live Videos
- **CBSSports** - **CBSSports**
- **CCTV**
- **CDA** - **CDA**
- **CeskaTelevize** - **CeskaTelevize**
- **channel9**: Channel 9 - **channel9**: Channel 9
- **CharlieRose**
- **Chaturbate** - **Chaturbate**
- **Chilloutzone** - **Chilloutzone**
- **chirbit** - **chirbit**
@ -145,7 +123,6 @@
- **cliphunter** - **cliphunter**
- **ClipRs** - **ClipRs**
- **Clipsyndicate** - **Clipsyndicate**
- **CloserToTruth**
- **cloudtime**: CloudTime - **cloudtime**: CloudTime
- **Cloudy** - **Cloudy**
- **Clubic** - **Clubic**
@ -158,8 +135,7 @@
- **CollegeRama** - **CollegeRama**
- **ComCarCoff** - **ComCarCoff**
- **ComedyCentral** - **ComedyCentral**
- **ComedyCentralShortname** - **ComedyCentralShows**: The Daily Show / The Colbert Report
- **ComedyCentralTV**
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED - **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **Coub** - **Coub**
- **Cracked** - **Cracked**
@ -171,11 +147,8 @@
- **CSNNE** - **CSNNE**
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CtsNews**: 華視新聞 - **CtsNews**: 華視新聞
- **CTVNews**
- **culturebox.francetvinfo.fr** - **culturebox.francetvinfo.fr**
- **CultureUnplugged** - **CultureUnplugged**
- **curiositystream**
- **curiositystream:collection**
- **CWTV** - **CWTV**
- **DailyMail** - **DailyMail**
- **dailymotion** - **dailymotion**
@ -187,6 +160,10 @@
- **daum.net:playlist** - **daum.net:playlist**
- **daum.net:user** - **daum.net:user**
- **DBTV** - **DBTV**
- **DCN**
- **dcn:live**
- **dcn:season**
- **dcn:video**
- **DctpTv** - **DctpTv**
- **DeezerPlaylist** - **DeezerPlaylist**
- **defense.gouv.fr** - **defense.gouv.fr**
@ -195,7 +172,6 @@
- **DigitallySpeaking** - **DigitallySpeaking**
- **Digiteka** - **Digiteka**
- **Discovery** - **Discovery**
- **DiscoveryGo**
- **Dotsub** - **Dotsub**
- **DouyuTV**: 斗鱼 - **DouyuTV**: 斗鱼
- **DPlay** - **DPlay**
@ -225,37 +201,32 @@
- **EroProfile** - **EroProfile**
- **Escapist** - **Escapist**
- **ESPN** - **ESPN**
- **ESPNArticle**
- **EsriVideo** - **EsriVideo**
- **Europa** - **Europa**
- **EveryonesMixtape** - **EveryonesMixtape**
- **exfm**: ex.fm
- **ExpoTV** - **ExpoTV**
- **ExtremeTube** - **ExtremeTube**
- **EyedoTV** - **EyedoTV**
- **facebook** - **facebook**
- **FacebookPluginsVideo**
- **faz.net** - **faz.net**
- **fc2** - **fc2**
- **fc2:embed**
- **Fczenit** - **Fczenit**
- **features.aol.com** - **features.aol.com**
- **fernsehkritik.tv** - **fernsehkritik.tv**
- **Firstpost** - **Firstpost**
- **FiveTV** - **FiveTV**
- **Flickr** - **Flickr**
- **Flipagram**
- **Folketinget**: Folketinget (ft.dk; Danish parliament) - **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FootyRoom** - **FootyRoom**
- **Formula1** - **Formula1**
- **FOX** - **FOX**
- **FOX9**
- **Foxgay** - **Foxgay**
- **foxnews**: Fox News and Fox Business Video - **FoxNews**: Fox News and Fox Business Video
- **foxnews:article**
- **foxnews:insider**
- **FoxSports** - **FoxSports**
- **france2.fr:generation-quoi** - **france2.fr:generation-quoi**
- **FranceCulture** - **FranceCulture**
- **FranceCultureEmission**
- **FranceInter** - **FranceInter**
- **francetv**: France 2, 3, 4, 5 and Ô - **francetv**: France 2, 3, 4, 5 and Ô
- **francetvinfo.fr** - **francetvinfo.fr**
@ -264,14 +235,14 @@
- **FreeVideo** - **FreeVideo**
- **Funimation** - **Funimation**
- **FunnyOrDie** - **FunnyOrDie**
- **Fusion**
- **FXNetworks**
- **GameInformer** - **GameInformer**
- **Gamekings**
- **GameOne** - **GameOne**
- **gameone:playlist** - **gameone:playlist**
- **Gamersyde** - **Gamersyde**
- **GameSpot** - **GameSpot**
- **GameStar** - **GameStar**
- **Gametrailers**
- **Gazeta** - **Gazeta**
- **GDCVault** - **GDCVault**
- **generic**: Generic downloader that works on some sites - **generic**: Generic downloader that works on some sites
@ -281,9 +252,8 @@
- **Glide**: Glide mobile video messages (glide.me) - **Glide**: Glide mobile video messages (glide.me)
- **Globo** - **Globo**
- **GloboArticle** - **GloboArticle**
- **Go**
- **GodTube** - **GodTube**
- **GodTV** - **GoldenMoustache**
- **Golem** - **Golem**
- **GoogleDrive** - **GoogleDrive**
- **Goshgay** - **Goshgay**
@ -291,16 +261,12 @@
- **Groupon** - **Groupon**
- **Hark** - **Hark**
- **HBO** - **HBO**
- **HBOEpisode**
- **HearThisAt** - **HearThisAt**
- **Heise** - **Heise**
- **HellPorno** - **HellPorno**
- **Helsinki**: helsinki.fi - **Helsinki**: helsinki.fi
- **HentaiStigma** - **HentaiStigma**
- **HGTV**
- **hgtv.com:show**
- **HistoricFilms** - **HistoricFilms**
- **history:topic**: History.com Topic
- **hitbox** - **hitbox**
- **hitbox:live** - **hitbox:live**
- **HornBunny** - **HornBunny**
@ -308,9 +274,6 @@
- **HotStar** - **HotStar**
- **Howcast** - **Howcast**
- **HowStuffWorks** - **HowStuffWorks**
- **HRTi**
- **HRTiPlaylist**
- **Huajiao**: 花椒直播
- **HuffPost**: Huffington Post - **HuffPost**: Huffington Post
- **Hypem** - **Hypem**
- **Iconosquare** - **Iconosquare**
@ -332,23 +295,18 @@
- **ivi**: ivi.ru - **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations - **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV - **ivideon**: Ivideon TV
- **Iwara**
- **Izlesene** - **Izlesene**
- **Jamendo**
- **JamendoAlbum**
- **JeuxVideo** - **JeuxVideo**
- **Jove** - **Jove**
- **jpopsuki.tv** - **jpopsuki.tv**
- **JWPlatform** - **JWPlatform**
- **Kaltura** - **Kaltura**
- **Kamcord**
- **KanalPlay**: Kanal 5/9/11 Play - **KanalPlay**: Kanal 5/9/11 Play
- **Kankan** - **Kankan**
- **Karaoketv** - **Karaoketv**
- **KarriereVideos** - **KarriereVideos**
- **keek** - **keek**
- **KeezMovies** - **KeezMovies**
- **Ketnet**
- **KhanAcademy** - **KhanAcademy**
- **KickStarter** - **KickStarter**
- **KonserthusetPlay** - **KonserthusetPlay**
@ -362,15 +320,11 @@
- **kuwo:mv**: 酷我音乐 - MV - **kuwo:mv**: 酷我音乐 - MV
- **kuwo:singer**: 酷我音乐 - 歌手 - **kuwo:singer**: 酷我音乐 - 歌手
- **kuwo:song**: 酷我音乐 - **kuwo:song**: 酷我音乐
- **la7.it** - **la7.tv**
- **Laola1Tv** - **Laola1Tv**
- **LCI**
- **Lcp**
- **LcpPlay**
- **Le**: 乐视网 - **Le**: 乐视网
- **Learnr** - **Learnr**
- **Lecture2Go** - **Lecture2Go**
- **LEGO**
- **Lemonde** - **Lemonde**
- **LePlaylist** - **LePlaylist**
- **LetvCloud**: 乐视云 - **LetvCloud**: 乐视云
@ -396,17 +350,13 @@
- **mailru**: Видео@Mail.Ru - **mailru**: Видео@Mail.Ru
- **MakersChannel** - **MakersChannel**
- **MakerTV** - **MakerTV**
- **mangomolo:live**
- **mangomolo:video**
- **MatchTV** - **MatchTV**
- **MDR**: MDR.DE and KiKA - **MDR**: MDR.DE and KiKA
- **media.ccc.de** - **media.ccc.de**
- **META**
- **metacafe** - **metacafe**
- **Metacritic** - **Metacritic**
- **Mgoon** - **Mgoon**
- **MGTV**: 芒果TV - **MGTV**: 芒果TV
- **MiaoPai**
- **Minhateca** - **Minhateca**
- **MinistryGrid** - **MinistryGrid**
- **Minoto** - **Minoto**
@ -428,13 +378,11 @@
- **MovieClips** - **MovieClips**
- **MovieFap** - **MovieFap**
- **Moviezine** - **Moviezine**
- **MovingImage**
- **MPORA** - **MPORA**
- **MSN** - **MSNBC**
- **mtg**: MTG services - **MTV**
- **mtv**
- **mtv.de** - **mtv.de**
- **mtv:video** - **mtviggy.com**
- **mtvservices:embedded** - **mtvservices:embedded**
- **MuenchenTV**: münchen.tv - **MuenchenTV**: münchen.tv
- **MusicPlayOn** - **MusicPlayOn**
@ -450,13 +398,11 @@
- **MyVidster** - **MyVidster**
- **n-tv.de** - **n-tv.de**
- **natgeo** - **natgeo**
- **natgeo:episodeguide** - **natgeo:channel**
- **natgeo:video**
- **Naver** - **Naver**
- **NBA** - **NBA**
- **NBC** - **NBC**
- **NBCNews** - **NBCNews**
- **NBCOlympics**
- **NBCSports** - **NBCSports**
- **NBCSportsVPlayer** - **NBCSportsVPlayer**
- **ndr**: NDR.de - Norddeutscher Rundfunk - **ndr**: NDR.de - Norddeutscher Rundfunk
@ -476,22 +422,18 @@
- **Newstube** - **Newstube**
- **NextMedia**: 蘋果日報 - **NextMedia**: 蘋果日報
- **NextMediaActionNews**: 蘋果日報 - 動新聞 - **NextMediaActionNews**: 蘋果日報 - 動新聞
- **nextmovie.com**
- **nfb**: National Film Board of Canada - **nfb**: National Film Board of Canada
- **nfl.com** - **nfl.com**
- **NhkVod**
- **nhl.com** - **nhl.com**
- **nhl.com:news**: NHL news - **nhl.com:news**: NHL news
- **nhl.com:videocenter** - **nhl.com:videocenter**
- **nhl.com:videocenter:category**: NHL videocenter category - **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com** - **nick.com**
- **nick.de**
- **nicknight**
- **niconico**: ニコニコ動画 - **niconico**: ニコニコ動画
- **NiconicoPlaylist** - **NiconicoPlaylist**
- **Nintendo**
- **njoy**: N-JOY - **njoy**: N-JOY
- **njoy:embed** - **njoy:embed**
- **NobelPrize**
- **Noco** - **Noco**
- **Normalboots** - **Normalboots**
- **NosVideo** - **NosVideo**
@ -516,14 +458,10 @@
- **Nuvid** - **Nuvid**
- **NYTimes** - **NYTimes**
- **NYTimesArticle** - **NYTimesArticle**
- **NZZ**
- **ocw.mit.edu** - **ocw.mit.edu**
- **OdaTV**
- **Odnoklassniki** - **Odnoklassniki**
- **OktoberfestTV** - **OktoberfestTV**
- **on.aol.com** - **on.aol.com**
- **onet.tv**
- **onet.tv:channel**
- **OnionStudios** - **OnionStudios**
- **Ooyala** - **Ooyala**
- **OoyalaExternal** - **OoyalaExternal**
@ -533,7 +471,6 @@
- **orf:iptv**: iptv.ORF.at - **orf:iptv**: iptv.ORF.at
- **orf:oe1**: Radio Österreich 1 - **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek - **orf:tvthek**: ORF TVthek
- **PandaTV**: 熊猫TV
- **pandora.tv**: 판도라TV - **pandora.tv**: 판도라TV
- **parliamentlive.tv**: UK parliament videos - **parliamentlive.tv**: UK parliament videos
- **Patreon** - **Patreon**
@ -548,6 +485,7 @@
- **Pinkbike** - **Pinkbike**
- **Pladform** - **Pladform**
- **play.fm** - **play.fm**
- **played.to**
- **PlaysTV** - **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz - **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid** - **Playvid**
@ -557,12 +495,8 @@
- **plus.google**: Google Plus - **plus.google**: Google Plus
- **pluzz.francetv.fr** - **pluzz.francetv.fr**
- **podomatic** - **podomatic**
- **Pokemon**
- **PolskieRadio**
- **PolskieRadioCategory**
- **PornCom**
- **PornHd** - **PornHd**
- **PornHub**: PornHub and Thumbzilla - **PornHub**
- **PornHubPlaylist** - **PornHubPlaylist**
- **PornHubUserVideos** - **PornHubUserVideos**
- **Pornotube** - **Pornotube**
@ -580,7 +514,6 @@
- **qqmusic:singer**: QQ音乐 - 歌手 - **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜 - **qqmusic:toplist**: QQ音乐 - 排行榜
- **R7** - **R7**
- **R7Article**
- **radio.de** - **radio.de**
- **radiobremen** - **radiobremen**
- **radiocanada** - **radiocanada**
@ -593,8 +526,6 @@
- **RDS**: RDS.ca - **RDS**: RDS.ca
- **RedTube** - **RedTube**
- **RegioTV** - **RegioTV**
- **RENTV**
- **RENTVArticle**
- **Restudy** - **Restudy**
- **Reuters** - **Reuters**
- **ReverbNation** - **ReverbNation**
@ -602,12 +533,8 @@
- **revision3:embed** - **revision3:embed**
- **RICE** - **RICE**
- **RingTV** - **RingTV**
- **RMCDecouverte**
- **RockstarGames**
- **RoosterTeeth**
- **RottenTomatoes** - **RottenTomatoes**
- **Roxwel** - **Roxwel**
- **Rozhlas**
- **RTBF** - **RTBF**
- **rte**: Raidió Teilifís Éireann TV - **rte**: Raidió Teilifís Éireann TV
- **rte:radio**: Raidió Teilifís Éireann radio - **rte:radio**: Raidió Teilifís Éireann radio
@ -618,9 +545,7 @@
- **rtve.es:alacarta**: RTVE a la carta - **rtve.es:alacarta**: RTVE a la carta
- **rtve.es:infantil**: RTVE infantil - **rtve.es:infantil**: RTVE infantil
- **rtve.es:live**: RTVE.es live streams - **rtve.es:live**: RTVE.es live streams
- **rtve.es:television**
- **RTVNH** - **RTVNH**
- **Rudo**
- **RUHD** - **RUHD**
- **RulePorn** - **RulePorn**
- **rutube**: Rutube videos - **rutube**: Rutube videos
@ -650,13 +575,11 @@
- **ServingSys** - **ServingSys**
- **Sexu** - **Sexu**
- **Shahid** - **Shahid**
- **Shared**: shared.sx - **Shared**: shared.sx and vivo.sx
- **ShareSix** - **ShareSix**
- **Sina** - **Sina**
- **SixPlay**
- **skynewsarabia:article**
- **skynewsarabia:video** - **skynewsarabia:video**
- **SkySports** - **skynewsarabia:video**
- **Slideshare** - **Slideshare**
- **Slutload** - **Slutload**
- **smotri**: Smotri.com - **smotri**: Smotri.com
@ -665,7 +588,6 @@
- **smotri:user**: Smotri.com user videos - **smotri:user**: Smotri.com user videos
- **Snotr** - **Snotr**
- **Sohu** - **Sohu**
- **SonyLIV**
- **soundcloud** - **soundcloud**
- **soundcloud:playlist** - **soundcloud:playlist**
- **soundcloud:search**: Soundcloud search - **soundcloud:search**: Soundcloud search
@ -689,13 +611,12 @@
- **SportBoxEmbed** - **SportBoxEmbed**
- **SportDeutschland** - **SportDeutschland**
- **Sportschau** - **Sportschau**
- **sr:mediathek**: Saarländischer Rundfunk
- **SRGSSR** - **SRGSSR**
- **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites - **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
- **SSA**
- **stanfordoc**: Stanford Open ClassRoom - **stanfordoc**: Stanford Open ClassRoom
- **Steam** - **Steam**
- **Stitcher** - **Stitcher**
- **Streamable**
- **streamcloud.eu** - **streamcloud.eu**
- **StreamCZ** - **StreamCZ**
- **StreetVoice** - **StreetVoice**
@ -705,11 +626,10 @@
- **SWRMediathek** - **SWRMediathek**
- **Syfy** - **Syfy**
- **SztvHu** - **SztvHu**
- **t-online.de**
- **Tagesschau** - **Tagesschau**
- **tagesschau:player** - **tagesschau:player**
- **Tapely**
- **Tass** - **Tass**
- **TBS**
- **TDSLifeway** - **TDSLifeway**
- **teachertube**: teachertube.com videos - **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos - **teachertube:user:collection**: teachertube.com user and collection videos
@ -724,22 +644,18 @@
- **Telecinco**: telecinco.es, cuatro.com and mediaset.es - **Telecinco**: telecinco.es, cuatro.com and mediaset.es
- **Telegraaf** - **Telegraaf**
- **TeleMB** - **TeleMB**
- **TeleQuebec**
- **TeleTask** - **TeleTask**
- **Telewebion**
- **TF1** - **TF1**
- **TFO**
- **TheIntercept** - **TheIntercept**
- **theoperaplatform**
- **ThePlatform** - **ThePlatform**
- **ThePlatformFeed** - **ThePlatformFeed**
- **TheScene** - **TheScene**
- **TheSixtyOne** - **TheSixtyOne**
- **TheStar** - **TheStar**
- **TheWeatherChannel**
- **ThisAmericanLife** - **ThisAmericanLife**
- **ThisAV** - **ThisAV**
- **ThisOldHouse** - **THVideo**
- **THVideoPlaylist**
- **tinypic**: tinypic.com videos - **tinypic**: tinypic.com videos
- **tlc.de** - **tlc.de**
- **TMZ** - **TMZ**
@ -747,13 +663,13 @@
- **TNAFlix** - **TNAFlix**
- **TNAFlixNetworkEmbed** - **TNAFlixNetworkEmbed**
- **toggle** - **toggle**
- **Tosh**: Tosh.0
- **tou.tv** - **tou.tv**
- **Toypics**: Toypics user profile - **Toypics**: Toypics user profile
- **ToypicsUser**: Toypics user profile - **ToypicsUser**: Toypics user profile
- **TrailerAddict** (Currently broken) - **TrailerAddict** (Currently broken)
- **Trilulilu** - **Trilulilu**
- **TruTV** - **trollvids**
- **TruTube**
- **Tube8** - **Tube8**
- **TubiTv** - **TubiTv**
- **tudou** - **tudou**
@ -775,13 +691,11 @@
- **TVCArticle** - **TVCArticle**
- **tvigle**: Интернет-телевидение Tvigle.ru - **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com** - **tvland.com**
- **TVNoe**
- **tvp**: Telewizja Polska - **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
- **tvp:series** - **tvp:series**
- **TVPlay**: TV3Play and related services
- **Tweakers** - **Tweakers**
- **twitch:chapter** - **twitch:chapter**
- **twitch:clips**
- **twitch:past_broadcasts** - **twitch:past_broadcasts**
- **twitch:profile** - **twitch:profile**
- **twitch:stream** - **twitch:stream**
@ -794,12 +708,7 @@
- **udemy:course** - **udemy:course**
- **UDNEmbed**: 聯合影音 - **UDNEmbed**: 聯合影音
- **Unistra** - **Unistra**
- **uol.com.br**
- **uplynk**
- **uplynk:preplay**
- **Urort**: NRK P3 Urørt - **Urort**: NRK P3 Urørt
- **URPlay**
- **USANetwork**
- **USAToday** - **USAToday**
- **ustream** - **ustream**
- **ustream:channel** - **ustream:channel**
@ -815,11 +724,8 @@
- **VevoPlaylist** - **VevoPlaylist**
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet - **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com** - **vh1.com**
- **Viafree**
- **Vice** - **Vice**
- **Viceland**
- **ViceShow** - **ViceShow**
- **Vidbit**
- **Viddler** - **Viddler**
- **video.google:search**: Google Video search - **video.google:search**: Google Video search
- **video.mit.edu** - **video.mit.edu**
@ -832,7 +738,6 @@
- **VideoPremium** - **VideoPremium**
- **VideoTt**: video.tt - Your True Tube (Currently broken) - **VideoTt**: video.tt - Your True Tube (Currently broken)
- **videoweed**: VideoWeed - **videoweed**: VideoWeed
- **Vidio**
- **vidme** - **vidme**
- **vidme:user** - **vidme:user**
- **vidme:user:likes** - **vidme:user:likes**
@ -857,13 +762,10 @@
- **Vimple**: Vimple - one-click video hosting - **Vimple**: Vimple - one-click video hosting
- **Vine** - **Vine**
- **vine:user** - **vine:user**
- **Vivo**: vivo.sx
- **vk**: VK - **vk**: VK
- **vk:uservideos**: VK - User's Videos - **vk:uservideos**: VK - User's Videos
- **vk:wallpost**
- **vlive** - **vlive**
- **Vodlocker** - **Vodlocker**
- **VODPlatform**
- **VoiceRepublic** - **VoiceRepublic**
- **VoxMedia** - **VoxMedia**
- **Vporn** - **Vporn**
@ -871,8 +773,7 @@
- **VRT** - **VRT**
- **vube**: Vube.com - **vube**: Vube.com
- **VuClip** - **VuClip**
- **VyboryMos** - **vulture.com**
- **Vzaar**
- **Walla** - **Walla**
- **washingtonpost** - **washingtonpost**
- **washingtonpost:article** - **washingtonpost:article**
@ -880,20 +781,21 @@
- **WatchIndianPorn**: Watch Indian Porn - **WatchIndianPorn**: Watch Indian Porn
- **WDR** - **WDR**
- **wdr:mobile** - **wdr:mobile**
- **WDRMaus**: Sendung mit der Maus
- **WebOfStories** - **WebOfStories**
- **WebOfStoriesPlaylist** - **WebOfStoriesPlaylist**
- **Weibo**
- **WeiqiTV**: WQTV - **WeiqiTV**: WQTV
- **wholecloud**: WholeCloud - **wholecloud**: WholeCloud
- **Wimp** - **Wimp**
- **Wistia** - **Wistia**
- **wnl**: npo.nl and ntr.nl - **WNL**
- **WorldStarHipHop** - **WorldStarHipHop**
- **wrzuta.pl** - **wrzuta.pl**
- **wrzuta.pl:playlist**
- **WSJ**: Wall Street Journal - **WSJ**: Wall Street Journal
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE - **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To
- **XHamster** - **XHamster**
- **XHamsterEmbed** - **XHamsterEmbed**
- **xiami:album**: 虾米音乐 - 专辑 - **xiami:album**: 虾米音乐 - 专辑
@ -918,7 +820,6 @@
- **Ynet** - **Ynet**
- **YouJizz** - **YouJizz**
- **youku**: 优酷 - **youku**: 优酷
- **youku:show**
- **YouPorn** - **YouPorn**
- **YourUpload** - **YourUpload**
- **youtube**: YouTube.com - **youtube**: YouTube.com
@ -932,7 +833,6 @@
- **youtube:search**: YouTube.com searches - **youtube:search**: YouTube.com searches
- **youtube:search:date**: YouTube.com searches, newest videos first - **youtube:search:date**: YouTube.com searches, newest videos first
- **youtube:search_url**: YouTube.com search URLs - **youtube:search_url**: YouTube.com search URLs
- **youtube:shared**
- **youtube:show**: YouTube.com (multi-season) shows - **youtube:show**: YouTube.com (multi-season) shows
- **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication) - **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)
- **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword) - **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)
@ -940,4 +840,6 @@
- **Zapiks** - **Zapiks**
- **ZDF** - **ZDF**
- **ZDFChannel** - **ZDFChannel**
- **zingmp3**: mp3.zing.vn - **zingmp3:album**: mp3.zing.vn albums
- **zingmp3:song**: mp3.zing.vn songs
- **ZippCast**

View File

@ -1,5 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
# coding: utf-8 # -*- coding: utf-8 -*-
from __future__ import print_function from __future__ import print_function
@ -21,37 +21,25 @@ try:
import py2exe import py2exe
except ImportError: except ImportError:
if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe': if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
print('Cannot import py2exe', file=sys.stderr) print("Cannot import py2exe", file=sys.stderr)
exit(1) exit(1)
py2exe_options = { py2exe_options = {
'bundle_files': 1, "bundle_files": 1,
'compressed': 1, "compressed": 1,
'optimize': 2, "optimize": 2,
'dist_dir': '.', "dist_dir": '.',
'dll_excludes': ['w9xpopen.exe', 'crypt32.dll'], "dll_excludes": ['w9xpopen.exe', 'crypt32.dll'],
} }
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
DESCRIPTION = 'YouTube video downloader'
LONG_DESCRIPTION = 'Command-line program to download videos from YouTube.com and other video sites'
py2exe_console = [{ py2exe_console = [{
'script': './youtube_dl/__main__.py', "script": "./youtube_dl/__main__.py",
'dest_base': 'youtube-dl', "dest_base": "youtube-dl",
'version': __version__,
'description': DESCRIPTION,
'comments': LONG_DESCRIPTION,
'product_name': 'youtube-dl',
'product_version': __version__,
}] }]
py2exe_params = { py2exe_params = {
'console': py2exe_console, 'console': py2exe_console,
'options': {'py2exe': py2exe_options}, 'options': {"py2exe": py2exe_options},
'zipfile': None 'zipfile': None
} }
@ -84,7 +72,7 @@ else:
params['scripts'] = ['bin/youtube-dl'] params['scripts'] = ['bin/youtube-dl']
class build_lazy_extractors(Command): class build_lazy_extractors(Command):
description = 'Build the extractor lazy loading module' description = "Build the extractor lazy loading module"
user_options = [] user_options = []
def initialize_options(self): def initialize_options(self):
@ -99,11 +87,16 @@ class build_lazy_extractors(Command):
dry_run=self.dry_run, dry_run=self.dry_run,
) )
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
setup( setup(
name='youtube_dl', name='youtube_dl',
version=__version__, version=__version__,
description=DESCRIPTION, description='YouTube video downloader',
long_description=LONG_DESCRIPTION, long_description='Small command-line program to download videos from'
' YouTube.com and other video sites.',
url='https://github.com/rg3/youtube-dl', url='https://github.com/rg3/youtube-dl',
author='Ricardo Garcia', author='Ricardo Garcia',
author_email='ytdl@yt-dl.org', author_email='ytdl@yt-dl.org',
@ -119,17 +112,16 @@ setup(
# test_requires = ['nosetest'], # test_requires = ['nosetest'],
classifiers=[ classifiers=[
'Topic :: Multimedia :: Video', "Topic :: Multimedia :: Video",
'Development Status :: 5 - Production/Stable', "Development Status :: 5 - Production/Stable",
'Environment :: Console', "Environment :: Console",
'License :: Public Domain', "License :: Public Domain",
'Programming Language :: Python :: 2.6', "Programming Language :: Python :: 2.6",
'Programming Language :: Python :: 2.7', "Programming Language :: Python :: 2.7",
'Programming Language :: Python :: 3', "Programming Language :: Python :: 3",
'Programming Language :: Python :: 3.2', "Programming Language :: Python :: 3.2",
'Programming Language :: Python :: 3.3', "Programming Language :: Python :: 3.3",
'Programming Language :: Python :: 3.4', "Programming Language :: Python :: 3.4",
'Programming Language :: Python :: 3.5',
], ],
cmdclass={'build_lazy_extractors': build_lazy_extractors}, cmdclass={'build_lazy_extractors': build_lazy_extractors},

View File

@ -11,7 +11,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL from test.helper import FakeYDL
from youtube_dl.extractor.common import InfoExtractor from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError
class TestIE(InfoExtractor): class TestIE(InfoExtractor):
@ -48,9 +48,6 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._og_search_property('foobar', html), 'Foo') self.assertEqual(ie._og_search_property('foobar', html), 'Foo')
self.assertEqual(ie._og_search_property('test1', html), 'foo > < bar') self.assertEqual(ie._og_search_property('test1', html), 'foo > < bar')
self.assertEqual(ie._og_search_property('test2', html), 'foo >//< bar') self.assertEqual(ie._og_search_property('test2', html), 'foo >//< bar')
self.assertEqual(ie._og_search_property(('test0', 'test1'), html), 'foo > < bar')
self.assertRaises(RegexNotFoundError, ie._og_search_property, 'test0', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._og_search_property, ('test0', 'test00'), html, None, fatal=True)
def test_html_search_meta(self): def test_html_search_meta(self):
ie = self.ie ie = self.ie
@ -69,11 +66,6 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._html_search_meta('d', html), '4') self.assertEqual(ie._html_search_meta('d', html), '4')
self.assertEqual(ie._html_search_meta('e', html), '5') self.assertEqual(ie._html_search_meta('e', html), '5')
self.assertEqual(ie._html_search_meta('f', html), '6') self.assertEqual(ie._html_search_meta('f', html), '6')
self.assertEqual(ie._html_search_meta(('a', 'b', 'c'), html), '1')
self.assertEqual(ie._html_search_meta(('c', 'b', 'a'), html), '3')
self.assertEqual(ie._html_search_meta(('z', 'x', 'c'), html), '3')
self.assertRaises(RegexNotFoundError, ie._html_search_meta, 'z', html, None, fatal=True)
self.assertRaises(RegexNotFoundError, ie._html_search_meta, ('z', 'x'), html, None, fatal=True)
def test_download_json(self): def test_download_json(self):
uri = encode_data_uri(b'{"foo": "blah"}', 'application/json') uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')

View File

@ -335,40 +335,6 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], f1['format_id']) self.assertEqual(downloaded['format_id'], f1['format_id'])
def test_audio_only_extractor_format_selection(self):
# For extractors with incomplete formats (all formats are audio-only or
# video-only) best and worst should fallback to corresponding best/worst
# video-only or audio-only formats (as per
# https://github.com/rg3/youtube-dl/pull/5556)
formats = [
{'format_id': 'low', 'ext': 'mp3', 'preference': 1, 'vcodec': 'none', 'url': TEST_URL},
{'format_id': 'high', 'ext': 'mp3', 'preference': 2, 'vcodec': 'none', 'url': TEST_URL},
]
info_dict = _make_result(formats)
ydl = YDL({'format': 'best'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'high')
ydl = YDL({'format': 'worst'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'low')
def test_format_not_available(self):
formats = [
{'format_id': 'regular', 'ext': 'mp4', 'height': 360, 'url': TEST_URL},
{'format_id': 'video', 'ext': 'mp4', 'height': 720, 'acodec': 'none', 'url': TEST_URL},
]
info_dict = _make_result(formats)
# This must fail since complete video-audio format does not match filter
# and extractor does not provide incomplete only formats (i.e. only
# video-only or audio-only).
ydl = YDL({'format': 'best[height>360]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
def test_invalid_format_specs(self): def test_invalid_format_specs(self):
def assert_syntax_error(format_spec): def assert_syntax_error(format_spec):
ydl = YDL({'format': format_spec}) ydl = YDL({'format': format_spec})
@ -605,7 +571,6 @@ class TestYoutubeDL(unittest.TestCase):
'extractor': 'TEST', 'extractor': 'TEST',
'duration': 30, 'duration': 30,
'filesize': 10 * 1024, 'filesize': 10 * 1024,
'playlist_id': '42',
} }
second = { second = {
'id': '2', 'id': '2',
@ -615,7 +580,6 @@ class TestYoutubeDL(unittest.TestCase):
'duration': 10, 'duration': 10,
'description': 'foo', 'description': 'foo',
'filesize': 5 * 1024, 'filesize': 5 * 1024,
'playlist_id': '43',
} }
videos = [first, second] videos = [first, second]
@ -652,10 +616,6 @@ class TestYoutubeDL(unittest.TestCase):
res = get_videos(f) res = get_videos(f)
self.assertEqual(res, ['1']) self.assertEqual(res, ['1'])
f = match_filter_func('playlist_id = 42')
res = get_videos(f)
self.assertEqual(res, ['1'])
def test_playlist_items_selection(self): def test_playlist_items_selection(self):
entries = [{ entries = [{
'id': compat_str(i), 'id': compat_str(i),

View File

@ -6,7 +6,6 @@ from __future__ import unicode_literals
import os import os
import sys import sys
import unittest import unittest
import collections
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
@ -101,6 +100,8 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch(':ytsubs', ['youtube:subscriptions']) self.assertMatch(':ytsubs', ['youtube:subscriptions'])
self.assertMatch(':ytsubscriptions', ['youtube:subscriptions']) self.assertMatch(':ytsubscriptions', ['youtube:subscriptions'])
self.assertMatch(':ythistory', ['youtube:history']) self.assertMatch(':ythistory', ['youtube:history'])
self.assertMatch(':thedailyshow', ['ComedyCentralShows'])
self.assertMatch(':tds', ['ComedyCentralShows'])
def test_vimeo_matching(self): def test_vimeo_matching(self):
self.assertMatch('https://vimeo.com/channels/tributes', ['vimeo:channel']) self.assertMatch('https://vimeo.com/channels/tributes', ['vimeo:channel'])
@ -129,15 +130,6 @@ class TestAllURLsMatching(unittest.TestCase):
'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html', 'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html',
['Yahoo']) ['Yahoo'])
def test_no_duplicated_ie_names(self):
name_accu = collections.defaultdict(list)
for ie in self.ies:
name_accu[ie.IE_NAME.lower()].append(type(ie).__name__)
for (ie_name, ie_list) in name_accu.items():
self.assertEqual(
len(ie_list), 1,
'Multiple extractors with the same IE_NAME "%s" (%s)' % (ie_name, ', '.join(ie_list)))
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -87,8 +87,6 @@ class TestCompat(unittest.TestCase):
def test_compat_shlex_split(self): def test_compat_shlex_split(self):
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two']) self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])
self.assertEqual(compat_shlex_split('-option "one\ntwo" \n -flag'), ['-option', 'one\ntwo', '-flag'])
self.assertEqual(compat_shlex_split('-val 中文'), ['-val', '中文'])
def test_compat_etree_fromstring(self): def test_compat_etree_fromstring(self):
xml = ''' xml = '''

View File

@ -87,7 +87,7 @@ class TestHTTP(unittest.TestCase):
ydl = YoutubeDL({'logger': FakeLogger()}) ydl = YoutubeDL({'logger': FakeLogger()})
r = ydl.extract_info('http://localhost:%d/302' % self.port) r = ydl.extract_info('http://localhost:%d/302' % self.port)
self.assertEqual(r['entries'][0]['url'], 'http://localhost:%d/vid.mp4' % self.port) self.assertEqual(r['url'], 'http://localhost:%d/vid.mp4' % self.port)
class TestHTTPS(unittest.TestCase): class TestHTTPS(unittest.TestCase):
@ -111,7 +111,7 @@ class TestHTTPS(unittest.TestCase):
ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True}) ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True})
r = ydl.extract_info('https://localhost:%d/video.html' % self.port) r = ydl.extract_info('https://localhost:%d/video.html' % self.port)
self.assertEqual(r['entries'][0]['url'], 'https://localhost:%d/vid.mp4' % self.port) self.assertEqual(r['url'], 'https://localhost:%d/vid.mp4' % self.port)
def _build_proxy_handler(name): def _build_proxy_handler(name):
@ -138,27 +138,27 @@ class TestProxy(unittest.TestCase):
self.proxy_thread.daemon = True self.proxy_thread.daemon = True
self.proxy_thread.start() self.proxy_thread.start()
self.geo_proxy = compat_http_server.HTTPServer( self.cn_proxy = compat_http_server.HTTPServer(
('localhost', 0), _build_proxy_handler('geo')) ('localhost', 0), _build_proxy_handler('cn'))
self.geo_port = http_server_port(self.geo_proxy) self.cn_port = http_server_port(self.cn_proxy)
self.geo_proxy_thread = threading.Thread(target=self.geo_proxy.serve_forever) self.cn_proxy_thread = threading.Thread(target=self.cn_proxy.serve_forever)
self.geo_proxy_thread.daemon = True self.cn_proxy_thread.daemon = True
self.geo_proxy_thread.start() self.cn_proxy_thread.start()
def test_proxy(self): def test_proxy(self):
geo_proxy = 'localhost:{0}'.format(self.geo_port) cn_proxy = 'localhost:{0}'.format(self.cn_port)
ydl = YoutubeDL({ ydl = YoutubeDL({
'proxy': 'localhost:{0}'.format(self.port), 'proxy': 'localhost:{0}'.format(self.port),
'geo_verification_proxy': geo_proxy, 'cn_verification_proxy': cn_proxy,
}) })
url = 'http://foo.com/bar' url = 'http://foo.com/bar'
response = ydl.urlopen(url).read().decode('utf-8') response = ydl.urlopen(url).read().decode('utf-8')
self.assertEqual(response, 'normal: {0}'.format(url)) self.assertEqual(response, 'normal: {0}'.format(url))
req = compat_urllib_request.Request(url) req = compat_urllib_request.Request(url)
req.add_header('Ytdl-request-proxy', geo_proxy) req.add_header('Ytdl-request-proxy', cn_proxy)
response = ydl.urlopen(req).read().decode('utf-8') response = ydl.urlopen(req).read().decode('utf-8')
self.assertEqual(response, 'geo: {0}'.format(url)) self.assertEqual(response, 'cn: {0}'.format(url))
def test_proxy_with_idn(self): def test_proxy_with_idn(self):
ydl = YoutubeDL({ ydl = YoutubeDL({

View File

@ -33,18 +33,14 @@ from youtube_dl.utils import (
ExtractorError, ExtractorError,
find_xpath_attr, find_xpath_attr,
fix_xml_ampersands, fix_xml_ampersands,
get_element_by_class,
InAdvancePagedList, InAdvancePagedList,
intlist_to_bytes, intlist_to_bytes,
is_html, is_html,
js_to_json, js_to_json,
limit_length, limit_length,
mimetype2ext,
month_by_name,
ohdave_rsa_encrypt, ohdave_rsa_encrypt,
OnDemandPagedList, OnDemandPagedList,
orderedSet, orderedSet,
parse_age_limit,
parse_duration, parse_duration,
parse_filesize, parse_filesize,
parse_count, parse_count,
@ -64,14 +60,11 @@ from youtube_dl.utils import (
timeconvert, timeconvert,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
unified_timestamp,
unsmuggle_url, unsmuggle_url,
uppercase_escape, uppercase_escape,
lowercase_escape, lowercase_escape,
url_basename, url_basename,
base_url,
urlencode_postdata, urlencode_postdata,
urshift,
update_url_query, update_url_query,
version_tuple, version_tuple,
xpath_with_ns, xpath_with_ns,
@ -85,7 +78,6 @@ from youtube_dl.utils import (
cli_option, cli_option,
cli_valueless_option, cli_valueless_option,
cli_bool_option, cli_bool_option,
parse_codecs,
) )
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_chr, compat_chr,
@ -257,8 +249,6 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unescapeHTML('&#47;'), '/') self.assertEqual(unescapeHTML('&#47;'), '/')
self.assertEqual(unescapeHTML('&eacute;'), 'é') self.assertEqual(unescapeHTML('&eacute;'), 'é')
self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;') self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;')
# HTML5 entities
self.assertEqual(unescapeHTML('&period;&apos;'), '.\'')
def test_date_from_str(self): def test_date_from_str(self):
self.assertEqual(date_from_str('yesterday'), date_from_str('now-1day')) self.assertEqual(date_from_str('yesterday'), date_from_str('now-1day'))
@ -291,30 +281,7 @@ class TestUtil(unittest.TestCase):
'20150202') '20150202')
self.assertEqual(unified_strdate('Feb 14th 2016 5:45PM'), '20160214') self.assertEqual(unified_strdate('Feb 14th 2016 5:45PM'), '20160214')
self.assertEqual(unified_strdate('25-09-2014'), '20140925') self.assertEqual(unified_strdate('25-09-2014'), '20140925')
self.assertEqual(unified_strdate('27.02.2016 17:30'), '20160227')
self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None) self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_strdate('Feb 7, 2016 at 6:35 pm'), '20160207')
def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
self.assertEqual(unified_timestamp('8/7/2009'), 1247011200)
self.assertEqual(unified_timestamp('Dec 14, 2012'), 1355443200)
self.assertEqual(unified_timestamp('2012/10/11 01:56:38 +0000'), 1349920598)
self.assertEqual(unified_timestamp('1968 12 10'), -33436800)
self.assertEqual(unified_timestamp('1968-12-10'), -33436800)
self.assertEqual(unified_timestamp('28/01/2014 21:00:00 +0100'), 1390939200)
self.assertEqual(
unified_timestamp('11/26/2014 11:30:00 AM PST', day_first=False),
1417001400)
self.assertEqual(
unified_timestamp('2/2/2015 6:47:40 PM', day_first=False),
1422902860)
self.assertEqual(unified_timestamp('Feb 14th 2016 5:45PM'), 1455471900)
self.assertEqual(unified_timestamp('25-09-2014'), 1411603200)
self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200)
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
def test_determine_ext(self): def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4') self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@ -414,12 +381,6 @@ class TestUtil(unittest.TestCase):
self.assertEqual(res_url, url) self.assertEqual(res_url, url)
self.assertEqual(res_data, None) self.assertEqual(res_data, None)
smug_url = smuggle_url(url, {'a': 'b'})
smug_smug_url = smuggle_url(smug_url, {'c': 'd'})
res_url, res_data = unsmuggle_url(smug_smug_url)
self.assertEqual(res_url, url)
self.assertEqual(res_data, {'a': 'b', 'c': 'd'})
def test_shell_quote(self): def test_shell_quote(self):
args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')] args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')]
self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""") self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""")
@ -438,27 +399,6 @@ class TestUtil(unittest.TestCase):
url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'), url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'),
'trailer.mp4') 'trailer.mp4')
def test_base_url(self):
self.assertEqual(base_url('http://foo.de/'), 'http://foo.de/')
self.assertEqual(base_url('http://foo.de/bar'), 'http://foo.de/')
self.assertEqual(base_url('http://foo.de/bar/'), 'http://foo.de/bar/')
self.assertEqual(base_url('http://foo.de/bar/baz'), 'http://foo.de/bar/')
self.assertEqual(base_url('http://foo.de/bar/baz?x=z/x/c'), 'http://foo.de/bar/')
def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None)
self.assertEqual(parse_age_limit(False), None)
self.assertEqual(parse_age_limit('invalid'), None)
self.assertEqual(parse_age_limit(0), 0)
self.assertEqual(parse_age_limit(18), 18)
self.assertEqual(parse_age_limit(21), 21)
self.assertEqual(parse_age_limit(22), None)
self.assertEqual(parse_age_limit('18'), 18)
self.assertEqual(parse_age_limit('18+'), 18)
self.assertEqual(parse_age_limit('PG-13'), 13)
self.assertEqual(parse_age_limit('TV-14'), 14)
self.assertEqual(parse_age_limit('TV-MA'), 17)
def test_parse_duration(self): def test_parse_duration(self):
self.assertEqual(parse_duration(None), None) self.assertEqual(parse_duration(None), None)
self.assertEqual(parse_duration(False), None) self.assertEqual(parse_duration(False), None)
@ -637,45 +577,6 @@ class TestUtil(unittest.TestCase):
limit_length('foo bar baz asd', 12).startswith('foo bar')) limit_length('foo bar baz asd', 12).startswith('foo bar'))
self.assertTrue('...' in limit_length('foo bar baz asd', 12)) self.assertTrue('...' in limit_length('foo bar baz asd', 12))
def test_mimetype2ext(self):
self.assertEqual(mimetype2ext(None), None)
self.assertEqual(mimetype2ext('video/x-flv'), 'flv')
self.assertEqual(mimetype2ext('application/x-mpegURL'), 'm3u8')
self.assertEqual(mimetype2ext('text/vtt'), 'vtt')
self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
def test_month_by_name(self):
self.assertEqual(month_by_name(None), None)
self.assertEqual(month_by_name('December', 'en'), 12)
self.assertEqual(month_by_name('décembre', 'fr'), 12)
self.assertEqual(month_by_name('December'), 12)
self.assertEqual(month_by_name('décembre'), None)
self.assertEqual(month_by_name('Unknown', 'unknown'), None)
def test_parse_codecs(self):
self.assertEqual(parse_codecs(''), {})
self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), {
'vcodec': 'avc1.77.30',
'acodec': 'mp4a.40.2',
})
self.assertEqual(parse_codecs('mp4a.40.2'), {
'vcodec': 'none',
'acodec': 'mp4a.40.2',
})
self.assertEqual(parse_codecs('mp4a.40.5,avc1.42001e'), {
'vcodec': 'avc1.42001e',
'acodec': 'mp4a.40.5',
})
self.assertEqual(parse_codecs('avc3.640028'), {
'vcodec': 'avc3.640028',
'acodec': 'none',
})
self.assertEqual(parse_codecs(', h264,,newcodec,aac'), {
'vcodec': 'h264',
'acodec': 'aac',
})
def test_escape_rfc3986(self): def test_escape_rfc3986(self):
reserved = "!*'();:@&=+$,/?#[]" reserved = "!*'();:@&=+$,/?#[]"
unreserved = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~' unreserved = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~'
@ -737,12 +638,6 @@ class TestUtil(unittest.TestCase):
"1":{"src":"skipped", "type": "application/vnd.apple.mpegURL"} "1":{"src":"skipped", "type": "application/vnd.apple.mpegURL"}
}''') }''')
inp = '''{"foo":101}'''
self.assertEqual(js_to_json(inp), '''{"foo":101}''')
inp = '''{"duration": "00:01:07"}'''
self.assertEqual(js_to_json(inp), '''{"duration": "00:01:07"}''')
def test_js_to_json_edgecases(self): def test_js_to_json_edgecases(self):
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}") on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"}) self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
@ -848,10 +743,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_filesize('2 MiB'), 2097152) self.assertEqual(parse_filesize('2 MiB'), 2097152)
self.assertEqual(parse_filesize('5 GB'), 5000000000) self.assertEqual(parse_filesize('5 GB'), 5000000000)
self.assertEqual(parse_filesize('1.2Tb'), 1200000000000) self.assertEqual(parse_filesize('1.2Tb'), 1200000000000)
self.assertEqual(parse_filesize('1.2tb'), 1200000000000)
self.assertEqual(parse_filesize('1,24 KB'), 1240) self.assertEqual(parse_filesize('1,24 KB'), 1240)
self.assertEqual(parse_filesize('1,24 kb'), 1240)
self.assertEqual(parse_filesize('8.5 megabytes'), 8500000)
def test_parse_count(self): def test_parse_count(self):
self.assertEqual(parse_count(None), None) self.assertEqual(parse_count(None), None)
@ -1002,7 +894,6 @@ The first line
self.assertEqual(cli_option({'proxy': '127.0.0.1:3128'}, '--proxy', 'proxy'), ['--proxy', '127.0.0.1:3128']) self.assertEqual(cli_option({'proxy': '127.0.0.1:3128'}, '--proxy', 'proxy'), ['--proxy', '127.0.0.1:3128'])
self.assertEqual(cli_option({'proxy': None}, '--proxy', 'proxy'), []) self.assertEqual(cli_option({'proxy': None}, '--proxy', 'proxy'), [])
self.assertEqual(cli_option({}, '--proxy', 'proxy'), []) self.assertEqual(cli_option({}, '--proxy', 'proxy'), [])
self.assertEqual(cli_option({'retries': 10}, '--retries', 'retries'), ['--retries', '10'])
def test_cli_valueless_option(self): def test_cli_valueless_option(self):
self.assertEqual(cli_valueless_option( self.assertEqual(cli_valueless_option(
@ -1063,17 +954,5 @@ The first line
self.assertRaises(ValueError, encode_base_n, 0, 70) self.assertRaises(ValueError, encode_base_n, 0, 70)
self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table) self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table)
def test_urshift(self):
self.assertEqual(urshift(3, 1), 1)
self.assertEqual(urshift(-3, 1), 2147483646)
def test_get_element_by_class(self):
html = '''
<span class="foo bar">nice</span>
'''
self.assertEqual(get_element_by_class('foo', html), 'nice')
self.assertEqual(get_element_by_class('no-such-class', html), None)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -1,70 +0,0 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
import unittest
import sys
import os
import subprocess
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
rootDir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
class TestVerboseOutput(unittest.TestCase):
def test_private_info_arg(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'--username', 'johnsmith@gmail.com',
'--password', 'secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'--username' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'--password' in serr)
self.assertTrue(b'secret' not in serr)
def test_private_info_shortarg(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'-u', 'johnsmith@gmail.com',
'-p', 'secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'-u' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'-p' in serr)
self.assertTrue(b'secret' not in serr)
def test_private_info_eq(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'--username=johnsmith@gmail.com',
'--password=secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'--username' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'--password' in serr)
self.assertTrue(b'secret' not in serr)
def test_private_info_shortarg_eq(self):
outp = subprocess.Popen(
[
sys.executable, 'youtube_dl/__main__.py', '-v',
'-u=johnsmith@gmail.com',
'-p=secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
sout, serr = outp.communicate()
self.assertTrue(b'-u' in serr)
self.assertTrue(b'johnsmith' not in serr)
self.assertTrue(b'-p' in serr)
self.assertTrue(b'secret' not in serr)
if __name__ == '__main__':
unittest.main()

View File

@ -1,11 +1,10 @@
#!/usr/bin/env python #!/usr/bin/env python
# coding: utf-8 # -*- coding: utf-8 -*-
from __future__ import absolute_import, unicode_literals from __future__ import absolute_import, unicode_literals
import collections import collections
import contextlib import contextlib
import copy
import datetime import datetime
import errno import errno
import fileinput import fileinput
@ -131,9 +130,6 @@ class YoutubeDL(object):
username: Username for authentication purposes. username: Username for authentication purposes.
password: Password for authentication purposes. password: Password for authentication purposes.
videopassword: Password for accessing a video. videopassword: Password for accessing a video.
ap_mso: Adobe Pass multiple-system operator identifier.
ap_username: Multiple-system operator account username.
ap_password: Multiple-system operator account password.
usenetrc: Use netrc for authentication instead. usenetrc: Use netrc for authentication instead.
verbose: Print additional info to stdout. verbose: Print additional info to stdout.
quiet: Do not print messages to stdout. quiet: Do not print messages to stdout.
@ -200,8 +196,8 @@ class YoutubeDL(object):
prefer_insecure: Use HTTP instead of HTTPS to retrieve information. prefer_insecure: Use HTTP instead of HTTPS to retrieve information.
At the moment, this is only supported by YouTube. At the moment, this is only supported by YouTube.
proxy: URL of the proxy server to use proxy: URL of the proxy server to use
geo_verification_proxy: URL of the proxy to use for IP address verification cn_verification_proxy: URL of the proxy to use for IP address verification
on geo-restricted sites. (Experimental) on Chinese sites. (Experimental)
socket_timeout: Time to wait for unresponsive hosts, in seconds socket_timeout: Time to wait for unresponsive hosts, in seconds
bidi_workaround: Work around buggy terminals without bidirectional text bidi_workaround: Work around buggy terminals without bidirectional text
support, using fridibi support, using fridibi
@ -252,16 +248,7 @@ class YoutubeDL(object):
source_address: (Experimental) Client-side IP address to bind to. source_address: (Experimental) Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the call_home: Boolean, true iff we are allowed to contact the
youtube-dl servers for debugging. youtube-dl servers for debugging.
sleep_interval: Number of seconds to sleep before each download when sleep_interval: Number of seconds to sleep before each download.
used alone or a lower bound of a range for randomized
sleep before each download (minimum possible number
of seconds to sleep) when used along with
max_sleep_interval.
max_sleep_interval:Upper bound of a range for randomized sleep before each
download (maximum possible number of seconds to sleep).
Must only be used along with sleep_interval.
Actual sleep time will be a random float from range
[sleep_interval; max_sleep_interval].
listformats: Print an overview of available video formats and exit. listformats: Print an overview of available video formats and exit.
list_thumbnails: Print a table of all thumbnails and exit. list_thumbnails: Print a table of all thumbnails and exit.
match_filter: A function that gets called with the info_dict of match_filter: A function that gets called with the info_dict of
@ -317,11 +304,6 @@ class YoutubeDL(object):
self.params.update(params) self.params.update(params)
self.cache = Cache(self) self.cache = Cache(self)
if self.params.get('cn_verification_proxy') is not None:
self.report_warning('--cn-verification-proxy is deprecated. Use --geo-verification-proxy instead.')
if self.params.get('geo_verification_proxy') is None:
self.params['geo_verification_proxy'] = self.params['cn_verification_proxy']
if params.get('bidi_workaround', False): if params.get('bidi_workaround', False):
try: try:
import pty import pty
@ -1064,9 +1046,9 @@ class YoutubeDL(object):
if isinstance(selector, list): if isinstance(selector, list):
fs = [_build_selector_function(s) for s in selector] fs = [_build_selector_function(s) for s in selector]
def selector_function(ctx): def selector_function(formats):
for f in fs: for f in fs:
for format in f(ctx): for format in f(formats):
yield format yield format
return selector_function return selector_function
elif selector.type == GROUP: elif selector.type == GROUP:
@ -1074,17 +1056,17 @@ class YoutubeDL(object):
elif selector.type == PICKFIRST: elif selector.type == PICKFIRST:
fs = [_build_selector_function(s) for s in selector.selector] fs = [_build_selector_function(s) for s in selector.selector]
def selector_function(ctx): def selector_function(formats):
for f in fs: for f in fs:
picked_formats = list(f(ctx)) picked_formats = list(f(formats))
if picked_formats: if picked_formats:
return picked_formats return picked_formats
return [] return []
elif selector.type == SINGLE: elif selector.type == SINGLE:
format_spec = selector.selector format_spec = selector.selector
def selector_function(ctx): def selector_function(formats):
formats = list(ctx['formats']) formats = list(formats)
if not formats: if not formats:
return return
if format_spec == 'all': if format_spec == 'all':
@ -1097,10 +1079,9 @@ class YoutubeDL(object):
if f.get('vcodec') != 'none' and f.get('acodec') != 'none'] if f.get('vcodec') != 'none' and f.get('acodec') != 'none']
if audiovideo_formats: if audiovideo_formats:
yield audiovideo_formats[format_idx] yield audiovideo_formats[format_idx]
# for extractors with incomplete formats (audio only (soundcloud) # for audio only (soundcloud) or video only (imgur) urls, select the best/worst audio format
# or video only (imgur)) we will fallback to best/worst elif (all(f.get('acodec') != 'none' for f in formats) or
# {video,audio}-only format all(f.get('vcodec') != 'none' for f in formats)):
elif ctx['incomplete_formats']:
yield formats[format_idx] yield formats[format_idx]
elif format_spec == 'bestaudio': elif format_spec == 'bestaudio':
audio_formats = [ audio_formats = [
@ -1174,18 +1155,17 @@ class YoutubeDL(object):
} }
video_selector, audio_selector = map(_build_selector_function, selector.selector) video_selector, audio_selector = map(_build_selector_function, selector.selector)
def selector_function(ctx): def selector_function(formats):
for pair in itertools.product( formats = list(formats)
video_selector(copy.deepcopy(ctx)), audio_selector(copy.deepcopy(ctx))): for pair in itertools.product(video_selector(formats), audio_selector(formats)):
yield _merge(pair) yield _merge(pair)
filters = [self._build_format_filter(f) for f in selector.filters] filters = [self._build_format_filter(f) for f in selector.filters]
def final_selector(ctx): def final_selector(formats):
ctx_copy = copy.deepcopy(ctx)
for _filter in filters: for _filter in filters:
ctx_copy['formats'] = list(filter(_filter, ctx_copy['formats'])) formats = list(filter(_filter, formats))
return selector_function(ctx_copy) return selector_function(formats)
return final_selector return final_selector
stream = io.BytesIO(format_spec.encode('utf-8')) stream = io.BytesIO(format_spec.encode('utf-8'))
@ -1243,10 +1223,6 @@ class YoutubeDL(object):
if 'title' not in info_dict: if 'title' not in info_dict:
raise ExtractorError('Missing "title" field in extractor result') raise ExtractorError('Missing "title" field in extractor result')
if not isinstance(info_dict['id'], compat_str):
self.report_warning('"id" field is not a string - forcing string conversion')
info_dict['id'] = compat_str(info_dict['id'])
if 'playlist' not in info_dict: if 'playlist' not in info_dict:
# It isn't part of a playlist # It isn't part of a playlist
info_dict['playlist'] = None info_dict['playlist'] = None
@ -1259,10 +1235,8 @@ class YoutubeDL(object):
info_dict['thumbnails'] = thumbnails = [{'url': thumbnail}] info_dict['thumbnails'] = thumbnails = [{'url': thumbnail}]
if thumbnails: if thumbnails:
thumbnails.sort(key=lambda t: ( thumbnails.sort(key=lambda t: (
t.get('preference') if t.get('preference') is not None else -1, t.get('preference'), t.get('width'), t.get('height'),
t.get('width') if t.get('width') is not None else -1, t.get('id'), t.get('url')))
t.get('height') if t.get('height') is not None else -1,
t.get('id') if t.get('id') is not None else '', t.get('url')))
for i, t in enumerate(thumbnails): for i, t in enumerate(thumbnails):
t['url'] = sanitize_url(t['url']) t['url'] = sanitize_url(t['url'])
if t.get('width') and t.get('height'): if t.get('width') and t.get('height'):
@ -1304,7 +1278,7 @@ class YoutubeDL(object):
for subtitle_format in subtitle: for subtitle_format in subtitle:
if subtitle_format.get('url'): if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url']) subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if subtitle_format.get('ext') is None: if 'ext' not in subtitle_format:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower() subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
if self.params.get('listsubtitles', False): if self.params.get('listsubtitles', False):
@ -1359,7 +1333,7 @@ class YoutubeDL(object):
note=' ({0})'.format(format['format_note']) if format.get('format_note') is not None else '', note=' ({0})'.format(format['format_note']) if format.get('format_note') is not None else '',
) )
# Automatically determine file extension if missing # Automatically determine file extension if missing
if format.get('ext') is None: if 'ext' not in format:
format['ext'] = determine_ext(format['url']).lower() format['ext'] = determine_ext(format['url']).lower()
# Automatically determine protocol if missing (useful for format # Automatically determine protocol if missing (useful for format
# selection purposes) # selection purposes)
@ -1394,34 +1368,7 @@ class YoutubeDL(object):
req_format_list.append('best') req_format_list.append('best')
req_format = '/'.join(req_format_list) req_format = '/'.join(req_format_list)
format_selector = self.build_format_selector(req_format) format_selector = self.build_format_selector(req_format)
formats_to_download = list(format_selector(formats))
# While in format selection we may need to have an access to the original
# format set in order to calculate some metrics or do some processing.
# For now we need to be able to guess whether original formats provided
# by extractor are incomplete or not (i.e. whether extractor provides only
# video-only or audio-only formats) for proper formats selection for
# extractors with such incomplete formats (see
# https://github.com/rg3/youtube-dl/pull/5556).
# Since formats may be filtered during format selection and may not match
# the original formats the results may be incorrect. Thus original formats
# or pre-calculated metrics should be passed to format selection routines
# as well.
# We will pass a context object containing all necessary additional data
# instead of just formats.
# This fixes incorrect format selection issue (see
# https://github.com/rg3/youtube-dl/issues/10083).
incomplete_formats = (
# All formats are video-only or
all(f.get('vcodec') != 'none' and f.get('acodec') == 'none' for f in formats) or
# all formats are audio-only
all(f.get('vcodec') == 'none' and f.get('acodec') != 'none' for f in formats))
ctx = {
'formats': formats,
'incomplete_formats': incomplete_formats,
}
formats_to_download = list(format_selector(ctx))
if not formats_to_download: if not formats_to_download:
raise ExtractorError('requested format not available', raise ExtractorError('requested format not available',
expected=True) expected=True)
@ -1608,9 +1555,7 @@ class YoutubeDL(object):
self.to_screen('[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format)) self.to_screen('[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format))
else: else:
self.to_screen('[info] Writing video subtitles to: ' + sub_filename) self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
# Use newline='' to prevent conversion of newline characters with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
# See https://github.com/rg3/youtube-dl/issues/10268
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
subfile.write(sub_data) subfile.write(sub_data)
except (OSError, IOError): except (OSError, IOError):
self.report_error('Cannot write subtitles file ' + sub_filename) self.report_error('Cannot write subtitles file ' + sub_filename)
@ -1658,7 +1603,7 @@ class YoutubeDL(object):
video_ext, audio_ext = audio.get('ext'), video.get('ext') video_ext, audio_ext = audio.get('ext'), video.get('ext')
if video_ext and audio_ext: if video_ext and audio_ext:
COMPATIBLE_EXTS = ( COMPATIBLE_EXTS = (
('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'ismv', 'isma'), ('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v'),
('webm') ('webm')
) )
for exts in COMPATIBLE_EXTS: for exts in COMPATIBLE_EXTS:

View File

@ -1,5 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
# coding: utf-8 # -*- coding: utf-8 -*-
from __future__ import unicode_literals from __future__ import unicode_literals
@ -18,6 +18,7 @@ from .options import (
from .compat import ( from .compat import (
compat_expanduser, compat_expanduser,
compat_getpass, compat_getpass,
compat_print,
compat_shlex_split, compat_shlex_split,
workaround_optparse_bug9161, workaround_optparse_bug9161,
) )
@ -34,14 +35,12 @@ from .utils import (
setproctitle, setproctitle,
std_headers, std_headers,
write_string, write_string,
render_table,
) )
from .update import update_self from .update import update_self
from .downloader import ( from .downloader import (
FileDownloader, FileDownloader,
) )
from .extractor import gen_extractors, list_extractors from .extractor import gen_extractors, list_extractors
from .extractor.adobepass import MSO_INFO
from .YoutubeDL import YoutubeDL from .YoutubeDL import YoutubeDL
@ -77,7 +76,7 @@ def _real_main(argv=None):
# Dump user agent # Dump user agent
if opts.dump_user_agent: if opts.dump_user_agent:
write_string(std_headers['User-Agent'] + '\n', out=sys.stdout) compat_print(std_headers['User-Agent'])
sys.exit(0) sys.exit(0)
# Batch file verification # Batch file verification
@ -102,10 +101,10 @@ def _real_main(argv=None):
if opts.list_extractors: if opts.list_extractors:
for ie in list_extractors(opts.age_limit): for ie in list_extractors(opts.age_limit):
write_string(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else '') + '\n', out=sys.stdout) compat_print(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else ''))
matchedUrls = [url for url in all_urls if ie.suitable(url)] matchedUrls = [url for url in all_urls if ie.suitable(url)]
for mu in matchedUrls: for mu in matchedUrls:
write_string(' ' + mu + '\n', out=sys.stdout) compat_print(' ' + mu)
sys.exit(0) sys.exit(0)
if opts.list_extractor_descriptions: if opts.list_extractor_descriptions:
for ie in list_extractors(opts.age_limit): for ie in list_extractors(opts.age_limit):
@ -118,11 +117,7 @@ def _real_main(argv=None):
_SEARCHES = ('cute kittens', 'slithering pythons', 'falling cat', 'angry poodle', 'purple fish', 'running tortoise', 'sleeping bunny', 'burping cow') _SEARCHES = ('cute kittens', 'slithering pythons', 'falling cat', 'angry poodle', 'purple fish', 'running tortoise', 'sleeping bunny', 'burping cow')
_COUNTS = ('', '5', '10', 'all') _COUNTS = ('', '5', '10', 'all')
desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES)) desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
write_string(desc + '\n', out=sys.stdout) compat_print(desc)
sys.exit(0)
if opts.ap_list_mso:
table = [[mso_id, mso_info['name']] for mso_id, mso_info in MSO_INFO.items()]
write_string('Supported TV Providers:\n' + render_table(['mso', 'mso name'], table) + '\n', out=sys.stdout)
sys.exit(0) sys.exit(0)
# Conflicting, missing and erroneous options # Conflicting, missing and erroneous options
@ -130,16 +125,12 @@ def _real_main(argv=None):
parser.error('using .netrc conflicts with giving username/password') parser.error('using .netrc conflicts with giving username/password')
if opts.password is not None and opts.username is None: if opts.password is not None and opts.username is None:
parser.error('account username missing\n') parser.error('account username missing\n')
if opts.ap_password is not None and opts.ap_username is None:
parser.error('TV Provider account username missing\n')
if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid): if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid):
parser.error('using output template conflicts with using title, video ID or auto number') parser.error('using output template conflicts with using title, video ID or auto number')
if opts.usetitle and opts.useid: if opts.usetitle and opts.useid:
parser.error('using title conflicts with using video ID') parser.error('using title conflicts with using video ID')
if opts.username is not None and opts.password is None: if opts.username is not None and opts.password is None:
opts.password = compat_getpass('Type account password and press [Return]: ') opts.password = compat_getpass('Type account password and press [Return]: ')
if opts.ap_username is not None and opts.ap_password is None:
opts.ap_password = compat_getpass('Type TV provider account password and press [Return]: ')
if opts.ratelimit is not None: if opts.ratelimit is not None:
numeric_limit = FileDownloader.parse_bytes(opts.ratelimit) numeric_limit = FileDownloader.parse_bytes(opts.ratelimit)
if numeric_limit is None: if numeric_limit is None:
@ -155,18 +146,6 @@ def _real_main(argv=None):
if numeric_limit is None: if numeric_limit is None:
parser.error('invalid max_filesize specified') parser.error('invalid max_filesize specified')
opts.max_filesize = numeric_limit opts.max_filesize = numeric_limit
if opts.sleep_interval is not None:
if opts.sleep_interval < 0:
parser.error('sleep interval must be positive or 0')
if opts.max_sleep_interval is not None:
if opts.max_sleep_interval < 0:
parser.error('max sleep interval must be positive or 0')
if opts.max_sleep_interval < opts.sleep_interval:
parser.error('max sleep interval must be greater than or equal to min sleep interval')
else:
opts.max_sleep_interval = opts.sleep_interval
if opts.ap_mso and opts.ap_mso not in MSO_INFO:
parser.error('Unsupported TV Provider, use --ap-list-mso to get a list of supported TV Providers')
def parse_retries(retries): def parse_retries(retries):
if retries in ('inf', 'infinite'): if retries in ('inf', 'infinite'):
@ -266,6 +245,8 @@ def _real_main(argv=None):
postprocessors.append({ postprocessors.append({
'key': 'FFmpegEmbedSubtitle', 'key': 'FFmpegEmbedSubtitle',
}) })
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
if opts.embedthumbnail: if opts.embedthumbnail:
already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails
postprocessors.append({ postprocessors.append({
@ -274,10 +255,6 @@ def _real_main(argv=None):
}) })
if not already_have_thumbnail: if not already_have_thumbnail:
opts.writethumbnail = True opts.writethumbnail = True
# XAttrMetadataPP should be run after post-processors that may change file
# contents
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
# Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way. # Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way.
# So if the user is able to remove the file before your postprocessor runs it might cause a few problems. # So if the user is able to remove the file before your postprocessor runs it might cause a few problems.
if opts.exec_cmd: if opts.exec_cmd:
@ -285,6 +262,12 @@ def _real_main(argv=None):
'key': 'ExecAfterDownload', 'key': 'ExecAfterDownload',
'exec_cmd': opts.exec_cmd, 'exec_cmd': opts.exec_cmd,
}) })
if opts.xattr_set_filesize:
try:
import xattr
xattr # Confuse flake8
except ImportError:
parser.error('setting filesize xattr requested but python-xattr is not available')
external_downloader_args = None external_downloader_args = None
if opts.external_downloader_args: if opts.external_downloader_args:
external_downloader_args = compat_shlex_split(opts.external_downloader_args) external_downloader_args = compat_shlex_split(opts.external_downloader_args)
@ -301,9 +284,6 @@ def _real_main(argv=None):
'password': opts.password, 'password': opts.password,
'twofactor': opts.twofactor, 'twofactor': opts.twofactor,
'videopassword': opts.videopassword, 'videopassword': opts.videopassword,
'ap_mso': opts.ap_mso,
'ap_username': opts.ap_username,
'ap_password': opts.ap_password,
'quiet': (opts.quiet or any_getting or any_printing), 'quiet': (opts.quiet or any_getting or any_printing),
'no_warnings': opts.no_warnings, 'no_warnings': opts.no_warnings,
'forceurl': opts.geturl, 'forceurl': opts.geturl,
@ -329,7 +309,6 @@ def _real_main(argv=None):
'nooverwrites': opts.nooverwrites, 'nooverwrites': opts.nooverwrites,
'retries': opts.retries, 'retries': opts.retries,
'fragment_retries': opts.fragment_retries, 'fragment_retries': opts.fragment_retries,
'skip_unavailable_fragments': opts.skip_unavailable_fragments,
'buffersize': opts.buffersize, 'buffersize': opts.buffersize,
'noresizebuffer': opts.noresizebuffer, 'noresizebuffer': opts.noresizebuffer,
'continuedl': opts.continue_dl, 'continuedl': opts.continue_dl,
@ -392,7 +371,6 @@ def _real_main(argv=None):
'source_address': opts.source_address, 'source_address': opts.source_address,
'call_home': opts.call_home, 'call_home': opts.call_home,
'sleep_interval': opts.sleep_interval, 'sleep_interval': opts.sleep_interval,
'max_sleep_interval': opts.max_sleep_interval,
'external_downloader': opts.external_downloader, 'external_downloader': opts.external_downloader,
'list_thumbnails': opts.list_thumbnails, 'list_thumbnails': opts.list_thumbnails,
'playlist_items': opts.playlist_items, 'playlist_items': opts.playlist_items,
@ -405,8 +383,6 @@ def _real_main(argv=None):
'external_downloader_args': external_downloader_args, 'external_downloader_args': external_downloader_args,
'postprocessor_args': postprocessor_args, 'postprocessor_args': postprocessor_args,
'cn_verification_proxy': opts.cn_verification_proxy, 'cn_verification_proxy': opts.cn_verification_proxy,
'geo_verification_proxy': opts.geo_verification_proxy,
} }
with YoutubeDL(ydl_opts) as ydl: with YoutubeDL(ydl_opts) as ydl:

File diff suppressed because it is too large Load Diff

View File

@ -7,7 +7,6 @@ from .http import HttpFD
from .rtmp import RtmpFD from .rtmp import RtmpFD
from .dash import DashSegmentsFD from .dash import DashSegmentsFD
from .rtsp import RtspFD from .rtsp import RtspFD
from .ism import IsmFD
from .external import ( from .external import (
get_external_downloader, get_external_downloader,
FFmpegFD, FFmpegFD,
@ -25,7 +24,6 @@ PROTOCOL_MAP = {
'rtsp': RtspFD, 'rtsp': RtspFD,
'f4m': F4mFD, 'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD, 'http_dash_segments': DashSegmentsFD,
'ism': IsmFD,
} }

View File

@ -4,7 +4,6 @@ import os
import re import re
import sys import sys
import time import time
import random
from ..compat import compat_os_name from ..compat import compat_os_name
from ..utils import ( from ..utils import (
@ -343,10 +342,8 @@ class FileDownloader(object):
}) })
return True return True
min_sleep_interval = self.params.get('sleep_interval') sleep_interval = self.params.get('sleep_interval')
if min_sleep_interval: if sleep_interval:
max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
self.to_screen('[download] Sleeping %s seconds...' % sleep_interval) self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
time.sleep(sleep_interval) time.sleep(sleep_interval)

View File

@ -1,6 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os import os
import re
from .fragment import FragmentFD from .fragment import FragmentFD
from ..compat import compat_urllib_error from ..compat import compat_urllib_error
@ -18,32 +19,32 @@ class DashSegmentsFD(FragmentFD):
FD_NAME = 'dashsegments' FD_NAME = 'dashsegments'
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
segments = info_dict['fragments'][:1] if self.params.get( base_url = info_dict['url']
'test', False) else info_dict['fragments'] segment_urls = [info_dict['segment_urls'][0]] if self.params.get('test', False) else info_dict['segment_urls']
initialization_url = info_dict.get('initialization_url')
ctx = { ctx = {
'filename': filename, 'filename': filename,
'total_frags': len(segments), 'total_frags': len(segment_urls) + (1 if initialization_url else 0),
} }
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
segments_filenames = [] segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0) fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
def process_segment(segment, tmp_filename, num): def append_url_to_file(target_url, tmp_filename, segment_name):
segment_url = segment['url']
segment_name = 'Frag%d' % num
target_filename = '%s-%s' % (tmp_filename, segment_name) target_filename = '%s-%s' % (tmp_filename, segment_name)
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = num == 0 or not skip_unavailable_fragments
count = 0 count = 0
while count <= fragment_retries: while count <= fragment_retries:
try: try:
success = ctx['dl'].download(target_filename, {'url': segment_url}) success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
if not success: if not success:
return False return False
down, target_sanitized = sanitize_open(target_filename, 'rb') down, target_sanitized = sanitize_open(target_filename, 'rb')
@ -51,27 +52,26 @@ class DashSegmentsFD(FragmentFD):
down.close() down.close()
segments_filenames.append(target_sanitized) segments_filenames.append(target_sanitized)
break break
except compat_urllib_error.HTTPError as err: except (compat_urllib_error.HTTPError, ) as err:
# YouTube may often return 404 HTTP error for a fragment causing the # YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately # whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attemps # retried with the same request data this usually succeeds (1-2 attemps
# is usually enough) thus allowing to download the whole file successfully. # is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any # So, we will retry all fragments that fail with 404 HTTP error for now.
# HTTP error. if err.code != 404:
raise
# Retry fragment
count += 1 count += 1
if count <= fragment_retries: if count <= fragment_retries:
self.report_retry_fragment(err, segment_name, count, fragment_retries) self.report_retry_fragment(segment_name, count, fragment_retries)
if count > fragment_retries: if count > fragment_retries:
if not fatal:
self.report_skip_fragment(segment_name)
return True
self.report_error('giving up after %s fragment retries' % fragment_retries) self.report_error('giving up after %s fragment retries' % fragment_retries)
return False return False
return True
for i, segment in enumerate(segments): if initialization_url:
if not process_segment(segment, ctx['tmpfilename'], i): append_url_to_file(initialization_url, ctx['tmpfilename'], 'Init')
return False for i, segment_url in enumerate(segment_urls):
append_url_to_file(segment_url, ctx['tmpfilename'], 'Seg%d' % i)
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@ -85,7 +85,7 @@ class ExternalFD(FileDownloader):
cmd, stderr=subprocess.PIPE) cmd, stderr=subprocess.PIPE)
_, stderr = p.communicate() _, stderr = p.communicate()
if p.returncode != 0: if p.returncode != 0:
self.to_stderr(stderr.decode('utf-8', 'replace')) self.to_stderr(stderr)
return p.returncode return p.returncode
@ -96,12 +96,6 @@ class CurlFD(ExternalFD):
cmd = [self.exe, '--location', '-o', tmpfilename] cmd = [self.exe, '--location', '-o', tmpfilename]
for key, val in info_dict['http_headers'].items(): for key, val in info_dict['http_headers'].items():
cmd += ['--header', '%s: %s' % (key, val)] cmd += ['--header', '%s: %s' % (key, val)]
cmd += self._bool_option('--continue-at', 'continuedl', '-', '0')
cmd += self._valueless_option('--silent', 'noprogress')
cmd += self._valueless_option('--verbose', 'verbose')
cmd += self._option('--limit-rate', 'ratelimit')
cmd += self._option('--retry', 'retries')
cmd += self._option('--max-filesize', 'max_filesize')
cmd += self._option('--interface', 'source_address') cmd += self._option('--interface', 'source_address')
cmd += self._option('--proxy', 'proxy') cmd += self._option('--proxy', 'proxy')
cmd += self._valueless_option('--insecure', 'nocheckcertificate') cmd += self._valueless_option('--insecure', 'nocheckcertificate')
@ -109,16 +103,6 @@ class CurlFD(ExternalFD):
cmd += ['--', info_dict['url']] cmd += ['--', info_dict['url']]
return cmd return cmd
def _call_downloader(self, tmpfilename, info_dict):
cmd = [encodeArgument(a) for a in self._make_cmd(tmpfilename, info_dict)]
self._debug_cmd(cmd)
# curl writes the progress to stderr so don't capture it.
p = subprocess.Popen(cmd)
p.communicate()
return p.returncode
class AxelFD(ExternalFD): class AxelFD(ExternalFD):
AVAILABLE_OPT = '-V' AVAILABLE_OPT = '-V'
@ -220,19 +204,12 @@ class FFmpegFD(ExternalFD):
if proxy: if proxy:
if not re.match(r'^[\da-zA-Z]+://', proxy): if not re.match(r'^[\da-zA-Z]+://', proxy):
proxy = 'http://%s' % proxy proxy = 'http://%s' % proxy
if proxy.startswith('socks'):
self.report_warning(
'%s does not support SOCKS proxies. Downloading is likely to fail. '
'Consider adding --hls-prefer-native to your command.' % self.get_basename())
# Since December 2015 ffmpeg supports -http_proxy option (see # Since December 2015 ffmpeg supports -http_proxy option (see
# http://git.videolan.org/?p=ffmpeg.git;a=commit;h=b4eb1f29ebddd60c41a2eb39f5af701e38e0d3fd) # http://git.videolan.org/?p=ffmpeg.git;a=commit;h=b4eb1f29ebddd60c41a2eb39f5af701e38e0d3fd)
# We could switch to the following code if we are able to detect version properly # We could switch to the following code if we are able to detect version properly
# args += ['-http_proxy', proxy] # args += ['-http_proxy', proxy]
env = os.environ.copy() env = os.environ.copy()
compat_setenv('HTTP_PROXY', proxy, env=env) compat_setenv('HTTP_PROXY', proxy, env=env)
compat_setenv('http_proxy', proxy, env=env)
protocol = info_dict.get('protocol') protocol = info_dict.get('protocol')

View File

@ -196,11 +196,6 @@ def build_fragments_list(boot_info):
first_frag_number = fragment_run_entry_table[0]['first'] first_frag_number = fragment_run_entry_table[0]['first']
fragments_counter = itertools.count(first_frag_number) fragments_counter = itertools.count(first_frag_number)
for segment, fragments_count in segment_run_table['segment_run']: for segment, fragments_count in segment_run_table['segment_run']:
# In some live HDS streams (for example Rai), `fragments_count` is
# abnormal and causing out-of-memory errors. It's OK to change the
# number of fragments for live streams as they are updated periodically
if fragments_count == 4294967295 and boot_info['live']:
fragments_count = 2
for _ in range(fragments_count): for _ in range(fragments_count):
res.append((segment, next(fragments_counter))) res.append((segment, next(fragments_counter)))
@ -314,8 +309,7 @@ class F4mFD(FragmentFD):
man_url = info_dict['url'] man_url = info_dict['url']
requested_bitrate = info_dict.get('tbr') requested_bitrate = info_dict.get('tbr')
self.to_screen('[%s] Downloading f4m manifest' % self.FD_NAME) self.to_screen('[%s] Downloading f4m manifest' % self.FD_NAME)
urlh = self.ydl.urlopen(man_url)
urlh = self.ydl.urlopen(self._prepare_url(info_dict, man_url))
man_url = urlh.geturl() man_url = urlh.geturl()
# Some manifests may be malformed, e.g. prosiebensat1 generated manifests # Some manifests may be malformed, e.g. prosiebensat1 generated manifests
# (see https://github.com/rg3/youtube-dl/issues/6215#issuecomment-121704244 # (see https://github.com/rg3/youtube-dl/issues/6215#issuecomment-121704244
@ -335,11 +329,7 @@ class F4mFD(FragmentFD):
base_url = compat_urlparse.urljoin(man_url, media.attrib['url']) base_url = compat_urlparse.urljoin(man_url, media.attrib['url'])
bootstrap_node = doc.find(_add_ns('bootstrapInfo')) bootstrap_node = doc.find(_add_ns('bootstrapInfo'))
# From Adobe F4M 3.0 spec: boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, base_url)
# The <baseURL> element SHALL be the base URL for all relative
# (HTTP-based) URLs in the manifest. If <baseURL> is not present, said
# URLs should be relative to the location of the containing document.
boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, man_url)
live = boot_info['live'] live = boot_info['live']
metadata_node = media.find(_add_ns('metadata')) metadata_node = media.find(_add_ns('metadata'))
if metadata_node is not None: if metadata_node is not None:
@ -388,10 +378,7 @@ class F4mFD(FragmentFD):
url_parsed = base_url_parsed._replace(path=base_url_parsed.path + name, query='&'.join(query)) url_parsed = base_url_parsed._replace(path=base_url_parsed.path + name, query='&'.join(query))
frag_filename = '%s-%s' % (ctx['tmpfilename'], name) frag_filename = '%s-%s' % (ctx['tmpfilename'], name)
try: try:
success = ctx['dl'].download(frag_filename, { success = ctx['dl'].download(frag_filename, {'url': url_parsed.geturl()})
'url': url_parsed.geturl(),
'http_headers': info_dict.get('http_headers'),
})
if not success: if not success:
return False return False
(down, frag_sanitized) = sanitize_open(frag_filename, 'rb') (down, frag_sanitized) = sanitize_open(frag_filename, 'rb')

View File

@ -6,10 +6,8 @@ import time
from .common import FileDownloader from .common import FileDownloader
from .http import HttpFD from .http import HttpFD
from ..utils import ( from ..utils import (
error_to_compat_str,
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
sanitized_Request,
) )
@ -24,23 +22,13 @@ class FragmentFD(FileDownloader):
Available options: Available options:
fragment_retries: Number of times to retry a fragment for HTTP error (DASH fragment_retries: Number of times to retry a fragment for HTTP error (DASH only)
and hlsnative only)
skip_unavailable_fragments:
Skip unavailable fragments (DASH and hlsnative only)
""" """
def report_retry_fragment(self, err, fragment_name, count, retries): def report_retry_fragment(self, fragment_name, count, retries):
self.to_screen( self.to_screen(
'[download] Got server HTTP error: %s. Retrying fragment %s (attempt %d of %s)...' '[download] Got server HTTP error. Retrying fragment %s (attempt %d of %s)...'
% (error_to_compat_str(err), fragment_name, count, self.format_retries(retries))) % (fragment_name, count, self.format_retries(retries)))
def report_skip_fragment(self, fragment_name):
self.to_screen('[download] Skipping fragment %s...' % fragment_name)
def _prepare_url(self, info_dict, url):
headers = info_dict.get('http_headers')
return sanitized_Request(url, None, headers) if headers else url
def _prepare_and_start_frag_download(self, ctx): def _prepare_and_start_frag_download(self, ctx):
self._prepare_frag_download(ctx) self._prepare_frag_download(ctx)

View File

@ -2,26 +2,14 @@ from __future__ import unicode_literals
import os.path import os.path
import re import re
import binascii
try:
from Crypto.Cipher import AES
can_decrypt_frag = True
except ImportError:
can_decrypt_frag = False
from .fragment import FragmentFD from .fragment import FragmentFD
from .external import FFmpegFD from .external import FFmpegFD
from ..compat import ( from ..compat import compat_urlparse
compat_urllib_error,
compat_urlparse,
compat_struct_pack,
)
from ..utils import ( from ..utils import (
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
parse_m3u8_attributes,
update_url_query,
) )
@ -31,40 +19,30 @@ class HlsFD(FragmentFD):
FD_NAME = 'hlsnative' FD_NAME = 'hlsnative'
@staticmethod @staticmethod
def can_download(manifest, info_dict): def can_download(manifest):
UNSUPPORTED_FEATURES = ( UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1] r'#EXT-X-KEY:METHOD=(?!NONE)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2] r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
# Live streams heuristic does not always work (e.g. geo restricted to Germany # Live streams heuristic does not always work (e.g. geo restricted to Germany
# http://hls-geo.daserste.de/i/videoportal/Film/c_620000/622873/format,716451,716457,716450,716458,716459,.mp4.csmil/index_4_av.m3u8?null=0) # http://hls-geo.daserste.de/i/videoportal/Film/c_620000/622873/format,716451,716457,716450,716458,716459,.mp4.csmil/index_4_av.m3u8?null=0)
# r'#EXT-X-MEDIA-SEQUENCE:(?!0$)', # live streams [3] # r'#EXT-X-MEDIA-SEQUENCE:(?!0$)', # live streams [3]
r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of
# This heuristic also is not correct since segments may not be appended as well. # event media playlists [4]
# Twitch vods of finished streams have EXT-X-PLAYLIST-TYPE:EVENT despite
# no segments will definitely be appended to the end of the playlist.
# r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of
# # event media playlists [4]
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4 # 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4
# 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2 # 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2 # 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2
# 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5 # 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5
) )
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES] return all(not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES)
check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
check_results.append(not info_dict.get('is_live'))
return all(check_results)
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
man_url = info_dict['url'] man_url = info_dict['url']
self.to_screen('[%s] Downloading m3u8 manifest' % self.FD_NAME) self.to_screen('[%s] Downloading m3u8 manifest' % self.FD_NAME)
manifest = self.ydl.urlopen(man_url).read()
manifest = self.ydl.urlopen(self._prepare_url(info_dict, man_url)).read()
s = manifest.decode('utf-8', 'ignore') s = manifest.decode('utf-8', 'ignore')
if not self.can_download(s, info_dict): if not self.can_download(s):
self.report_warning( self.report_warning(
'hlsnative has detected features it does not support, ' 'hlsnative has detected features it does not support, '
'extraction will be delegated to ffmpeg') 'extraction will be delegated to ffmpeg')
@ -73,97 +51,36 @@ class HlsFD(FragmentFD):
fd.add_progress_hook(ph) fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict) return fd.real_download(filename, info_dict)
total_frags = 0 fragment_urls = []
for line in s.splitlines(): for line in s.splitlines():
line = line.strip() line = line.strip()
if line and not line.startswith('#'): if line and not line.startswith('#'):
total_frags += 1 segment_url = (
line
if re.match(r'^https?://', line)
else compat_urlparse.urljoin(man_url, line))
fragment_urls.append(segment_url)
# We only download the first fragment during the test
if self.params.get('test', False):
break
ctx = { ctx = {
'filename': filename, 'filename': filename,
'total_frags': total_frags, 'total_frags': len(fragment_urls),
} }
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
test = self.params.get('test', False)
extra_query = None
extra_param_to_segment_url = info_dict.get('extra_param_to_segment_url')
if extra_param_to_segment_url:
extra_query = compat_urlparse.parse_qs(extra_param_to_segment_url)
i = 0
media_sequence = 0
decrypt_info = {'METHOD': 'NONE'}
frags_filenames = [] frags_filenames = []
for line in s.splitlines(): for i, frag_url in enumerate(fragment_urls):
line = line.strip() frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
if line: success = ctx['dl'].download(frag_filename, {'url': frag_url})
if not line.startswith('#'): if not success:
frag_url = ( return False
line down, frag_sanitized = sanitize_open(frag_filename, 'rb')
if re.match(r'^https?://', line) ctx['dest_stream'].write(down.read())
else compat_urlparse.urljoin(man_url, line)) down.close()
frag_name = 'Frag%d' % i frags_filenames.append(frag_sanitized)
frag_filename = '%s-%s' % (ctx['tmpfilename'], frag_name)
if extra_query:
frag_url = update_url_query(frag_url, extra_query)
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(frag_filename, {
'url': frag_url,
'http_headers': info_dict.get('http_headers'),
})
if not success:
return False
down, frag_sanitized = sanitize_open(frag_filename, 'rb')
frag_content = down.read()
down.close()
break
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/rg3/youtube-dl/issues/10165,
# https://github.com/rg3/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_name, count, fragment_retries)
if count > fragment_retries:
if skip_unavailable_fragments:
i += 1
media_sequence += 1
self.report_skip_fragment(frag_name)
continue
self.report_error(
'giving up after %s fragment retries' % fragment_retries)
return False
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
ctx['dest_stream'].write(frag_content)
frags_filenames.append(frag_sanitized)
# We only download the first fragment during the test
if test:
break
i += 1
media_sequence += 1
elif line.startswith('#EXT-X-KEY'):
decrypt_info = parse_m3u8_attributes(line[11:])
if decrypt_info['METHOD'] == 'AES-128':
if 'IV' in decrypt_info:
decrypt_info['IV'] = binascii.unhexlify(decrypt_info['IV'][2:].zfill(32))
if not re.match(r'^https?://', decrypt_info['URI']):
decrypt_info['URI'] = compat_urlparse.urljoin(
man_url, decrypt_info['URI'])
if extra_query:
decrypt_info['URI'] = update_url_query(decrypt_info['URI'], extra_query)
decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read()
elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
media_sequence = int(line[22:])
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@ -13,9 +13,6 @@ from ..utils import (
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
sanitized_Request, sanitized_Request,
write_xattr,
XAttrMetadataError,
XAttrUnavailableError,
) )
@ -182,8 +179,9 @@ class HttpFD(FileDownloader):
if self.params.get('xattr_set_filesize', False) and data_len is not None: if self.params.get('xattr_set_filesize', False) and data_len is not None:
try: try:
write_xattr(tmpfilename, 'user.ytdl.filesize', str(data_len).encode('utf-8')) import xattr
except (XAttrUnavailableError, XAttrMetadataError) as err: xattr.setxattr(tmpfilename, 'user.ytdl.filesize', str(data_len))
except(OSError, IOError, ImportError) as err:
self.report_error('unable to set filesize xattr: %s' % str(err)) self.report_error('unable to set filesize xattr: %s' % str(err))
try: try:

View File

@ -1,271 +0,0 @@
from __future__ import unicode_literals
import os
import time
import struct
import binascii
import io
from .fragment import FragmentFD
from ..compat import compat_urllib_error
from ..utils import (
sanitize_open,
encodeFilename,
)
u8 = struct.Struct(b'>B')
u88 = struct.Struct(b'>Bx')
u16 = struct.Struct(b'>H')
u1616 = struct.Struct(b'>Hxx')
u32 = struct.Struct(b'>I')
u64 = struct.Struct(b'>Q')
s88 = struct.Struct(b'>bx')
s16 = struct.Struct(b'>h')
s1616 = struct.Struct(b'>hxx')
s32 = struct.Struct(b'>i')
unity_matrix = (s32.pack(0x10000) + s32.pack(0) * 3) * 2 + s32.pack(0x40000000)
TRACK_ENABLED = 0x1
TRACK_IN_MOVIE = 0x2
TRACK_IN_PREVIEW = 0x4
SELF_CONTAINED = 0x1
def box(box_type, payload):
return u32.pack(8 + len(payload)) + box_type + payload
def full_box(box_type, version, flags, payload):
return box(box_type, u8.pack(version) + u32.pack(flags)[1:] + payload)
def write_piff_header(stream, params):
track_id = params['track_id']
fourcc = params['fourcc']
duration = params['duration']
timescale = params.get('timescale', 10000000)
language = params.get('language', 'und')
height = params.get('height', 0)
width = params.get('width', 0)
is_audio = width == 0 and height == 0
creation_time = modification_time = int(time.time())
ftyp_payload = b'isml' # major brand
ftyp_payload += u32.pack(1) # minor version
ftyp_payload += b'piff' + b'iso2' # compatible brands
stream.write(box(b'ftyp', ftyp_payload)) # File Type Box
mvhd_payload = u64.pack(creation_time)
mvhd_payload += u64.pack(modification_time)
mvhd_payload += u32.pack(timescale)
mvhd_payload += u64.pack(duration)
mvhd_payload += s1616.pack(1) # rate
mvhd_payload += s88.pack(1) # volume
mvhd_payload += u16.pack(0) # reserved
mvhd_payload += u32.pack(0) * 2 # reserved
mvhd_payload += unity_matrix
mvhd_payload += u32.pack(0) * 6 # pre defined
mvhd_payload += u32.pack(0xffffffff) # next track id
moov_payload = full_box(b'mvhd', 1, 0, mvhd_payload) # Movie Header Box
tkhd_payload = u64.pack(creation_time)
tkhd_payload += u64.pack(modification_time)
tkhd_payload += u32.pack(track_id) # track id
tkhd_payload += u32.pack(0) # reserved
tkhd_payload += u64.pack(duration)
tkhd_payload += u32.pack(0) * 2 # reserved
tkhd_payload += s16.pack(0) # layer
tkhd_payload += s16.pack(0) # alternate group
tkhd_payload += s88.pack(1 if is_audio else 0) # volume
tkhd_payload += u16.pack(0) # reserved
tkhd_payload += unity_matrix
tkhd_payload += u1616.pack(width)
tkhd_payload += u1616.pack(height)
trak_payload = full_box(b'tkhd', 1, TRACK_ENABLED | TRACK_IN_MOVIE | TRACK_IN_PREVIEW, tkhd_payload) # Track Header Box
mdhd_payload = u64.pack(creation_time)
mdhd_payload += u64.pack(modification_time)
mdhd_payload += u32.pack(timescale)
mdhd_payload += u64.pack(duration)
mdhd_payload += u16.pack(((ord(language[0]) - 0x60) << 10) | ((ord(language[1]) - 0x60) << 5) | (ord(language[2]) - 0x60))
mdhd_payload += u16.pack(0) # pre defined
mdia_payload = full_box(b'mdhd', 1, 0, mdhd_payload) # Media Header Box
hdlr_payload = u32.pack(0) # pre defined
hdlr_payload += b'soun' if is_audio else b'vide' # handler type
hdlr_payload += u32.pack(0) * 3 # reserved
hdlr_payload += (b'Sound' if is_audio else b'Video') + b'Handler\0' # name
mdia_payload += full_box(b'hdlr', 0, 0, hdlr_payload) # Handler Reference Box
if is_audio:
smhd_payload = s88.pack(0) # balance
smhd_payload = u16.pack(0) # reserved
media_header_box = full_box(b'smhd', 0, 0, smhd_payload) # Sound Media Header
else:
vmhd_payload = u16.pack(0) # graphics mode
vmhd_payload += u16.pack(0) * 3 # opcolor
media_header_box = full_box(b'vmhd', 0, 1, vmhd_payload) # Video Media Header
minf_payload = media_header_box
dref_payload = u32.pack(1) # entry count
dref_payload += full_box(b'url ', 0, SELF_CONTAINED, b'') # Data Entry URL Box
dinf_payload = full_box(b'dref', 0, 0, dref_payload) # Data Reference Box
minf_payload += box(b'dinf', dinf_payload) # Data Information Box
stsd_payload = u32.pack(1) # entry count
sample_entry_payload = u8.pack(0) * 6 # reserved
sample_entry_payload += u16.pack(1) # data reference index
if is_audio:
sample_entry_payload += u32.pack(0) * 2 # reserved
sample_entry_payload += u16.pack(params.get('channels', 2))
sample_entry_payload += u16.pack(params.get('bits_per_sample', 16))
sample_entry_payload += u16.pack(0) # pre defined
sample_entry_payload += u16.pack(0) # reserved
sample_entry_payload += u1616.pack(params['sampling_rate'])
if fourcc == 'AACL':
sample_entry_box = box(b'mp4a', sample_entry_payload)
else:
sample_entry_payload = sample_entry_payload
sample_entry_payload += u16.pack(0) # pre defined
sample_entry_payload += u16.pack(0) # reserved
sample_entry_payload += u32.pack(0) * 3 # pre defined
sample_entry_payload += u16.pack(width)
sample_entry_payload += u16.pack(height)
sample_entry_payload += u1616.pack(0x48) # horiz resolution 72 dpi
sample_entry_payload += u1616.pack(0x48) # vert resolution 72 dpi
sample_entry_payload += u32.pack(0) # reserved
sample_entry_payload += u16.pack(1) # frame count
sample_entry_payload += u8.pack(0) * 32 # compressor name
sample_entry_payload += u16.pack(0x18) # depth
sample_entry_payload += s16.pack(-1) # pre defined
codec_private_data = binascii.unhexlify(params['codec_private_data'])
if fourcc in ('H264', 'AVC1'):
sps, pps = codec_private_data.split(u32.pack(1))[1:]
avcc_payload = u8.pack(1) # configuration version
avcc_payload += sps[1:4] # avc profile indication + profile compatibility + avc level indication
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete represenation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(1) # reserved (0) + number of sps (0000001)
avcc_payload += u16.pack(len(sps))
avcc_payload += sps
avcc_payload += u8.pack(1) # number of pps
avcc_payload += u16.pack(len(pps))
avcc_payload += pps
sample_entry_payload += box(b'avcC', avcc_payload) # AVC Decoder Configuration Record
sample_entry_box = box(b'avc1', sample_entry_payload) # AVC Simple Entry
stsd_payload += sample_entry_box
stbl_payload = full_box(b'stsd', 0, 0, stsd_payload) # Sample Description Box
stts_payload = u32.pack(0) # entry count
stbl_payload += full_box(b'stts', 0, 0, stts_payload) # Decoding Time to Sample Box
stsc_payload = u32.pack(0) # entry count
stbl_payload += full_box(b'stsc', 0, 0, stsc_payload) # Sample To Chunk Box
stco_payload = u32.pack(0) # entry count
stbl_payload += full_box(b'stco', 0, 0, stco_payload) # Chunk Offset Box
minf_payload += box(b'stbl', stbl_payload) # Sample Table Box
mdia_payload += box(b'minf', minf_payload) # Media Information Box
trak_payload += box(b'mdia', mdia_payload) # Media Box
moov_payload += box(b'trak', trak_payload) # Track Box
mehd_payload = u64.pack(duration)
mvex_payload = full_box(b'mehd', 1, 0, mehd_payload) # Movie Extends Header Box
trex_payload = u32.pack(track_id) # track id
trex_payload += u32.pack(1) # default sample description index
trex_payload += u32.pack(0) # default sample duration
trex_payload += u32.pack(0) # default sample size
trex_payload += u32.pack(0) # default sample flags
mvex_payload += full_box(b'trex', 0, 0, trex_payload) # Track Extends Box
moov_payload += box(b'mvex', mvex_payload) # Movie Extends Box
stream.write(box(b'moov', moov_payload)) # Movie Box
def extract_box_data(data, box_sequence):
data_reader = io.BytesIO(data)
while True:
box_size = u32.unpack(data_reader.read(4))[0]
box_type = data_reader.read(4)
if box_type == box_sequence[0]:
box_data = data_reader.read(box_size - 8)
if len(box_sequence) == 1:
return box_data
return extract_box_data(box_data, box_sequence[1:])
data_reader.seek(box_size - 8, 1)
class IsmFD(FragmentFD):
"""
Download segments in a ISM manifest
"""
FD_NAME = 'ism'
def real_download(self, filename, info_dict):
segments = info_dict['fragments'][:1] if self.params.get(
'test', False) else info_dict['fragments']
ctx = {
'filename': filename,
'total_frags': len(segments),
}
self._prepare_and_start_frag_download(ctx)
segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
track_written = False
for i, segment in enumerate(segments):
segment_url = segment['url']
segment_name = 'Frag%d' % i
target_filename = '%s-%s' % (ctx['tmpfilename'], segment_name)
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': segment_url})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
down_data = down.read()
if not track_written:
tfhd_data = extract_box_data(down_data, [b'moof', b'traf', b'tfhd'])
info_dict['_download_params']['track_id'] = u32.unpack(tfhd_data[4:8])[0]
write_piff_header(ctx['dest_stream'], info_dict['_download_params'])
track_written = True
ctx['dest_stream'].write(down_data)
down.close()
segments_filenames.append(target_sanitized)
break
except compat_urllib_error.HTTPError as err:
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, segment_name, count, fragment_retries)
if count > fragment_retries:
if skip_unavailable_fragments:
self.report_skip_fragment(segment_name)
continue
self.report_error('giving up after %s fragment retries' % fragment_retries)
return False
self._finish_frag_download(ctx)
for segment_file in segments_filenames:
os.remove(encodeFilename(segment_file))
return True

View File

@ -7,13 +7,12 @@ from ..utils import (
ExtractorError, ExtractorError,
js_to_json, js_to_json,
int_or_none, int_or_none,
parse_iso8601,
) )
class ABCIE(InfoExtractor): class ABCIE(InfoExtractor):
IE_NAME = 'abc.net.au' IE_NAME = 'abc.net.au'
_VALID_URL = r'https?://(?:www\.)?abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)' _VALID_URL = r'https?://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334', 'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334',
@ -94,59 +93,3 @@ class ABCIE(InfoExtractor):
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
} }
class ABCIViewIE(InfoExtractor):
IE_NAME = 'abc.net.au:iview'
_VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)'
# ABC iview programs are normally available for 14 days only.
_TESTS = [{
'url': 'http://iview.abc.net.au/programs/diaries-of-a-broken-mind/ZX9735A001S00',
'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
'info_dict': {
'id': 'ZX9735A001S00',
'ext': 'mp4',
'title': 'Diaries Of A Broken Mind',
'description': 'md5:7de3903874b7a1be279fe6b68718fc9e',
'upload_date': '20161010',
'uploader_id': 'abc2',
'timestamp': 1476064920,
},
'skip': 'Video gone',
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_params = self._parse_json(self._search_regex(
r'videoParams\s*=\s*({.+?});', webpage, 'video params'), video_id)
title = video_params.get('title') or video_params['seriesTitle']
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
formats = self._extract_akamai_formats(stream['hds-unmetered'], video_id)
self._sort_formats(formats)
subtitles = {}
src_vtt = stream.get('captions', {}).get('src-vtt')
if src_vtt:
subtitles['en'] = [{
'url': src_vtt,
'ext': 'vtt',
}]
return {
'id': video_id,
'title': title,
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage),
'thumbnail': self._html_search_meta(['og:image', 'twitter:image:src'], webpage),
'duration': int_or_none(video_params.get('eventDuration')),
'timestamp': parse_iso8601(video_params.get('pubDate'), ' '),
'series': video_params.get('seriesTitle'),
'series_id': video_params.get('seriesHouseNumber') or video_id[:7],
'episode_number': int_or_none(self._html_search_meta('episodeNumber', webpage, default=None)),
'episode': self._html_search_meta('episode_title', webpage, default=None),
'uploader_id': video_params.get('channel'),
'formats': formats,
'subtitles': subtitles,
}

View File

@ -1,19 +1,13 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import parse_iso8601
int_or_none,
parse_iso8601,
)
class ABCOTVSIE(InfoExtractor): class Abc7NewsIE(InfoExtractor):
IE_NAME = 'abcotvs' _VALID_URL = r'https?://abc7news\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
IE_DESC = 'ABC Owned Television Stations'
_VALID_URL = r'https?://(?:abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/', 'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/',
@ -21,7 +15,7 @@ class ABCOTVSIE(InfoExtractor):
'id': '472581', 'id': '472581',
'display_id': 'east-bay-museum-celebrates-vintage-synthesizers', 'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
'ext': 'mp4', 'ext': 'mp4',
'title': 'East Bay museum celebrates vintage synthesizers', 'title': 'East Bay museum celebrates history of synthesized music',
'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10', 'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1421123075, 'timestamp': 1421123075,
@ -47,7 +41,7 @@ class ABCOTVSIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
m3u8 = self._html_search_meta( m3u8 = self._html_search_meta(
'contentURL', webpage, 'm3u8 url', fatal=True).split('?')[0] 'contentURL', webpage, 'm3u8 url', fatal=True)
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4') formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
self._sort_formats(formats) self._sort_formats(formats)
@ -72,41 +66,3 @@ class ABCOTVSIE(InfoExtractor):
'uploader': uploader, 'uploader': uploader,
'formats': formats, 'formats': formats,
} }
class ABCOTVSClipsIE(InfoExtractor):
IE_NAME = 'abcotvs:clips'
_VALID_URL = r'https?://clips\.abcotvs\.com/(?:[^/]+/)*video/(?P<id>\d+)'
_TEST = {
'url': 'https://clips.abcotvs.com/kabc/video/214814',
'info_dict': {
'id': '214814',
'ext': 'mp4',
'title': 'SpaceX launch pad explosion destroys rocket, satellite',
'description': 'md5:9f186e5ad8f490f65409965ee9c7be1b',
'upload_date': '20160901',
'timestamp': 1472756695,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json('https://clips.abcotvs.com/vogo/video/getByIds?ids=' + video_id, video_id)['results'][0]
title = video_data['title']
formats = self._extract_m3u8_formats(
video_data['videoURL'].split('?')[0], video_id, 'mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnailURL'),
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('pubDate')),
'formats': formats,
}

View File

@ -12,7 +12,7 @@ from ..compat import compat_urlparse
class AbcNewsVideoIE(AMPIE): class AbcNewsVideoIE(AMPIE):
IE_NAME = 'abcnews:video' IE_NAME = 'abcnews:video'
_VALID_URL = r'https?://abcnews\.go\.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)' _VALID_URL = 'http://abcnews.go.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://abcnews.go.com/ThisWeek/video/week-exclusive-irans-foreign-minister-zarif-20411932', 'url': 'http://abcnews.go.com/ThisWeek/video/week-exclusive-irans-foreign-minister-zarif-20411932',
@ -49,7 +49,7 @@ class AbcNewsVideoIE(AMPIE):
class AbcNewsIE(InfoExtractor): class AbcNewsIE(InfoExtractor):
IE_NAME = 'abcnews' IE_NAME = 'abcnews'
_VALID_URL = r'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)' _VALID_URL = 'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY', 'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',

File diff suppressed because it is too large Load Diff

View File

@ -156,10 +156,7 @@ class AdobeTVVideoIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) video_data = self._download_json(url + '?format=json', video_id)
video_data = self._parse_json(self._search_regex(
r'var\s+bridge\s*=\s*([^;]+);', webpage, 'bridged data'), video_id)
formats = [{ formats = [{
'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')), 'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')),

View File

@ -3,14 +3,16 @@ from __future__ import unicode_literals
import re import re
from .turner import TurnerBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
int_or_none, float_or_none,
xpath_text,
) )
class AdultSwimIE(TurnerBaseIE): class AdultSwimIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?adultswim\.com/videos/(?P<is_playlist>playlists/)?(?P<show_path>[^/]+)/(?P<episode_path>[^/?#]+)/?' _VALID_URL = r'https?://(?:www\.)?adultswim\.com/videos/(?P<is_playlist>playlists/)?(?P<show_path>[^/]+)/(?P<episode_path>[^/?#]+)/?'
_TESTS = [{ _TESTS = [{
@ -81,42 +83,6 @@ class AdultSwimIE(TurnerBaseIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
# heroMetadata.trailer
'url': 'http://www.adultswim.com/videos/decker/inside-decker-a-new-hero/',
'info_dict': {
'id': 'I0LQFQkaSUaFp8PnAWHhoQ',
'ext': 'mp4',
'title': 'Decker - Inside Decker: A New Hero',
'description': 'md5:c916df071d425d62d70c86d4399d3ee0',
'duration': 249.008,
},
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}, {
'url': 'http://www.adultswim.com/videos/toonami/friday-october-14th-2016/',
'info_dict': {
'id': 'eYiLsKVgQ6qTC6agD67Sig',
'title': 'Toonami - Friday, October 14th, 2016',
'description': 'md5:99892c96ffc85e159a428de85c30acde',
},
'playlist': [{
'md5': '',
'info_dict': {
'id': 'eYiLsKVgQ6qTC6agD67Sig',
'ext': 'mp4',
'title': 'Toonami - Friday, October 14th, 2016',
'description': 'md5:99892c96ffc85e159a428de85c30acde',
},
}],
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}] }]
@staticmethod @staticmethod
@ -167,58 +133,79 @@ class AdultSwimIE(TurnerBaseIE):
if video_info is None: if video_info is None:
if bootstrapped_data.get('slugged_video', {}).get('slug') == episode_path: if bootstrapped_data.get('slugged_video', {}).get('slug') == episode_path:
video_info = bootstrapped_data['slugged_video'] video_info = bootstrapped_data['slugged_video']
if not video_info: else:
video_info = bootstrapped_data.get( raise ExtractorError('Unable to find video info')
'heroMetadata', {}).get('trailer', {}).get('video')
if not video_info:
video_info = bootstrapped_data.get('onlineOriginals', [None])[0]
if not video_info:
raise ExtractorError('Unable to find video info')
show = bootstrapped_data['show'] show = bootstrapped_data['show']
show_title = show['title'] show_title = show['title']
stream = video_info.get('stream') stream = video_info.get('stream')
if stream and stream.get('videoPlaybackID'): clips = [stream] if stream else video_info.get('clips')
segment_ids = [stream['videoPlaybackID']] if not clips:
elif video_info.get('clips'): raise ExtractorError(
segment_ids = [clip['videoPlaybackID'] for clip in video_info['clips']] 'This video is only available via cable service provider subscription that'
elif video_info.get('videoPlaybackID'): ' is not currently supported. You may want to use --cookies.'
segment_ids = [video_info['videoPlaybackID']] if video_info.get('auth') is True else 'Unable to find stream or clips',
elif video_info.get('id'): expected=True)
segment_ids = [video_info['id']] segment_ids = [clip['videoPlaybackID'] for clip in clips]
else:
if video_info.get('auth') is True:
raise ExtractorError(
'This video is only available via cable service provider subscription that'
' is not currently supported. You may want to use --cookies.', expected=True)
else:
raise ExtractorError('Unable to find stream or clips')
episode_id = video_info['id'] episode_id = video_info['id']
episode_title = video_info['title'] episode_title = video_info['title']
episode_description = video_info.get('description') episode_description = video_info['description']
episode_duration = int_or_none(video_info.get('duration')) episode_duration = video_info.get('duration')
view_count = int_or_none(video_info.get('views'))
entries = [] entries = []
for part_num, segment_id in enumerate(segment_ids): for part_num, segment_id in enumerate(segment_ids):
segement_info = self._extract_cvp_info( segment_url = 'http://www.adultswim.com/videos/api/v0/assets?id=%s&platform=desktop' % segment_id
'http://www.adultswim.com/videos/api/v0/assets?id=%s&platform=desktop' % segment_id,
segment_id, {
'secure': {
'media_src': 'http://androidhls-secure.cdn.turner.com/adultswim/big',
'tokenizer_src': 'http://www.adultswim.com/astv/mvpd/processors/services/token_ipadAdobe.do',
},
})
segment_title = '%s - %s' % (show_title, episode_title) segment_title = '%s - %s' % (show_title, episode_title)
if len(segment_ids) > 1: if len(segment_ids) > 1:
segment_title += ' Part %d' % (part_num + 1) segment_title += ' Part %d' % (part_num + 1)
segement_info.update({
idoc = self._download_xml(
segment_url, segment_title,
'Downloading segment information', 'Unable to download segment information')
segment_duration = float_or_none(
xpath_text(idoc, './/trt', 'segment duration').strip())
formats = []
file_els = idoc.findall('.//files/file') or idoc.findall('./files/file')
unique_urls = []
unique_file_els = []
for file_el in file_els:
media_url = file_el.text
if not media_url or determine_ext(media_url) == 'f4m':
continue
if file_el.text not in unique_urls:
unique_urls.append(file_el.text)
unique_file_els.append(file_el)
for file_el in unique_file_els:
bitrate = file_el.attrib.get('bitrate')
ftype = file_el.attrib.get('type')
media_url = file_el.text
if determine_ext(media_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media_url, segment_title, 'mp4', preference=0,
m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': '%s_%s' % (bitrate, ftype),
'url': file_el.text.strip(),
# The bitrate may not be a number (for example: 'iphone')
'tbr': int(bitrate) if bitrate.isdigit() else None,
})
self._sort_formats(formats)
entries.append({
'id': segment_id, 'id': segment_id,
'title': segment_title, 'title': segment_title,
'description': episode_description, 'formats': formats,
'duration': segment_duration,
'description': episode_description
}) })
entries.append(segement_info)
return { return {
'_type': 'playlist', '_type': 'playlist',
@ -227,6 +214,5 @@ class AdultSwimIE(TurnerBaseIE):
'entries': entries, 'entries': entries,
'title': '%s - %s' % (show_title, episode_title), 'title': '%s - %s' % (show_title, episode_title),
'description': episode_description, 'description': episode_description,
'duration': episode_duration, 'duration': episode_duration
'view_count': view_count,
} }

View File

@ -2,140 +2,23 @@ from __future__ import unicode_literals
import re import re
from .theplatform import ThePlatformIE from .common import InfoExtractor
from ..utils import ( from ..utils import (
smuggle_url, smuggle_url,
update_url_query, update_url_query,
unescapeHTML, unescapeHTML,
extract_attributes,
get_element_by_attribute,
)
from ..compat import (
compat_urlparse,
) )
class AENetworksBaseIE(ThePlatformIE): class AENetworksIE(InfoExtractor):
_THEPLATFORM_KEY = 'crazyjava'
_THEPLATFORM_SECRET = 's3cr3t'
class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks' IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network' IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)' _VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?P<type>[^/]+)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
_TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
'info_dict': {
'id': '22253814',
'ext': 'mp4',
'title': 'Winter Is Coming',
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
'timestamp': 1338306241,
'upload_date': '20120529',
'uploader': 'AENE-NEW',
},
'add_ie': ['ThePlatform'],
}, {
'url': 'http://www.history.com/shows/ancient-aliens/season-1',
'info_dict': {
'id': '71889446852',
},
'playlist_mincount': 5,
}, {
'url': 'http://www.mylifetime.com/shows/atlanta-plastic',
'info_dict': {
'id': 'SERIES4317',
'title': 'Atlanta Plastic',
},
'playlist_mincount': 2,
}, {
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
'only_matching': True
}, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True
}, {
'url': 'http://www.mylifetime.com/shows/project-runway-junior/season-1/episode-6',
'only_matching': True
}, {
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
'only_matching': True
}]
_DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY',
'aetv.com': 'AETV',
'mylifetime.com': 'LIFETIME',
'fyi.tv': 'FYI',
}
def _real_extract(self, url):
domain, show_path, movie_display_id = re.match(self._VALID_URL, url).groups()
display_id = show_path or movie_display_id
webpage = self._download_webpage(url, display_id)
if show_path:
url_parts = show_path.split('/')
url_parts_len = len(url_parts)
if url_parts_len == 1:
entries = []
for season_url_path in re.findall(r'(?s)<li[^>]+data-href="(/shows/%s/season-\d+)"' % url_parts[0], webpage):
entries.append(self.url_result(
compat_urlparse.urljoin(url, season_url_path), 'AENetworks'))
return self.playlist_result(
entries, self._html_search_meta('aetn:SeriesId', webpage),
self._html_search_meta('aetn:SeriesTitle', webpage))
elif url_parts_len == 2:
entries = []
for episode_item in re.findall(r'(?s)<div[^>]+class="[^"]*episode-item[^"]*"[^>]*>', webpage):
episode_attributes = extract_attributes(episode_item)
episode_url = compat_urlparse.urljoin(
url, episode_attributes['data-canonical'])
entries.append(self.url_result(
episode_url, 'AENetworks',
episode_attributes['data-videoid']))
return self.playlist_result(
entries, self._html_search_meta('aetn:SeasonId', webpage))
query = {
'mbr': 'true',
'assetTypes': 'medium_video_s3'
}
video_id = self._html_search_meta('aetn:VideoID', webpage)
media_url = self._search_regex(
r"media_url\s*=\s*'([^']+)'", webpage, 'video url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
if theplatform_metadata.get('AETN$isBehindWall'):
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain]
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
query['auth'] = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._search_json_ld(webpage, video_id, fatal=False))
media_url = update_url_query(media_url, query)
media_url = self._sign_url(media_url, self._THEPLATFORM_KEY, self._THEPLATFORM_SECRET)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
self._sort_formats(formats)
info.update({
'id': video_id,
'formats': formats,
'subtitles': subtitles,
})
return info
class HistoryTopicIE(AENetworksBaseIE):
IE_NAME = 'history:topic'
IE_DESC = 'History.com Topic'
_VALID_URL = r'https?://(?:www\.)?history\.com/topics/(?:[^/]+/)?(?P<topic_id>[^/]+)(?:/[^/]+(?:/(?P<video_display_id>[^/?#]+))?)?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false', 'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
'info_dict': { 'info_dict': {
'id': '40700995724', 'id': 'g12m5Gyt3fdR',
'ext': 'mp4', 'ext': 'mp4',
'title': "Bet You Didn't Know: Valentine's Day", 'title': "Bet You Didn't Know: Valentine's Day",
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7', 'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
@ -148,61 +31,57 @@ class HistoryTopicIE(AENetworksBaseIE):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
'expected_warnings': ['JSON-LD'],
}, { }, {
'url': 'http://www.history.com/topics/world-war-i/world-war-i-history/videos', 'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'info_dict': 'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
{ 'info_dict': {
'id': 'world-war-i-history', 'id': 'eg47EERs_JsZ',
'title': 'World War I History', 'ext': 'mp4',
'title': 'Winter Is Coming',
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
'timestamp': 1338306241,
'upload_date': '20120529',
'uploader': 'AENE-NEW',
}, },
'playlist_mincount': 24, 'add_ie': ['ThePlatform'],
}, { }, {
'url': 'http://www.history.com/topics/world-war-i-history/videos', 'url': 'http://www.aetv.com/shows/duck-dynasty/video/inlawful-entry',
'only_matching': True, 'only_matching': True
}, { }, {
'url': 'http://www.history.com/topics/world-war-i/world-war-i-history', 'url': 'http://www.fyi.tv/shows/tiny-house-nation/videos/207-sq-ft-minnesota-prairie-cottage',
'only_matching': True, 'only_matching': True
}, { }, {
'url': 'http://www.history.com/topics/world-war-i/world-war-i-history/speeches', 'url': 'http://www.mylifetime.com/shows/project-runway-junior/video/season-1/episode-6/superstar-clients',
'only_matching': True, 'only_matching': True
}] }]
def theplatform_url_result(self, theplatform_url, video_id, query): def _real_extract(self, url):
return { page_type, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id)
video_url_re = [
r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
r"media_url\s*=\s*'([^']+)'"
]
video_url = unescapeHTML(self._search_regex(video_url_re, webpage, 'video url'))
query = {'mbr': 'true'}
if page_type == 'shows':
query['assetTypes'] = 'medium_video_s3'
if 'switch=hds' in video_url:
query['switch'] = 'hls'
info = self._search_json_ld(webpage, video_id, fatal=False)
info.update({
'_type': 'url_transparent', '_type': 'url_transparent',
'id': video_id,
'url': smuggle_url( 'url': smuggle_url(
update_url_query(theplatform_url, query), update_url_query(video_url, query),
{ {
'sig': { 'sig': {
'key': self._THEPLATFORM_KEY, 'key': 'crazyjava',
'secret': self._THEPLATFORM_SECRET, 'secret': 's3cr3t'},
},
'force_smil_url': True 'force_smil_url': True
}), }),
'ie_key': 'ThePlatform', })
} return info
def _real_extract(self, url):
topic_id, video_display_id = re.match(self._VALID_URL, url).groups()
if video_display_id:
webpage = self._download_webpage(url, video_display_id)
release_url, video_id = re.search(r"_videoPlayer.play\('([^']+)'\s*,\s*'[^']+'\s*,\s*'(\d+)'\)", webpage).groups()
release_url = unescapeHTML(release_url)
return self.theplatform_url_result(
release_url, video_id, {
'mbr': 'true',
'switch': 'hls'
})
else:
webpage = self._download_webpage(url, topic_id)
entries = []
for episode_item in re.findall(r'<a.+?data-release-url="[^"]+"[^>]*>', webpage):
video_attributes = extract_attributes(episode_item)
entries.append(self.theplatform_url_result(
video_attributes['data-release-url'], video_attributes['data-id'], {
'mbr': 'true',
'switch': 'hls'
}))
return self.playlist_result(entries, topic_id, get_element_by_attribute('class', 'show-title', webpage))

View File

@ -1,145 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
int_or_none,
update_url_query,
xpath_element,
xpath_text,
)
class AfreecaTVIE(InfoExtractor):
IE_DESC = 'afreecatv.com'
_VALID_URL = r'''(?x)
https?://
(?:
(?:(?:live|afbbs|www)\.)?afreeca(?:tv)?\.com(?::\d+)?
(?:
/app/(?:index|read_ucc_bbs)\.cgi|
/player/[Pp]layer\.(?:swf|html)
)\?.*?\bnTitleNo=|
vod\.afreecatv\.com/PLAYER/STATION/
)
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://live.afreecatv.com:8079/app/index.cgi?szType=read_ucc_bbs&szBjId=dailyapril&nStationNo=16711924&nBbsNo=18605867&nTitleNo=36164052&szSkin=',
'md5': 'f72c89fe7ecc14c1b5ce506c4996046e',
'info_dict': {
'id': '36164052',
'ext': 'mp4',
'title': '데일리 에이프릴 요정들의 시상식!',
'thumbnail': 're:^https?://(?:video|st)img.afreecatv.com/.*$',
'uploader': 'dailyapril',
'uploader_id': 'dailyapril',
'upload_date': '20160503',
}
}, {
'url': 'http://afbbs.afreecatv.com:8080/app/read_ucc_bbs.cgi?nStationNo=16711924&nTitleNo=36153164&szBjId=dailyapril&nBbsNo=18605867',
'info_dict': {
'id': '36153164',
'title': "BJ유트루와 함께하는 '팅커벨 메이크업!'",
'thumbnail': 're:^https?://(?:video|st)img.afreecatv.com/.*$',
'uploader': 'dailyapril',
'uploader_id': 'dailyapril',
},
'playlist_count': 2,
'playlist': [{
'md5': 'd8b7c174568da61d774ef0203159bf97',
'info_dict': {
'id': '36153164_1',
'ext': 'mp4',
'title': "BJ유트루와 함께하는 '팅커벨 메이크업!'",
'upload_date': '20160502',
},
}, {
'md5': '58f2ce7f6044e34439ab2d50612ab02b',
'info_dict': {
'id': '36153164_2',
'ext': 'mp4',
'title': "BJ유트루와 함께하는 '팅커벨 메이크업!'",
'upload_date': '20160502',
},
}],
}, {
'url': 'http://www.afreecatv.com/player/Player.swf?szType=szBjId=djleegoon&nStationNo=11273158&nBbsNo=13161095&nTitleNo=36327652',
'only_matching': True,
}, {
'url': 'http://vod.afreecatv.com/PLAYER/STATION/15055030',
'only_matching': True,
}]
@staticmethod
def parse_video_key(key):
video_key = {}
m = re.match(r'^(?P<upload_date>\d{8})_\w+_(?P<part>\d+)$', key)
if m:
video_key['upload_date'] = m.group('upload_date')
video_key['part'] = m.group('part')
return video_key
def _real_extract(self, url):
video_id = self._match_id(url)
parsed_url = compat_urllib_parse_urlparse(url)
info_url = compat_urlparse.urlunparse(parsed_url._replace(
netloc='afbbs.afreecatv.com:8080',
path='/api/video/get_video_info.php'))
video_xml = self._download_xml(
update_url_query(info_url, {'nTitleNo': video_id}), video_id)
if xpath_element(video_xml, './track/video/file') is None:
raise ExtractorError('Specified AfreecaTV video does not exist',
expected=True)
title = xpath_text(video_xml, './track/title', 'title')
uploader = xpath_text(video_xml, './track/nickname', 'uploader')
uploader_id = xpath_text(video_xml, './track/bj_id', 'uploader id')
duration = int_or_none(xpath_text(video_xml, './track/duration',
'duration'))
thumbnail = xpath_text(video_xml, './track/titleImage', 'thumbnail')
entries = []
for i, video_file in enumerate(video_xml.findall('./track/video/file')):
video_key = self.parse_video_key(video_file.get('key', ''))
if not video_key:
continue
entries.append({
'id': '%s_%s' % (video_id, video_key.get('part', i + 1)),
'title': title,
'upload_date': video_key.get('upload_date'),
'duration': int_or_none(video_file.get('duration')),
'url': video_file.text,
})
info = {
'id': video_id,
'title': title,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'thumbnail': thumbnail,
}
if len(entries) > 1:
info['_type'] = 'multi_video'
info['entries'] = entries
elif len(entries) == 1:
info['url'] = entries[0]['url']
info['upload_date'] = entries[0].get('upload_date')
else:
raise ExtractorError(
'No files found for the specified AfreecaTV video, either'
' the URL is incorrect or the video has been made private.',
expected=True)
return info

View File

@ -0,0 +1,64 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class AftonbladetIE(InfoExtractor):
_VALID_URL = r'https?://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'info_dict': {
'id': '36015',
'ext': 'mp4',
'title': 'Vulkanutbrott i rymden - nu släpper NASA bilderna',
'description': 'Jupiters måne mest aktiv av alla himlakroppar',
'timestamp': 1394142732,
'upload_date': '20140306',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
# find internal video meta data
meta_url = 'http://aftonbladet-play.drlib.aptoma.no/video/%s.json'
player_config = self._parse_json(self._html_search_regex(
r'data-player-config="([^"]+)"', webpage, 'player config'), video_id)
internal_meta_id = player_config['videoId']
internal_meta_url = meta_url % internal_meta_id
internal_meta_json = self._download_json(
internal_meta_url, video_id, 'Downloading video meta data')
# find internal video formats
format_url = 'http://aftonbladet-play.videodata.drvideo.aptoma.no/actions/video/?id=%s'
internal_video_id = internal_meta_json['videoId']
internal_formats_url = format_url % internal_video_id
internal_formats_json = self._download_json(
internal_formats_url, video_id, 'Downloading video formats')
formats = []
for fmt in internal_formats_json['formats']['http']['pseudostreaming']['mp4']:
p = fmt['paths'][0]
formats.append({
'url': 'http://%s:%d/%s/%s' % (p['address'], p['port'], p['path'], p['filename']),
'ext': 'mp4',
'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')),
'tbr': int_or_none(fmt.get('bitrate')),
'protocol': 'http',
})
self._sort_formats(formats)
return {
'id': video_id,
'title': internal_meta_json['title'],
'formats': formats,
'thumbnail': internal_meta_json.get('imageUrl'),
'description': internal_meta_json.get('shortPreamble'),
'timestamp': int_or_none(internal_meta_json.get('timePublished')),
'duration': int_or_none(internal_meta_json.get('duration')),
'view_count': int_or_none(internal_meta_json.get('views')),
}

View File

@ -4,7 +4,7 @@ from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor): class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html' _VALID_URL = r'https?://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_TEST = { _TEST = {
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html', 'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',

View File

@ -1,26 +1,29 @@
# coding: utf-8 # -*- coding: utf-8 -*-
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
remove_end,
qualities, qualities,
url_basename, unescapeHTML,
xpath_element,
) )
class AllocineIE(InfoExtractor): class AllocineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?:article|video|film)/(?:fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?' _VALID_URL = r'https?://(?:www\.)?allocine\.fr/(?P<typ>article|video|film)/(fichearticle_gen_carticle=|player_gen_cmedia=|fichefilm_gen_cfilm=|video-)(?P<id>[0-9]+)(?:\.html)?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.allocine.fr/article/fichearticle_gen_carticle=18635087.html', 'url': 'http://www.allocine.fr/article/fichearticle_gen_carticle=18635087.html',
'md5': '0c9fcf59a841f65635fa300ac43d8269', 'md5': '0c9fcf59a841f65635fa300ac43d8269',
'info_dict': { 'info_dict': {
'id': '19546517', 'id': '19546517',
'display_id': '18635087',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Astérix - Le Domaine des Dieux Teaser VF', 'title': 'Astérix - Le Domaine des Dieux Teaser VF',
'description': 'md5:4a754271d9c6f16c72629a8a993ee884', 'description': 'md5:abcd09ce503c6560512c14ebfdb720d2',
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
}, { }, {
@ -28,82 +31,64 @@ class AllocineIE(InfoExtractor):
'md5': 'd0cdce5d2b9522ce279fdfec07ff16e0', 'md5': 'd0cdce5d2b9522ce279fdfec07ff16e0',
'info_dict': { 'info_dict': {
'id': '19540403', 'id': '19540403',
'display_id': '19540403',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Planes 2 Bande-annonce VF', 'title': 'Planes 2 Bande-annonce VF',
'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway', 'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway',
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
}, { }, {
'url': 'http://www.allocine.fr/video/player_gen_cmedia=19544709&cfilm=181290.html', 'url': 'http://www.allocine.fr/film/fichefilm_gen_cfilm=181290.html',
'md5': '101250fb127ef9ca3d73186ff22a47ce', 'md5': '101250fb127ef9ca3d73186ff22a47ce',
'info_dict': { 'info_dict': {
'id': '19544709', 'id': '19544709',
'display_id': '19544709',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Dragons 2 - Bande annonce finale VF', 'title': 'Dragons 2 - Bande annonce finale VF',
'description': 'md5:6cdd2d7c2687d4c6aafe80a35e17267a', 'description': 'md5:601d15393ac40f249648ef000720e7e3',
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
}, { }, {
'url': 'http://www.allocine.fr/video/video-19550147/', 'url': 'http://www.allocine.fr/video/video-19550147/',
'md5': '3566c0668c0235e2d224fd8edb389f67', 'only_matching': True,
'info_dict': {
'id': '19550147',
'ext': 'mp4',
'title': 'Faux Raccord N°123 - Les gaffes de Cliffhanger',
'description': 'md5:bc734b83ffa2d8a12188d9eb48bb6354',
'thumbnail': 're:http://.*\.jpg',
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
typ = mobj.group('typ')
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
formats = [] if typ == 'film':
video_id = self._search_regex(r'href="/video/player_gen_cmedia=([0-9]+).+"', webpage, 'video id')
else:
player = self._search_regex(r'data-player=\'([^\']+)\'>', webpage, 'data player', default=None)
if player:
player_data = json.loads(player)
video_id = compat_str(player_data['refMedia'])
else:
model = self._search_regex(r'data-model="([^"]+)">', webpage, 'data model')
model_data = self._parse_json(unescapeHTML(model), display_id)
video_id = compat_str(model_data['id'])
xml = self._download_xml('http://www.allocine.fr/ws/AcVisiondataV4.ashx?media=%s' % video_id, display_id)
video = xpath_element(xml, './/AcVisionVideo').attrib
quality = qualities(['ld', 'md', 'hd']) quality = qualities(['ld', 'md', 'hd'])
model = self._html_search_regex( formats = []
r'data-model="([^"]+)"', webpage, 'data model', default=None) for k, v in video.items():
if model: if re.match(r'.+_path', k):
model_data = self._parse_json(model, display_id) format_id = k.split('_')[0]
for video_url in model_data['sources'].values():
video_id, format_id = url_basename(video_url).split('_')[:2]
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'quality': quality(format_id), 'quality': quality(format_id),
'url': video_url, 'url': v,
}) })
title = model_data['title']
else:
video_id = display_id
media_data = self._download_json(
'http://www.allocine.fr/ws/AcVisiondataV5.ashx?media=%s' % video_id, display_id)
for key, value in media_data['video'].items():
if not key.endswith('Path'):
continue
format_id = key[:-len('Path')]
formats.append({
'format_id': format_id,
'quality': quality(format_id),
'url': value,
})
title = remove_end(self._html_search_regex(
r'(?s)<title>(.+?)</title>', webpage, 'title'
).strip(), ' - AlloCiné')
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'title': video['videoTitle'],
'title': title,
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats, 'formats': formats,
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),

View File

@ -1,92 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .theplatform import ThePlatformIE
from ..utils import (
update_url_query,
parse_age_limit,
int_or_none,
)
class AMCNetworksIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies/|shows/[^/]+/(?:full-episodes/)?season-\d+/episode-\d+(?:-(?:[^/]+/)?|/))(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
'md5': '',
'info_dict': {
'id': 's3MX01Nl4vPH',
'ext': 'mp4',
'title': 'Maron - Season 4 - Step 1',
'description': 'In denial about his current situation, Marc is reluctantly convinced by his friends to enter rehab. Starring Marc Maron and Constance Zimmer.',
'age_limit': 17,
'upload_date': '20160505',
'timestamp': 1462468831,
'uploader': 'AMCN',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Requires TV provider accounts',
}, {
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
'only_matching': True,
}, {
'url': 'http://www.amc.com/shows/preacher/full-episodes/season-01/episode-00/pilot',
'only_matching': True,
}, {
'url': 'http://www.wetv.com/shows/million-dollar-matchmaker/season-01/episode-06-the-dumped-dj-and-shallow-hal',
'only_matching': True,
}, {
'url': 'http://www.ifc.com/movies/chaos',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
query = {
'mbr': 'true',
'manifest': 'm3u',
}
media_url = self._search_regex(r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)', webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), display_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid']
title = theplatform_metadata['title']
rating = theplatform_metadata['ratings'][0]['rating']
auth_required = self._search_regex(r'window\.authRequired\s*=\s*(true|false);', webpage, 'auth required')
if auth_required == 'true':
requestor_id = self._search_regex(r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)', webpage, 'requestor id')
resource = self._get_mvpd_resource(requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, requestor_id, resource)
media_url = update_url_query(media_url, query)
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
self._sort_formats(formats)
info.update({
'id': video_id,
'subtitles': subtitles,
'formats': formats,
'age_limit': parse_age_limit(parse_age_limit(rating)),
})
ns_keys = theplatform_metadata.get('$xmlns', {}).keys()
if ns_keys:
ns = list(ns_keys)[0]
series = theplatform_metadata.get(ns + '$show')
season_number = int_or_none(theplatform_metadata.get(ns + '$season'))
episode = theplatform_metadata.get(ns + '$episodeTitle')
episode_number = int_or_none(theplatform_metadata.get(ns + '$episode'))
if season_number:
title = 'Season %d - %s' % (season_number, title)
if series:
title = '%s - %s' % (series, title)
info.update({
'title': title,
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
})
return info

View File

@ -5,8 +5,6 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
mimetype2ext,
determine_ext,
) )
@ -52,25 +50,21 @@ class AMPIE(InfoExtractor):
if isinstance(media_content, dict): if isinstance(media_content, dict):
media_content = [media_content] media_content = [media_content]
for media_data in media_content: for media_data in media_content:
media = media_data.get('@attributes', {}) media = media_data['@attributes']
media_url = media.get('url') media_type = media['type']
if not media_url: if media_type in ('video/f4m', 'application/f4m+xml'):
continue
ext = mimetype2ext(media.get('type')) or determine_ext(media_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
media_url + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124', media['url'] + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
video_id, f4m_id='hds', fatal=False)) video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8': elif media_type == 'application/x-mpegURL':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', m3u8_id='hls', fatal=False)) media['url'], video_id, 'mp4', m3u8_id='hls', fatal=False))
else: else:
formats.append({ formats.append({
'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'), 'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'),
'url': media['url'], 'url': media['url'],
'tbr': int_or_none(media.get('bitrate')), 'tbr': int_or_none(media.get('bitrate')),
'filesize': int_or_none(media.get('fileSize')), 'filesize': int_or_none(media.get('fileSize')),
'ext': ext,
}) })
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -22,7 +22,6 @@ class AnimeOnDemandIE(InfoExtractor):
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply' _APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
_NETRC_MACHINE = 'animeondemand' _NETRC_MACHINE = 'animeondemand'
_TESTS = [{ _TESTS = [{
# jap, OmU
'url': 'https://www.anime-on-demand.de/anime/161', 'url': 'https://www.anime-on-demand.de/anime/161',
'info_dict': { 'info_dict': {
'id': '161', 'id': '161',
@ -31,21 +30,17 @@ class AnimeOnDemandIE(InfoExtractor):
}, },
'playlist_mincount': 4, 'playlist_mincount': 4,
}, { }, {
# Film wording is used instead of Episode, ger/jap, Dub/OmU # Film wording is used instead of Episode
'url': 'https://www.anime-on-demand.de/anime/39', 'url': 'https://www.anime-on-demand.de/anime/39',
'only_matching': True, 'only_matching': True,
}, { }, {
# Episodes without titles, jap, OmU # Episodes without titles
'url': 'https://www.anime-on-demand.de/anime/162', 'url': 'https://www.anime-on-demand.de/anime/162',
'only_matching': True, 'only_matching': True,
}, { }, {
# ger/jap, Dub/OmU, account required # ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/169', 'url': 'https://www.anime-on-demand.de/anime/169',
'only_matching': True, 'only_matching': True,
}, {
# Full length film, non-series, ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/185',
'only_matching': True,
}] }]
def _login(self): def _login(self):
@ -115,12 +110,35 @@ class AnimeOnDemandIE(InfoExtractor):
entries = [] entries = []
def extract_info(html, video_id, num=None): for num, episode_html in enumerate(re.findall(
title, description = [None] * 2 r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage), 1):
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
formats = [] formats = []
for input_ in re.findall( for input_ in re.findall(
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', html): r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', episode_html):
attributes = extract_attributes(input_) attributes = extract_attributes(input_)
playlist_urls = [] playlist_urls = []
for playlist_key in ('data-playlist', 'data-otherplaylist'): for playlist_key in ('data-playlist', 'data-otherplaylist'):
@ -143,7 +161,7 @@ class AnimeOnDemandIE(InfoExtractor):
format_id_list.append(lang) format_id_list.append(lang)
if kind: if kind:
format_id_list.append(kind) format_id_list.append(kind)
if not format_id_list and num is not None: if not format_id_list:
format_id_list.append(compat_str(num)) format_id_list.append(compat_str(num))
format_id = '-'.join(format_id_list) format_id = '-'.join(format_id_list)
format_note = ', '.join(filter(None, (kind, lang_note))) format_note = ', '.join(filter(None, (kind, lang_note)))
@ -197,74 +215,28 @@ class AnimeOnDemandIE(InfoExtractor):
}) })
formats.extend(file_formats) formats.extend(file_formats)
return { if formats:
'title': title, self._sort_formats(formats)
'description': description,
'formats': formats,
}
def extract_entries(html, video_id, common_info, num=None):
info = extract_info(html, video_id, num)
if info['formats']:
self._sort_formats(info['formats'])
f = common_info.copy() f = common_info.copy()
f.update(info) f.update({
'title': title,
'description': description,
'formats': formats,
})
entries.append(f) entries.append(f)
# Extract teaser/trailer only when full episode is not available # Extract teaser only when full episode is not available
if not info['formats']: if not formats:
m = re.search( m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>(?P<kind>Teaser|Trailer)<', r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
html) episode_html)
if m: if m:
f = common_info.copy() f = common_info.copy()
f.update({ f.update({
'id': '%s-%s' % (f['id'], m.group('kind').lower()), 'id': '%s-teaser' % f['id'],
'title': m.group('title'), 'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')), 'url': compat_urlparse.urljoin(url, m.group('href')),
}) })
entries.append(f) entries.append(f)
def extract_episodes(html):
for num, episode_html in enumerate(re.findall(
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', html), 1):
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
'id': video_id,
'series': anime_title,
'episode': episode_title,
'episode_number': episode_number,
}
extract_entries(episode_html, video_id, common_info)
def extract_film(html, video_id):
common_info = {
'id': anime_id,
'title': anime_title,
'description': anime_description,
}
extract_entries(html, video_id, common_info)
extract_episodes(webpage)
if not entries:
extract_film(webpage, anime_id)
return self.playlist_result(entries, anime_id, anime_title, anime_description) return self.playlist_result(entries, anime_id, anime_title, anime_description)

View File

@ -157,16 +157,22 @@ class AnvatoIE(InfoExtractor):
video_data_url, video_id, transform_source=strip_jsonp, video_data_url, video_id, transform_source=strip_jsonp,
data=json.dumps(payload).encode('utf-8')) data=json.dumps(payload).encode('utf-8'))
def _get_anvato_videos(self, access_key, video_id): def _extract_anvato_videos(self, webpage, video_id):
anvplayer_data = self._parse_json(self._html_search_regex(
r'<script[^>]+data-anvp=\'([^\']+)\'', webpage,
'Anvato player data'), video_id)
video_id = anvplayer_data['video']
access_key = anvplayer_data['accessKey']
video_data = self._get_video_json(access_key, video_id) video_data = self._get_video_json(access_key, video_id)
formats = [] formats = []
for published_url in video_data['published_urls']: for published_url in video_data['published_urls']:
video_url = published_url['embed_url'] video_url = published_url['embed_url']
media_format = published_url.get('format')
ext = determine_ext(video_url) ext = determine_ext(video_url)
if ext == 'smil' or media_format == 'smil': if ext == 'smil':
formats.extend(self._extract_smil_formats(video_url, video_id)) formats.extend(self._extract_smil_formats(video_url, video_id))
continue continue
@ -177,7 +183,7 @@ class AnvatoIE(InfoExtractor):
'tbr': tbr if tbr != 0 else None, 'tbr': tbr if tbr != 0 else None,
} }
if ext == 'm3u8' or media_format in ('m3u8', 'm3u8-variant'): if ext == 'm3u8':
# Not using _extract_m3u8_formats here as individual media # Not using _extract_m3u8_formats here as individual media
# playlists are also included in published_urls. # playlists are also included in published_urls.
if tbr is None: if tbr is None:
@ -188,7 +194,7 @@ class AnvatoIE(InfoExtractor):
'format_id': '-'.join(filter(None, ['hls', compat_str(tbr)])), 'format_id': '-'.join(filter(None, ['hls', compat_str(tbr)])),
'ext': 'mp4', 'ext': 'mp4',
}) })
elif ext == 'mp3' or media_format == 'mp3': elif ext == 'mp3':
a_format['vcodec'] = 'none' a_format['vcodec'] = 'none'
else: else:
a_format.update({ a_format.update({
@ -212,19 +218,7 @@ class AnvatoIE(InfoExtractor):
'formats': formats, 'formats': formats,
'title': video_data.get('def_title'), 'title': video_data.get('def_title'),
'description': video_data.get('def_description'), 'description': video_data.get('def_description'),
'tags': video_data.get('def_tags', '').split(','),
'categories': video_data.get('categories'), 'categories': video_data.get('categories'),
'thumbnail': video_data.get('thumbnail'), 'thumbnail': video_data.get('thumbnail'),
'timestamp': int_or_none(video_data.get(
'ts_published') or video_data.get('ts_added')),
'uploader': video_data.get('mcp_id'),
'duration': int_or_none(video_data.get('duration')),
'subtitles': subtitles, 'subtitles': subtitles,
} }
def _extract_anvato_videos(self, webpage, video_id):
anvplayer_data = self._parse_json(self._html_search_regex(
r'<script[^>]+data-anvp=\'([^\']+)\'', webpage,
'Anvato player data'), video_id)
return self._get_anvato_videos(
anvplayer_data['accessKey'], anvplayer_data['video'])

View File

@ -123,10 +123,6 @@ class AolFeaturesIE(InfoExtractor):
'title': 'What To Watch - February 17, 2016', 'title': 'What To Watch - February 17, 2016',
}, },
'add_ie': ['FiveMin'], 'add_ie': ['FiveMin'],
'params': {
# encrypted m3u8 download
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,6 +1,8 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -13,7 +15,7 @@ class AparatIE(InfoExtractor):
_TEST = { _TEST = {
'url': 'http://www.aparat.com/v/wP8On', 'url': 'http://www.aparat.com/v/wP8On',
'md5': '131aca2e14fe7c4dcb3c4877ba300c89', 'md5': '6714e0af7e0d875c5a39c4dc4ab46ad1',
'info_dict': { 'info_dict': {
'id': 'wP8On', 'id': 'wP8On',
'ext': 'mp4', 'ext': 'mp4',
@ -29,13 +31,13 @@ class AparatIE(InfoExtractor):
# Note: There is an easier-to-parse configuration at # Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id # http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work # but the URL in there does not work
embed_url = 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id embed_url = ('http://www.aparat.com/video/video/embed/videohash/' +
video_id + '/vt/frame')
webpage = self._download_webpage(embed_url, video_id) webpage = self._download_webpage(embed_url, video_id)
file_list = self._parse_json(self._search_regex( video_urls = [video_url.replace('\\/', '/') for video_url in re.findall(
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage, 'file list'), video_id) r'(?:fileList\[[0-9]+\]\s*=|"file"\s*:)\s*"([^"]+)"', webpage)]
for i, item in enumerate(file_list[0]): for i, video_url in enumerate(video_urls):
video_url = item['file']
req = HEADRequest(video_url) req = HEADRequest(video_url)
res = self._request_webpage( res = self._request_webpage(
req, video_id, note='Testing video URL %d' % i, errnote=False) req, video_id, note='Testing video URL %d' % i, errnote=False)

View File

@ -7,8 +7,6 @@ from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_duration,
unified_strdate,
) )
@ -18,8 +16,7 @@ class AppleTrailersIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://trailers.apple.com/trailers/wb/manofsteel/', 'url': 'http://trailers.apple.com/trailers/wb/manofsteel/',
'info_dict': { 'info_dict': {
'id': '5111', 'id': 'manofsteel',
'title': 'Man of Steel',
}, },
'playlist': [ 'playlist': [
{ {
@ -73,15 +70,6 @@ class AppleTrailersIE(InfoExtractor):
'id': 'blackthorn', 'id': 'blackthorn',
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
'expected_warnings': ['Unable to download JSON metadata'],
}, {
# json data only available from http://trailers.apple.com/trailers/feeds/data/15881.json
'url': 'http://trailers.apple.com/trailers/fox/kungfupanda3/',
'info_dict': {
'id': '15881',
'title': 'Kung Fu Panda 3',
},
'playlist_mincount': 4,
}, { }, {
'url': 'http://trailers.apple.com/ca/metropole/autrui/', 'url': 'http://trailers.apple.com/ca/metropole/autrui/',
'only_matching': True, 'only_matching': True,
@ -97,45 +85,6 @@ class AppleTrailersIE(InfoExtractor):
movie = mobj.group('movie') movie = mobj.group('movie')
uploader_id = mobj.group('company') uploader_id = mobj.group('company')
webpage = self._download_webpage(url, movie)
film_id = self._search_regex(r"FilmId\s*=\s*'(\d+)'", webpage, 'film id')
film_data = self._download_json(
'http://trailers.apple.com/trailers/feeds/data/%s.json' % film_id,
film_id, fatal=False)
if film_data:
entries = []
for clip in film_data.get('clips', []):
clip_title = clip['title']
formats = []
for version, version_data in clip.get('versions', {}).items():
for size, size_data in version_data.get('sizes', {}).items():
src = size_data.get('src')
if not src:
continue
formats.append({
'format_id': '%s-%s' % (version, size),
'url': re.sub(r'_(\d+p.mov)', r'_h\1', src),
'width': int_or_none(size_data.get('width')),
'height': int_or_none(size_data.get('height')),
'language': version[:2],
})
self._sort_formats(formats)
entries.append({
'id': movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', clip_title).lower(),
'formats': formats,
'title': clip_title,
'thumbnail': clip.get('screen') or clip.get('thumb'),
'duration': parse_duration(clip.get('runtime') or clip.get('faded')),
'upload_date': unified_strdate(clip.get('posted')),
'uploader_id': uploader_id,
})
page_data = film_data.get('page', {})
return self.playlist_result(entries, film_id, page_data.get('movie_title'))
playlist_url = compat_urlparse.urljoin(url, 'includes/playlists/itunes.inc') playlist_url = compat_urlparse.urljoin(url, 'includes/playlists/itunes.inc')
def fix_html(s): def fix_html(s):

View File

@ -1,65 +1,67 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE from .common import InfoExtractor
from ..utils import ( from ..utils import unified_strdate
unified_strdate,
clean_html,
)
class ArchiveOrgIE(JWPlatformBaseIE): class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org' IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos' IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$' _VALID_URL = r'https?://(?:www\.)?archive\.org/details/(?P<id>[^?/]+)(?:[?].*)?$'
_TESTS = [{ _TESTS = [{
'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect', 'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'md5': '8af1d4cf447933ed3c7f4871162602db', 'md5': '8af1d4cf447933ed3c7f4871162602db',
'info_dict': { 'info_dict': {
'id': 'XD300-23_68HighlightsAResearchCntAugHumanIntellect', 'id': 'XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'ext': 'ogg', 'ext': 'ogv',
'title': '1968 Demo - FJCC Conference Presentation Reel #1', 'title': '1968 Demo - FJCC Conference Presentation Reel #1',
'description': 'md5:da45c349df039f1cc8075268eb1b5c25', 'description': 'md5:1780b464abaca9991d8968c877bb53ed',
'upload_date': '19681210', 'upload_date': '19681210',
'uploader': 'SRI International' 'uploader': 'SRI International'
} }
}, { }, {
'url': 'https://archive.org/details/Cops1922', 'url': 'https://archive.org/details/Cops1922',
'md5': 'bc73c8ab3838b5a8fc6c6651fa7b58ba', 'md5': '18f2a19e6d89af8425671da1cf3d4e04',
'info_dict': { 'info_dict': {
'id': 'Cops1922', 'id': 'Cops1922',
'ext': 'mp4', 'ext': 'ogv',
'title': 'Buster Keaton\'s "Cops" (1922)', 'title': 'Buster Keaton\'s "Cops" (1922)',
'description': 'md5:b4544662605877edd99df22f9620d858', 'description': 'md5:70f72ee70882f713d4578725461ffcc3',
} }
}, {
'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(
'http://archive.org/embed/' + video_id, video_id)
jwplayer_playlist = self._parse_json(self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\);",
webpage, 'jwplayer playlist'), video_id)
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)
def get_optional(metadata, field): json_url = url + ('&' if '?' in url else '?') + 'output=json'
return metadata.get(field, [None])[0] data = self._download_json(json_url, video_id)
metadata = self._download_json( def get_optional(data_dict, field):
'http://archive.org/details/' + video_id, video_id, query={ return data_dict['metadata'].get(field, [None])[0]
'output': 'json',
})['metadata'] title = get_optional(data, 'title')
info.update({ description = get_optional(data, 'description')
'title': get_optional(metadata, 'title') or info.get('title'), uploader = get_optional(data, 'creator')
'description': clean_html(get_optional(metadata, 'description')), upload_date = unified_strdate(get_optional(data, 'date'))
})
if info.get('_type') != 'playlist': formats = [
info.update({ {
'uploader': get_optional(metadata, 'creator'), 'format': fdata['format'],
'upload_date': unified_strdate(get_optional(metadata, 'date')), 'url': 'http://' + data['server'] + data['dir'] + fn,
}) 'file_size': int(fdata['size']),
return info }
for fn, fdata in data['files'].items()
if 'Video' in fdata['format']]
self._sort_formats(formats)
return {
'_type': 'video',
'id': video_id,
'title': title,
'formats': formats,
'description': description,
'uploader': uploader,
'upload_date': upload_date,
'thumbnail': data.get('misc', {}).get('image'),
}

View File

@ -8,19 +8,19 @@ from .generic import GenericIE
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
get_element_by_attribute,
qualities, qualities,
int_or_none, int_or_none,
parse_duration, parse_duration,
unified_strdate, unified_strdate,
xpath_text, xpath_text,
update_url_query,
) )
from ..compat import compat_etree_fromstring from ..compat import compat_etree_fromstring
class ARDMediathekIE(InfoExtractor): class ARDMediathekIE(InfoExtractor):
IE_NAME = 'ARD:mediathek' IE_NAME = 'ARD:mediathek'
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?' _VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114', 'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114',
@ -35,7 +35,6 @@ class ARDMediathekIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916', 'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e', 'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
@ -46,7 +45,6 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:196392e79876d0ac94c94e8cdb2875f1', 'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
'duration': 5252, 'duration': 5252,
}, },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
# audio # audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086', 'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
@ -58,22 +56,9 @@ class ARDMediathekIE(InfoExtractor):
'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef', 'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
'duration': 3240, 'duration': 3240,
}, },
'skip': 'HTTP Error 404: Not Found',
}, { }, {
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht', 'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
'only_matching': True, 'only_matching': True,
}, {
# audio
'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
'md5': '4e8f00631aac0395fee17368ac0e9867',
'info_dict': {
'id': '30796318',
'ext': 'mp3',
'title': 'Vor dem Fest',
'description': 'md5:c0c1c8048514deaed2a73b3a60eecacb',
'duration': 3287,
},
'skip': 'Video is no longer available',
}] }]
def _extract_media_info(self, media_info_url, webpage, video_id): def _extract_media_info(self, media_info_url, webpage, video_id):
@ -129,14 +114,11 @@ class ARDMediathekIE(InfoExtractor):
continue continue
if ext == 'f4m': if ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
update_url_query(stream_url, { stream_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124',
'hdcore': '3.1.1', video_id, preference=-1, f4m_id='hds', fatal=False))
'plugin': 'aasp-3.1.1.69.124'
}),
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False)) stream_url, video_id, 'mp4', preference=1, m3u8_id='hls', fatal=False))
else: else:
if server and server.startswith('rtmp'): if server and server.startswith('rtmp'):
f = { f = {
@ -174,15 +156,11 @@ class ARDMediathekIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
ERRORS = ( if '>Der gewünschte Beitrag ist nicht mehr verfügbar.<' in webpage:
('>Leider liegt eine Störung vor.', 'Video %s is unavailable'), raise ExtractorError('Video %s is no longer available' % video_id, expected=True)
('>Der gewünschte Beitrag ist nicht mehr verfügbar.<',
'Video %s is no longer available'),
)
for pattern, message in ERRORS: if 'Diese Sendung ist für Jugendliche unter 12 Jahren nicht geeignet. Der Clip ist deshalb nur von 20 bis 6 Uhr verfügbar.' in webpage:
if pattern in webpage: raise ExtractorError('This program is only suitable for those aged 12 and older. Video %s is therefore only available between 20 pm and 6 am.' % video_id, expected=True)
raise ExtractorError(message % video_id, expected=True)
if re.search(r'[\?&]rss($|[=&])', url): if re.search(r'[\?&]rss($|[=&])', url):
doc = compat_etree_fromstring(webpage.encode('utf-8')) doc = compat_etree_fromstring(webpage.encode('utf-8'))
@ -242,7 +220,7 @@ class ARDMediathekIE(InfoExtractor):
class ARDIE(InfoExtractor): class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html' _VALID_URL = '(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TEST = { _TEST = {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html', 'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'md5': 'd216c3a86493f9322545e045ddc3eb35', 'md5': 'd216c3a86493f9322545e045ddc3eb35',
@ -254,8 +232,7 @@ class ARDIE(InfoExtractor):
'title': 'Die Story im Ersten: Mission unter falscher Flagge', 'title': 'Die Story im Ersten: Mission unter falscher Flagge',
'upload_date': '20140804', 'upload_date': '20140804',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
}, }
'skip': 'HTTP Error 404: Not Found',
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -297,3 +274,41 @@ class ARDIE(InfoExtractor):
'upload_date': upload_date, 'upload_date': upload_date,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
} }
class SportschauIE(ARDMediathekIE):
IE_NAME = 'Sportschau'
_VALID_URL = r'(?P<baseurl>https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video(?P<id>[^/#?]+))\.html'
_TESTS = [{
'url': 'http://www.sportschau.de/tourdefrance/videoseppeltkokainhatnichtsmitklassischemdopingzutun100.html',
'info_dict': {
'id': 'seppeltkokainhatnichtsmitklassischemdopingzutun100',
'ext': 'mp4',
'title': 'Seppelt: "Kokain hat nichts mit klassischem Doping zu tun"',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'Der ARD-Doping Experte Hajo Seppelt gibt seine Einschätzung zum ersten Dopingfall der diesjährigen Tour de France um den Italiener Luca Paolini ab.',
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
base_url = mobj.group('baseurl')
webpage = self._download_webpage(url, video_id)
title = get_element_by_attribute('class', 'headline', webpage)
description = self._html_search_meta('description', webpage, 'description')
info = self._extract_media_info(
base_url + '-mc_defaultQuality-h.json', webpage, video_id)
info.update({
'title': title,
'description': description,
})
return info

View File

@ -1,115 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
mimetype2ext,
parse_iso8601,
strip_jsonp,
)
class ArkenaIE(InfoExtractor):
_VALID_URL = r'https?://play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+)'
_TESTS = [{
'url': 'https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411',
'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
'info_dict': {
'id': 'b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe',
'ext': 'mp4',
'title': 'Big Buck Bunny',
'description': 'Royalty free test video',
'timestamp': 1432816365,
'upload_date': '20150528',
'is_live': False,
},
}, {
'url': 'https://play.arkena.com/config/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411/?callbackMethod=jQuery1111023664739129262213_1469227693893',
'only_matching': True,
}, {
'url': 'http://play.arkena.com/config/avp/v1/player/media/327336/darkmatter/131064/?callbackMethod=jQuery1111002221189684892677_1469227595972',
'only_matching': True,
}, {
'url': 'http://play.arkena.com/embed/avp/v1/player/media/327336/darkmatter/131064/',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
# See https://support.arkena.com/display/PLAY/Ways+to+embed+your+video
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//play\.arkena\.com/embed/avp/.+?)\1',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
account_id = mobj.group('account_id')
playlist = self._download_json(
'https://play.arkena.com/config/avp/v2/player/media/%s/0/%s/?callbackMethod=_'
% (video_id, account_id),
video_id, transform_source=strip_jsonp)['Playlist'][0]
media_info = playlist['MediaInfo']
title = media_info['Title']
media_files = playlist['MediaFiles']
is_live = False
formats = []
for kind_case, kind_formats in media_files.items():
kind = kind_case.lower()
for f in kind_formats:
f_url = f.get('Url')
if not f_url:
continue
is_live = f.get('Live') == 'true'
exts = (mimetype2ext(f.get('Type')), determine_ext(f_url, None))
if kind == 'm3u8' or 'm3u8' in exts:
formats.extend(self._extract_m3u8_formats(
f_url, video_id, 'mp4',
entry_protocol='m3u8' if is_live else 'm3u8_native',
m3u8_id=kind, fatal=False, live=is_live))
elif kind == 'flash' or 'f4m' in exts:
formats.extend(self._extract_f4m_formats(
f_url, video_id, f4m_id=kind, fatal=False))
elif kind == 'dash' or 'mpd' in exts:
formats.extend(self._extract_mpd_formats(
f_url, video_id, mpd_id=kind, fatal=False))
elif kind == 'silverlight':
# TODO: process when ism is supported (see
# https://github.com/rg3/youtube-dl/issues/8118)
continue
else:
tbr = float_or_none(f.get('Bitrate'), 1000)
formats.append({
'url': f_url,
'format_id': '%s-%d' % (kind, tbr) if tbr else kind,
'tbr': tbr,
})
self._sort_formats(formats)
description = media_info.get('Description')
video_id = media_info.get('VideoId') or video_id
timestamp = parse_iso8601(media_info.get('PublishDate'))
thumbnails = [{
'url': thumbnail['Url'],
'width': int_or_none(thumbnail.get('Size')),
} for thumbnail in (media_info.get('Poster') or []) if thumbnail.get('Url')]
return {
'id': video_id,
'title': title,
'description': description,
'timestamp': timestamp,
'is_live': is_live,
'thumbnails': thumbnails,
'formats': formats,
}

View File

@ -1,4 +1,4 @@
# coding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -180,14 +180,11 @@ class ArteTVBaseIE(InfoExtractor):
class ArteTVPlus7IE(ArteTVBaseIE): class ArteTVPlus7IE(ArteTVBaseIE):
IE_NAME = 'arte.tv:+7' IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/[^/]+/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D', 'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22',
'only_matching': True,
}] }]
@classmethod @classmethod
@ -243,10 +240,10 @@ class ArteTVPlus7IE(ArteTVBaseIE):
return self._extract_from_json_url(json_url, video_id, lang, title=title) return self._extract_from_json_url(json_url, video_id, lang, title=title)
# Different kind of embed URL (e.g. # Different kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium) # http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
entries = [ embed_url = self._search_regex(
self.url_result(url) r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1',
for _, url in re.findall(r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', webpage)] webpage, 'embed url', group='url')
return self.playlist_result(entries) return self.url_result(embed_url)
# It also uses the arte_vp_url url from the webpage to extract the information # It also uses the arte_vp_url url from the webpage to extract the information
@ -255,17 +252,22 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://creative.arte.tv/fr/episode/osmosis-episode-1', 'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
'info_dict': { 'info_dict': {
'id': '057405-001-A', 'id': '72176',
'ext': 'mp4', 'ext': 'mp4',
'title': 'OSMOSIS - N\'AYEZ PLUS PEUR D\'AIMER (1)', 'title': 'Folge 2 - Corporate Design',
'upload_date': '20150716', 'upload_date': '20131004',
}, },
}, { }, {
'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion', 'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion',
'playlist_count': 11, 'info_dict': {
'add_ie': ['Youtube'], 'id': '160676',
'ext': 'mp4',
'title': 'Monty Python live (mostly)',
'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
'upload_date': '20140805',
}
}, { }, {
'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde', 'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
'only_matching': True, 'only_matching': True,
@ -347,13 +349,14 @@ class ArteTVCinemaIE(ArteTVPlus7IE):
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)' _VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
_TESTS = [{ _TESTS = [{
'url': 'http://cinema.arte.tv/fr/article/les-ailes-du-desir-de-julia-reck', 'url': 'http://cinema.arte.tv/de/node/38291',
'md5': 'a5b9dd5575a11d93daf0e3f404f45438', 'md5': '6b275511a5107c60bacbeeda368c3aa1',
'info_dict': { 'info_dict': {
'id': '062494-000-A', 'id': '055876-000_PWA12025-D',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Film lauréat du concours web - "Les ailes du désir" de Julia Reck', 'title': 'Tod auf dem Nil',
'upload_date': '20150807', 'upload_date': '20160122',
'description': 'md5:7f749bbb77d800ef2be11d54529b96bc',
}, },
}] }]
@ -410,22 +413,6 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
return self._extract_from_json_url(json_url, video_id, lang) return self._extract_from_json_url(json_url, video_id, lang)
class TheOperaPlatformIE(ArteTVPlus7IE):
IE_NAME = 'theoperaplatform'
_VALID_URL = r'https?://(?:www\.)?theoperaplatform\.eu/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.theoperaplatform.eu/de/opera/verdi-otello',
'md5': '970655901fa2e82e04c00b955e9afe7b',
'info_dict': {
'id': '060338-009-A',
'ext': 'mp4',
'title': 'Verdi - OTELLO',
'upload_date': '20160927',
},
}]
class ArteTVPlaylistIE(ArteTVBaseIE): class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist' IE_NAME = 'arte.tv:playlist'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)' _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)'
@ -435,7 +422,6 @@ class ArteTVPlaylistIE(ArteTVBaseIE):
'info_dict': { 'info_dict': {
'id': 'PL-013263', 'id': 'PL-013263',
'title': 'Areva & Uramin', 'title': 'Areva & Uramin',
'description': 'md5:a1dc0312ce357c262259139cfd48c9bf',
}, },
'playlist_mincount': 6, 'playlist_mincount': 6,
}, { }, {

View File

@ -6,8 +6,8 @@ from ..utils import float_or_none
class AudioBoomIE(InfoExtractor): class AudioBoomIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audioboom\.com/(?:boos|posts)/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?audioboom\.com/boos/(?P<id>[0-9]+)'
_TESTS = [{ _TEST = {
'url': 'https://audioboom.com/boos/4279833-3-09-2016-czaban-hour-3?t=0', 'url': 'https://audioboom.com/boos/4279833-3-09-2016-czaban-hour-3?t=0',
'md5': '63a8d73a055c6ed0f1e51921a10a5a76', 'md5': '63a8d73a055c6ed0f1e51921a10a5a76',
'info_dict': { 'info_dict': {
@ -19,10 +19,7 @@ class AudioBoomIE(InfoExtractor):
'uploader': 'Steve Czaban', 'uploader': 'Steve Czaban',
'uploader_url': 're:https?://(?:www\.)?audioboom\.com/channel/steveczabanyahoosportsradio', 'uploader_url': 're:https?://(?:www\.)?audioboom\.com/channel/steveczabanyahoosportsradio',
} }
}, { }
'url': 'https://audioboom.com/posts/4279833-3-09-2016-czaban-hour-3?t=0',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)

View File

@ -6,7 +6,6 @@ import time
from .common import InfoExtractor from .common import InfoExtractor
from .soundcloud import SoundcloudIE from .soundcloud import SoundcloudIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
url_basename, url_basename,
@ -137,7 +136,7 @@ class AudiomackAlbumIE(InfoExtractor):
result[resultkey] = api_response[apikey] result[resultkey] = api_response[apikey]
song_id = url_basename(api_response['url']).rpartition('.')[0] song_id = url_basename(api_response['url']).rpartition('.')[0]
result['entries'].append({ result['entries'].append({
'id': compat_str(api_response.get('id', song_id)), 'id': api_response.get('id', song_id),
'uploader': api_response.get('artist'), 'uploader': api_response.get('artist'),
'title': api_response.get('title', song_id), 'title': api_response.get('title', song_id),
'url': api_response['url'], 'url': api_response['url'],

View File

@ -1,185 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import base64
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlencode,
compat_str,
)
from ..utils import (
int_or_none,
parse_iso8601,
smuggle_url,
unsmuggle_url,
urlencode_postdata,
)
class AWAANIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?show/(?P<show_id>\d+)/[^/]+(?:/(?P<video_id>\d+)/(?P<season_id>\d+))?'
def _real_extract(self, url):
show_id, video_id, season_id = re.match(self._VALID_URL, url).groups()
if video_id and int(video_id) > 0:
return self.url_result(
'http://awaan.ae/media/%s' % video_id, 'AWAANVideo')
elif season_id and int(season_id) > 0:
return self.url_result(smuggle_url(
'http://awaan.ae/program/season/%s' % season_id,
{'show_id': show_id}), 'AWAANSeason')
else:
return self.url_result(
'http://awaan.ae/program/%s' % show_id, 'AWAANSeason')
class AWAANBaseIE(InfoExtractor):
def _parse_video_data(self, video_data, video_id, is_live):
title = video_data.get('title_en') or video_data['title_ar']
img = video_data.get('img')
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'description': video_data.get('description_en') or video_data.get('description_ar'),
'thumbnail': 'http://admin.mangomolo.com/analytics/%s' % img if img else None,
'duration': int_or_none(video_data.get('duration')),
'timestamp': parse_iso8601(video_data.get('create_time'), ' '),
'is_live': is_live,
}
class AWAANVideoIE(AWAANBaseIE):
IE_NAME = 'awaan:video'
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?(?:video(?:/[^/]+)?|media|catchup/[^/]+/[^/]+)/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.dcndigital.ae/#/video/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375',
'md5': '5f61c33bfc7794315c671a62d43116aa',
'info_dict':
{
'id': '17375',
'ext': 'mp4',
'title': 'رحلة العمر : الحلقة 1',
'description': 'md5:0156e935d870acb8ef0a66d24070c6d6',
'duration': 2041,
'timestamp': 1227504126,
'upload_date': '20081124',
'uploader_id': '71',
},
}, {
'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id,
video_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(video_data, video_id, False)
embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' + compat_urllib_parse_urlencode({
'id': video_data['id'],
'user_id': video_data['user_id'],
'signature': video_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
})
info.update({
'_type': 'url_transparent',
'url': embed_url,
'ie_key': 'MangomoloVideo',
})
return info
class AWAANLiveIE(AWAANBaseIE):
IE_NAME = 'awaan:live'
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?live/(?P<id>\d+)'
_TEST = {
'url': 'http://awaan.ae/live/6/dubai-tv',
'info_dict': {
'id': '6',
'ext': 'mp4',
'title': 're:Dubai Al Oula [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'upload_date': '20150107',
'timestamp': 1420588800,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
channel_id = self._match_id(url)
channel_data = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/getchanneldetails?channel_id=%s' % channel_id,
channel_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(channel_data, channel_id, True)
embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' + compat_urllib_parse_urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
})
info.update({
'_type': 'url_transparent',
'url': embed_url,
'ie_key': 'MangomoloLive',
})
return info
class AWAANSeasonIE(InfoExtractor):
IE_NAME = 'awaan:season'
_VALID_URL = r'https?://(?:www\.)?(?:awaan|dcndigital)\.ae/(?:#/)?program/(?:(?P<show_id>\d+)|season/(?P<season_id>\d+))'
_TEST = {
'url': 'http://dcndigital.ae/#/program/205024/%D9%85%D8%AD%D8%A7%D8%B6%D8%B1%D8%A7%D8%AA-%D8%A7%D9%84%D8%B4%D9%8A%D8%AE-%D8%A7%D9%84%D8%B4%D8%B9%D8%B1%D8%A7%D9%88%D9%8A',
'info_dict':
{
'id': '7910',
'title': 'محاضرات الشيخ الشعراوي',
},
'playlist_mincount': 27,
}
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
show_id, season_id = re.match(self._VALID_URL, url).groups()
data = {}
if season_id:
data['season'] = season_id
show_id = smuggled_data.get('show_id')
if show_id is None:
season = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/season_info?id=%s' % season_id,
season_id, headers={'Origin': 'http://awaan.ae'})
show_id = season['id']
data['show_id'] = show_id
show = self._download_json(
'http://admin.mangomolo.com/analytics/index.php/plus/show',
show_id, data=urlencode_postdata(data), headers={
'Origin': 'http://awaan.ae',
'Content-Type': 'application/x-www-form-urlencoded'
})
if not season_id:
season_id = show['default_season']
for season in show['seasons']:
if season['id'] == season_id:
title = season.get('title_en') or season['title_ar']
entries = []
for video in show['videos']:
video_id = compat_str(video['id'])
entries.append(self.url_result(
'http://awaan.ae/media/%s' % video_id, 'AWAANVideo', video_id))
return self.playlist_result(entries, season_id, title)

View File

@ -46,7 +46,6 @@ class AzubuIE(InfoExtractor):
'uploader_id': 272749, 'uploader_id': 272749,
'view_count': int, 'view_count': int,
}, },
'skip': 'Channel offline',
}, },
] ]
@ -57,26 +56,22 @@ class AzubuIE(InfoExtractor):
'http://www.azubu.tv/api/video/%s' % video_id, video_id)['data'] 'http://www.azubu.tv/api/video/%s' % video_id, video_id)['data']
title = data['title'].strip() title = data['title'].strip()
description = data.get('description') description = data['description']
thumbnail = data.get('thumbnail') thumbnail = data['thumbnail']
view_count = data.get('view_count') view_count = data['view_count']
user = data.get('user', {}) uploader = data['user']['username']
uploader = user.get('username') uploader_id = data['user']['id']
uploader_id = user.get('id')
stream_params = json.loads(data['stream_params']) stream_params = json.loads(data['stream_params'])
timestamp = float_or_none(stream_params.get('creationDate'), 1000) timestamp = float_or_none(stream_params['creationDate'], 1000)
duration = float_or_none(stream_params.get('length'), 1000) duration = float_or_none(stream_params['length'], 1000)
renditions = stream_params.get('renditions') or [] renditions = stream_params.get('renditions') or []
video = stream_params.get('FLVFullLength') or stream_params.get('videoFullLength') video = stream_params.get('FLVFullLength') or stream_params.get('videoFullLength')
if video: if video:
renditions.append(video) renditions.append(video)
if not renditions and not user.get('channel', {}).get('is_live', True):
raise ExtractorError('%s said: channel is offline.' % self.IE_NAME, expected=True)
formats = [{ formats = [{
'url': fmt['url'], 'url': fmt['url'],
'width': fmt['frameWidth'], 'width': fmt['frameWidth'],
@ -103,7 +98,7 @@ class AzubuIE(InfoExtractor):
class AzubuLiveIE(InfoExtractor): class AzubuLiveIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?azubu\.tv/(?P<id>[^/]+)$' _VALID_URL = r'https?://www.azubu.tv/(?P<id>[^/]+)$'
_TEST = { _TEST = {
'url': 'http://www.azubu.tv/MarsTVMDLen', 'url': 'http://www.azubu.tv/MarsTVMDLen',

View File

@ -162,15 +162,6 @@ class BandcampAlbumIE(InfoExtractor):
'uploader_id': 'dotscale', 'uploader_id': 'dotscale',
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
}, {
# with escaped quote in title
'url': 'https://jstrecords.bandcamp.com/album/entropy-ep',
'info_dict': {
'title': '"Entropy" EP',
'uploader_id': 'jstrecords',
'id': 'entropy-ep',
},
'playlist_mincount': 3,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -185,11 +176,8 @@ class BandcampAlbumIE(InfoExtractor):
entries = [ entries = [
self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key()) self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key())
for t_path in tracks_paths] for t_path in tracks_paths]
title = self._html_search_regex( title = self._search_regex(
r'album_title\s*:\s*"((?:\\.|[^"\\])+?)"', r'album_title\s*:\s*"(.*?)"', webpage, 'title', fatal=False)
webpage, 'title', fatal=False)
if title:
title = title.replace(r'\"', '"')
return { return {
'_type': 'playlist', '_type': 'playlist',
'uploader_id': uploader_id, 'uploader_id': uploader_id,

View File

@ -2,23 +2,19 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import itertools
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
dict_get,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
try_get,
unescapeHTML, unescapeHTML,
) )
from ..compat import ( from ..compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_HTTPError, compat_HTTPError,
compat_urlparse,
) )
@ -35,7 +31,7 @@ class BBCCoUkIE(InfoExtractor):
music/clips[/#]| music/clips[/#]|
radio/player/ radio/player/
) )
(?P<id>%s)(?!/(?:episodes|broadcasts|clips)) (?P<id>%s)
''' % _ID_REGEX ''' % _ID_REGEX
_MEDIASELECTOR_URLS = [ _MEDIASELECTOR_URLS = [
@ -196,7 +192,6 @@ class BBCCoUkIE(InfoExtractor):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Now it\'s really geo-restricted',
}, { }, {
# compact player (https://github.com/rg3/youtube-dl/issues/8147) # compact player (https://github.com/rg3/youtube-dl/issues/8147)
'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player', 'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
@ -233,6 +228,51 @@ class BBCCoUkIE(InfoExtractor):
asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist') asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
return [ref.get('href') for ref in asx.findall('./Entry/ref')] return [ref.get('href') for ref in asx.findall('./Entry/ref')]
def _extract_connection(self, connection, programme_id):
formats = []
kind = connection.get('kind')
protocol = connection.get('protocol')
supplier = connection.get('supplier')
if protocol == 'http':
href = connection.get('href')
transfer_format = connection.get('transferFormat')
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, supplier),
})
# Skip DASH until supported
elif transfer_format == 'dash':
pass
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=supplier, fatal=False))
# Direct link
else:
formats.append({
'url': href,
'format_id': supplier or kind or protocol,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
formats.append({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
'format_id': supplier,
})
return formats
def _extract_items(self, playlist): def _extract_items(self, playlist):
return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS) return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
@ -253,6 +293,46 @@ class BBCCoUkIE(InfoExtractor):
def _extract_connections(self, media): def _extract_connections(self, media):
return self._findall_ns(media, './{%s}connection') return self._findall_ns(media, './{%s}connection')
def _extract_video(self, media, programme_id):
formats = []
vbr = int_or_none(media.get('bitrate'))
vcodec = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'width': width,
'height': height,
'vbr': vbr,
'vcodec': vcodec,
'filesize': file_size,
})
if service:
format['format_id'] = '%s_%s' % (service, format['format_id'])
formats.extend(conn_formats)
return formats
def _extract_audio(self, media, programme_id):
formats = []
abr = int_or_none(media.get('bitrate'))
acodec = media.get('encoding')
service = media.get('service')
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
'vcodec': 'none',
})
formats.extend(conn_formats)
return formats
def _get_subtitles(self, media, programme_id): def _get_subtitles(self, media, programme_id):
subtitles = {} subtitles = {}
for connection in self._extract_connections(media): for connection in self._extract_connections(media):
@ -298,87 +378,13 @@ class BBCCoUkIE(InfoExtractor):
def _process_media_selector(self, media_selection, programme_id): def _process_media_selector(self, media_selection, programme_id):
formats = [] formats = []
subtitles = None subtitles = None
urls = []
for media in self._extract_medias(media_selection): for media in self._extract_medias(media_selection):
kind = media.get('kind') kind = media.get('kind')
if kind in ('video', 'audio'): if kind == 'audio':
bitrate = int_or_none(media.get('bitrate')) formats.extend(self._extract_audio(media, programme_id))
encoding = media.get('encoding') elif kind == 'video':
service = media.get('service') formats.extend(self._extract_video(media, programme_id))
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
for connection in self._extract_connections(media):
href = connection.get('href')
if href in urls:
continue
if href:
urls.append(href)
conn_kind = connection.get('kind')
protocol = connection.get('protocol')
supplier = connection.get('supplier')
transfer_format = connection.get('transferFormat')
format_id = supplier or conn_kind or protocol
if service:
format_id = '%s_%s' % (service, format_id)
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, format_id),
})
elif transfer_format == 'dash':
formats.extend(self._extract_mpd_formats(
href, programme_id, mpd_id=format_id, fatal=False))
elif transfer_format == 'hls':
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False))
else:
if not service and not supplier and bitrate:
format_id += '-%d' % bitrate
fmt = {
'format_id': format_id,
'filesize': file_size,
}
if kind == 'video':
fmt.update({
'width': width,
'height': height,
'vbr': bitrate,
'vcodec': encoding,
})
else:
fmt.update({
'abr': bitrate,
'acodec': encoding,
'vcodec': 'none',
})
if protocol == 'http':
# Direct link
fmt.update({
'url': href,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
fmt.update({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
})
formats.append(fmt)
elif kind == 'captions': elif kind == 'captions':
subtitles = self.extract_subtitles(media, programme_id) subtitles = self.extract_subtitles(media, programme_id)
return formats, subtitles return formats, subtitles
@ -583,7 +589,6 @@ class BBCIE(BBCCoUkIE):
'id': '150615_telabyad_kentin_cogu', 'id': '150615_telabyad_kentin_cogu',
'ext': 'mp4', 'ext': 'mp4',
'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde", 'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde",
'description': 'md5:33a4805a855c9baf7115fcbde57e7025',
'timestamp': 1434397334, 'timestamp': 1434397334,
'upload_date': '20150615', 'upload_date': '20150615',
}, },
@ -597,7 +602,6 @@ class BBCIE(BBCCoUkIE):
'id': '150619_video_honduras_militares_hospitales_corrupcion_aw', 'id': '150619_video_honduras_militares_hospitales_corrupcion_aw',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Honduras militariza sus hospitales por nuevo escándalo de corrupción', 'title': 'Honduras militariza sus hospitales por nuevo escándalo de corrupción',
'description': 'md5:1525f17448c4ee262b64b8f0c9ce66c8',
'timestamp': 1434713142, 'timestamp': 1434713142,
'upload_date': '20150619', 'upload_date': '20150619',
}, },
@ -647,23 +651,6 @@ class BBCIE(BBCCoUkIE):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
} }
}, {
# single video embedded with Morph
'url': 'http://www.bbc.co.uk/sport/live/olympics/36895975',
'info_dict': {
'id': 'p041vhd0',
'ext': 'mp4',
'title': "Nigeria v Japan - Men's First Round",
'description': 'Live coverage of the first round from Group B at the Amazonia Arena.',
'duration': 7980,
'uploader': 'BBC Sport',
'uploader_id': 'bbc_sport',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Georestricted to UK',
}, { }, {
# single video with playlist.sxml URL in playlist param # single video with playlist.sxml URL in playlist param
'url': 'http://www.bbc.com/sport/0/football/33653409', 'url': 'http://www.bbc.com/sport/0/football/33653409',
@ -711,9 +698,7 @@ class BBCIE(BBCCoUkIE):
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE) return False if BBCCoUkIE.suitable(url) or BBCCoUkArticleIE.suitable(url) else super(BBCIE, cls).suitable(url)
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
else super(BBCIE, cls).suitable(url))
def _extract_from_media_meta(self, media_meta, video_id): def _extract_from_media_meta(self, media_meta, video_id):
# Direct links to media in media metadata (e.g. # Direct links to media in media metadata (e.g.
@ -761,7 +746,7 @@ class BBCIE(BBCCoUkIE):
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
json_ld_info = self._search_json_ld(webpage, playlist_id, default={}) json_ld_info = self._search_json_ld(webpage, playlist_id, default=None)
timestamp = json_ld_info.get('timestamp') timestamp = json_ld_info.get('timestamp')
playlist_title = json_ld_info.get('title') playlist_title = json_ld_info.get('title')
@ -830,29 +815,8 @@ class BBCIE(BBCCoUkIE):
# http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani) # http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani)
playlist = data_playable.get('otherSettings', {}).get('playlist', {}) playlist = data_playable.get('otherSettings', {}).get('playlist', {})
if playlist: if playlist:
entry = None entries.append(self._extract_from_playlist_sxml(
for key in ('streaming', 'progressiveDownload'): playlist.get('progressiveDownloadUrl'), playlist_id, timestamp))
playlist_url = playlist.get('%sUrl' % key)
if not playlist_url:
continue
try:
info = self._extract_from_playlist_sxml(
playlist_url, playlist_id, timestamp)
if not entry:
entry = info
else:
entry['title'] = info['title']
entry['formats'].extend(info['formats'])
except Exception as e:
# Some playlist URL may fail with 500, at the same time
# the other one may work fine (e.g.
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 500:
continue
raise
if entry:
self._sort_formats(entry['formats'])
entries.append(entry)
if entries: if entries:
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description) return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
@ -885,50 +849,6 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles, 'subtitles': subtitles,
} }
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
# There are several setPayload calls may be present but the video
# seems to be always related to the first one
morph_payload = self._parse_json(
self._search_regex(
r'Morph\.setPayload\([^,]+,\s*({.+?})\);',
webpage, 'morph payload', default='{}'),
playlist_id, fatal=False)
if morph_payload:
components = try_get(morph_payload, lambda x: x['body']['components'], list) or []
for component in components:
if not isinstance(component, dict):
continue
lead_media = try_get(component, lambda x: x['props']['leadMedia'], dict)
if not lead_media:
continue
identifiers = lead_media.get('identifiers')
if not identifiers or not isinstance(identifiers, dict):
continue
programme_id = identifiers.get('vpid') or identifiers.get('playablePid')
if not programme_id:
continue
title = lead_media.get('title') or self._og_search_title(webpage)
formats, subtitles = self._download_media_selector(programme_id)
self._sort_formats(formats)
description = lead_media.get('summary')
uploader = lead_media.get('masterBrand')
uploader_id = lead_media.get('mid')
duration = None
duration_d = lead_media.get('duration')
if isinstance(duration_d, dict):
duration = parse_duration(dict_get(
duration_d, ('rawDuration', 'formattedDuration', 'spokenDuration')))
return {
'id': programme_id,
'title': title,
'description': description,
'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
'formats': formats,
'subtitles': subtitles,
}
def extract_all(pattern): def extract_all(pattern):
return list(filter(None, map( return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False), lambda s: self._parse_json(s, playlist_id, fatal=False),
@ -946,7 +866,7 @@ class BBCIE(BBCCoUkIE):
r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage)) r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage))
if entries: if entries:
return self.playlist_result( return self.playlist_result(
[self.url_result(entry_, 'BBCCoUk') for entry_ in entries], [self.url_result(entry, 'BBCCoUk') for entry in entries],
playlist_id, playlist_title, playlist_description) playlist_id, playlist_title, playlist_description)
# Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511) # Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511)
@ -1028,7 +948,7 @@ class BBCIE(BBCCoUkIE):
class BBCCoUkArticleIE(InfoExtractor): class BBCCoUkArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)' _VALID_URL = r'https?://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
IE_NAME = 'bbc.co.uk:article' IE_NAME = 'bbc.co.uk:article'
IE_DESC = 'BBC articles' IE_DESC = 'BBC articles'
@ -1055,116 +975,3 @@ class BBCCoUkArticleIE(InfoExtractor):
r'<div[^>]+typeof="Clip"[^>]+resource="([^"]+)"', webpage)] r'<div[^>]+typeof="Clip"[^>]+resource="([^"]+)"', webpage)]
return self.playlist_result(entries, playlist_id, title, description) return self.playlist_result(entries, playlist_id, title, description)
class BBCCoUkPlaylistBaseIE(InfoExtractor):
def _entries(self, webpage, url, playlist_id):
single_page = 'page' in compat_urlparse.parse_qs(
compat_urlparse.urlparse(url).query)
for page_num in itertools.count(2):
for video_id in re.findall(
self._VIDEO_ID_TEMPLATE % BBCCoUkIE._ID_REGEX, webpage):
yield self.url_result(
self._URL_TEMPLATE % video_id, BBCCoUkIE.ie_key())
if single_page:
return
next_page = self._search_regex(
r'<li[^>]+class=(["\'])pagination_+next\1[^>]*><a[^>]+href=(["\'])(?P<url>(?:(?!\2).)+)\2',
webpage, 'next page url', default=None, group='url')
if not next_page:
break
webpage = self._download_webpage(
compat_urlparse.urljoin(url, next_page), playlist_id,
'Downloading page %d' % page_num, page_num)
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
title, description = self._extract_title_and_description(webpage)
return self.playlist_result(
self._entries(webpage, url, playlist_id),
playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
_TESTS = [{
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance',
'description': 'French thriller serial about a missing teenager.',
},
'playlist_mincount': 6,
'skip': 'This programme is not currently available on BBC iPlayer',
}, {
# Available for over a year unlike 30 days for most other programmes
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
'info_dict': {
'id': 'p02tcc32',
'title': 'Bohemian Icons',
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
},
'playlist_mincount': 10,
}]
def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
description = self._search_regex(
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
webpage, 'description', fatal=False, group='value')
return title, description
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/programmes/(?P<id>%s)/(?:episodes|broadcasts|clips)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/programmes/%s'
_VIDEO_ID_TEMPLATE = r'data-pid=["\'](%s)'
_TESTS = [{
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/clips',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance - Clips - BBC Four',
'description': 'French thriller serial about a missing teenager.',
},
'playlist_mincount': 7,
}, {
# multipage playlist, explicit page
'url': 'http://www.bbc.co.uk/programmes/b00mfl7n/clips?page=1',
'info_dict': {
'id': 'b00mfl7n',
'title': 'Frozen Planet - Clips - BBC One',
'description': 'md5:65dcbf591ae628dafe32aa6c4a4a0d8c',
},
'playlist_mincount': 24,
}, {
# multipage playlist, all pages
'url': 'http://www.bbc.co.uk/programmes/b00mfl7n/clips',
'info_dict': {
'id': 'b00mfl7n',
'title': 'Frozen Planet - Clips - BBC One',
'description': 'md5:65dcbf591ae628dafe32aa6c4a4a0d8c',
},
'playlist_mincount': 142,
}, {
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/broadcasts/2016/06',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/programmes/b05rcz9v/clips',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/programmes/b055jkys/episodes/player',
'only_matching': True,
}]
def _extract_title_and_description(self, webpage):
title = self._og_search_title(webpage, fatal=False)
description = self._og_search_description(webpage)
return title, description

View File

@ -8,10 +8,10 @@ from ..compat import compat_str
from ..utils import int_or_none from ..utils import int_or_none
class BeatportIE(InfoExtractor): class BeatportProIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|pro\.)?beatport\.com/track/(?P<display_id>[^/]+)/(?P<id>[0-9]+)' _VALID_URL = r'https?://pro\.beatport\.com/track/(?P<display_id>[^/]+)/(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://beatport.com/track/synesthesia-original-mix/5379371', 'url': 'https://pro.beatport.com/track/synesthesia-original-mix/5379371',
'md5': 'b3c34d8639a2f6a7f734382358478887', 'md5': 'b3c34d8639a2f6a7f734382358478887',
'info_dict': { 'info_dict': {
'id': '5379371', 'id': '5379371',
@ -20,7 +20,7 @@ class BeatportIE(InfoExtractor):
'title': 'Froxic - Synesthesia (Original Mix)', 'title': 'Froxic - Synesthesia (Original Mix)',
}, },
}, { }, {
'url': 'https://beatport.com/track/love-and-war-original-mix/3756896', 'url': 'https://pro.beatport.com/track/love-and-war-original-mix/3756896',
'md5': 'e44c3025dfa38c6577fbaeb43da43514', 'md5': 'e44c3025dfa38c6577fbaeb43da43514',
'info_dict': { 'info_dict': {
'id': '3756896', 'id': '3756896',
@ -29,7 +29,7 @@ class BeatportIE(InfoExtractor):
'title': 'Wolfgang Gartner - Love & War (Original Mix)', 'title': 'Wolfgang Gartner - Love & War (Original Mix)',
}, },
}, { }, {
'url': 'https://beatport.com/track/birds-original-mix/4991738', 'url': 'https://pro.beatport.com/track/birds-original-mix/4991738',
'md5': 'a1fd8e8046de3950fd039304c186c05f', 'md5': 'a1fd8e8046de3950fd039304c186c05f',
'info_dict': { 'info_dict': {
'id': '4991738', 'id': '4991738',

View File

@ -46,19 +46,19 @@ class BeegIE(InfoExtractor):
self._proto_relative_url(cpl_url), video_id, self._proto_relative_url(cpl_url), video_id,
'Downloading cpl JS', fatal=False) 'Downloading cpl JS', fatal=False)
if cpl: if cpl:
beeg_version = int_or_none(self._search_regex( beeg_version = self._search_regex(
r'beeg_version\s*=\s*([^\b]+)', cpl, r'beeg_version\s*=\s*(\d+)', cpl,
'beeg version', default=None)) or self._search_regex( 'beeg version', default=None) or self._search_regex(
r'/(\d+)\.js', cpl_url, 'beeg version', default=None) r'/(\d+)\.js', cpl_url, 'beeg version', default=None)
beeg_salt = self._search_regex( beeg_salt = self._search_regex(
r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg salt', r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg beeg_salt',
default=None, group='beeg_salt') default=None, group='beeg_salt')
beeg_version = beeg_version or '2000' beeg_version = beeg_version or '1750'
beeg_salt = beeg_salt or 'pmweAkq8lAYKdfWcFCUj0yoVgoPlinamH5UE1CB3H' beeg_salt = beeg_salt or 'MIDtGaw96f0N1kMMAM1DE46EC9pmFr'
video = self._download_json( video = self._download_json(
'https://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id), 'http://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id),
video_id) video_id)
def split(o, e): def split(o, e):

View File

@ -1,75 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class BellMediaIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?
(?P<domain>
(?:
ctv|
tsn|
bnn|
thecomedynetwork|
discovery|
discoveryvelocity|
sciencechannel|
investigationdiscovery|
animalplanet|
bravo|
mtv|
space
)\.ca|
much\.com
)/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6})'''
_TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
'info_dict': {
'id': '706966',
'ext': 'mp4',
'title': 'Larry Day and Richard Jutras on the TIFF red carpet of \'Stonewall\'',
'description': 'etalk catches up with Larry Day and Richard Jutras on the TIFF red carpet of "Stonewall”.',
'upload_date': '20150919',
'timestamp': 1442624700,
},
'expected_warnings': ['HTTP Error 404'],
}, {
'url': 'http://www.thecomedynetwork.ca/video/player?vid=923582',
'only_matching': True,
}, {
'url': 'http://www.tsn.ca/video/expectations-high-for-milos-raonic-at-us-open~939549',
'only_matching': True,
}, {
'url': 'http://www.bnn.ca/video/berman-s-call-part-two-viewer-questions~939654',
'only_matching': True,
}, {
'url': 'http://www.ctv.ca/YourMorning/Video/S1E6-Monday-August-29-2016-vid938009',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/atmidnight/episode948007/tuesday-september-13-2016',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
'only_matching': True,
}]
_DOMAINS = {
'thecomedynetwork': 'comedy',
'discoveryvelocity': 'discvel',
'sciencechannel': 'discsci',
'investigationdiscovery': 'invdisc',
'animalplanet': 'aniplan',
}
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
domain = domain.split('.')[0]
return {
'_type': 'url_transparent',
'id': video_id,
'url': '9c9media:%s_web:%s' % (self._DOMAINS.get(domain, domain), video_id),
'ie_key': 'NineCNineMedia',
}

View File

@ -1,26 +1,31 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor from .common import InfoExtractor
from ..utils import unified_strdate from ..compat import compat_urllib_parse_unquote
from ..utils import (
xpath_text,
xpath_with_ns,
int_or_none,
parse_iso8601,
)
class BetIE(MTVServicesInfoExtractor): class BetIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bet\.com/(?:[^/]+/)+(?P<id>.+?)\.html' _VALID_URL = r'https?://(?:www\.)?bet\.com/(?:[^/]+/)+(?P<id>.+?)\.html'
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.bet.com/news/politics/2014/12/08/in-bet-exclusive-obama-talks-race-and-racism.html', 'url': 'http://www.bet.com/news/politics/2014/12/08/in-bet-exclusive-obama-talks-race-and-racism.html',
'info_dict': { 'info_dict': {
'id': '07e96bd3-8850-3051-b856-271b457f0ab8', 'id': 'news/national/2014/a-conversation-with-president-obama',
'display_id': 'in-bet-exclusive-obama-talks-race-and-racism', 'display_id': 'in-bet-exclusive-obama-talks-race-and-racism',
'ext': 'flv', 'ext': 'flv',
'title': 'A Conversation With President Obama', 'title': 'A Conversation With President Obama',
'description': 'President Obama urges persistence in confronting racism and bias.', 'description': 'md5:699d0652a350cf3e491cd15cc745b5da',
'duration': 1534, 'duration': 1534,
'timestamp': 1418075340,
'upload_date': '20141208', 'upload_date': '20141208',
'uploader': 'admin',
'thumbnail': 're:(?i)^https?://.*\.jpg$', 'thumbnail': 're:(?i)^https?://.*\.jpg$',
'subtitles': {
'en': 'mincount:2',
}
}, },
'params': { 'params': {
# rtmp download # rtmp download
@ -30,17 +35,16 @@ class BetIE(MTVServicesInfoExtractor):
{ {
'url': 'http://www.bet.com/video/news/national/2014/justice-for-ferguson-a-community-reacts.html', 'url': 'http://www.bet.com/video/news/national/2014/justice-for-ferguson-a-community-reacts.html',
'info_dict': { 'info_dict': {
'id': '9f516bf1-7543-39c4-8076-dd441b459ba9', 'id': 'news/national/2014/justice-for-ferguson-a-community-reacts',
'display_id': 'justice-for-ferguson-a-community-reacts', 'display_id': 'justice-for-ferguson-a-community-reacts',
'ext': 'flv', 'ext': 'flv',
'title': 'Justice for Ferguson: A Community Reacts', 'title': 'Justice for Ferguson: A Community Reacts',
'description': 'A BET News special.', 'description': 'A BET News special.',
'duration': 1696, 'duration': 1696,
'timestamp': 1416942360,
'upload_date': '20141125', 'upload_date': '20141125',
'uploader': 'admin',
'thumbnail': 're:(?i)^https?://.*\.jpg$', 'thumbnail': 're:(?i)^https?://.*\.jpg$',
'subtitles': {
'en': 'mincount:2',
}
}, },
'params': { 'params': {
# rtmp download # rtmp download
@ -49,32 +53,57 @@ class BetIE(MTVServicesInfoExtractor):
} }
] ]
_FEED_URL = "http://feeds.mtvnservices.com/od/feed/bet-mrss-player"
def _get_feed_query(self, uri):
return {
'uuid': uri,
}
def _extract_mgid(self, webpage):
return self._search_regex(r'data-uri="([^"]+)', webpage, 'mgid')
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
mgid = self._extract_mgid(webpage)
videos_info = self._get_videos_info(mgid)
info_dict = videos_info['entries'][0] media_url = compat_urllib_parse_unquote(self._search_regex(
[r'mediaURL\s*:\s*"([^"]+)"', r"var\s+mrssMediaUrl\s*=\s*'([^']+)'"],
webpage, 'media URL'))
upload_date = unified_strdate(self._html_search_meta('date', webpage)) video_id = self._search_regex(
description = self._html_search_meta('description', webpage) r'/video/(.*)/_jcr_content/', media_url, 'video id')
info_dict.update({ mrss = self._download_xml(media_url, display_id)
item = mrss.find('./channel/item')
NS_MAP = {
'dc': 'http://purl.org/dc/elements/1.1/',
'media': 'http://search.yahoo.com/mrss/',
'ka': 'http://kickapps.com/karss',
}
title = xpath_text(item, './title', 'title')
description = xpath_text(
item, './description', 'description', fatal=False)
timestamp = parse_iso8601(xpath_text(
item, xpath_with_ns('./dc:date', NS_MAP),
'upload date', fatal=False))
uploader = xpath_text(
item, xpath_with_ns('./dc:creator', NS_MAP),
'uploader', fatal=False)
media_content = item.find(
xpath_with_ns('./media:content', NS_MAP))
duration = int_or_none(media_content.get('duration'))
smil_url = media_content.get('url')
thumbnail = media_content.find(
xpath_with_ns('./media:thumbnail', NS_MAP)).get('url')
formats = self._extract_smil_formats(smil_url, display_id)
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title,
'description': description, 'description': description,
'upload_date': upload_date, 'thumbnail': thumbnail,
}) 'timestamp': timestamp,
'uploader': uploader,
return info_dict 'duration': duration,
'formats': formats,
}

View File

@ -11,13 +11,22 @@ from ..compat import compat_urllib_parse_unquote
class BigflixIE(InfoExtractor): class BigflixIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bigflix\.com/.+/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?bigflix\.com/.+/(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.bigflix.com/Hindi-movies/Action-movies/Singham-Returns/16537',
'md5': 'ec76aa9b1129e2e5b301a474e54fab74',
'info_dict': {
'id': '16537',
'ext': 'mp4',
'title': 'Singham Returns',
'description': 'md5:3d2ba5815f14911d5cc6a501ae0cf65d',
}
}, {
# 2 formats # 2 formats
'url': 'http://www.bigflix.com/Tamil-movies/Drama-movies/Madarasapatinam/16070', 'url': 'http://www.bigflix.com/Tamil-movies/Drama-movies/Madarasapatinam/16070',
'info_dict': { 'info_dict': {
'id': '16070', 'id': '16070',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Madarasapatinam', 'title': 'Madarasapatinam',
'description': 'md5:9f0470b26a4ba8e824c823b5d95c2f6b', 'description': 'md5:63b9b8ed79189c6f0418c26d9a3452ca',
'formats': 'mincount:2', 'formats': 'mincount:2',
}, },
'params': { 'params': {

View File

@ -1,101 +1,188 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import hashlib import calendar
import datetime
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_parse_qs from ..compat import (
compat_etree_fromstring,
compat_str,
compat_parse_qs,
compat_xml_parse_error,
)
from ..utils import ( from ..utils import (
ExtractorError,
int_or_none, int_or_none,
float_or_none, float_or_none,
unified_timestamp, xpath_text,
urlencode_postdata,
) )
class BiliBiliIE(InfoExtractor): class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/v/)(?P<id>\d+)' _VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/', 'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e', 'md5': '5f7d29e1a2872f3df0cf76b1f87d3788',
'info_dict': { 'info_dict': {
'id': '1074402', 'id': '1554319',
'ext': 'mp4', 'ext': 'flv',
'title': '【金坷垃】金泡沫', 'title': '【金坷垃】金泡沫',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923', 'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'duration': 308.315, 'duration': 308.067,
'timestamp': 1398012660, 'timestamp': 1398012660,
'upload_date': '20140420', 'upload_date': '20140420',
'thumbnail': 're:^https?://.+\.jpg', 'thumbnail': 're:^https?://.+\.jpg',
'uploader': '菊子桑', 'uploader': '菊子桑',
'uploader_id': '156160', 'uploader_id': '156160',
}, },
} }, {
'url': 'http://www.bilibili.com/video/av1041170/',
'info_dict': {
'id': '1041170',
'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
},
'playlist_count': 9,
}, {
'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': {
'id': '4808130',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
},
'playlist': [{
'md5': '55cdadedf3254caaa0d5d27cf20a8f9c',
'info_dict': {
'id': '4808130_part1',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '926f9f67d0c482091872fbd8eca7ea3d',
'info_dict': {
'id': '4808130_part2',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '4b7b225b968402d7c32348c646f1fd83',
'info_dict': {
'id': '4808130_part3',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '7b795e214166501e9141139eea236e91',
'info_dict': {
'id': '4808130_part4',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}],
}]
_APP_KEY = '6f90a59ac58a4123' # BiliBili blocks keys from time to time. The current key is extracted from
_BILIBILI_KEY = '0bfd84cc3940035173f35e6777508326' # the Android client
# TODO: find the sign algorithm used in the flash player
_APP_KEY = '86385cdc024c0f6c'
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
if 'anime/v' not in url: params = compat_parse_qs(self._search_regex(
cid = compat_parse_qs(self._search_regex( [r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
[r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)', r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'], webpage, 'player parameters'))
webpage, 'player parameters'))['cid'][0] cid = params['cid'][0]
info_xml_str = self._download_webpage(
'http://interface.bilibili.com/v_cdn_play',
cid, query={'appkey': self._APP_KEY, 'cid': cid},
note='Downloading video info page')
err_msg = None
durls = None
info_xml = None
try:
info_xml = compat_etree_fromstring(info_xml_str.encode('utf-8'))
except compat_xml_parse_error:
info_json = self._parse_json(info_xml_str, video_id, fatal=False)
err_msg = (info_json or {}).get('error_text')
else: else:
js = self._download_json( err_msg = xpath_text(info_xml, './message')
'http://bangumi.bilibili.com/web_api/get_source', video_id,
data=urlencode_postdata({'episode_id': video_id}),
headers={'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
cid = js['result']['cid']
payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid) if info_xml is not None:
sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest() durls = info_xml.findall('./durl')
if not durls:
video_info = self._download_json( if err_msg:
'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign), raise ExtractorError('%s said: %s' % (self.IE_NAME, err_msg), expected=True)
video_id, note='Downloading video info page') else:
raise ExtractorError('No videos found!')
entries = [] entries = []
for idx, durl in enumerate(video_info['durl']): for durl in durls:
size = xpath_text(durl, ['./filesize', './size'])
formats = [{ formats = [{
'url': durl['url'], 'url': durl.find('./url').text,
'filesize': int_or_none(durl['size']), 'filesize': int_or_none(size),
}] }]
for backup_url in durl.get('backup_url', []): for backup_url in durl.findall('./backup_url/url'):
formats.append({ formats.append({
'url': backup_url, 'url': backup_url.text,
# backup URLs have lower priorities # backup URLs have lower priorities
'preference': -2 if 'hd.mp4' in backup_url else -3, 'preference': -2 if 'hd.mp4' in backup_url.text else -3,
}) })
self._sort_formats(formats) self._sort_formats(formats)
entries.append({ entries.append({
'id': '%s_part%s' % (video_id, idx), 'id': '%s_part%s' % (cid, xpath_text(durl, './order')),
'duration': float_or_none(durl.get('length'), 1000), 'duration': int_or_none(xpath_text(durl, './length'), 1000),
'formats': formats, 'formats': formats,
}) })
title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title') title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title')
description = self._html_search_meta('description', webpage) description = self._html_search_meta('description', webpage)
timestamp = unified_timestamp(self._html_search_regex( datetime_str = self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False)) r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False)
thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage) if datetime_str:
timestamp = calendar.timegm(datetime.datetime.strptime(datetime_str, '%Y-%m-%dT%H:%M').timetuple())
# TODO 'view_count' requires deobfuscating Javascript # TODO 'view_count' requires deobfuscating Javascript
info = { info = {
'id': video_id, 'id': compat_str(cid),
'title': title, 'title': title,
'description': description, 'description': description,
'timestamp': timestamp, 'timestamp': timestamp,
'thumbnail': thumbnail, 'thumbnail': self._html_search_meta('thumbnailUrl', webpage),
'duration': float_or_none(video_info.get('timelength'), scale=1000), 'duration': float_or_none(xpath_text(info_xml, './timelength'), scale=1000),
} }
uploader_mobj = re.search( uploader_mobj = re.search(

View File

@ -2,15 +2,11 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import remove_end
ExtractorError,
remove_end,
)
from .rudo import RudoIE
class BioBioChileTVIE(InfoExtractor): class BioBioChileTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:tv|www)\.biobiochile\.cl/(?:notas|noticias)/(?:[^/]+/)+(?P<id>[^/]+)\.shtml' _VALID_URL = r'https?://tv\.biobiochile\.cl/notas/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
_TESTS = [{ _TESTS = [{
'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml', 'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml',
@ -22,7 +18,6 @@ class BioBioChileTVIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Fernando Atria', 'uploader': 'Fernando Atria',
}, },
'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
}, { }, {
# different uploader layout # different uploader layout
'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml', 'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml',
@ -37,16 +32,6 @@ class BioBioChileTVIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
}, {
'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml',
'info_dict': {
'id': 'edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos',
'ext': 'mp4',
'uploader': '(none)',
'upload_date': '20160708',
'title': 'Edecanes del Congreso: Figuras decorativas que le cuestan muy caro a los chilenos',
},
}, { }, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml', 'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
'only_matching': True, 'only_matching': True,
@ -60,22 +45,42 @@ class BioBioChileTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
rudo_url = RudoIE._extract_url(webpage)
if not rudo_url:
raise ExtractorError('No videos found')
title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV') title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV')
file_url = self._search_regex(
r'loadFWPlayerVideo\([^,]+,\s*(["\'])(?P<url>.+?)\1',
webpage, 'file url', group='url')
base_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*fileURL', webpage,
'base url', default='http://unlimited2-cl.digitalproserver.com/bbtv/',
group='url')
formats = self._extract_m3u8_formats(
'%s%s/playlist.m3u8' % (base_url, file_url), video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
f = {
'url': '%s%s' % (base_url, file_url),
'format_id': 'http',
'protocol': 'http',
'preference': 1,
}
if formats:
f_copy = formats[-1].copy()
f_copy.update(f)
f = f_copy
formats.append(f)
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'<a[^>]+href=["\']https?://(?:busca|www)\.biobiochile\.cl/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>', r'<a[^>]+href=["\']https?://busca\.biobiochile\.cl/author[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
return { return {
'_type': 'url_transparent',
'url': rudo_url,
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': uploader, 'uploader': uploader,
'formats': formats,
} }

View File

@ -24,8 +24,7 @@ class BIQLEIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ребенок в шоке от автоматической мойки', 'title': 'Ребенок в шоке от автоматической мойки',
'uploader': 'Dmitry Kotov', 'uploader': 'Dmitry Kotov',
}, }
'skip': ' This video was marked as adult. Embedding adult videos on external sites is prohibited.',
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,4 +1,3 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -21,18 +20,6 @@ class BloombergIE(InfoExtractor):
'params': { 'params': {
'format': 'best[format_id^=hds]', 'format': 'best[format_id^=hds]',
}, },
}, {
# video ID in BPlayer(...)
'url': 'http://www.bloomberg.com/features/2016-hello-world-new-zealand/',
'info_dict': {
'id': '938c7e72-3f25-4ddb-8b85-a9be731baa74',
'ext': 'flv',
'title': 'Meet the Real-Life Tech Wizards of Middle Earth',
'description': 'Hello World, Episode 1: New Zealands freaky AI babies, robot exoskeletons, and a virtual you.',
},
'params': {
'format': 'best[format_id^=hds]',
},
}, { }, {
'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets', 'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
'only_matching': True, 'only_matching': True,
@ -46,11 +33,7 @@ class BloombergIE(InfoExtractor):
webpage = self._download_webpage(url, name) webpage = self._download_webpage(url, name)
video_id = self._search_regex( video_id = self._search_regex(
r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1', r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1',
webpage, 'id', group='url', default=None) webpage, 'id', group='url')
if not video_id:
bplayer_data = self._parse_json(self._search_regex(
r'BPlayer\(null,\s*({[^;]+})\);', webpage, 'id'), name)
video_id = bplayer_data['id']
title = re.sub(': Video$', '', self._og_search_title(webpage)) title = re.sub(': Video$', '', self._og_search_title(webpage))
embed_info = self._download_json( embed_info = self._download_json(

View File

@ -12,7 +12,7 @@ from ..utils import (
class BpbIE(InfoExtractor): class BpbIE(InfoExtractor):
IE_DESC = 'Bundeszentrale für politische Bildung' IE_DESC = 'Bundeszentrale für politische Bildung'
_VALID_URL = r'https?://(?:www\.)?bpb\.de/mediathek/(?P<id>[0-9]+)/' _VALID_URL = r'https?://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/'
_TEST = { _TEST = {
'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr', 'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',

View File

@ -29,8 +29,7 @@ class BRIE(InfoExtractor):
'duration': 180, 'duration': 180,
'uploader': 'Reinhard Weber', 'uploader': 'Reinhard Weber',
'upload_date': '20150422', 'upload_date': '20150422',
}, }
'skip': '404 not found',
}, },
{ {
'url': 'http://www.br.de/nachrichten/oberbayern/inhalt/muenchner-polizeipraesident-schreiber-gestorben-100.html', 'url': 'http://www.br.de/nachrichten/oberbayern/inhalt/muenchner-polizeipraesident-schreiber-gestorben-100.html',
@ -41,8 +40,7 @@ class BRIE(InfoExtractor):
'title': 'Manfred Schreiber ist tot', 'title': 'Manfred Schreiber ist tot',
'description': 'md5:b454d867f2a9fc524ebe88c3f5092d97', 'description': 'md5:b454d867f2a9fc524ebe88c3f5092d97',
'duration': 26, 'duration': 26,
}, }
'skip': '404 not found',
}, },
{ {
'url': 'https://www.br-klassik.de/audio/peeping-tom-premierenkritik-dance-festival-muenchen-100.html', 'url': 'https://www.br-klassik.de/audio/peeping-tom-premierenkritik-dance-festival-muenchen-100.html',
@ -53,8 +51,7 @@ class BRIE(InfoExtractor):
'title': 'Kurzweilig und sehr bewegend', 'title': 'Kurzweilig und sehr bewegend',
'description': 'md5:0351996e3283d64adeb38ede91fac54e', 'description': 'md5:0351996e3283d64adeb38ede91fac54e',
'duration': 296, 'duration': 296,
}, }
'skip': '404 not found',
}, },
{ {
'url': 'http://www.br.de/radio/bayern1/service/team/videos/team-video-erdelt100.html', 'url': 'http://www.br.de/radio/bayern1/service/team/videos/team-video-erdelt100.html',

View File

@ -1,74 +1,31 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .adobepass import AdobePassIE from .common import InfoExtractor
from ..utils import ( from ..utils import smuggle_url
smuggle_url,
update_url_query,
int_or_none,
)
class BravoTVIE(AdobePassIE): class BravoTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+videos/(?P<id>[^/?]+)'
_TESTS = [{ _TEST = {
'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale', 'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale',
'md5': '9086d0b7ef0ea2aabc4781d75f4e5863', 'md5': 'd60cdf68904e854fac669bd26cccf801',
'info_dict': { 'info_dict': {
'id': 'zHyk1_HU_mPy', 'id': 'LitrBdX64qLn',
'ext': 'mp4', 'ext': 'mp4',
'title': 'LCK Ep 12: Fishy Finale', 'title': 'Last Chance Kitchen Returns',
'description': 'S13/E12: Two eliminated chefs have just 12 minutes to cook up a delicious fish dish.', 'description': 'S13: Last Chance Kitchen Returns for Top Chef Season 13',
'timestamp': 1448926740,
'upload_date': '20151130',
'uploader': 'NBCU-BRAV', 'uploader': 'NBCU-BRAV',
'upload_date': '20160302',
'timestamp': 1456945320,
} }
}, { }
'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, video_id)
settings = self._parse_json(self._search_regex( account_pid = self._search_regex(r'"account_pid"\s*:\s*"([^"]+)"', webpage, 'account pid')
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);', webpage, 'drupal settings'), release_pid = self._search_regex(r'"release_pid"\s*:\s*"([^"]+)"', webpage, 'release pid')
display_id) return self.url_result(smuggle_url(
info = {} 'http://link.theplatform.com/s/%s/%s?mbr=true&switch=progressive' % (account_pid, release_pid),
query = { {'force_smil_url': True}), 'ThePlatform', release_pid)
'mbr': 'true',
}
account_pid, release_pid = [None] * 2
tve = settings.get('sharedTVE')
if tve:
query['manifest'] = 'm3u'
account_pid = 'HNK2IC'
release_pid = tve['release_pid']
if tve.get('entitlement') == 'auth':
adobe_pass = settings.get('adobePass', {})
resource = self._get_mvpd_resource(
adobe_pass.get('adobePassResourceId', 'bravo'),
tve['title'], release_pid, tve.get('rating'))
query['auth'] = self._extract_mvpd_auth(
url, release_pid, adobe_pass.get('adobePassRequestorId', 'bravo'), resource)
else:
shared_playlist = settings['shared_playlist']
account_pid = shared_playlist['account_pid']
metadata = shared_playlist['video_metadata'][shared_playlist['default_clip']]
release_pid = metadata['release_pid']
info.update({
'title': metadata['title'],
'description': metadata.get('description'),
'season_number': int_or_none(metadata.get('season_num')),
'episode_number': int_or_none(metadata.get('episode_num')),
})
query['switch'] = 'progressive'
info.update({
'_type': 'url_transparent',
'id': release_pid,
'url': smuggle_url(update_url_query(
'http://link.theplatform.com/s/%s/%s' % (account_pid, release_pid),
query), {'force_smil_url': True}),
'ie_key': 'ThePlatform',
})
return info

View File

@ -1,4 +1,4 @@
# coding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -26,8 +26,6 @@ from ..utils import (
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
update_url_query, update_url_query,
clean_html,
mimetype2ext,
) )
@ -92,7 +90,6 @@ class BrightcoveLegacyIE(InfoExtractor):
'description': 'md5:363109c02998fee92ec02211bd8000df', 'description': 'md5:363109c02998fee92ec02211bd8000df',
'uploader': 'National Ballet of Canada', 'uploader': 'National Ballet of Canada',
}, },
'skip': 'Video gone',
}, },
{ {
# test flv videos served by akamaihd.net # test flv videos served by akamaihd.net
@ -111,7 +108,7 @@ class BrightcoveLegacyIE(InfoExtractor):
}, },
}, },
{ {
# playlist with 'videoList' # playlist test
# from http://support.brightcove.com/en/video-cloud/docs/playlist-support-single-video-players # from http://support.brightcove.com/en/video-cloud/docs/playlist-support-single-video-players
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL', 'url': 'http://c.brightcove.com/services/viewer/htmlFederated?playerID=3550052898001&playerKey=AQ%7E%7E%2CAAABmA9XpXk%7E%2C-Kp7jNgisre1fG5OdqpAFUTcs0lP_ZoL',
'info_dict': { 'info_dict': {
@ -120,15 +117,6 @@ class BrightcoveLegacyIE(InfoExtractor):
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
}, },
{
# playlist with 'playlistTab' (https://github.com/rg3/youtube-dl/issues/9965)
'url': 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=AQ%7E%7E,AAABXlLMdok%7E,NJ4EoMlZ4rZdx9eU1rkMVd8EaYPBBUlg',
'info_dict': {
'id': '1522758701001',
'title': 'Lesson 08',
},
'playlist_mincount': 10,
},
] ]
FLV_VCODECS = { FLV_VCODECS = {
1: 'SORENSON', 1: 'SORENSON',
@ -310,19 +298,13 @@ class BrightcoveLegacyIE(InfoExtractor):
info_url, player_key, 'Downloading playlist information') info_url, player_key, 'Downloading playlist information')
json_data = json.loads(playlist_info) json_data = json.loads(playlist_info)
if 'videoList' in json_data: if 'videoList' not in json_data:
playlist_info = json_data['videoList']
playlist_dto = playlist_info['mediaCollectionDTO']
elif 'playlistTabs' in json_data:
playlist_info = json_data['playlistTabs']
playlist_dto = playlist_info['lineupListDTO']['playlistDTOs'][0]
else:
raise ExtractorError('Empty playlist') raise ExtractorError('Empty playlist')
playlist_info = json_data['videoList']
videos = [self._extract_video_info(video_info) for video_info in playlist_dto['videoDTOs']] videos = [self._extract_video_info(video_info) for video_info in playlist_info['mediaCollectionDTO']['videoDTOs']]
return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'], return self.playlist_result(videos, playlist_id='%s' % playlist_info['id'],
playlist_title=playlist_dto['displayName']) playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info): def _extract_video_info(self, video_info):
video_id = compat_str(video_info['id']) video_id = compat_str(video_info['id'])
@ -546,16 +528,14 @@ class BrightcoveNewIE(InfoExtractor):
formats = [] formats = []
for source in json_data.get('sources', []): for source in json_data.get('sources', []):
container = source.get('container') container = source.get('container')
ext = mimetype2ext(source.get('type')) source_type = source.get('type')
src = source.get('src') src = source.get('src')
if ext == 'ism': if source_type == 'application/x-mpegURL' or container == 'M2TS':
continue
elif ext == 'm3u8' or container == 'M2TS':
if not src: if not src:
continue continue
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)) src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
elif ext == 'mpd': elif source_type == 'application/dash+xml':
if not src: if not src:
continue continue
formats.extend(self._extract_mpd_formats(src, video_id, 'dash', fatal=False)) formats.extend(self._extract_mpd_formats(src, video_id, 'dash', fatal=False))
@ -571,7 +551,7 @@ class BrightcoveNewIE(InfoExtractor):
'tbr': tbr, 'tbr': tbr,
'filesize': int_or_none(source.get('size')), 'filesize': int_or_none(source.get('size')),
'container': container, 'container': container,
'ext': ext or container.lower(), 'ext': container.lower(),
} }
if width == 0 and height == 0: if width == 0 and height == 0:
f.update({ f.update({
@ -605,13 +585,6 @@ class BrightcoveNewIE(InfoExtractor):
'format_id': build_format_id('rtmp'), 'format_id': build_format_id('rtmp'),
}) })
formats.append(f) formats.append(f)
errors = json_data.get('errors')
if not formats and errors:
error = errors[0]
raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}
@ -621,21 +594,15 @@ class BrightcoveNewIE(InfoExtractor):
'url': text_track['src'], 'url': text_track['src'],
}) })
is_live = False
duration = float_or_none(json_data.get('duration'), 1000)
if duration and duration < 0:
is_live = True
return { return {
'id': video_id, 'id': video_id,
'title': self._live_title(title) if is_live else title, 'title': title,
'description': clean_html(json_data.get('description')), 'description': json_data.get('description'),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'), 'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': duration, 'duration': float_or_none(json_data.get('duration'), 1000),
'timestamp': parse_iso8601(json_data.get('published_at')), 'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': account_id, 'uploader_id': account_id,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'tags': json_data.get('tags', []), 'tags': json_data.get('tags', []),
'is_live': is_live,
} }

View File

@ -5,7 +5,6 @@ import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .facebook import FacebookIE
class BuzzFeedIE(InfoExtractor): class BuzzFeedIE(InfoExtractor):
@ -21,11 +20,11 @@ class BuzzFeedIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'aVCR29aE_OQ', 'id': 'aVCR29aE_OQ',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Angry Ram destroys a punching bag..',
'description': 'md5:c59533190ef23fd4458a5e8c8c872345',
'upload_date': '20141024', 'upload_date': '20141024',
'uploader_id': 'Buddhanz1', 'uploader_id': 'Buddhanz1',
'uploader': 'Angry Ram', 'description': 'He likes to stay in shape with his heavy bag, he wont stop until its on the ground\n\nFollow Angry Ram on Facebook for regular updates -\nhttps://www.facebook.com/pages/Angry-Ram/1436897249899558?ref=hl',
'uploader': 'Buddhanz',
'title': 'Angry Ram destroys a punching bag',
} }
}] }]
}, { }, {
@ -42,30 +41,13 @@ class BuzzFeedIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'mVmBL8B-In0', 'id': 'mVmBL8B-In0',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Munchkin the Teddy Bear gets her exercise',
'description': 'md5:28faab95cda6e361bcff06ec12fc21d8',
'upload_date': '20141124', 'upload_date': '20141124',
'uploader_id': 'CindysMunchkin', 'uploader_id': 'CindysMunchkin',
'description': 're:© 2014 Munchkin the',
'uploader': 're:^Munchkin the', 'uploader': 're:^Munchkin the',
'title': 're:Munchkin the Teddy Bear gets her exercise',
}, },
}] }]
}, {
'url': 'http://www.buzzfeed.com/craigsilverman/the-most-adorable-crash-landing-ever#.eq7pX0BAmK',
'info_dict': {
'id': 'the-most-adorable-crash-landing-ever',
'title': 'Watch This Baby Goose Make The Most Adorable Crash Landing',
'description': 'This gosling knows how to stick a landing.',
},
'playlist': [{
'md5': '763ca415512f91ca62e4621086900a23',
'info_dict': {
'id': '971793786185728',
'ext': 'mp4',
'title': 'We set up crash pads so that the goslings on our roof would have a safe landi...',
'uploader': 'Calgary Outdoor Centre-University of Calgary',
},
}],
'add_ie': ['Facebook'],
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -84,10 +66,6 @@ class BuzzFeedIE(InfoExtractor):
continue continue
entries.append(self.url_result(video['url'])) entries.append(self.url_result(video['url']))
facebook_url = FacebookIE._extract_url(webpage)
if facebook_url:
entries.append(self.url_result(facebook_url))
return { return {
'_type': 'playlist', '_type': 'playlist',
'id': playlist_id, 'id': playlist_id,

View File

@ -1,5 +1,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -7,15 +8,15 @@ from ..utils import ExtractorError
class BYUtvIE(InfoExtractor): class BYUtvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/(?!event/)(?P<id>[0-9a-f-]+)(?:/(?P<display_id>[^/?#&]+))?' _VALID_URL = r'^https?://(?:www\.)?byutv.org/watch/[0-9a-f-]+/(?P<video_id>[^/?#]+)'
_TESTS = [{ _TEST = {
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5', 'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5',
'md5': '05850eb8c749e2ee05ad5a1c34668493',
'info_dict': { 'info_dict': {
'id': '6587b9a3-89d2-42a6-a7f7-fd2f81840a7d', 'id': 'studio-c-season-5-episode-5',
'display_id': 'studio-c-season-5-episode-5',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Season 5 Episode 5',
'description': 'md5:e07269172baff037f8e8bf9956bc9747', 'description': 'md5:e07269172baff037f8e8bf9956bc9747',
'title': 'Season 5 Episode 5',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 1486.486, 'duration': 1486.486,
}, },
@ -23,71 +24,28 @@ class BYUtvIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['Ooyala'], 'add_ie': ['Ooyala'],
}, {
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
ep = self._parse_json(
episode_code, display_id, transform_source=lambda s:
re.sub(r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', s))
if ep['providerType'] != 'Ooyala':
raise ExtractorError('Unsupported provider %s' % ep['provider'])
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ep['providerId'],
'id': video_id,
'display_id': display_id,
'title': ep['title'],
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
}
class BYUtvEventIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/event/(?P<id>[0-9a-f-]+)'
_TEST = {
'url': 'http://www.byutv.org/watch/event/29941b9b-8bf6-48d2-aebf-7a87add9e34b',
'info_dict': {
'id': '29941b9b-8bf6-48d2-aebf-7a87add9e34b',
'ext': 'mp4',
'title': 'Toledo vs. BYU (9/30/16)',
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
episode_json = re.sub(
r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', episode_code)
ep = json.loads(episode_json)
ooyala_id = self._search_regex( if ep['providerType'] == 'Ooyala':
r'providerId\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1', return {
webpage, 'ooyala id', group='id') '_type': 'url_transparent',
'ie_key': 'Ooyala',
title = self._search_regex( 'url': 'ooyala:%s' % ep['providerId'],
r'class=["\']description["\'][^>]*>\s*<h1>([^<]+)</h1>', webpage, 'id': video_id,
'title').strip() 'title': ep['title'],
'description': ep.get('description'),
return { 'thumbnail': ep.get('imageThumbnail'),
'_type': 'url_transparent', }
'ie_key': 'Ooyala', else:
'url': 'ooyala:%s' % ooyala_id, raise ExtractorError('Unsupported provider %s' % ep['provider'])
'id': video_id,
'title': title,
}

View File

@ -1,6 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import datetime
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -9,10 +10,8 @@ from ..compat import (
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
clean_html, parse_iso8601,
parse_duration,
str_to_int, str_to_int,
unified_strdate,
) )
@ -27,14 +26,14 @@ class CamdemyIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ch1-1 Introduction, Signals (02-23-2012)', 'title': 'Ch1-1 Introduction, Signals (02-23-2012)',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'description': '',
'creator': 'ss11spring', 'creator': 'ss11spring',
'duration': 1591,
'upload_date': '20130114', 'upload_date': '20130114',
'timestamp': 1358154556,
'view_count': int, 'view_count': int,
} }
}, { }, {
# With non-empty description # With non-empty description
# webpage returns "No permission or not login"
'url': 'http://www.camdemy.com/media/13885', 'url': 'http://www.camdemy.com/media/13885',
'md5': '4576a3bb2581f86c61044822adbd1249', 'md5': '4576a3bb2581f86c61044822adbd1249',
'info_dict': { 'info_dict': {
@ -42,77 +41,70 @@ class CamdemyIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'EverCam + Camdemy QuickStart', 'title': 'EverCam + Camdemy QuickStart',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'description': 'md5:2a9f989c2b153a2342acee579c6e7db6', 'description': 'md5:050b62f71ed62928f8a35f1a41e186c9',
'creator': 'evercam', 'creator': 'evercam',
'duration': 318, 'upload_date': '20140620',
'timestamp': 1403271569,
} }
}, { }, {
# External source (YouTube) # External source
'url': 'http://www.camdemy.com/media/14842', 'url': 'http://www.camdemy.com/media/14842',
'md5': '50e1c3c3aa233d3d7b7daa2fa10b1cf7',
'info_dict': { 'info_dict': {
'id': '2vsYQzNIsJo', 'id': '2vsYQzNIsJo',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Excel 2013 Tutorial - How to add Password Protection',
'description': 'Excel 2013 Tutorial for Beginners - How to add Password Protection',
'upload_date': '20130211', 'upload_date': '20130211',
'uploader': 'Hun Kim', 'uploader': 'Hun Kim',
'description': 'Excel 2013 Tutorial for Beginners - How to add Password Protection',
'uploader_id': 'hunkimtutorials', 'uploader_id': 'hunkimtutorials',
}, 'title': 'Excel 2013 Tutorial - How to add Password Protection',
'params': { }
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
page = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, video_id)
src_from = self._html_search_regex( src_from = self._html_search_regex(
r"class=['\"]srcFrom['\"][^>]*>Sources?(?:\s+from)?\s*:\s*<a[^>]+(?:href|title)=(['\"])(?P<url>(?:(?!\1).)+)\1", r"<div class='srcFrom'>Source: <a title='([^']+)'", page,
webpage, 'external source', default=None, group='url') 'external source', default=None)
if src_from: if src_from:
return self.url_result(src_from) return self.url_result(src_from)
oembed_obj = self._download_json( oembed_obj = self._download_json(
'http://www.camdemy.com/oembed/?format=json&url=' + url, video_id) 'http://www.camdemy.com/oembed/?format=json&url=' + url, video_id)
title = oembed_obj['title']
thumb_url = oembed_obj['thumbnail_url'] thumb_url = oembed_obj['thumbnail_url']
video_folder = compat_urlparse.urljoin(thumb_url, 'video/') video_folder = compat_urlparse.urljoin(thumb_url, 'video/')
file_list_doc = self._download_xml( file_list_doc = self._download_xml(
compat_urlparse.urljoin(video_folder, 'fileList.xml'), compat_urlparse.urljoin(video_folder, 'fileList.xml'),
video_id, 'Downloading filelist XML') video_id, 'Filelist XML')
file_name = file_list_doc.find('./video/item/fileName').text file_name = file_list_doc.find('./video/item/fileName').text
video_url = compat_urlparse.urljoin(video_folder, file_name) video_url = compat_urlparse.urljoin(video_folder, file_name)
# Some URLs return "No permission or not login" in a webpage despite being timestamp = parse_iso8601(self._html_search_regex(
# freely available via oembed JSON URL (e.g. http://www.camdemy.com/media/13885) r"<div class='title'>Posted\s*:</div>\s*<div class='value'>([^<>]+)<",
upload_date = unified_strdate(self._search_regex( page, 'creation time', fatal=False),
r'>published on ([^<]+)<', webpage, delimiter=' ', timezone=datetime.timedelta(hours=8))
'upload date', default=None)) view_count = str_to_int(self._html_search_regex(
view_count = str_to_int(self._search_regex( r"<div class='title'>Views\s*:</div>\s*<div class='value'>([^<>]+)<",
r'role=["\']viewCnt["\'][^>]*>([\d,.]+) views', page, 'view count', fatal=False))
webpage, 'view count', default=None))
description = self._html_search_meta(
'description', webpage, default=None) or clean_html(
oembed_obj.get('description'))
return { return {
'id': video_id, 'id': video_id,
'url': video_url, 'url': video_url,
'title': title, 'title': oembed_obj['title'],
'thumbnail': thumb_url, 'thumbnail': thumb_url,
'description': description, 'description': self._html_search_meta('description', page),
'creator': oembed_obj.get('author_name'), 'creator': oembed_obj['author_name'],
'duration': parse_duration(oembed_obj.get('duration')), 'duration': oembed_obj['duration'],
'upload_date': upload_date, 'timestamp': timestamp,
'view_count': view_count, 'view_count': view_count,
} }
class CamdemyFolderIE(InfoExtractor): class CamdemyFolderIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?camdemy\.com/folder/(?P<id>\d+)' _VALID_URL = r'https?://www.camdemy.com/folder/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
# links with trailing slash # links with trailing slash
'url': 'http://www.camdemy.com/folder/450', 'url': 'http://www.camdemy.com/folder/450',

View File

@ -1,112 +1,86 @@
# coding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse
from ..utils import ( from ..utils import (
dict_get,
ExtractorError, ExtractorError,
HEADRequest, HEADRequest,
int_or_none,
qualities,
remove_end,
unified_strdate, unified_strdate,
url_basename,
qualities,
int_or_none,
) )
class CanalplusIE(InfoExtractor): class CanalplusIE(InfoExtractor):
IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv' IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
_VALID_URL = r'''(?x) _VALID_URL = r'https?://(?:www\.(?P<site>canalplus\.fr|piwiplus\.fr|d8\.tv|itele\.fr)/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))'
https?://
(?:
(?:
(?:(?:www|m)\.)?canalplus\.fr|
(?:www\.)?piwiplus\.fr|
(?:www\.)?d8\.tv|
(?:www\.)?c8\.fr|
(?:www\.)?d17\.tv|
(?:www\.)?itele\.fr
)/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
player\.canalplus\.fr/#/(?P<id>\d+)
)
'''
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json' _VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json'
_SITE_ID_MAP = { _SITE_ID_MAP = {
'canalplus': 'cplus', 'canalplus.fr': 'cplus',
'piwiplus': 'teletoon', 'piwiplus.fr': 'teletoon',
'd8': 'd8', 'd8.tv': 'd8',
'c8': 'd8', 'itele.fr': 'itele',
'd17': 'd17',
'itele': 'itele',
} }
_TESTS = [{ _TESTS = [{
'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814', 'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1263092',
'md5': '12164a6f14ff6df8bd628e8ba9b10b78',
'info_dict': { 'info_dict': {
'id': '1405510', 'id': '1263092',
'display_id': 'pid1830-c-zapping',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Zapping - 02/07/2016', 'title': 'Le Zapping - 13/05/15',
'description': 'Le meilleur de toutes les chaînes, tous les jours', 'description': 'md5:09738c0d06be4b5d06a0940edb0da73f',
'upload_date': '20160702', 'upload_date': '20150513',
}, },
}, { }, {
'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190', 'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190',
'info_dict': { 'info_dict': {
'id': '1108190', 'id': '1108190',
'display_id': 'pid1405-le-labyrinthe-boing-super-ranger', 'ext': 'flv',
'ext': 'mp4', 'title': 'Le labyrinthe - Boing super ranger',
'title': 'BOING SUPER RANGER - Ep : Le labyrinthe',
'description': 'md5:4cea7a37153be42c1ba2c1d3064376ff', 'description': 'md5:4cea7a37153be42c1ba2c1d3064376ff',
'upload_date': '20140724', 'upload_date': '20140724',
}, },
'skip': 'Only works from France', 'skip': 'Only works from France',
}, { }, {
'url': 'http://www.c8.fr/c8-divertissement/ms-touche-pas-a-mon-poste/pid6318-videos-integrales.html', 'url': 'http://www.d8.tv/d8-docs-mags/pid6589-d8-campagne-intime.html',
'md5': '4b47b12b4ee43002626b97fad8fb1de5',
'info_dict': { 'info_dict': {
'id': '1420213', 'id': '966289',
'display_id': 'pid6318-videos-integrales', 'ext': 'flv',
'ext': 'mp4', 'title': 'Campagne intime - Documentaire exceptionnel',
'title': 'TPMP ! Même le matin - Les 35H de Baba - 14/10/2016', 'description': 'md5:d2643b799fb190846ae09c61e59a859f',
'description': 'md5:f96736c1b0ffaa96fd5b9e60ad871799', 'upload_date': '20131108',
'upload_date': '20161014',
}, },
'skip': 'Only works from France', 'skip': 'videos get deleted after a while',
}, { }, {
'url': 'http://www.itele.fr/chroniques/invite-michael-darmon/rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510', 'url': 'http://www.itele.fr/france/video/aubervilliers-un-lycee-en-colere-111559',
'md5': '38b8f7934def74f0d6f3ba6c036a5f82',
'info_dict': { 'info_dict': {
'id': '1420176', 'id': '1213714',
'display_id': 'rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'ext': 'mp4', 'ext': 'mp4',
'title': 'L\'invité de Michaël Darmon du 14/10/2016 - ', 'title': 'Aubervilliers : un lycée en colère - Le 11/02/2015 à 06h45',
'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.', 'description': 'md5:8216206ec53426ea6321321f3b3c16db',
'upload_date': '20161014', 'upload_date': '20150211',
}, },
}, {
'url': 'http://m.canalplus.fr/?vid=1398231',
'only_matching': True,
}, {
'url': 'http://www.d17.tv/emissions/pid8303-lolywood.html?vid=1397061',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.groupdict().get('id')
site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]] site_id = self._SITE_ID_MAP[mobj.group('site') or 'canal']
# Beware, some subclasses do not define an id group # Beware, some subclasses do not define an id group
display_id = remove_end(dict_get(mobj.groupdict(), ('display_id', 'id', 'vid')), '.html') display_id = url_basename(mobj.group('path'))
webpage = self._download_webpage(url, display_id) if video_id is None:
video_id = self._search_regex( webpage = self._download_webpage(url, display_id)
[r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)', video_id = self._search_regex(
r'id=["\']canal_video_player(?P<id>\d+)'], [r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)', r'id=["\']canal_video_player(?P<id>\d+)'],
webpage, 'video id', group='id') webpage, 'video id', group='id')
info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id) info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
video_data = self._download_json(info_url, video_id, 'Downloading video JSON') video_data = self._download_json(info_url, video_id, 'Downloading video JSON')

View File

@ -1,13 +1,11 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import float_or_none from ..utils import float_or_none
class CanvasIE(InfoExtractor): class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<site_id>canvas|een)\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?canvas\.be/video/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week', 'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
'md5': 'ea838375a547ac787d4064d8c7860a6c', 'md5': 'ea838375a547ac787d4064d8c7860a6c',
@ -40,42 +38,22 @@ class CanvasIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} }
}, {
'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
'info_dict': {
'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
'display_id': 'herbekijk-sorry-voor-alles',
'ext': 'mp4',
'title': 'Herbekijk Sorry voor alles',
'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 3788.06,
},
'params': {
'skip_download': True,
}
}, {
'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id = self._match_id(url)
site_id, display_id = mobj.group('site_id'), mobj.group('id')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
title = (self._search_regex( title = self._search_regex(
r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>', r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>',
webpage, 'title', default=None) or self._og_search_title( webpage, 'title', default=None) or self._og_search_title(webpage)
webpage)).strip()
video_id = self._html_search_regex( video_id = self._html_search_regex(
r'data-video=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id', group='id') r'data-video=(["\'])(?P<id>.+?)\1', webpage, 'video id', group='id')
data = self._download_json( data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s' 'https://mediazone.vrt.be/api/v1/canvas/assets/%s' % video_id, display_id)
% (site_id, video_id), display_id)
formats = [] formats = []
for target in data['targetUrls']: for target in data['targetUrls']:

View File

@ -1,102 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
float_or_none,
int_or_none,
try_get,
)
from .videomore import VideomoreIE
class CarambaTVIE(InfoExtractor):
_VALID_URL = r'(?:carambatv:|https?://video1\.carambatv\.ru/v/)(?P<id>\d+)'
_TESTS = [{
'url': 'http://video1.carambatv.ru/v/191910501',
'md5': '2f4a81b7cfd5ab866ee2d7270cb34a2a',
'info_dict': {
'id': '191910501',
'ext': 'mp4',
'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 2678.31,
},
}, {
'url': 'carambatv:191910501',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
'http://video1.carambatv.ru/v/%s/videoinfo.js' % video_id,
video_id)
title = video['title']
base_url = video.get('video') or 'http://video1.carambatv.ru/v/%s/' % video_id
formats = [{
'url': base_url + f['fn'],
'height': int_or_none(f.get('height')),
'format_id': '%sp' % f['height'] if f.get('height') else None,
} for f in video['qualities'] if f.get('fn')]
self._sort_formats(formats)
thumbnail = video.get('splash')
duration = float_or_none(try_get(
video, lambda x: x['annotations'][0]['end_time'], compat_str))
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}
class CarambaTVPageIE(InfoExtractor):
_VALID_URL = r'https?://carambatv\.ru/(?:[^/]+/)+(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://carambatv.ru/movie/bad-comedian/razborka-v-manile/',
'md5': 'a49fb0ec2ad66503eeb46aac237d3c86',
'info_dict': {
'id': '475222',
'ext': 'flv',
'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
'thumbnail': 're:^https?://.*\.jpg',
# duration reported by videomore is incorrect
'duration': int,
},
'add_ie': [VideomoreIE.ie_key()],
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
videomore_url = VideomoreIE._extract_url(webpage)
if videomore_url:
title = self._og_search_title(webpage)
return {
'_type': 'url_transparent',
'url': videomore_url,
'ie_key': VideomoreIE.ie_key(),
'title': title,
}
video_url = self._og_search_property('video:iframe', webpage, default=None)
if not video_url:
video_id = self._search_regex(
r'(?:video_id|crmb_vuid)\s*[:=]\s*["\']?(\d+)',
webpage, 'video id')
video_url = 'carambatv:%s' % video_id
return self.url_result(video_url, CarambaTVIE.ie_key())

View File

@ -1,42 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .turner import TurnerBaseIE
class CartoonNetworkIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?cartoonnetwork\.com/video/(?:[^/]+/)+(?P<id>[^/?#]+)-(?:clip|episode)\.html'
_TEST = {
'url': 'http://www.cartoonnetwork.com/video/teen-titans-go/starfire-the-cat-lady-clip.html',
'info_dict': {
'id': '8a250ab04ed07e6c014ef3f1e2f9016c',
'ext': 'mp4',
'title': 'Starfire the Cat Lady',
'description': 'Robin decides to become a cat so that Starfire will finally love him.',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
id_type, video_id = re.search(r"_cnglobal\.cvp(Video|Title)Id\s*=\s*'([^']+)';", webpage).groups()
query = ('id' if id_type == 'Video' else 'titleId') + '=' + video_id
return self._extract_cvp_info(
'http://www.cartoonnetwork.com/video-seo-svc/episodeservices/getCvpPlaylist?networkName=CN2&' + query, video_id, {
'secure': {
'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big',
'tokenizer_src': 'http://www.cartoonnetwork.com/cntv/mvpd/processors/services/token_ipadAdobe.do',
},
}, {
'url': url,
'site_name': 'CartoonNetwork',
'auth_required': self._search_regex(
r'_cnglobal\.cvpFullOrPreviewAuth\s*=\s*(true|false);',
webpage, 'auth required', default='false') == 'true',
})

View File

@ -4,24 +4,13 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
js_to_json, js_to_json,
smuggle_url, smuggle_url,
try_get,
xpath_text,
xpath_element,
xpath_with_ns,
find_xpath_attr,
parse_iso8601,
parse_age_limit,
int_or_none,
ExtractorError,
) )
class CBCIE(InfoExtractor): class CBCIE(InfoExtractor):
IE_NAME = 'cbc.ca'
_VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?!player/)(?:[^/]+/)+(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?!player/)(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
# with mediaId # with mediaId
@ -36,22 +25,8 @@ class CBCIE(InfoExtractor):
'upload_date': '20160203', 'upload_date': '20160203',
'uploader': 'CBCC-NEW', 'uploader': 'CBCC-NEW',
}, },
'skip': 'Geo-restricted to Canada',
}, { }, {
# with clipId, feed available via tpfeed.cbc.ca and feed.theplatform.com # with clipId
'url': 'http://www.cbc.ca/22minutes/videos/22-minutes-update/22-minutes-update-episode-4',
'md5': '162adfa070274b144f4fdc3c3b8207db',
'info_dict': {
'id': '2414435309',
'ext': 'mp4',
'title': '22 Minutes Update: What Not To Wear Quebec',
'description': "This week's latest Canadian top political story is What Not To Wear Quebec.",
'upload_date': '20131025',
'uploader': 'CBCC-NEW',
'timestamp': 1382717907,
},
}, {
# with clipId, feed only available via tpfeed.cbc.ca
'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live', 'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live',
'md5': '0274a90b51a9b4971fe005c63f592f12', 'md5': '0274a90b51a9b4971fe005c63f592f12',
'info_dict': { 'info_dict': {
@ -89,7 +64,6 @@ class CBCIE(InfoExtractor):
'uploader': 'CBCC-NEW', 'uploader': 'CBCC-NEW',
}, },
}], }],
'skip': 'Geo-restricted to Canada',
}] }]
@classmethod @classmethod
@ -107,15 +81,9 @@ class CBCIE(InfoExtractor):
media_id = player_info.get('mediaId') media_id = player_info.get('mediaId')
if not media_id: if not media_id:
clip_id = player_info['clipId'] clip_id = player_info['clipId']
feed = self._download_json( media_id = self._download_json(
'http://tpfeed.cbc.ca/f/ExhSPC/vms_5akSXx4Ng_Zn?byCustomValue={:mpsReleases}{%s}' % clip_id, 'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
clip_id, fatal=False) clip_id)['entries'][0]['id'].split('/')[-1]
if feed:
media_id = try_get(feed, lambda x: x['entries'][0]['guid'], compat_str)
if not media_id:
media_id = self._download_json(
'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
clip_id)['entries'][0]['id'].split('/')[-1]
return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
else: else:
entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)] entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]
@ -123,7 +91,6 @@ class CBCIE(InfoExtractor):
class CBCPlayerIE(InfoExtractor): class CBCPlayerIE(InfoExtractor):
IE_NAME = 'cbc.ca:player'
_VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)' _VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbc.ca/player/play/2683190193', 'url': 'http://www.cbc.ca/player/play/2683190193',
@ -137,7 +104,6 @@ class CBCPlayerIE(InfoExtractor):
'upload_date': '20160210', 'upload_date': '20160210',
'uploader': 'CBCC-NEW', 'uploader': 'CBCC-NEW',
}, },
'skip': 'Geo-restricted to Canada',
}, { }, {
# Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/ # Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/
'url': 'http://www.cbc.ca/player/play/2657631896', 'url': 'http://www.cbc.ca/player/play/2657631896',
@ -177,165 +143,3 @@ class CBCPlayerIE(InfoExtractor):
}), }),
'id': video_id, 'id': video_id,
} }
class CBCWatchBaseIE(InfoExtractor):
_device_id = None
_device_token = None
_API_BASE_URL = 'https://api-cbc.cloud.clearleap.com/cloffice/client/'
_NS_MAP = {
'media': 'http://search.yahoo.com/mrss/',
'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
}
def _call_api(self, path, video_id):
url = path if path.startswith('http') else self._API_BASE_URL + path
result = self._download_xml(url, video_id, headers={
'X-Clearleap-DeviceId': self._device_id,
'X-Clearleap-DeviceToken': self._device_token,
})
error_message = xpath_text(result, 'userMessage') or xpath_text(result, 'systemMessage')
if error_message:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message))
return result
def _real_initialize(self):
if not self._device_id or not self._device_token:
device = self._downloader.cache.load('cbcwatch', 'device') or {}
self._device_id, self._device_token = device.get('id'), device.get('token')
if not self._device_id or not self._device_token:
result = self._download_xml(
self._API_BASE_URL + 'device/register',
None, data=b'<device><type>web</type></device>')
self._device_id = xpath_text(result, 'deviceId', fatal=True)
self._device_token = xpath_text(result, 'deviceToken', fatal=True)
self._downloader.cache.store(
'cbcwatch', 'device', {
'id': self._device_id,
'token': self._device_token,
})
def _parse_rss_feed(self, rss):
channel = xpath_element(rss, 'channel', fatal=True)
def _add_ns(path):
return xpath_with_ns(path, self._NS_MAP)
entries = []
for item in channel.findall('item'):
guid = xpath_text(item, 'guid', fatal=True)
title = xpath_text(item, 'title', fatal=True)
media_group = xpath_element(item, _add_ns('media:group'), fatal=True)
content = xpath_element(media_group, _add_ns('media:content'), fatal=True)
content_url = content.attrib['url']
thumbnails = []
for thumbnail in media_group.findall(_add_ns('media:thumbnail')):
thumbnail_url = thumbnail.get('url')
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail.get('profile'),
'url': thumbnail_url,
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
})
timestamp = None
release_date = find_xpath_attr(
item, _add_ns('media:credit'), 'role', 'releaseDate')
if release_date is not None:
timestamp = parse_iso8601(release_date.text)
entries.append({
'_type': 'url_transparent',
'url': content_url,
'id': guid,
'title': title,
'description': xpath_text(item, 'description'),
'timestamp': timestamp,
'duration': int_or_none(content.get('duration')),
'age_limit': parse_age_limit(xpath_text(item, _add_ns('media:rating'))),
'episode': xpath_text(item, _add_ns('clearleap:episode')),
'episode_number': int_or_none(xpath_text(item, _add_ns('clearleap:episodeInSeason'))),
'series': xpath_text(item, _add_ns('clearleap:series')),
'season_number': int_or_none(xpath_text(item, _add_ns('clearleap:season'))),
'thumbnails': thumbnails,
'ie_key': 'CBCWatchVideo',
})
return self.playlist_result(
entries, xpath_text(channel, 'guid'),
xpath_text(channel, 'title'),
xpath_text(channel, 'description'))
class CBCWatchVideoIE(CBCWatchBaseIE):
IE_NAME = 'cbc.ca:watch:video'
_VALID_URL = r'https?://api-cbc\.cloud\.clearleap\.com/cloffice/client/web/play/?\?.*?\bcontentId=(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
def _real_extract(self, url):
video_id = self._match_id(url)
result = self._call_api(url, video_id)
m3u8_url = xpath_text(result, 'url', fatal=True)
formats = self._extract_m3u8_formats(re.sub(r'/([^/]+)/[^/?]+\.m3u8', r'/\1/\1.m3u8', m3u8_url), video_id, 'mp4', fatal=False)
if len(formats) < 2:
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
# Despite metadata in m3u8 all video+audio formats are
# actually video-only (no audio)
for f in formats:
if f.get('acodec') != 'none' and f.get('vcodec') != 'none':
f['acodec'] = 'none'
self._sort_formats(formats)
info = {
'id': video_id,
'title': video_id,
'formats': formats,
}
rss = xpath_element(result, 'rss')
if rss:
info.update(self._parse_rss_feed(rss)['entries'][0])
del info['url']
del info['_type']
del info['ie_key']
return info
class CBCWatchIE(CBCWatchBaseIE):
IE_NAME = 'cbc.ca:watch'
_VALID_URL = r'https?://watch\.cbc\.ca/(?:[^/]+/)+(?P<id>[0-9a-f-]+)'
_TESTS = [{
'url': 'http://watch.cbc.ca/doc-zone/season-6/customer-disservice/38e815a-009e3ab12e4',
'info_dict': {
'id': '38e815a-009e3ab12e4',
'ext': 'mp4',
'title': 'Customer (Dis)Service',
'description': 'md5:8bdd6913a0fe03d4b2a17ebe169c7c87',
'upload_date': '20160219',
'timestamp': 1455840000,
},
'params': {
# m3u8 download
'skip_download': True,
'format': 'bestvideo',
},
'skip': 'Geo-restricted to Canada',
}, {
'url': 'http://watch.cbc.ca/arthur/all/1ed4b385-cd84-49cf-95f0-80f004680057',
'info_dict': {
'id': '1ed4b385-cd84-49cf-95f0-80f004680057',
'title': 'Arthur',
'description': 'Arthur, the sweetest 8-year-old aardvark, and his pals solve all kinds of problems with humour, kindness and teamwork.',
},
'playlist_mincount': 30,
'skip': 'Geo-restricted to Canada',
}]
def _real_extract(self, url):
video_id = self._match_id(url)
rss = self._call_api('web/browse/' + video_id, video_id)
return self._parse_rss_feed(rss)

View File

@ -1,16 +1,17 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .theplatform import ThePlatformFeedIE import re
from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
xpath_text,
xpath_element,
int_or_none, int_or_none,
find_xpath_attr, find_xpath_attr,
xpath_element,
xpath_text,
update_url_query,
) )
class CBSBaseIE(ThePlatformFeedIE): class CBSBaseIE(ThePlatformIE):
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'): def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL') closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return { return {
@ -22,12 +23,13 @@ class CBSBaseIE(ThePlatformFeedIE):
class CBSIE(CBSBaseIE): class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)' _VALID_URL = r'(?:cbs:(?P<content_id>\w+)|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<display_id>[^/]+))'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/', 'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': { 'info_dict': {
'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_', 'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
'display_id': 'connect-chat-feat-garth-brooks',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Connect Chat feat. Garth Brooks', 'title': 'Connect Chat feat. Garth Brooks',
'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!', 'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
@ -37,7 +39,22 @@ class CBSIE(CBSBaseIE):
'uploader': 'CBSI-NEW', 'uploader': 'CBSI-NEW',
}, },
'params': { 'params': {
# m3u8 download # rtmp download
'skip_download': True,
},
'_skip': 'Blocked outside the US',
}, {
'url': 'http://www.cbs.com/shows/liveonletterman/artist/221752/st-vincent/',
'info_dict': {
'id': 'WWF_5KqY3PK1',
'display_id': 'st-vincent',
'ext': 'flv',
'title': 'Live on Letterman - St. Vincent',
'description': 'Live On Letterman: St. Vincent in concert from New York\'s Ed Sullivan Theater on Tuesday, July 16, 2014.',
'duration': 3221,
},
'params': {
# rtmp download
'skip_download': True, 'skip_download': True,
}, },
'_skip': 'Blocked outside the US', '_skip': 'Blocked outside the US',
@ -48,42 +65,40 @@ class CBSIE(CBSBaseIE):
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/', 'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True, 'only_matching': True,
}] }]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _extract_video_info(self, content_id): def _real_extract(self, url):
content_id, display_id = re.match(self._VALID_URL, url).groups()
if not content_id:
webpage = self._download_webpage(url, display_id)
content_id = self._search_regex(
[r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"],
webpage, 'content id')
items_data = self._download_xml( items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php', 'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id}) content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item') video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True) title = xpath_text(video_data, 'videoTitle', 'title', True)
tp_path = 'dJ5BDC/media/guid/2198311517/%s' % content_id
tp_release_url = 'http://link.theplatform.com/s/' + tp_path
asset_types = []
subtitles = {} subtitles = {}
formats = [] formats = []
for item in items_data.findall('.//item'): for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType') pid = xpath_text(item, 'pid')
if not asset_type or asset_type in asset_types: if not pid:
continue continue
asset_types.append(asset_type) tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid
query = { if '.m3u8' in xpath_text(item, 'contentUrl', default=''):
'mbr': 'true', tp_release_url += '&manifest=m3u'
'assetTypes': asset_type,
}
if asset_type.startswith('HLS') or asset_type in ('OnceURL', 'StreamPack'):
query['formats'] = 'MPEG4,M3U'
elif asset_type in ('RTMP', 'WIFI', '3G'):
query['formats'] = 'MPEG4,FLV'
tp_formats, tp_subtitles = self._extract_theplatform_smil( tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(tp_release_url, query), content_id, tp_release_url, content_id, 'Downloading %s SMIL data' % pid)
'Downloading %s SMIL data' % asset_type)
formats.extend(tp_formats) formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles) subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats) self._sort_formats(formats)
info = self._extract_theplatform_metadata(tp_path, content_id) info = self.get_metadata('dJ5BDC/media/guid/2198311517/%s' % content_id, content_id)
info.update({ info.update({
'id': content_id, 'id': content_id,
'display_id': display_id,
'title': title, 'title': title,
'series': xpath_text(video_data, 'seriesTitle'), 'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')), 'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
@ -94,7 +109,3 @@ class CBSIE(CBSBaseIE):
'subtitles': subtitles, 'subtitles': subtitles,
}) })
return info return info
def _real_extract(self, url):
content_id = self._match_id(url)
return self._extract_video_info(content_id)

View File

@ -63,7 +63,7 @@ class CBSInteractiveIE(ThePlatformIE):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex( data_json = self._html_search_regex(
r"data-(?:cnet|zdnet)-video(?:-uvp(?:js)?)?-options='([^']+)'", r"data-(?:cnet|zdnet)-video(?:-uvp)?-options='([^']+)'",
webpage, 'data json') webpage, 'data json')
data = self._parse_json(data_json, display_id) data = self._parse_json(data_json, display_id)
vdata = data.get('video') or data['videos'][0] vdata = data.get('video') or data['videos'][0]
@ -80,6 +80,9 @@ class CBSInteractiveIE(ThePlatformIE):
media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId']) media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId'])
formats, subtitles = [], {} formats, subtitles = [], {}
if site == 'cnet':
formats, subtitles = self._extract_theplatform_smil(
self.TP_RELEASE_URL_TEMPLATE % media_guid_path, video_id)
for (fkey, vid) in vdata['files'].items(): for (fkey, vid) in vdata['files'].items():
if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']: if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
continue continue
@ -91,7 +94,7 @@ class CBSInteractiveIE(ThePlatformIE):
subtitles = self._merge_subtitles(subtitles, tp_subtitles) subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats) self._sort_formats(formats)
info = self._extract_theplatform_metadata('kYEXFC/%s' % media_guid_path, video_id) info = self.get_metadata('kYEXFC/%s' % media_guid_path, video_id)
info.update({ info.update({
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,

View File

@ -1,10 +1,12 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import calendar
import datetime
from .anvato import AnvatoIE from .anvato import AnvatoIE
from .sendtonews import SendtoNewsIE from .sendtonews import SendtoNewsIE
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import unified_timestamp
class CBSLocalIE(AnvatoIE): class CBSLocalIE(AnvatoIE):
@ -22,7 +24,6 @@ class CBSLocalIE(AnvatoIE):
'thumbnail': 're:^https?://.*', 'thumbnail': 're:^https?://.*',
'timestamp': 1463440500, 'timestamp': 1463440500,
'upload_date': '20160516', 'upload_date': '20160516',
'uploader': 'CBS',
'subtitles': { 'subtitles': {
'en': 'mincount:5', 'en': 'mincount:5',
}, },
@ -36,15 +37,19 @@ class CBSLocalIE(AnvatoIE):
'Syndication\\Curb.tv', 'Syndication\\Curb.tv',
'Content\\News' 'Content\\News'
], ],
'tags': ['CBS 2 News Evening'],
}, },
}, { }, {
# SendtoNews embed # SendtoNews embed
'url': 'http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/', 'url': 'http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/',
'info_dict': { 'info_dict': {
'id': 'GxfCe0Zo7D-175909-5588', 'id': 'GxfCe0Zo7D-175909-5588',
'ext': 'mp4',
'title': 'Recap: CLE 15, CIN 6',
'description': '5/16/16: Indians\' bats explode for 15 runs in a win',
'upload_date': '20160516',
'timestamp': 1463433840,
'duration': 49,
}, },
'playlist_count': 9,
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
@ -57,15 +62,19 @@ class CBSLocalIE(AnvatoIE):
sendtonews_url = SendtoNewsIE._extract_url(webpage) sendtonews_url = SendtoNewsIE._extract_url(webpage)
if sendtonews_url: if sendtonews_url:
return self.url_result( info_dict = {
compat_urlparse.urljoin(url, sendtonews_url), '_type': 'url_transparent',
ie=SendtoNewsIE.ie_key()) 'url': compat_urlparse.urljoin(url, sendtonews_url),
}
info_dict = self._extract_anvato_videos(webpage, display_id) else:
info_dict = self._extract_anvato_videos(webpage, display_id)
time_str = self._html_search_regex( time_str = self._html_search_regex(
r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False) r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False)
timestamp = unified_timestamp(time_str) timestamp = None
if time_str:
timestamp = calendar.timegm(datetime.datetime.strptime(
time_str, '%b %d, %Y %I:%M %p').timetuple())
info_dict.update({ info_dict.update({
'display_id': display_id, 'display_id': display_id,

View File

@ -1,15 +1,14 @@
# coding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .cbs import CBSIE from .cbs import CBSBaseIE
from ..utils import ( from ..utils import (
parse_duration, parse_duration,
) )
class CBSNewsIE(CBSIE): class CBSNewsIE(CBSBaseIE):
IE_NAME = 'cbsnews'
IE_DESC = 'CBS News' IE_DESC = 'CBS News'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)' _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
@ -27,18 +26,13 @@ class CBSNewsIE(CBSIE):
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Subscribers only',
}, },
{ {
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/', 'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'info_dict': { 'info_dict': {
'id': 'SNJBOYzXiWBOvaLsdzwH8fmtP1SCd91Y', 'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack', 'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'description': 'md5:4a6983e480542d8b333a947bfc64ddc7',
'upload_date': '20140404',
'timestamp': 1396650660,
'uploader': 'CBSI-NEW',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 205, 'duration': 205,
'subtitles': { 'subtitles': {
@ -64,44 +58,66 @@ class CBSNewsIE(CBSIE):
webpage, 'video JSON info'), video_id) webpage, 'video JSON info'), video_id)
item = video_info['item'] if 'item' in video_info else video_info item = video_info['item'] if 'item' in video_info else video_info
guid = item['mpxRefId'] title = item.get('articleTitle') or item.get('hed')
return self._extract_video_info(guid) duration = item.get('duration')
thumbnail = item.get('mediaImage') or item.get('thumbnail')
subtitles = {}
formats = []
for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
pid = item.get('media' + format_id)
if not pid:
continue
release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' % pid
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}
class CBSNewsLiveVideoIE(InfoExtractor): class CBSNewsLiveVideoIE(InfoExtractor):
IE_NAME = 'cbsnews:livevideo'
IE_DESC = 'CBS News Live Videos' IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
# Live videos get deleted soon. See http://www.cbsnews.com/live/ for the latest examples
_TEST = { _TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/', 'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': { 'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh', 'id': 'clinton-sanders-prepare-to-face-off-in-nh',
'ext': 'mp4', 'ext': 'flv',
'title': 'Clinton, Sanders Prepare To Face Off In NH', 'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334, 'duration': 334,
}, },
'skip': 'Video gone',
} }
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) video_id = self._match_id(url)
video_info = self._download_json( webpage = self._download_webpage(url, video_id)
'http://feeds.cbsn.cbsnews.com/rundown/story', display_id, query={
'device': 'desktop',
'dvr_slug': display_id,
})
formats = self._extract_akamai_formats(video_info['url'], display_id) video_info = self._parse_json(self._html_search_regex(
self._sort_formats(formats) r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story']
hdcore_sign = 'hdcore=3.3.1'
f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
if f4m_formats:
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
self._sort_formats(f4m_formats)
return { return {
'id': display_id, 'id': video_id,
'display_id': display_id,
'title': video_info['headline'], 'title': video_info['headline'],
'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'), 'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
'duration': parse_duration(video_info.get('segmentDur')), 'duration': parse_duration(video_info.get('segmentDur')),
'formats': formats, 'formats': f4m_formats,
} }

View File

@ -1,31 +1,30 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .cbs import CBSBaseIE import re
from .common import InfoExtractor
class CBSSportsIE(CBSBaseIE): class CBSSportsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/video/player/[^/]+/(?P<id>\d+)' _VALID_URL = r'https?://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)'
_TESTS = [{ _TEST = {
'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast', 'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s',
'info_dict': { 'info_dict': {
'id': '708337219968', 'id': '_d5_GbO8p1sT',
'ext': 'mp4', 'ext': 'flv',
'title': 'Ben Simmons the next LeBron? Not so fast', 'title': 'US Open flashbacks: 1990s',
'description': 'md5:854294f627921baba1f4b9a990d87197', 'description': 'Bill Macatee relives the best moments in US Open history from the 1990s.',
'timestamp': 1466293740,
'upload_date': '20160618',
'uploader': 'CBSI-NEW',
}, },
'params': { }
# m3u8 download
'skip_download': True,
}
}]
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
return self._extract_video_info('byId=%s' % video_id, video_id) section = mobj.group('section')
video_id = mobj.group('id')
all_videos = self._download_json(
'http://www.cbssports.com/data/video/player/getVideos/%s?as=json' % section,
video_id)
# The json file contains the info of all the videos in the section
video_info = next(v for v in all_videos if v['pcid'] == video_id)
return self.url_result('theplatform:%s' % video_info['pid'], 'ThePlatform')

View File

@ -1,53 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import float_or_none
class CCTVIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:.+?\.)?
(?:
cctv\.(?:com|cn)|
cntv\.cn
)/
(?:
video/[^/]+/(?P<id>[0-9a-f]{32})|
\d{4}/\d{2}/\d{2}/(?P<display_id>VID[0-9A-Za-z]+)
)'''
_TESTS = [{
'url': 'http://english.cntv.cn/2016/09/03/VIDEhnkB5y9AgHyIEVphCEz1160903.shtml',
'md5': '819c7b49fc3927d529fb4cd555621823',
'info_dict': {
'id': '454368eb19ad44a1925bf1eb96140a61',
'ext': 'mp4',
'title': 'Portrait of Real Current Life 09/03/2016 Modern Inventors Part 1',
}
}, {
'url': 'http://tv.cctv.com/2016/09/07/VIDE5C1FnlX5bUywlrjhxXOV160907.shtml',
'only_matching': True,
}, {
'url': 'http://tv.cntv.cn/video/C39296/95cfac44cabd3ddc4a9438780a4e5c44',
'only_matching': True
}]
def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).groups()
if not video_id:
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'(?:fo\.addVariable\("videoCenterId",\s*|guid\s*=\s*)"([0-9a-f]{32})',
webpage, 'video_id')
api_data = self._download_json(
'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + video_id, video_id)
m3u8_url = re.sub(r'maxbr=\d+&?', '', api_data['hls_url'])
return {
'id': video_id,
'title': api_data['title'],
'formats': self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', fatal=False),
'duration': float_or_none(api_data.get('video', {}).get('totalLength')),
}

View File

@ -5,16 +5,14 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
decode_packed_codes,
ExtractorError, ExtractorError,
float_or_none, parse_duration
int_or_none,
parse_duration,
) )
class CDAIE(InfoExtractor): class CDAIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?cda\.pl/video|ebd\.cda\.pl/[0-9]+x[0-9]+)/(?P<id>[0-9a-z]+)' _VALID_URL = r'https?://(?:(?:www\.)?cda\.pl/video|ebd\.cda\.pl/[0-9]+x[0-9]+)/(?P<id>[0-9a-z]+)'
_BASE_URL = 'http://www.cda.pl/'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cda.pl/video/5749950c', 'url': 'http://www.cda.pl/video/5749950c',
'md5': '6f844bf51b15f31fae165365707ae970', 'md5': '6f844bf51b15f31fae165365707ae970',
@ -23,9 +21,6 @@ class CDAIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'height': 720, 'height': 720,
'title': 'Oto dlaczego przed zakrętem należy zwolnić.', 'title': 'Oto dlaczego przed zakrętem należy zwolnić.',
'description': 'md5:269ccd135d550da90d1662651fcb9772',
'thumbnail': 're:^https?://.*\.jpg$',
'average_rating': float,
'duration': 39 'duration': 39
} }
}, { }, {
@ -35,11 +30,6 @@ class CDAIE(InfoExtractor):
'id': '57413289', 'id': '57413289',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Lądowanie na lotnisku na Maderze', 'title': 'Lądowanie na lotnisku na Maderze',
'description': 'md5:60d76b71186dcce4e0ba6d4bbdb13e1a',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'crash404',
'view_count': int,
'average_rating': float,
'duration': 137 'duration': 137
} }
}, { }, {
@ -49,55 +39,30 @@ class CDAIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
self._set_cookie('cda.pl', 'cda.player', 'html5') webpage = self._download_webpage('http://ebd.cda.pl/0x0/' + video_id, video_id)
webpage = self._download_webpage(
self._BASE_URL + '/video/' + video_id, video_id)
if 'Ten film jest dostępny dla użytkowników premium' in webpage: if 'Ten film jest dostępny dla użytkowników premium' in webpage:
raise ExtractorError('This video is only available for premium users.', expected=True) raise ExtractorError('This video is only available for premium users.', expected=True)
formats = [] title = self._html_search_regex(r'<title>(.+?)</title>', webpage, 'title')
uploader = self._search_regex(r'''(?x) formats = []
<(span|meta)[^>]+itemprop=(["\'])author\2[^>]*>
(?:<\1[^>]*>[^<]*</\1>|(?!</\1>)(?:.|\n))*?
<(span|meta)[^>]+itemprop=(["\'])name\4[^>]*>(?P<uploader>[^<]+)</\3>
''', webpage, 'uploader', default=None, group='uploader')
view_count = self._search_regex(
r'Odsłony:(?:\s|&nbsp;)*([0-9]+)', webpage,
'view_count', default=None)
average_rating = self._search_regex(
r'<(?:span|meta)[^>]+itemprop=(["\'])ratingValue\1[^>]*>(?P<rating_value>[0-9.]+)',
webpage, 'rating', fatal=False, group='rating_value')
info_dict = { info_dict = {
'id': video_id, 'id': video_id,
'title': self._og_search_title(webpage), 'title': title,
'description': self._og_search_description(webpage),
'uploader': uploader,
'view_count': int_or_none(view_count),
'average_rating': float_or_none(average_rating),
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats, 'formats': formats,
'duration': None, 'duration': None,
} }
def extract_format(page, version): def extract_format(page, version):
json_str = self._search_regex( unpacked = decode_packed_codes(page)
r'player_data=(\\?["\'])(?P<player_data>.+?)\1', page, format_url = self._search_regex(
'%s player_json' % version, fatal=False, group='player_data') r"url:\\'(.+?)\\'", unpacked, '%s url' % version, fatal=False)
if not json_str: if not format_url:
return
player_data = self._parse_json(
json_str, '%s player_data' % version, fatal=False)
if not player_data:
return
video = player_data.get('video')
if not video or 'file' not in video:
self.report_warning('Unable to extract %s version information' % version)
return return
f = { f = {
'url': video['file'], 'url': format_url,
} }
m = re.search( m = re.search(
r'<a[^>]+data-quality="(?P<format_id>[^"]+)"[^>]+href="[^"]+"[^>]+class="[^"]*quality-btn-active[^"]*">(?P<height>[0-9]+)p', r'<a[^>]+data-quality="(?P<format_id>[^"]+)"[^>]+href="[^"]+"[^>]+class="[^"]*quality-btn-active[^"]*">(?P<height>[0-9]+)p',
@ -109,7 +74,8 @@ class CDAIE(InfoExtractor):
}) })
info_dict['formats'].append(f) info_dict['formats'].append(f)
if not info_dict['duration']: if not info_dict['duration']:
info_dict['duration'] = parse_duration(video.get('duration')) info_dict['duration'] = parse_duration(self._search_regex(
r"duration:\\'(.+?)\\'", unpacked, 'duration', fatal=False))
extract_format(webpage, 'default') extract_format(webpage, 'default')
@ -117,8 +83,7 @@ class CDAIE(InfoExtractor):
r'<a[^>]+data-quality="[^"]+"[^>]+href="([^"]+)"[^>]+class="quality-btn"[^>]*>([0-9]+p)', r'<a[^>]+data-quality="[^"]+"[^>]+href="([^"]+)"[^>]+class="quality-btn"[^>]*>([0-9]+p)',
webpage): webpage):
webpage = self._download_webpage( webpage = self._download_webpage(
self._BASE_URL + href, video_id, href, video_id, 'Downloading %s version information' % resolution, fatal=False)
'Downloading %s version information' % resolution, fatal=False)
if not webpage: if not webpage:
# Manually report warning because empty page is returned when # Manually report warning because empty page is returned when
# invalid version is requested. # invalid version is requested.

View File

@ -1,4 +1,4 @@
# coding: utf-8 # -*- coding: utf-8 -*-
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@ -17,7 +17,7 @@ from ..utils import (
class CeskaTelevizeIE(InfoExtractor): class CeskaTelevizeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$' _VALID_URL = r'https?://www\.ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220', 'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': { 'info_dict': {

View File

@ -20,64 +20,54 @@ class Channel9IE(InfoExtractor):
''' '''
IE_DESC = 'Channel 9' IE_DESC = 'Channel 9'
IE_NAME = 'channel9' IE_NAME = 'channel9'
_VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+?)(?P<rss>/RSS)?/?(?:[?#&]|$)' _VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+)/?'
_TESTS = [{ _TESTS = [
'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002', {
'md5': 'bbd75296ba47916b754e73c3a4bbdf10', 'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
'info_dict': { 'md5': 'bbd75296ba47916b754e73c3a4bbdf10',
'id': 'Events/TechEd/Australia/2013/KOS002', 'info_dict': {
'ext': 'mp4', 'id': 'Events/TechEd/Australia/2013/KOS002',
'title': 'Developer Kick-Off Session: Stuff We Love', 'ext': 'mp4',
'description': 'md5:c08d72240b7c87fcecafe2692f80e35f', 'title': 'Developer Kick-Off Session: Stuff We Love',
'duration': 4576, 'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
'thumbnail': 're:http://.*\.jpg', 'duration': 4576,
'session_code': 'KOS002', 'thumbnail': 're:http://.*\.jpg',
'session_day': 'Day 1', 'session_code': 'KOS002',
'session_room': 'Arena 1A', 'session_day': 'Day 1',
'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug', 'session_room': 'Arena 1A',
'Mads Kristensen'], 'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug', 'Mads Kristensen'],
},
}, },
}, { {
'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing', 'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
'md5': 'b43ee4529d111bc37ba7ee4f34813e68', 'md5': 'b43ee4529d111bc37ba7ee4f34813e68',
'info_dict': { 'info_dict': {
'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing', 'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Self-service BI with Power BI - nuclear testing', 'title': 'Self-service BI with Power BI - nuclear testing',
'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b', 'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
'duration': 1540, 'duration': 1540,
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
'authors': ['Mike Wilmot'], 'authors': ['Mike Wilmot'],
},
}, },
}, { {
# low quality mp4 is best # low quality mp4 is best
'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library', 'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'info_dict': { 'info_dict': {
'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library', 'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ranges for the Standard Library', 'title': 'Ranges for the Standard Library',
'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d', 'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d',
'duration': 5646, 'duration': 5646,
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, { }
'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS', ]
'info_dict': {
'id': 'Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b',
'title': 'Channel 9',
},
'playlist_count': 2,
}, {
'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS',
'only_matching': True,
}, {
'url': 'https://channel9.msdn.com/Events/Speakers/scott-hanselman/RSS?UrlSafeName=scott-hanselman',
'only_matching': True,
}]
_RSS_URL = 'http://channel9.msdn.com/%s/RSS' _RSS_URL = 'http://channel9.msdn.com/%s/RSS'
@ -264,30 +254,22 @@ class Channel9IE(InfoExtractor):
return self.playlist_result(contents) return self.playlist_result(contents)
def _extract_list(self, video_id, rss_url=None): def _extract_list(self, content_path):
if not rss_url: rss = self._download_xml(self._RSS_URL % content_path, content_path, 'Downloading RSS')
rss_url = self._RSS_URL % video_id
rss = self._download_xml(rss_url, video_id, 'Downloading RSS')
entries = [self.url_result(session_url.text, 'Channel9') entries = [self.url_result(session_url.text, 'Channel9')
for session_url in rss.findall('./channel/item/link')] for session_url in rss.findall('./channel/item/link')]
title_text = rss.find('./channel/title').text title_text = rss.find('./channel/title').text
return self.playlist_result(entries, video_id, title_text) return self.playlist_result(entries, content_path, title_text)
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
content_path = mobj.group('contentpath') content_path = mobj.group('contentpath')
rss = mobj.group('rss')
if rss: webpage = self._download_webpage(url, content_path, 'Downloading web page')
return self._extract_list(content_path, url)
webpage = self._download_webpage( page_type_m = re.search(r'<meta name="WT.entryid" content="(?P<pagetype>[^:]+)[^"]+"/>', webpage)
url, content_path, 'Downloading web page') if page_type_m is not None:
page_type = page_type_m.group('pagetype')
page_type = self._search_regex(
r'<meta[^>]+name=(["\'])WT\.entryid\1[^>]+content=(["\'])(?P<pagetype>[^:]+).+?\2',
webpage, 'page type', default=None, group='pagetype')
if page_type:
if page_type == 'Entry': # Any 'item'-like page, may contain downloadable content if page_type == 'Entry': # Any 'item'-like page, may contain downloadable content
return self._extract_entry_item(webpage, content_path) return self._extract_entry_item(webpage, content_path)
elif page_type == 'Session': # Event session page, may contain downloadable content elif page_type == 'Session': # Event session page, may contain downloadable content
@ -296,5 +278,6 @@ class Channel9IE(InfoExtractor):
return self._extract_list(content_path) return self._extract_list(content_path)
else: else:
raise ExtractorError('Unexpected WT.entryid %s' % page_type, expected=True) raise ExtractorError('Unexpected WT.entryid %s' % page_type, expected=True)
else: # Assuming list else: # Assuming list
return self._extract_list(content_path) return self._extract_list(content_path)

View File

@ -1,51 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_end
class CharlieRoseIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?charlierose\.com/video(?:s|/player)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://charlierose.com/videos/27996',
'md5': 'fda41d49e67d4ce7c2411fd2c4702e09',
'info_dict': {
'id': '27996',
'ext': 'mp4',
'title': 'Remembering Zaha Hadid',
'thumbnail': 're:^https?://.*\.jpg\?\d+',
'description': 'We revisit past conversations with Zaha Hadid, in memory of the world renowned Iraqi architect.',
'subtitles': {
'en': [{
'ext': 'vtt',
}],
},
},
}, {
'url': 'https://charlierose.com/videos/27996',
'only_matching': True,
}]
_PLAYER_BASE = 'https://charlierose.com/video/player/%s'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(self._PLAYER_BASE % video_id, video_id)
title = remove_end(self._og_search_title(webpage), ' - Charlie Rose')
info_dict = self._parse_html5_media_entries(
self._PLAYER_BASE % video_id, webpage, video_id,
m3u8_entry_protocol='m3u8_native')[0]
self._sort_formats(info_dict['formats'])
self._remove_duplicate_formats(info_dict['formats'])
info_dict.update({
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage),
})
return info_dict

Some files were not shown because too many files have changed in this diff Show More