Compare commits

..

119 Commits

Author SHA1 Message Date
Sergey M․
6c152ce20f release 2016.10.02 2016-10-02 15:58:00 +07:00
Sergey M․
26406d33c7 [ChangeLog] Actualize 2016-10-02 15:56:33 +07:00
Yen Chi Hsuan
703b3afa93 [amcnetworks] Skip a restricted _TEST 2016-10-02 14:25:06 +08:00
Yen Chi Hsuan
99ed78c79e [jwplatform] Support DASH streams 2016-10-02 14:07:49 +08:00
Yen Chi Hsuan
fd15264172 [jwplatform] Support old-style jwplayer playlists 2016-10-02 13:47:06 +08:00
Yen Chi Hsuan
bd26441205 [utils] Fix xattr error handling 2016-10-02 03:03:41 +08:00
Yen Chi Hsuan
b19e275d99 [__init__] Fix lost xattr if --embed-thumbnail used
Reported at
https://github.com/rg3/youtube-dl/issues/9054#issuecomment-250451823
2016-10-02 02:12:14 +08:00
Sergey M․
f6ba581f89 [byutv:event] Add extractor 2016-10-02 00:50:07 +07:00
Sergey M․
6d2549fb4f [byutv] Fix id and display id 2016-10-02 00:44:54 +07:00
Déstin Reed
4da4516973 [byutv] Rely on _match_id and _parse_json 2016-10-02 00:41:18 +07:00
Sergey M․
e1e97c2446 [periscope:user] Fix extraction (Closes #10820) 2016-10-01 22:50:47 +07:00
Yen Chi Hsuan
53a7e3d287 [utils] Support xattr as well as pyxattr
Closes #9054

There are two xattr packages in Python, pyxattr [1] and xattr [2]. They
have different APIs.

In old days pyxattr supports Linux only and xattr supports Linux, Mac,
FreeBSD and Solaris, and pyxattr supports Linux only. Recently pyxattr
adds support for Mac OS X. [3]

An old version of [2] is shipped with Mac OS X. However, some Linux
distributions have pyxattr only, for example PLD-Linux [4] and old Arch
Linux. [5] As a result, supporting both is the way to go.

[1] https://github.com/iustin/pyxattr
[2] https://github.com/xattr/xattr
[3] https://github.com/iustin/pyxattr/pull/9
[4] https://github.com/rg3/youtube-dl/issues/5498
[5] https://git.archlinux.org/svntogit/community.git/commit/?id=427c4c76401e386d865ccddea4fbfdc74df80492
    https://git.archlinux.org/svntogit/community.git/commit/?id=59b40da7b69622a6761d364a8b07909e9cccaa56
    python-xattr is added on 2016/06/29 while pyxattr is there for more
    than 6 years
2016-10-01 20:13:04 +08:00
Yen Chi Hsuan
d54739a2e6 [downloader/http] xattr values should be bytes 2016-10-01 19:58:13 +08:00
Yen Chi Hsuan
63e0fd5bcc Merge pull request #10818 from TRox1972/criterion_match_id
[criterion] Rely on _match_id, improve regex and add thumbnail to test
2016-10-01 19:49:18 +08:00
Déstin Reed
9c51a24642 [criterion] Rely on _match_id, improve regex and add thumbnail to test 2016-10-01 13:46:48 +02:00
Yen Chi Hsuan
9bd7bd0b80 [twitch] Skip a 404 test 2016-10-01 16:38:47 +08:00
Yen Chi Hsuan
4a76b73c6c Merge pull request #10817 from TRox1972/clubic_match_id
[clubic] Rely on _match_id and _parse_json
2016-10-01 16:20:12 +08:00
Yen Chi Hsuan
e295618f9e [dctp] Fix extraction (closes #10734) 2016-10-01 15:22:48 +08:00
Yen Chi Hsuan
d7753d1948 [downloader/http] Use write_xattr function for --xattr-set-filesize 2016-10-01 14:47:20 +08:00
Déstin Reed
eaf9b22f94 [clubic] Rely on _match_id and _parse_json 2016-09-30 20:03:25 +02:00
Sergey M․
a1001f47fc [instagram] PEP 8 2016-10-01 00:16:08 +07:00
Déstin Reed
1609782258 [Instagram] Extract video dimensions 2016-10-01 00:13:34 +07:00
Sergey M․
de6babf922 [tvland] Extend _VALID_URL (Closes #10812) 2016-09-30 22:30:34 +07:00
Sergey M․
b0582fc806 [vgtv] Add support for tv.aftonbladet.se (Closes #10800) 2016-09-30 00:15:09 +07:00
Sergey M․
af33dd8ee7 [aftonbladet] Remove extractor 2016-09-30 00:13:03 +07:00
Sergey M․
70d7b323b6 [vk] Improve view count extraction 2016-09-29 23:52:29 +07:00
Sergey M․
a7ee8a00f4 [vk] Extract timestamp (Closes #10760) 2016-09-29 23:52:29 +07:00
Sergey M․
c6eed6b8c0 [utils] Lower priority for rare date formats and add tests 2016-09-29 23:52:29 +07:00
Kacper Michajłow
3aa3953d28 [vk] Fix date and view count extraction. 2016-09-29 23:52:29 +07:00
Yen Chi Hsuan
efa97bdcf1 Move write_xattr to utils.py
There are some other places that use xattr functions. It's better to
move it to a common place so that others can use it.
2016-09-30 00:28:32 +08:00
Sergey M․
475f8a4580 [vk] Add support for running live streams (Closes #10799) 2016-09-29 23:21:39 +07:00
Sergey M․
93aa0b6318 [vk] Add support for finished live streams (#10799) 2016-09-29 23:04:10 +07:00
Yen Chi Hsuan
0ce26ef228 Merge pull request #10788 from TRox1972/instagram_comments
[Instagram] Extract comments
2016-09-29 21:54:39 +08:00
Yen Chi Hsuan
0d72ff9c51 [leeco] Recognize more Le Sports URLs (#10794) 2016-09-29 21:39:35 +08:00
Déstin Reed
a56e74e271 [Instagram] Extract comments 2016-09-28 19:32:40 +02:00
Sergey M․
f533490bb7 [ketnet] Extract mzsource formats (#10770) 2016-09-28 22:58:25 +07:00
Remita Amine
8bfda726c2 [limelight:media] improve http formats extraction 2016-09-28 16:34:27 +01:00
Sergey M․
8f0cf20ab9 release 2016.09.27 2016-09-27 23:09:46 +07:00
Sergey M․
c8f45f763c [ChangeLog] Remove duplicate 2016-09-27 23:03:00 +07:00
Sergey M․
dd2cffeeec [ChangeLog] Actualize 2016-09-27 22:43:35 +07:00
Sergey M․
cdfcc4ce95 [mtv] Improve _VALID_URL 2016-09-27 22:27:10 +07:00
Kacper Michajłow
e384552590 [vk] Add support for dailymotion embeds
Fixes #10661
2016-09-27 21:58:14 +07:00
Sergey M․
1a2fbe322e [periscope] Treat timed_out state as finished stream 2016-09-27 21:55:51 +07:00
Sergey M․
f9dd86a112 [npo] Clarify IE_NAMEs (Closes #10775) 2016-09-27 21:37:33 +07:00
Remita Amine
2342733f85 fix tests related to 1978540a5122c53012e17a78841f3da0df77fd34(closes #10774) 2016-09-27 15:31:25 +01:00
Remita Amine
93933c9819 [awaan:video] fix test(closes #10773) 2016-09-27 15:31:25 +01:00
Yen Chi Hsuan
d75d9e343e [einthusan] Fix extraction (closes #10714) 2016-09-27 14:38:41 +08:00
Sergey M․
72c3d02d29 [promptfile] Improve and modernize 2016-09-26 23:39:54 +07:00
Ondřej Bárta
d3dbb46330 [promptfile] Fix extraction (Closes #10634) 2016-09-26 23:20:58 +07:00
Sergey M․
fffb9cff94 [kaltura] Speed up embed regexes (#10764) 2016-09-26 22:15:58 +07:00
Yen Chi Hsuan
d3c97bad61 Ignore and cleanup 3gp files 2016-09-26 14:14:37 +08:00
Sergey M․
2d5b4af007 [extractors] Add import for anderetijden extractor 2016-09-25 23:30:57 +07:00
Sergey M․
f1ee462c82 [PULL_REQUEST_TEMPLATE.md] Fix typo 2016-09-25 22:38:36 +07:00
Sergey M․
5742c18bc1 [npo] Add support for anderetijden.nl (Closes #10754) 2016-09-25 22:26:14 +07:00
Sergey M․
ddb19772d5 [vpro] Fix playlist title extraction and update tests 2016-09-25 22:26:06 +07:00
Sergey M․
a3d8b38168 [npo] Generalize playlist extractors 2016-09-25 22:26:00 +07:00
Sergey M․
e590b7ff9e [PULL_REQUEST_TEMPLATE.md] Add checkable Improvement options PR's purpose 2016-09-25 18:09:46 +07:00
Sergey M․
f3625cc4ca [PULL_REQUEST_TEMPLATE.md] Add Unlicense notice 2016-09-25 18:08:35 +07:00
stepshal
2d3d29976b [youtube] Change test URLs from http to https 2016-09-25 17:45:24 +07:00
Sergey M․
493353c7fd [prosiebensat1] Add support for advopedia 2016-09-25 06:25:57 +07:00
Sergey M․
0a078550b9 [prosiebensat1] Improve _VALID_URL 2016-09-25 06:19:17 +07:00
Sergey M․
f92bb612c6 [mwave] Relax _VALID_URLs (Closes #10735, closes #10748) 2016-09-25 06:14:32 +07:00
Sergey M․
ddde91952f [prosiebensat1] Fix playlist support (Closes #10745) 2016-09-25 05:36:18 +07:00
Sergey M․
63c583eb2c [prosiebensat1] Add support for sat1gold (#10745) 2016-09-25 04:43:10 +07:00
Remita Amine
7fd57de6fb [cbsnews:livevideo] fix extraction and extract m3u8 formats 2016-09-24 22:01:33 +01:00
Remita Amine
e71a450956 [common] add hdcore sign to akamai f4m formats 2016-09-24 21:55:53 +01:00
Remita Amine
27e99078d3 [brightcove:new] add support for live streams 2016-09-24 15:39:48 +01:00
Remita Amine
6f126d903f [download/hls] Delegate downloading to ffmpeg for live streams 2016-09-24 15:39:47 +01:00
Sergey M․
7518a61d41 [soundcloud] Fix typo in playlist base class name 2016-09-24 19:29:49 +07:00
Sergey M․
8e45e1cc4d [soundcloud] Generalize playlist entries extraction (#10733) 2016-09-24 19:18:01 +07:00
Yen Chi Hsuan
f0bc5a8609 [twitter] Support Periscope embeds (closes #10737)
Also update _TESTS
2016-09-24 20:00:29 +08:00
Remita Amine
a54ffb8aa7 [mtv] add common IE_NAME prefix for MTVIE and MTVVideoIE 2016-09-24 10:50:14 +01:00
Remita Amine
8add4bfecb [mtv] add support for new website urls(closes #8169)(closes #9808) 2016-09-24 10:42:20 +01:00
Yen Chi Hsuan
0711995bca [openload] Support subtitles (closes #10625) 2016-09-24 14:27:08 +08:00
Yen Chi Hsuan
5968d7d2fe [extractor/common] Improved support for HTML5 subtitles
Ref: #10625

In a strict sense, <track>s with kind=captions are not subtitles. [1]
openload misuses this attribute, and I guess there will be more
examples, so I add it to common.py.

Also allow extracting information for subtitles-only <video> or <audio>
tags, which is the case of openload.

[1] https://www.w3.org/TR/html5/embedded-content-0.html#attr-track-kind
2016-09-24 14:20:42 +08:00
Sergey M․
e6332059ac release 2016.09.24 2016-09-24 02:16:47 +07:00
Sergey M․
8eec691e8a [ChangeLog] Actualize 2016-09-24 02:12:49 +07:00
Sergey M․
24628cf7db [soundcloud:playlist] Provide video id for playlist entries (Closes #10733) 2016-09-24 02:01:01 +07:00
Sergey M․
71ad00c09f [prosiebensat1] Add support for kabeleinsdoku (Closes #10732) 2016-09-23 21:08:16 +07:00
Remita Amine
45cae3b021 [cbs] extract info from thunder videoPlayerService(closes #10728) 2016-09-22 19:28:22 +01:00
Yen Chi Hsuan
4ddcb5999d [openload] Fix extraction (closes #10408, closes #10727)
Thanks to @daniel100097 for providing a working version
2016-09-23 01:47:51 +08:00
Yen Chi Hsuan
628406db96 [Makefile] Cleanup files from fragment-based downloaders 2016-09-23 01:13:56 +08:00
Yen Chi Hsuan
e3d6bdc8fc [ustream] Support HLS streams (closes #10698) 2016-09-23 01:11:13 +08:00
Sergey M․
0a439c5c4c [udemy] Stringify video id 2016-09-22 21:48:53 +07:00
Remita Amine
1978540a51 [ooyala] extract all hls formats 2016-09-21 21:49:52 +01:00
Sergey M․
12f211d0cb [videomore] Fix embed regex 2016-09-21 22:51:36 +07:00
Remita Amine
3a5a18705f [adobepass] add support MSO that depend on watchTVeverywhere(closes #10709) 2016-09-21 15:57:27 +01:00
Remita Amine
1ae0ae5db0 [cartoonnetwork] add support Adobe Pass auth 2016-09-20 18:52:00 +01:00
Sergey M․
f62a77b99a [soundcloud] Modernize 2016-09-20 21:56:57 +07:00
coolsa
4bfd294e2f [soundcloud] Extract license metadata 2016-09-20 21:56:57 +07:00
Remita Amine
e33a7253b2 [fox] add support for Adobe Pass auth(closes #8584) 2016-09-20 15:52:23 +01:00
Remita Amine
c38f06818d add support for Adobe Pass auth in tbs,tnt and trutv extractors(fixes #10642)(closes #10222)(closes #10519) 2016-09-20 11:55:30 +01:00
Sergey M․
cb57386873 release 2016.09.19 2016-09-19 02:58:32 +07:00
Sergey M․
59fd8f931d [ChangeLog] Actualize 2016-09-19 02:57:14 +07:00
Sergey M․
70b4cf9b1b [crunchyroll] Check if already logged in (Closes #10700) 2016-09-19 02:50:06 +07:00
Sergey M․
cc764a6da8 [twitch:stream] Remove fallback to profile extraction when stream is offline
Main page does not contain profile videos anymore
2016-09-18 19:10:18 +07:00
Yen Chi Hsuan
d8dbf8707d [thisav] Improve title extraction (closes #10682)
I didn't add a test case as the one in #10682 looks like a copyrighted
product.
2016-09-18 18:35:38 +08:00
Sergey M․
a1da888d0c [vyborymos] Improve station info extraction 2016-09-18 17:30:55 +07:00
Sergey M․
3acff9423d release 2016.09.18 2016-09-18 17:16:55 +07:00
Sergey M․
9ca93b99d1 [ChangeLog] Actualize 2016-09-18 17:15:22 +07:00
Sergey M․
14ae11efab [vyborymos] Add extractor (Closes #10692) 2016-09-18 16:56:40 +07:00
Sergey M․
190d2027d0 [xfileshare] Add title regex for streamin.to and fallback to video id (Closes #10646) 2016-09-18 07:22:06 +07:00
Sergey M․
26394d021d [globo:article] Add support for multiple videos (Closes #10653) 2016-09-17 23:34:10 +07:00
Sergey M․
30d0b549be [extractor/common] Add manifest_url for hls and hds formats 2016-09-17 21:33:38 +07:00
Sergey M․
86f4d14f81 Refactor fragments interface and dash segments downloader
- Eliminate segment_urls and initialization_url
+ Introduce manifest_url (manifest may contain unfragmented data in this case url will be used for direct media URL and manifest_url for manifest itself correspondingly)
* Rewrite dashsegments downloader to use fragments data
* Improve generic mpd extraction
2016-09-17 20:35:22 +07:00
Sergey M․
21d21b0c72 [svt] Fix DASH formats extraction 2016-09-17 19:25:31 +07:00
Sergey M․
b4c1d6e800 [extractor/common] Expose fragments interface for dashsegments formats 2016-09-17 18:31:18 +07:00
Sergey M․
a0d5077c8d [extractor/common] Introduce fragments interface 2016-09-17 18:31:09 +07:00
Yen Chi Hsuan
584d6f3457 [thisav] Recognize jwplayers (closes #10447) 2016-09-17 18:46:43 +08:00
Yen Chi Hsuan
e14c82bd6b [jwplatform] Use js_to_json to detect more JWPlayers 2016-09-17 18:45:08 +08:00
Sergey M․
c51a7f0b2f [franceinter] Fix upload date extraction 2016-09-17 15:44:37 +07:00
Remita Amine
d05ef09d9d [mangomolo] fix domain regex 2016-09-17 08:11:01 +01:00
Remita Amine
30d9e20938 [postprocessor/ffmpeg] apply FFmpegFixupM3u8PP only for videos with aac codec(#5591) 2016-09-16 22:06:55 +01:00
Remita Amine
fc86d4eed0 [mangomolo] fix typo 2016-09-16 20:10:47 +01:00
Remita Amine
7d273a387a [mangomolo] add support for Mangomolo embeds 2016-09-16 19:31:39 +01:00
Remita Amine
6ad0219556 [common] add helper method for Wowza Streaming Engine format extraction 2016-09-16 19:30:38 +01:00
Remita Amine
98b7506e96 [toutv] add support for authentication(closes #10669) 2016-09-16 17:40:15 +01:00
Sergey M․
52dc8a9b3f [franceinter] Fix upload date extraction 2016-09-16 22:02:59 +07:00
Sergey M․
9d8985a165 [tv4] Fix hls and hds formats (Closes #10659) 2016-09-16 00:54:34 +07:00
76 changed files with 2948 additions and 810 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.09.15*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.09.15**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.10.02*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.10.02**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.09.15
[debug] youtube-dl version 2016.10.02
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -10,8 +10,13 @@
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:
- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)
- [ ] I am not the original author of this code but it is in public domain or released under [Unlicense](http://unlicense.org/) (provide reliable evidence)
### What is the purpose of your *pull request*?
- [ ] Bug fix
- [ ] Improvement
- [ ] New extractor
- [ ] New feature

1
.gitignore vendored
View File

@@ -29,6 +29,7 @@ updates_key.pem
*.m4a
*.m4v
*.mp3
*.3gp
*.part
*.swp
test/testdata

102
ChangeLog
View File

@@ -1,3 +1,105 @@
version 2016.10.02
Core
* Fix possibly lost extended attributes during post-processing
+ Support pyxattr as well as python-xattr for --xattrs and
--xattr-set-filesize (#9054)
Extractors
+ [jwplatform] Support DASH streams in JWPlayer
+ [jwplatform] Support old-style JWPlayer playlists
+ [byutv:event] Add extractor
* [periscope:user] Fix extraction (#10820)
* [dctp] Fix extraction (#10734)
+ [instagram] Extract video dimensions (#10790)
+ [tvland] Extend URL regular expression (#10812)
+ [vgtv] Add support for tv.aftonbladet.se (#10800)
- [aftonbladet] Remove extractor
* [vk] Fix timestamp and view count extraction (#10760)
+ [vk] Add support for running and finished live streams (#10799)
+ [leeco] Recognize more Le Sports URLs (#10794)
+ [instagram] Extract comments (#10788)
+ [ketnet] Extract mzsource formats (#10770)
* [limelight:media] Improve HTTP formats extraction
version 2016.09.27
Core
+ Add hdcore query parameter to akamai f4m formats
+ Delegate HLS live streams downloading to ffmpeg
+ Improved support for HTML5 subtitles
Extractors
+ [vk] Add support for dailymotion embeds (#10661)
* [promptfile] Fix extraction (#10634)
* [kaltura] Speed up embed regular expressions (#10764)
+ [npo] Add support for anderetijden.nl (#10754)
+ [prosiebensat1] Add support for advopedia sites
* [mwave] Relax URL regular expression (#10735, #10748)
* [prosiebensat1] Fix playlist support (#10745)
+ [prosiebensat1] Add support for sat1gold sites (#10745)
+ [cbsnews:livevideo] Fix extraction and extract m3u8 formats
+ [brightcove:new] Add support for live streams
* [soundcloud] Generalize playlist entries extraction (#10733)
+ [mtv] Add support for new URL schema (#8169, #9808)
* [einthusan] Fix extraction (#10714)
+ [twitter] Support Periscope embeds (#10737)
+ [openload] Support subtitles (#10625)
version 2016.09.24
Core
+ Add support for watchTVeverywhere.com authentication provider based MSOs for
Adobe Pass authentication (#10709)
Extractors
+ [soundcloud:playlist] Provide video id for early playlist entries (#10733)
+ [prosiebensat1] Add support for kabeleinsdoku (#10732)
* [cbs] Extract info from thunder videoPlayerService (#10728)
* [openload] Fix extraction (#10408)
+ [ustream] Support the new HLS streams (#10698)
+ [ooyala] Extract all HLS formats
+ [cartoonnetwork] Add support for Adobe Pass authentication
+ [soundcloud] Extract license metadata
+ [fox] Add support for Adobe Pass authentication (#8584)
+ [tbs] Add support for Adobe Pass authentication (#10642, #10222)
+ [trutv] Add support for Adobe Pass authentication (#10519)
+ [turner] Add support for Adobe Pass authentication
version 2016.09.19
Extractors
+ [crunchyroll] Check if already authenticated (#10700)
- [twitch:stream] Remove fallback to profile extraction when stream is offline
* [thisav] Improve title extraction (#10682)
* [vyborymos] Improve station info extraction
version 2016.09.18
Core
+ Introduce manifest_url and fragments fields in formats dictionary for
fragmented media
+ Provide manifest_url field for DASH segments, HLS and HDS
+ Provide fragments field for DASH segments
* Rework DASH segments downloader to use fragments field
+ Add helper method for Wowza Streaming Engine formats extraction
Extractors
+ [vyborymos] Add extractor for vybory.mos.ru (#10692)
+ [xfileshare] Add title regular expression for streamin.to (#10646)
+ [globo:article] Add support for multiple videos (#10653)
+ [thisav] Recognize HTML5 videos (#10447)
* [jwplatform] Improve JWPlayer detection
+ [mangomolo] Add support for Mangomolo embeds
+ [toutv] Add support for authentication (#10669)
* [franceinter] Fix upload date extraction
* [tv4] Fix HLS and HDS formats extraction (#10659)
version 2016.09.15
Core

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete

View File

@@ -34,12 +34,12 @@
- **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **AfreecaTV**: afreecatv.com
- **Aftonbladet**
- **AirMozilla**
- **AlJazeera**
- **Allocine**
- **AlphaPorno**
- **AMCNetworks**
- **anderetijden**: npo.nl and ntr.nl
- **AnimeOnDemand**
- **anitube.se**
- **AnySex**
@@ -111,6 +111,7 @@
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BuzzFeed**
- **BYUtv**
- **BYUtvEvent**
- **Camdemy**
- **CamdemyFolder**
- **CamWithHer**
@@ -127,8 +128,8 @@
- **CBS**
- **CBSInteractive**
- **CBSLocal**
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **cbsnews**: CBS News
- **cbsnews:livevideo**: CBS News Live Videos
- **CBSSports**
- **CCTV**
- **CDA**
@@ -388,6 +389,8 @@
- **mailru**: Видео@Mail.Ru
- **MakersChannel**
- **MakerTV**
- **mangomolo:live**
- **mangomolo:video**
- **MatchTV**
- **MDR**: MDR.DE and KiKA
- **media.ccc.de**
@@ -422,8 +425,9 @@
- **MPORA**
- **MSN**
- **mtg**: MTG services
- **MTV**
- **mtv**
- **mtv.de**
- **mtv:video**
- **mtvservices:embedded**
- **MuenchenTV**: münchen.tv
- **MusicPlayOn**
@@ -849,6 +853,7 @@
- **VRT**
- **vube**: Vube.com
- **VuClip**
- **VyboryMos**
- **Walla**
- **washingtonpost**
- **washingtonpost:article**
@@ -862,7 +867,7 @@
- **wholecloud**: WholeCloud
- **Wimp**
- **Wistia**
- **WNL**
- **wnl**: npo.nl and ntr.nl
- **WorldStarHipHop**
- **wrzuta.pl**
- **wrzuta.pl:playlist**

View File

@@ -292,6 +292,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_strdate('25-09-2014'), '20140925')
self.assertEqual(unified_strdate('27.02.2016 17:30'), '20160227')
self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_strdate('Feb 7, 2016 at 6:35 pm'), '20160207')
def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
@@ -312,6 +313,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('27.02.2016 17:30'), 1456594200)
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')

View File

@@ -266,8 +266,6 @@ def _real_main(argv=None):
postprocessors.append({
'key': 'FFmpegEmbedSubtitle',
})
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
if opts.embedthumbnail:
already_have_thumbnail = opts.writethumbnail or opts.write_all_thumbnails
postprocessors.append({
@@ -276,6 +274,10 @@ def _real_main(argv=None):
})
if not already_have_thumbnail:
opts.writethumbnail = True
# XAttrMetadataPP should be run after post-processors that may change file
# contents
if opts.xattrs:
postprocessors.append({'key': 'XAttrMetadata'})
# Please keep ExecAfterDownload towards the bottom as it allows the user to modify the final file in any way.
# So if the user is able to remove the file before your postprocessor runs it might cause a few problems.
if opts.exec_cmd:
@@ -283,12 +285,6 @@ def _real_main(argv=None):
'key': 'ExecAfterDownload',
'exec_cmd': opts.exec_cmd,
})
if opts.xattr_set_filesize:
try:
import xattr
xattr # Confuse flake8
except ImportError:
parser.error('setting filesize xattr requested but python-xattr is not available')
external_downloader_args = None
if opts.external_downloader_args:
external_downloader_args = compat_shlex_split(opts.external_downloader_args)

View File

@@ -1,7 +1,6 @@
from __future__ import unicode_literals
import os
import re
from .fragment import FragmentFD
from ..compat import compat_urllib_error
@@ -19,34 +18,32 @@ class DashSegmentsFD(FragmentFD):
FD_NAME = 'dashsegments'
def real_download(self, filename, info_dict):
base_url = info_dict['url']
segment_urls = [info_dict['segment_urls'][0]] if self.params.get('test', False) else info_dict['segment_urls']
initialization_url = info_dict.get('initialization_url')
segments = info_dict['fragments'][:1] if self.params.get(
'test', False) else info_dict['fragments']
ctx = {
'filename': filename,
'total_frags': len(segment_urls) + (1 if initialization_url else 0),
'total_frags': len(segments),
}
self._prepare_and_start_frag_download(ctx)
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
def process_segment(segment, tmp_filename, fatal):
target_url, segment_name = segment
def process_segment(segment, tmp_filename, num):
segment_url = segment['url']
segment_name = 'Frag%d' % num
target_filename = '%s-%s' % (tmp_filename, segment_name)
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = num == 0 or not skip_unavailable_fragments
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
success = ctx['dl'].download(target_filename, {'url': segment_url})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
@@ -72,16 +69,8 @@ class DashSegmentsFD(FragmentFD):
return False
return True
segments_to_download = [(initialization_url, 'Init')] if initialization_url else []
segments_to_download.extend([
(segment_url, 'Seg%d' % i)
for i, segment_url in enumerate(segment_urls)])
for i, segment in enumerate(segments_to_download):
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = i == 0 or not skip_unavailable_fragments
if not process_segment(segment, ctx['tmpfilename'], fatal):
for i, segment in enumerate(segments):
if not process_segment(segment, ctx['tmpfilename'], i):
return False
self._finish_frag_download(ctx)

View File

@@ -31,7 +31,7 @@ class HlsFD(FragmentFD):
FD_NAME = 'hlsnative'
@staticmethod
def can_download(manifest):
def can_download(manifest, info_dict):
UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
@@ -53,6 +53,7 @@ class HlsFD(FragmentFD):
)
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
check_results.append(not info_dict.get('is_live'))
return all(check_results)
def real_download(self, filename, info_dict):
@@ -62,7 +63,7 @@ class HlsFD(FragmentFD):
s = manifest.decode('utf-8', 'ignore')
if not self.can_download(s):
if not self.can_download(s, info_dict):
self.report_warning(
'hlsnative has detected features it does not support, '
'extraction will be delegated to ffmpeg')

View File

@@ -13,6 +13,9 @@ from ..utils import (
encodeFilename,
sanitize_open,
sanitized_Request,
write_xattr,
XAttrMetadataError,
XAttrUnavailableError,
)
@@ -179,9 +182,8 @@ class HttpFD(FileDownloader):
if self.params.get('xattr_set_filesize', False) and data_len is not None:
try:
import xattr
xattr.setxattr(tmpfilename, 'user.ytdl.filesize', str(data_len))
except(OSError, IOError, ImportError) as err:
write_xattr(tmpfilename, 'user.ytdl.filesize', str(data_len).encode('utf-8'))
except (XAttrUnavailableError, XAttrMetadataError) as err:
self.report_error('unable to set filesize xattr: %s' % str(err))
try:

File diff suppressed because it is too large Load Diff

View File

@@ -1,64 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class AftonbladetIE(InfoExtractor):
_VALID_URL = r'https?://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'info_dict': {
'id': '36015',
'ext': 'mp4',
'title': 'Vulkanutbrott i rymden - nu släpper NASA bilderna',
'description': 'Jupiters måne mest aktiv av alla himlakroppar',
'timestamp': 1394142732,
'upload_date': '20140306',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
# find internal video meta data
meta_url = 'http://aftonbladet-play-metadata.cdn.drvideo.aptoma.no/video/%s.json'
player_config = self._parse_json(self._html_search_regex(
r'data-player-config="([^"]+)"', webpage, 'player config'), video_id)
internal_meta_id = player_config['aptomaVideoId']
internal_meta_url = meta_url % internal_meta_id
internal_meta_json = self._download_json(
internal_meta_url, video_id, 'Downloading video meta data')
# find internal video formats
format_url = 'http://aftonbladet-play.videodata.drvideo.aptoma.no/actions/video/?id=%s'
internal_video_id = internal_meta_json['videoId']
internal_formats_url = format_url % internal_video_id
internal_formats_json = self._download_json(
internal_formats_url, video_id, 'Downloading video formats')
formats = []
for fmt in internal_formats_json['formats']['http']['pseudostreaming']['mp4']:
p = fmt['paths'][0]
formats.append({
'url': 'http://%s:%d/%s/%s' % (p['address'], p['port'], p['path'], p['filename']),
'ext': 'mp4',
'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')),
'tbr': int_or_none(fmt.get('bitrate')),
'protocol': 'http',
})
self._sort_formats(formats)
return {
'id': video_id,
'title': internal_meta_json['title'],
'formats': formats,
'thumbnail': internal_meta_json.get('imageUrl'),
'description': internal_meta_json.get('shortPreamble'),
'timestamp': int_or_none(internal_meta_json.get('timePublished')),
'duration': int_or_none(internal_meta_json.get('duration')),
'view_count': int_or_none(internal_meta_json.get('views')),
}

View File

@@ -28,6 +28,7 @@ class AMCNetworksIE(ThePlatformIE):
# m3u8 download
'skip_download': True,
},
'skip': 'Requires TV provider accounts',
}, {
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
'only_matching': True,

View File

@@ -50,25 +50,6 @@ class AWAANBaseIE(InfoExtractor):
'is_live': is_live,
}
def _extract_video_formats(self, webpage, video_id, m3u8_entry_protocol):
formats = []
format_url_base = 'http' + self._html_search_regex(
[
r'file\s*:\s*"https?(://[^"]+)/playlist.m3u8',
r'<a[^>]+href="rtsp(://[^"]+)"'
], webpage, 'format url')
formats.extend(self._extract_mpd_formats(
format_url_base + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_m3u8_formats(
format_url_base + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False))
formats.extend(self._extract_f4m_formats(
format_url_base + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
self._sort_formats(formats)
return formats
class AWAANVideoIE(AWAANBaseIE):
IE_NAME = 'awaan:video'
@@ -85,6 +66,7 @@ class AWAANVideoIE(AWAANBaseIE):
'duration': 2041,
'timestamp': 1227504126,
'upload_date': '20081124',
'uploader_id': '71',
},
}, {
'url': 'http://awaan.ae/video/26723981/%D8%AF%D8%A7%D8%B1-%D8%A7%D9%84%D8%B3%D9%84%D8%A7%D9%85:-%D8%AE%D9%8A%D8%B1-%D8%AF%D9%88%D8%B1-%D8%A7%D9%84%D8%A3%D9%86%D8%B5%D8%A7%D8%B1',
@@ -99,16 +81,18 @@ class AWAANVideoIE(AWAANBaseIE):
video_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(video_data, video_id, False)
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' +
compat_urllib_parse_urlencode({
'id': video_data['id'],
'user_id': video_data['user_id'],
'signature': video_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
}), video_id)
info['formats'] = self._extract_video_formats(webpage, video_id, 'm3u8_native')
embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' + compat_urllib_parse_urlencode({
'id': video_data['id'],
'user_id': video_data['user_id'],
'signature': video_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
})
info.update({
'_type': 'url_transparent',
'url': embed_url,
'ie_key': 'MangomoloVideo',
})
return info
@@ -138,16 +122,18 @@ class AWAANLiveIE(AWAANBaseIE):
channel_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(channel_data, channel_id, True)
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' +
compat_urllib_parse_urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
}), channel_id)
info['formats'] = self._extract_video_formats(webpage, channel_id, 'm3u8')
embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' + compat_urllib_parse_urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
})
info.update({
'_type': 'url_transparent',
'url': embed_url,
'ie_key': 'MangomoloLive',
})
return info

View File

@@ -621,15 +621,21 @@ class BrightcoveNewIE(InfoExtractor):
'url': text_track['src'],
})
is_live = False
duration = float_or_none(json_data.get('duration'), 1000)
if duration and duration < 0:
is_live = True
return {
'id': video_id,
'title': title,
'title': self._live_title(title) if is_live else title,
'description': clean_html(json_data.get('description')),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000),
'duration': duration,
'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': account_id,
'formats': formats,
'subtitles': subtitles,
'tags': json_data.get('tags', []),
'is_live': is_live,
}

View File

@@ -1,6 +1,5 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
@@ -8,15 +7,15 @@ from ..utils import ExtractorError
class BYUtvIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?byutv.org/watch/[0-9a-f-]+/(?P<video_id>[^/?#]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/(?!event/)(?P<id>[0-9a-f-]+)(?:/(?P<display_id>[^/?#&]+))?'
_TESTS = [{
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5',
'md5': '05850eb8c749e2ee05ad5a1c34668493',
'info_dict': {
'id': 'studio-c-season-5-episode-5',
'id': '6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
'display_id': 'studio-c-season-5-episode-5',
'ext': 'mp4',
'description': 'md5:e07269172baff037f8e8bf9956bc9747',
'title': 'Season 5 Episode 5',
'description': 'md5:e07269172baff037f8e8bf9956bc9747',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 1486.486,
},
@@ -24,28 +23,71 @@ class BYUtvIE(InfoExtractor):
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
ep = self._parse_json(
episode_code, display_id, transform_source=lambda s:
re.sub(r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', s))
if ep['providerType'] != 'Ooyala':
raise ExtractorError('Unsupported provider %s' % ep['provider'])
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ep['providerId'],
'id': video_id,
'display_id': display_id,
'title': ep['title'],
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
}
class BYUtvEventIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?byutv\.org/watch/event/(?P<id>[0-9a-f-]+)'
_TEST = {
'url': 'http://www.byutv.org/watch/event/29941b9b-8bf6-48d2-aebf-7a87add9e34b',
'info_dict': {
'id': '29941b9b-8bf6-48d2-aebf-7a87add9e34b',
'ext': 'mp4',
'title': 'Toledo vs. BYU (9/30/16)',
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
episode_json = re.sub(
r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', episode_code)
ep = json.loads(episode_json)
if ep['providerType'] == 'Ooyala':
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ep['providerId'],
'id': video_id,
'title': ep['title'],
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
}
else:
raise ExtractorError('Unsupported provider %s' % ep['provider'])
ooyala_id = self._search_regex(
r'providerId\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'ooyala id', group='id')
title = self._search_regex(
r'class=["\']description["\'][^>]*>\s*<h1>([^<]+)</h1>', webpage,
'title').strip()
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ooyala_id,
'id': video_id,
'title': title,
}

View File

@@ -33,4 +33,10 @@ class CartoonNetworkIE(TurnerBaseIE):
'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big',
'tokenizer_src': 'http://www.cartoonnetwork.com/cntv/mvpd/processors/services/token_ipadAdobe.do',
},
}, {
'url': url,
'site_name': 'CartoonNetwork',
'auth_required': self._search_regex(
r'_cnglobal\.cvpFullOrPreviewAuth\s*=\s*(true|false);',
webpage, 'auth required', default='false') == 'true',
})

View File

@@ -4,7 +4,9 @@ from .theplatform import ThePlatformFeedIE
from ..utils import (
int_or_none,
find_xpath_attr,
ExtractorError,
xpath_element,
xpath_text,
update_url_query,
)
@@ -47,27 +49,49 @@ class CBSIE(CBSBaseIE):
'only_matching': True,
}]
def _extract_video_info(self, guid):
path = 'dJ5BDC/media/guid/2198311517/' + guid
smil_url = 'http://link.theplatform.com/s/%s?mbr=true' % path
formats, subtitles = self._extract_theplatform_smil(smil_url + '&manifest=m3u', guid)
for r in ('OnceURL&formats=M3U', 'HLS&formats=M3U', 'RTMP', 'WIFI', '3G'):
try:
tp_formats, _ = self._extract_theplatform_smil(smil_url + '&assetTypes=' + r, guid, 'Downloading %s SMIL data' % r.split('&')[0])
formats.extend(tp_formats)
except ExtractorError:
def _extract_video_info(self, content_id):
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
tp_path = 'dJ5BDC/media/guid/2198311517/%s' % content_id
tp_release_url = 'http://link.theplatform.com/s/' + tp_path
asset_types = []
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType')
if not asset_type or asset_type in asset_types:
continue
asset_types.append(asset_type)
query = {
'mbr': 'true',
'assetTypes': asset_type,
}
if asset_type.startswith('HLS') or asset_type in ('OnceURL', 'StreamPack'):
query['formats'] = 'MPEG4,M3U'
elif asset_type in ('RTMP', 'WIFI', '3G'):
query['formats'] = 'MPEG4,FLV'
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(tp_release_url, query), content_id,
'Downloading %s SMIL data' % asset_type)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
metadata = self._download_theplatform_metadata(path, guid)
info = self._parse_theplatform_metadata(metadata)
info = self._extract_theplatform_metadata(tp_path, content_id)
info.update({
'id': guid,
'id': content_id,
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
'series': metadata.get('cbs$SeriesTitle'),
'season_number': int_or_none(metadata.get('cbs$SeasonNumber')),
'episode': metadata.get('cbs$EpisodeTitle'),
'episode_number': int_or_none(metadata.get('cbs$EpisodeNumber')),
})
return info

View File

@@ -9,6 +9,7 @@ from ..utils import (
class CBSNewsIE(CBSIE):
IE_NAME = 'cbsnews'
IE_DESC = 'CBS News'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
@@ -68,15 +69,16 @@ class CBSNewsIE(CBSIE):
class CBSNewsLiveVideoIE(InfoExtractor):
IE_NAME = 'cbsnews:livevideo'
IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[^/?#]+)'
# Live videos get deleted soon. See http://www.cbsnews.com/live/ for the latest examples
_TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh',
'ext': 'flv',
'ext': 'mp4',
'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334,
},
@@ -84,25 +86,22 @@ class CBSNewsLiveVideoIE(InfoExtractor):
}
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_info = self._download_json(
'http://feeds.cbsn.cbsnews.com/rundown/story', display_id, query={
'device': 'desktop',
'dvr_slug': display_id,
})
video_info = self._parse_json(self._html_search_regex(
r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story']
hdcore_sign = 'hdcore=3.3.1'
f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
if f4m_formats:
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
self._sort_formats(f4m_formats)
formats = self._extract_akamai_formats(video_info['url'], display_id)
self._sort_formats(formats)
return {
'id': video_id,
'id': display_id,
'display_id': display_id,
'title': video_info['headline'],
'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
'duration': parse_duration(video_info.get('segmentDur')),
'formats': f4m_formats,
'formats': formats,
}

View File

@@ -1,9 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
@@ -30,16 +27,14 @@ class ClubicIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
player_url = 'http://player.m6web.fr/v1/player/clubic/%s.html' % video_id
player_page = self._download_webpage(player_url, video_id)
config_json = self._search_regex(
config = self._parse_json(self._search_regex(
r'(?m)M6\.Player\.config\s*=\s*(\{.+?\});$', player_page,
'configuration')
config = json.loads(config_json)
'configuration'), video_id)
video_info = config['videoInfo']
sources = config['sources']

View File

@@ -87,6 +87,9 @@ class InfoExtractor(object):
Potential fields:
* url Mandatory. The URL of the video file
* manifest_url
The URL of the manifest file in case of
fragmented media (DASH, hls, hds)
* ext Will be calculated from URL if missing
* format A human-readable description of the format
("mp4 container with h264/opus").
@@ -115,6 +118,11 @@ class InfoExtractor(object):
download, lower-case.
"http", "https", "rtsp", "rtmp", "rtmpe",
"m3u8", "m3u8_native" or "http_dash_segments".
* fragments A list of fragments of the fragmented media,
with the following entries:
* "url" (mandatory) - fragment's URL
* "duration" (optional, int or float)
* "filesize" (optional, int)
* preference Order number of this format. If this field is
present and not None, the formats get sorted
by this field, regardless of all other values.
@@ -1142,6 +1150,7 @@ class InfoExtractor(object):
formats.append({
'format_id': format_id,
'url': manifest_url,
'manifest_url': manifest_url,
'ext': 'flv' if bootstrap_info is not None else None,
'tbr': tbr,
'width': width,
@@ -1247,9 +1256,11 @@ class InfoExtractor(object):
# format_id intact.
if not live:
format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats)))
manifest_url = format_url(line.strip())
f = {
'format_id': '-'.join(format_id),
'url': format_url(line.strip()),
'url': manifest_url,
'manifest_url': manifest_url,
'tbr': tbr,
'ext': ext,
'fps': float_or_none(last_info.get('FRAME-RATE')),
@@ -1521,9 +1532,10 @@ class InfoExtractor(object):
mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group()
return self._parse_mpd_formats(
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict)
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url,
formats_dict=formats_dict, mpd_url=mpd_url)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}):
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None):
"""
Parse formats from MPD manifest.
References:
@@ -1544,42 +1556,52 @@ class InfoExtractor(object):
def extract_multisegment_info(element, ms_parent_info):
ms_info = ms_parent_info.copy()
# As per [1, 5.3.9.2.2] SegmentList and SegmentTemplate share some
# common attributes and elements. We will only extract relevant
# for us.
def extract_common(source):
segment_timeline = source.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
ms_info['s'] = []
for s in s_e:
r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r
ms_info['s'].append({
't': int(s.get('t', 0)),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']),
'r': r,
})
start_number = source.get('startNumber')
if start_number:
ms_info['start_number'] = int(start_number)
timescale = source.get('timescale')
if timescale:
ms_info['timescale'] = int(timescale)
segment_duration = source.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
def extract_Initialization(source):
initialization = source.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
segment_list = element.find(_add_ns('SegmentList'))
if segment_list is not None:
extract_common(segment_list)
extract_Initialization(segment_list)
segment_urls_e = segment_list.findall(_add_ns('SegmentURL'))
if segment_urls_e:
ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e]
initialization = segment_list.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
else:
segment_template = element.find(_add_ns('SegmentTemplate'))
if segment_template is not None:
start_number = segment_template.get('startNumber')
if start_number:
ms_info['start_number'] = int(start_number)
segment_timeline = segment_template.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
ms_info['s'] = []
for s in s_e:
r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r
ms_info['s'].append({
't': int(s.get('t', 0)),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']),
'r': r,
})
else:
timescale = segment_template.get('timescale')
if timescale:
ms_info['timescale'] = int(timescale)
segment_duration = segment_template.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
extract_common(segment_template)
media_template = segment_template.get('media')
if media_template:
ms_info['media_template'] = media_template
@@ -1587,11 +1609,14 @@ class InfoExtractor(object):
if initialization:
ms_info['initialization_url'] = initialization
else:
initialization = segment_template.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
extract_Initialization(segment_template)
return ms_info
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
formats = []
for period in mpd_doc.findall(_add_ns('Period')):
@@ -1634,6 +1659,7 @@ class InfoExtractor(object):
f = {
'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
'url': base_url,
'manifest_url': mpd_url,
'ext': mimetype2ext(mime_type),
'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')),
@@ -1648,9 +1674,7 @@ class InfoExtractor(object):
}
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float(representation_ms_info['segment_duration']) / float(representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template)
@@ -1659,46 +1683,79 @@ class InfoExtractor(object):
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
# can't be used at the same time
if '%(Number' in media_template:
representation_ms_info['segment_urls'] = [
media_template % {
if '%(Number' in media_template and 's' not in representation_ms_info:
segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
representation_ms_info['fragments'] = [{
'url': media_template % {
'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth'),
}
for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
},
'duration': segment_duration,
} for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
else:
representation_ms_info['segment_urls'] = []
# $Number*$ or $Time$ in media template with S list available
# Example $Number*$: http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg
# Example $Time$: https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411
representation_ms_info['fragments'] = []
segment_time = 0
segment_d = None
segment_number = representation_ms_info['start_number']
def add_segment_url():
representation_ms_info['segment_urls'].append(
media_template % {
'Time': segment_time,
'Bandwidth': representation_attrib.get('bandwidth'),
}
)
segment_url = media_template % {
'Time': segment_time,
'Bandwidth': representation_attrib.get('bandwidth'),
'Number': segment_number,
}
representation_ms_info['fragments'].append({
'url': segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
})
for num, s in enumerate(representation_ms_info['s']):
segment_time = s.get('t') or segment_time
segment_d = s['d']
add_segment_url()
segment_number += 1
for r in range(s.get('r', 0)):
segment_time += s['d']
segment_time += segment_d
add_segment_url()
segment_time += s['d']
if 'segment_urls' in representation_ms_info:
segment_number += 1
segment_time += segment_d
elif 'segment_urls' in representation_ms_info and 's' in representation_ms_info:
# No media template
# Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
# or any YouTube dashsegments video
fragments = []
s_num = 0
for segment_url in representation_ms_info['segment_urls']:
s = representation_ms_info['s'][s_num]
for r in range(s.get('r', 0) + 1):
fragments.append({
'url': segment_url,
'duration': float_or_none(s['d'], representation_ms_info['timescale']),
})
representation_ms_info['fragments'] = fragments
# NB: MPD manifest may contain direct URLs to unfragmented media.
# No fragments key is present in this case.
if 'fragments' in representation_ms_info:
f.update({
'segment_urls': representation_ms_info['segment_urls'],
'fragments': [],
'protocol': 'http_dash_segments',
})
if 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id)
f.update({
'initialization_url': initialization_url,
})
if not f.get('url'):
f['url'] = initialization_url
f['fragments'].append({'url': initialization_url})
f['fragments'].extend(representation_ms_info['fragments'])
for fragment in f['fragments']:
fragment['url'] = combine_url(base_url, fragment['url'])
try:
existing_format = next(
fo for fo in formats
@@ -1771,7 +1828,7 @@ class InfoExtractor(object):
for track_tag in re.findall(r'<track[^>]+>', media_content):
track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind')
if not kind or kind == 'subtitles':
if not kind or kind in ('subtitles', 'captions'):
src = track_attributes.get('src')
if not src:
continue
@@ -1779,22 +1836,70 @@ class InfoExtractor(object):
media_info['subtitles'].setdefault(lang, []).append({
'url': absolute_url(src),
})
if media_info['formats']:
if media_info['formats'] or media_info['subtitles']:
entries.append(media_info)
return entries
def _extract_akamai_formats(self, manifest_url, video_id):
formats = []
hdcore_sign = 'hdcore=3.7.0'
f4m_url = re.sub(r'(https?://.+?)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
formats.extend(self._extract_f4m_formats(
update_url_query(f4m_url, {'hdcore': '3.7.0'}),
video_id, f4m_id='hds', fatal=False))
if 'hdcore=' not in f4m_url:
f4m_url += ('&' if '?' in f4m_url else '?') + hdcore_sign
f4m_formats = self._extract_f4m_formats(
f4m_url, video_id, f4m_id='hds', fatal=False)
for entry in f4m_formats:
entry.update({'extra_param_to_segment_url': hdcore_sign})
formats.extend(f4m_formats)
m3u8_url = re.sub(r'(https?://.+?)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
return formats
def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]):
url = re.sub(r'/(?:manifest|playlist|jwplayer)\.(?:m3u8|f4m|mpd|smil)', '', url)
url_base = self._search_regex(r'(?:https?|rtmp|rtsp)(://[^?]+)', url, 'format url')
http_base_url = 'http' + url_base
formats = []
if 'm3u8' not in skip_protocols:
formats.extend(self._extract_m3u8_formats(
http_base_url + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False))
if 'f4m' not in skip_protocols:
formats.extend(self._extract_f4m_formats(
http_base_url + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
if re.search(r'(?:/smil:|\.smil)', url_base):
if 'dash' not in skip_protocols:
formats.extend(self._extract_mpd_formats(
http_base_url + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
if 'smil' not in skip_protocols:
rtmp_formats = self._extract_smil_formats(
http_base_url + '/jwplayer.smil',
video_id, fatal=False)
for rtmp_format in rtmp_formats:
rtsp_format = rtmp_format.copy()
rtsp_format['url'] = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
del rtsp_format['play_path']
del rtsp_format['ext']
rtsp_format.update({
'url': rtsp_format['url'].replace('rtmp://', 'rtsp://'),
'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
'protocol': 'rtsp',
})
formats.extend([rtmp_format, rtsp_format])
else:
for protocol in ('rtmp', 'rtsp'):
if protocol not in skip_protocols:
formats.append({
'url': protocol + url_base,
'format_id': protocol,
'protocol': protocol,
})
return formats
def _live_title(self, name):
""" Generate the title for a live video """
now = datetime.datetime.now()

View File

@@ -1,8 +1,6 @@
# -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@@ -16,20 +14,20 @@ class CriterionIE(InfoExtractor):
'ext': 'mp4',
'title': 'Le Samouraï',
'description': 'md5:a2b4b116326558149bef81f76dcbb93f',
'thumbnail': 're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
final_url = self._search_regex(
r'so.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
r'so\.addVariable\("videoURL", "(.+?)"\)\;', webpage, 'video url')
title = self._og_search_title(webpage)
description = self._html_search_meta('description', webpage)
thumbnail = self._search_regex(
r'so.addVariable\("thumbnailURL", "(.+?)"\)\;',
r'so\.addVariable\("thumbnailURL", "(.+?)"\)\;',
webpage, 'thumbnail url')
return {

View File

@@ -46,6 +46,13 @@ class CrunchyrollBaseIE(InfoExtractor):
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
def is_logged(webpage):
return '<title>Redirecting' in webpage
# Already logged in
if is_logged(login_page):
return
login_form_str = self._search_regex(
r'(?P<form><form[^>]+?id=(["\'])%s\2[^>]*>)' % self._LOGIN_FORM,
login_page, 'login form', group='form')
@@ -69,7 +76,7 @@ class CrunchyrollBaseIE(InfoExtractor):
headers={'Content-Type': 'application/x-www-form-urlencoded'})
# Successful login
if '<title>Redirecting' in response:
if is_logged(response):
return
error = self._html_search_regex(

View File

@@ -1,61 +1,54 @@
# encoding: utf-8
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import unified_strdate
class DctpTvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dctp\.tv/(#/)?filme/(?P<id>.+?)/$'
_TEST = {
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'md5': '174dd4a8a6225cf5655952f969cfbe24',
'info_dict': {
'id': '1324',
'id': '95eaa4f33dad413aa17b4ee613cccc6c',
'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
'ext': 'flv',
'title': 'Videoinstallation für eine Kaufhausfassade'
'ext': 'mp4',
'title': 'Videoinstallation für eine Kaufhausfassade',
'description': 'Kurzfilm',
'upload_date': '20110407',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
# rtmp download
'skip_download': True,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
base_url = 'http://dctp-ivms2-restapi.s3.amazonaws.com/'
version_json = self._download_json(
base_url + 'version.json',
video_id, note='Determining file version')
version = version_json['version_name']
info_json = self._download_json(
'{0}{1}/restapi/slugs/{2}.json'.format(base_url, version, video_id),
video_id, note='Fetching object ID')
object_id = compat_str(info_json['object_id'])
meta_json = self._download_json(
'{0}{1}/restapi/media/{2}.json'.format(base_url, version, object_id),
video_id, note='Downloading metadata')
uuid = meta_json['uuid']
title = meta_json['title']
wide = meta_json['is_wide']
if wide:
ratio = '16x9'
else:
ratio = '4x3'
play_path = 'mp4:{0}_dctp_0500_{1}.m4v'.format(uuid, ratio)
webpage = self._download_webpage(url, video_id)
object_id = self._html_search_meta('DC.identifier', webpage)
servers_json = self._download_json(
'http://www.dctp.tv/streaming_servers/',
'http://www.dctp.tv/elastic_streaming_client/get_streaming_server/',
video_id, note='Downloading server list')
url = servers_json[0]['endpoint']
server = servers_json[0]['server']
m3u8_path = self._search_regex(
r'\'([^\'"]+/playlist\.m3u8)"', webpage, 'm3u8 path')
formats = self._extract_m3u8_formats(
'http://%s%s' % (server, m3u8_path), video_id, ext='mp4',
entry_protocol='m3u8_native')
title = self._og_search_title(webpage)
description = self._html_search_meta('DC.description', webpage)
upload_date = unified_strdate(
self._html_search_meta('DC.date.created', webpage))
thumbnail = self._og_search_thumbnail(webpage)
return {
'id': object_id,
'title': title,
'format': 'rtmp',
'url': url,
'play_path': play_path,
'rtmp_real_time': True,
'ext': 'flv',
'display_id': video_id
'formats': formats,
'display_id': video_id,
'description': description,
'upload_date': upload_date,
'thumbnail': thumbnail,
}

View File

@@ -14,7 +14,7 @@ class EinthusanIE(InfoExtractor):
_TESTS = [
{
'url': 'http://www.einthusan.com/movies/watch.php?id=2447',
'md5': 'af244f4458cd667205e513d75da5b8b1',
'md5': 'd71379996ff5b7f217eca034c34e3461',
'info_dict': {
'id': '2447',
'ext': 'mp4',
@@ -25,13 +25,13 @@ class EinthusanIE(InfoExtractor):
},
{
'url': 'http://www.einthusan.com/movies/watch.php?id=1671',
'md5': 'ef63c7a803e22315880ed182c10d1c5c',
'md5': 'b16a6fd3c67c06eb7c79c8a8615f4213',
'info_dict': {
'id': '1671',
'ext': 'mp4',
'title': 'Soodhu Kavvuum',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'md5:05d8a0c0281a4240d86d76e14f2f4d51',
'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c',
}
},
]
@@ -50,9 +50,11 @@ class EinthusanIE(InfoExtractor):
video_id = self._search_regex(
r'data-movieid=["\'](\d+)', webpage, 'video id', default=video_id)
video_url = self._download_webpage(
m3u8_url = self._download_webpage(
'http://cdn.einthusan.com/geturl/%s/hd/London,Washington,Toronto,Dallas,San,Sydney/'
% video_id, video_id)
% video_id, video_id, headers={'Referer': url})
formats = self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4', entry_protocol='m3u8_native')
description = self._html_search_meta('description', webpage)
thumbnail = self._html_search_regex(
@@ -64,7 +66,7 @@ class EinthusanIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'url': video_url,
'formats': formats,
'thumbnail': thumbnail,
'description': description,
}

View File

@@ -31,7 +31,6 @@ from .aenetworks import (
HistoryTopicIE,
)
from .afreecatv import AfreecaTVIE
from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE
@@ -117,7 +116,10 @@ from .brightcove import (
BrightcoveNewIE,
)
from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
from .byutv import (
BYUtvIE,
BYUtvEventIE,
)
from .c56 import C56IE
from .camdemy import (
CamdemyIE,
@@ -472,6 +474,10 @@ from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE
from .mangomolo import (
MangomoloVideoIE,
MangomoloLiveIE,
)
from .matchtv import MatchTVIE
from .mdr import MDRIE
from .meta import METAIE
@@ -512,6 +518,7 @@ from .movingimage import MovingImageIE
from .msn import MSNIE
from .mtv import (
MTVIE,
MTVVideoIE,
MTVServicesEmbeddedIE,
MTVDEIE,
)
@@ -607,13 +614,14 @@ from .nowtv import (
)
from .noz import NozIE
from .npo import (
AndereTijdenIE,
NPOIE,
NPOLiveIE,
NPORadioIE,
NPORadioFragmentIE,
SchoolTVIE,
VPROIE,
WNLIE
WNLIE,
)
from .npr import NprIE
from .nrk import (
@@ -1065,6 +1073,7 @@ from .vporn import VpornIE
from .vrt import VRTIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .vyborymos import VyboryMosIE
from .walla import WallaIE
from .washingtonpost import (
WashingtonPostIE,

View File

@@ -11,9 +11,13 @@ class Formula1IE(InfoExtractor):
'md5': '8c79e54be72078b26b89e0e111c0502b',
'info_dict': {
'id': 'JvYXJpMzE6pArfHWm5ARp5AiUmD-gibV',
'ext': 'flv',
'ext': 'mp4',
'title': 'Race highlights - Spain 2016',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://www.formula1.com/en/video/2016/5/Race_highlights_-_Spain_2016.html',

View File

@@ -1,14 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .adobepass import AdobePassIE
from ..utils import (
smuggle_url,
update_url_query,
)
class FOXIE(InfoExtractor):
class FOXIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.fox.com/watch/255180355939/7684182528',
@@ -30,14 +30,26 @@ class FOXIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
release_url = self._parse_json(self._search_regex(
r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
video_id)['release_url']
settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), video_id)
fox_pdk_player = settings['fox_pdk_player']
release_url = fox_pdk_player['release_url']
query = {
'mbr': 'true',
'switch': 'http'
}
if fox_pdk_player.get('access') == 'locked':
ap_p = settings['foxAdobePassProvider']
rating = ap_p.get('videoRating')
if rating == 'n/a':
rating = None
resource = self._get_mvpd_resource('fbc-fox', None, ap_p['videoGUID'], rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, 'fbc-fox', resource)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(update_url_query(
release_url, {'switch': 'http'}), {'force_smil_url': True}),
'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
'id': video_id,
}

View File

@@ -2,7 +2,6 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import month_by_name
@@ -10,14 +9,14 @@ class FranceInterIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?franceinter\.fr/emissions/(?P<id>[^?#]+)'
_TEST = {
'url': 'https://www.franceinter.fr/emissions/la-marche-de-l-histoire/la-marche-de-l-histoire-18-decembre-2013',
'md5': '4764932e466e6f6c79c317d2e74f6884',
'url': 'https://www.franceinter.fr/emissions/affaires-sensibles/affaires-sensibles-07-septembre-2016',
'md5': '9e54d7bdb6fdc02a841007f8a975c094',
'info_dict': {
'id': 'la-marche-de-l-histoire/la-marche-de-l-histoire-18-decembre-2013',
'id': 'affaires-sensibles/affaires-sensibles-07-septembre-2016',
'ext': 'mp3',
'title': 'LHistoire dans les jeux vidéo du 18 décembre 2013 - France Inter',
'description': 'md5:7f2ce449894d1e585932273080fb410d',
'upload_date': '20131218',
'title': 'Affaire Cahuzac : le contentieux du compte en Suisse',
'description': 'md5:401969c5d318c061f86bda1fa359292b',
'upload_date': '20160907',
},
}
@@ -39,7 +38,8 @@ class FranceInterIE(InfoExtractor):
if upload_date_str:
upload_date_list = upload_date_str.split()
upload_date_list.reverse()
upload_date_list[1] = compat_str(month_by_name(upload_date_list[1], lang='fr'))
upload_date_list[1] = '%02d' % (month_by_name(upload_date_list[1], lang='fr') or 0)
upload_date_list[2] = '%02d' % int(upload_date_list[2])
upload_date = ''.join(upload_date_list)
else:
upload_date = None

View File

@@ -1657,7 +1657,9 @@ class GenericIE(InfoExtractor):
return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats(
doc, video_id, mpd_base_url=url.rpartition('/')[0])
doc, video_id,
mpd_base_url=full_response.geturl().rpartition('/')[0],
mpd_url=url)
self._sort_formats(info_dict['formats'])
return info_dict
elif re.match(r'^{http://ns\.adobe\.com/f4m/[12]\.0}manifest$', doc.tag):
@@ -2254,6 +2256,35 @@ class GenericIE(InfoExtractor):
return self.url_result(
self._proto_relative_url(unescapeHTML(mobj.group('url'))), 'VODPlatform')
# Look for Mangomolo embeds
mobj = re.search(
r'''(?x)<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?admin\.mangomolo\.com/analytics/index\.php/customers/embed/
(?:
video\?.*?\bid=(?P<video_id>\d+)|
index\?.*?\bchannelid=(?P<channel_id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)
).+?)\1''', webpage)
if mobj is not None:
info = {
'_type': 'url_transparent',
'url': self._proto_relative_url(unescapeHTML(mobj.group('url'))),
'title': video_title,
'description': video_description,
'thumbnail': video_thumbnail,
'uploader': video_uploader,
}
video_id = mobj.group('video_id')
if video_id:
info.update({
'ie_key': 'MangomoloVideo',
'id': video_id,
})
else:
info.update({
'ie_key': 'MangomoloLive',
'id': mobj.group('channel_id'),
})
return info
# Look for Instagram embeds
instagram_embed_url = InstagramIE._extract_embed_url(webpage)
if instagram_embed_url is not None:
@@ -2301,12 +2332,23 @@ class GenericIE(InfoExtractor):
info_dict.update(json_ld)
return info_dict
# Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries:
for entry in entries:
entry.update({
'id': video_id,
'title': video_title,
})
self._sort_formats(entry['formats'])
return self.playlist_result(entries)
def check_video(vurl):
if YoutubeIE.suitable(vurl):
return True
vpath = compat_urlparse.urlparse(vurl).path
vext = determine_ext(vpath)
return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml')
return '.' in vpath and vext not in ('swf', 'png', 'jpg', 'srt', 'sbv', 'sub', 'vtt', 'ttml', 'js')
def filter_video(urls):
return list(filter(check_video, urls))
@@ -2356,9 +2398,6 @@ class GenericIE(InfoExtractor):
# We only look in og:video if the MIME type is a video, don't try if it's a Flash player:
if m_video_type is not None:
found = filter_video(re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage))
if not found:
# HTML5 video
found = re.findall(r'(?s)<(?:video|audio)[^<]*(?:>.*?<source[^>]*)?\s+src=["\'](.*?)["\']', webpage)
if not found:
REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)'
found = re.search(

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
import random
import re
import math
from .common import InfoExtractor
@@ -14,6 +15,7 @@ from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
orderedSet,
str_or_none,
)
@@ -63,6 +65,9 @@ class GloboIE(InfoExtractor):
}, {
'url': 'http://canaloff.globo.com/programas/desejar-profundo/videos/4518560.html',
'only_matching': True,
}, {
'url': 'globo:3607726',
'only_matching': True,
}]
class MD5(object):
@@ -396,7 +401,7 @@ class GloboIE(InfoExtractor):
class GloboArticleIE(InfoExtractor):
_VALID_URL = r'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)(?:\.html)?'
_VALID_URL = r'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/.]+)(?:\.html)?'
_VIDEOID_REGEXES = [
r'\bdata-video-id=["\'](\d{7,})',
@@ -408,15 +413,20 @@ class GloboArticleIE(InfoExtractor):
_TESTS = [{
'url': 'http://g1.globo.com/jornal-nacional/noticia/2014/09/novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes.html',
'md5': '307fdeae4390ccfe6ba1aa198cf6e72b',
'info_dict': {
'id': '3652183',
'ext': 'mp4',
'title': 'Receita Federal explica como vai fiscalizar bagagens de quem retorna ao Brasil de avião',
'duration': 110.711,
'uploader': 'Rede Globo',
'uploader_id': '196',
}
'id': 'novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes',
'title': 'Novidade na fiscalização de bagagem pela Receita provoca discussões',
'description': 'md5:c3c4b4d4c30c32fce460040b1ac46b12',
},
'playlist_count': 1,
}, {
'url': 'http://g1.globo.com/pr/parana/noticia/2016/09/mpf-denuncia-lula-marisa-e-mais-seis-na-operacao-lava-jato.html',
'info_dict': {
'id': 'mpf-denuncia-lula-marisa-e-mais-seis-na-operacao-lava-jato',
'title': "Lula era o 'comandante máximo' do esquema da Lava Jato, diz MPF",
'description': 'md5:8aa7cc8beda4dc71cc8553e00b77c54c',
},
'playlist_count': 6,
}, {
'url': 'http://gq.globo.com/Prazeres/Poder/noticia/2015/10/all-o-desafio-assista-ao-segundo-capitulo-da-serie.html',
'only_matching': True,
@@ -435,5 +445,12 @@ class GloboArticleIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(self._VIDEOID_REGEXES, webpage, 'video id')
return self.url_result('globo:%s' % video_id, 'Globo')
video_ids = []
for video_regex in self._VIDEOID_REGEXES:
video_ids.extend(re.findall(video_regex, webpage))
entries = [
self.url_result('globo:%s' % video_id, GloboIE.ie_key())
for video_id in orderedSet(video_ids)]
title = self._og_search_title(webpage, fatal=False)
description = self._html_search_meta('description', webpage)
return self.playlist_result(entries, display_id, title, description)

View File

@@ -29,6 +29,7 @@ class InstagramIE(InfoExtractor):
'uploader': 'Naomi Leonor Phan-Quang',
'like_count': int,
'comment_count': int,
'comments': list,
},
}, {
# missing description
@@ -44,6 +45,7 @@ class InstagramIE(InfoExtractor):
'uploader': 'Britney Spears',
'like_count': int,
'comment_count': int,
'comments': list,
},
'params': {
'skip_download': True,
@@ -82,7 +84,7 @@ class InstagramIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
(video_url, description, thumbnail, timestamp, uploader,
uploader_id, like_count, comment_count) = [None] * 8
uploader_id, like_count, comment_count, height, width) = [None] * 10
shared_data = self._parse_json(
self._search_regex(
@@ -94,6 +96,8 @@ class InstagramIE(InfoExtractor):
shared_data, lambda x: x['entry_data']['PostPage'][0]['media'], dict)
if media:
video_url = media.get('video_url')
height = int_or_none(media.get('dimensions', {}).get('height'))
width = int_or_none(media.get('dimensions', {}).get('width'))
description = media.get('caption')
thumbnail = media.get('display_src')
timestamp = int_or_none(media.get('date'))
@@ -101,10 +105,24 @@ class InstagramIE(InfoExtractor):
uploader_id = media.get('owner', {}).get('username')
like_count = int_or_none(media.get('likes', {}).get('count'))
comment_count = int_or_none(media.get('comments', {}).get('count'))
comments = [{
'author': comment.get('user', {}).get('username'),
'author_id': comment.get('user', {}).get('id'),
'id': comment.get('id'),
'text': comment.get('text'),
'timestamp': int_or_none(comment.get('created_at')),
} for comment in media.get(
'comments', {}).get('nodes', []) if comment.get('text')]
if not video_url:
video_url = self._og_search_video_url(webpage, secure=False)
formats = [{
'url': video_url,
'width': width,
'height': height,
}]
if not uploader_id:
uploader_id = self._search_regex(
r'"owner"\s*:\s*{\s*"username"\s*:\s*"(.+?)"',
@@ -121,7 +139,7 @@ class InstagramIE(InfoExtractor):
return {
'id': video_id,
'url': video_url,
'formats': formats,
'ext': 'mp4',
'title': 'Video by %s' % uploader_id,
'description': description,
@@ -131,6 +149,7 @@ class InstagramIE(InfoExtractor):
'uploader': uploader,
'like_count': like_count,
'comment_count': comment_count,
'comments': comments,
}

View File

@@ -9,6 +9,7 @@ from ..utils import (
determine_ext,
float_or_none,
int_or_none,
js_to_json,
mimetype2ext,
)
@@ -19,24 +20,32 @@ class JWPlatformBaseIE(InfoExtractor):
# TODO: Merge this with JWPlayer-related codes in generic.py
mobj = re.search(
'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\((?P<options>[^)]+)\)',
r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage)
if mobj:
return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id)
self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True, m3u8_id=None, rtmp_params=None, base_url=None):
def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
m3u8_id=None, mpd_id=None, rtmp_params=None, base_url=None):
# JWPlayer backward compatibility: flattened playlists
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/api/config.js#L81-L96
if 'playlist' not in jwplayer_data:
jwplayer_data = {'playlist': [jwplayer_data]}
entries = []
# JWPlayer backward compatibility: single playlist item
# https://github.com/jwplayer/jwplayer/blob/v7.7.0/src/js/playlist/playlist.js#L10
if not isinstance(jwplayer_data['playlist'], list):
jwplayer_data['playlist'] = [jwplayer_data['playlist']]
for video_data in jwplayer_data['playlist']:
# JWPlayer backward compatibility: flattened sources
# https://github.com/jwplayer/jwplayer/blob/v7.4.3/src/js/playlist/item.js#L29-L35
@@ -55,6 +64,9 @@ class JWPlatformBaseIE(InfoExtractor):
if source_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, this_video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, this_video_id, mpd_id=mpd_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in ('oga', 'aac', 'mp3', 'mpeg', 'vorbis'):
formats.append({

View File

@@ -105,20 +105,20 @@ class KalturaIE(InfoExtractor):
kWidget\.(?:thumb)?[Ee]mbed\(
\{.*?
(?P<q1>['\"])wid(?P=q1)\s*:\s*
(?P<q2>['\"])_?(?P<partner_id>[^'\"]+)(?P=q2),.*?
(?P<q2>['\"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
(?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['\"])(?P<id>[^'\"]+)(?P=q4),
(?P<q4>['\"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4),
""", webpage) or
re.search(
r'''(?xs)
(?P<q1>["\'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/(?:(?!(?P=q1)).)*(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
(?P=q1).*?
(?:
entry_?[Ii]d|
(?P<q2>["\'])entry_?[Ii]d(?P=q2)
)\s*:\s*
(?P<q3>["\'])(?P<id>.+?)(?P=q3)
(?P<q3>["\'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
''', webpage))
if mobj:
embed_info = mobj.groupdict()

View File

@@ -21,6 +21,10 @@ class KetnetIE(InfoExtractor):
}, {
'url': 'https://www.ketnet.be/achter-de-schermen/sien-repeteert-voor-stars-for-life',
'only_matching': True,
}, {
# mzsource, geo restricted to Belgium
'url': 'https://www.ketnet.be/kijken/nachtwacht/de-bermadoe',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -36,9 +40,25 @@ class KetnetIE(InfoExtractor):
title = config['title']
formats = self._extract_m3u8_formats(
config['source']['hls'], video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
formats = []
for source_key in ('', 'mz'):
source = config.get('%ssource' % source_key)
if not isinstance(source, dict):
continue
for format_id, format_url in source.items():
if format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id=format_id,
fatal=False))
elif format_id == 'hds':
formats.extend(self._extract_f4m_formats(
format_url, video_id, f4m_id=format_id, fatal=False))
else:
formats.append({
'url': format_url,
'format_id': format_id,
})
self._sort_formats(formats)
return {

View File

@@ -29,7 +29,7 @@ from ..utils import (
class LeIE(InfoExtractor):
IE_DESC = '乐视网'
_VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|sports\.le\.com/video)/(?P<id>\d+)\.html'
_VALID_URL = r'https?://(?:www\.le\.com/ptv/vplay|(?:sports\.le|(?:www\.)?lesports)\.com/(?:match|video))/(?P<id>\d+)\.html'
_URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html'
@@ -73,6 +73,12 @@ class LeIE(InfoExtractor):
}, {
'url': 'http://sports.le.com/video/25737697.html',
'only_matching': True,
}, {
'url': 'http://www.lesports.com/match/1023203003.html',
'only_matching': True,
}, {
'url': 'http://sports.le.com/match/1023203003.html',
'only_matching': True,
}]
# ror() and calc_time_key() are reversed from a embedded swf file in KLetvPlayer.swf

View File

@@ -59,7 +59,7 @@ class LimelightBaseIE(InfoExtractor):
format_id = 'rtmp'
if stream.get('videoBitRate'):
format_id += '-%d' % int_or_none(stream['videoBitRate'])
http_url = 'http://%s/%s' % (rtmp.group('host').replace('csl.', 'cpl.'), rtmp.group('playpath')[4:])
http_url = 'http://cpl.delvenetworks.com/' + rtmp.group('playpath')[4:]
urls.append(http_url)
http_fmt = fmt.copy()
http_fmt.update({

View File

@@ -0,0 +1,54 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
int_or_none,
)
class MangomoloBaseIE(InfoExtractor):
def _get_real_id(self, page_id):
return page_id
def _real_extract(self, url):
page_id = self._get_real_id(self._match_id(url))
webpage = self._download_webpage(url, page_id)
hidden_inputs = self._hidden_inputs(webpage)
m3u8_entry_protocol = 'm3u8' if self._IS_LIVE else 'm3u8_native'
format_url = self._html_search_regex(
[
r'file\s*:\s*"(https?://[^"]+?/playlist.m3u8)',
r'<a[^>]+href="(rtsp://[^"]+)"'
], webpage, 'format url')
formats = self._extract_wowza_formats(
format_url, page_id, m3u8_entry_protocol, ['smil'])
self._sort_formats(formats)
return {
'id': page_id,
'title': self._live_title(page_id) if self._IS_LIVE else page_id,
'uploader_id': hidden_inputs.get('userid'),
'duration': int_or_none(hidden_inputs.get('duration')),
'is_live': self._IS_LIVE,
'formats': formats,
}
class MangomoloVideoIE(MangomoloBaseIE):
IE_NAME = 'mangomolo:video'
_VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/video\?.*?\bid=(?P<id>\d+)'
_IS_LIVE = False
class MangomoloLiveIE(MangomoloBaseIE):
IE_NAME = 'mangomolo:live'
_VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/index\?.*?\bchannelid=(?P<id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)'
_IS_LIVE = True
def _get_real_id(self, page_id):
return base64.b64decode(compat_urllib_parse_unquote(page_id).encode()).decode()

View File

@@ -270,6 +270,29 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
class MTVIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv'
_VALID_URL = r'https?://(?:www\.)?mtv\.com/(?:video-clips|full-episodes)/(?P<id>[^/?#.]+)'
_FEED_URL = 'http://www.mtv.com/feeds/mrss/'
_TESTS = [{
'url': 'http://www.mtv.com/video-clips/vl8qof/unlocking-the-truth-trailer',
'md5': '1edbcdf1e7628e414a8c5dcebca3d32b',
'info_dict': {
'id': '5e14040d-18a4-47c4-a582-43ff602de88e',
'ext': 'mp4',
'title': 'Unlocking The Truth|July 18, 2016|1|101|Trailer',
'description': '"Unlocking the Truth" premieres August 17th at 11/10c.',
'timestamp': 1468846800,
'upload_date': '20160718',
},
}, {
'url': 'http://www.mtv.com/full-episodes/94tujl/unlocking-the-truth-gates-of-hell-season-1-ep-101',
'only_matching': True,
}]
class MTVVideoIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv:video'
_VALID_URL = r'''(?x)^https?://
(?:(?:www\.)?mtv\.com/videos/.+?/(?P<videoid>[0-9]+)/[^/]+$|
m\.mtv\.com/videos/video\.rbml\?.*?id=(?P<mgid>[^&]+))'''

View File

@@ -9,9 +9,9 @@ from ..utils import (
class MwaveIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
_VALID_URL = r'https?://mwave\.interest\.me/(?:[^/]+/)?mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
_URL_TEMPLATE = 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=%s'
_TEST = {
_TESTS = [{
'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859',
# md5 is unstable
'info_dict': {
@@ -23,7 +23,10 @@ class MwaveIE(InfoExtractor):
'duration': 206,
'view_count': int,
}
}
}, {
'url': 'http://mwave.interest.me/en/mnettv/videodetail.m?searchVideoDetailVO.clip_id=176199',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -60,8 +63,8 @@ class MwaveIE(InfoExtractor):
class MwaveMeetGreetIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/meetgreet/view/(?P<id>\d+)'
_TEST = {
_VALID_URL = r'https?://mwave\.interest\.me/(?:[^/]+/)?meetgreet/view/(?P<id>\d+)'
_TESTS = [{
'url': 'http://mwave.interest.me/meetgreet/view/256',
'info_dict': {
'id': '173294',
@@ -72,7 +75,10 @@ class MwaveMeetGreetIE(InfoExtractor):
'duration': 3634,
'view_count': int,
}
}
}, {
'url': 'http://mwave.interest.me/en/meetgreet/view/256',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -5,6 +5,7 @@ import re
from .common import InfoExtractor
from ..utils import (
fix_xml_ampersands,
orderedSet,
parse_duration,
qualities,
strip_jsonp,
@@ -438,9 +439,29 @@ class SchoolTVIE(InfoExtractor):
}
class VPROIE(NPOIE):
class NPOPlaylistBaseIE(NPOIE):
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result('npo:%s' % video_id if not video_id.startswith('http') else video_id)
for video_id in orderedSet(re.findall(self._PLAYLIST_ENTRY_RE, webpage))
]
playlist_title = self._html_search_regex(
self._PLAYLIST_TITLE_RE, webpage, 'playlist title',
default=None) or self._og_search_title(webpage)
return self.playlist_result(entries, playlist_id, playlist_title)
class VPROIE(NPOPlaylistBaseIE):
IE_NAME = 'vpro'
_VALID_URL = r'https?://(?:www\.)?(?:tegenlicht\.)?vpro\.nl/(?:[^/]+/){2,}(?P<id>[^/]+)\.html'
_PLAYLIST_TITLE_RE = r'<h1[^>]+class=["\'].*?\bmedia-platform-title\b.*?["\'][^>]*>([^<]+)'
_PLAYLIST_ENTRY_RE = r'data-media-id="([^"]+)"'
_TESTS = [
{
@@ -453,12 +474,13 @@ class VPROIE(NPOIE):
'description': 'md5:52cf4eefbc96fffcbdc06d024147abea',
'upload_date': '20130225',
},
'skip': 'Video gone',
},
{
'url': 'http://www.vpro.nl/programmas/2doc/2015/sergio-herman.html',
'info_dict': {
'id': 'sergio-herman',
'title': 'Sergio Herman: Fucking perfect',
'title': 'sergio herman: fucking perfect',
},
'playlist_count': 2,
},
@@ -467,54 +489,40 @@ class VPROIE(NPOIE):
'url': 'http://www.vpro.nl/programmas/2doc/2015/education-education.html',
'info_dict': {
'id': 'education-education',
'title': '2Doc',
'title': 'education education',
},
'playlist_count': 2,
}
]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result('npo:%s' % video_id if not video_id.startswith('http') else video_id)
for video_id in re.findall(r'data-media-id="([^"]+)"', webpage)
]
playlist_title = self._search_regex(
r'<title>\s*([^>]+?)\s*-\s*Teledoc\s*-\s*VPRO\s*</title>',
webpage, 'playlist title', default=None) or self._og_search_title(webpage)
return self.playlist_result(entries, playlist_id, playlist_title)
class WNLIE(InfoExtractor):
class WNLIE(NPOPlaylistBaseIE):
IE_NAME = 'wnl'
_VALID_URL = r'https?://(?:www\.)?omroepwnl\.nl/video/detail/(?P<id>[^/]+)__\d+'
_PLAYLIST_TITLE_RE = r'(?s)<h1[^>]+class="subject"[^>]*>(.+?)</h1>'
_PLAYLIST_ENTRY_RE = r'<a[^>]+href="([^"]+)"[^>]+class="js-mid"[^>]*>Deel \d+'
_TEST = {
_TESTS = [{
'url': 'http://www.omroepwnl.nl/video/detail/vandaag-de-dag-6-mei__060515',
'info_dict': {
'id': 'vandaag-de-dag-6-mei',
'title': 'Vandaag de Dag 6 mei',
},
'playlist_count': 4,
}
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
class AndereTijdenIE(NPOPlaylistBaseIE):
IE_NAME = 'anderetijden'
_VALID_URL = r'https?://(?:www\.)?anderetijden\.nl/programma/(?:[^/]+/)+(?P<id>[^/?#&]+)'
_PLAYLIST_TITLE_RE = r'(?s)<h1[^>]+class=["\'].*?\bpage-title\b.*?["\'][^>]*>(.+?)</h1>'
_PLAYLIST_ENTRY_RE = r'<figure[^>]+class=["\']episode-container episode-page["\'][^>]+data-prid=["\'](.+?)["\']'
entries = [
self.url_result('npo:%s' % video_id, 'NPO')
for video_id, part in re.findall(
r'<a[^>]+href="([^"]+)"[^>]+class="js-mid"[^>]*>(Deel \d+)', webpage)
]
playlist_title = self._html_search_regex(
r'(?s)<h1[^>]+class="subject"[^>]*>(.+?)</h1>',
webpage, 'playlist title')
return self.playlist_result(entries, playlist_id, playlist_title)
_TESTS = [{
'url': 'http://anderetijden.nl/programma/1/Andere-Tijden/aflevering/676/Duitse-soldaten-over-de-Slag-bij-Arnhem',
'info_dict': {
'id': 'Duitse-soldaten-over-de-Slag-bij-Arnhem',
'title': 'Duitse soldaten over de Slag bij Arnhem',
},
'playlist_count': 3,
}]

View File

@@ -47,7 +47,7 @@ class OoyalaBaseIE(InfoExtractor):
delivery_type = stream['delivery_type']
if delivery_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
s_url, embed_code, 'mp4', 'm3u8_native',
re.sub(r'/ip(?:ad|hone)/', '/all/', s_url), embed_code, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif delivery_type == 'hds' or ext == 'f4m':
formats.extend(self._extract_f4m_formats(

View File

@@ -24,6 +24,22 @@ class OpenloadIE(InfoExtractor):
'title': 'skyrim_no-audio_1080.mp4',
'thumbnail': 're:^https?://.*\.jpg$',
},
}, {
'url': 'https://openload.co/embed/rjC09fkPLYs',
'info_dict': {
'id': 'rjC09fkPLYs',
'ext': 'mp4',
'title': 'movie.mp4',
'thumbnail': 're:^https?://.*\.jpg$',
'subtitles': {
'en': [{
'ext': 'vtt',
}],
},
},
'params': {
'skip_download': True, # test subtitles only
},
}, {
'url': 'https://openload.co/embed/kUEfGclsU9o/skyrim_no-audio_1080.mp4',
'only_matching': True,
@@ -51,7 +67,8 @@ class OpenloadIE(InfoExtractor):
# declared to be freely used in youtube-dl
# See https://github.com/rg3/youtube-dl/issues/10408
enc_data = self._html_search_regex(
r'<span[^>]+id="hiddenurl"[^>]*>([^<]+)</span>', webpage, 'encrypted data')
r'<span[^>]*>([^<]+)</span>\s*<span[^>]*>[^<]+</span>\s*<span[^>]+id="streamurl"',
webpage, 'encrypted data')
video_url_chars = []
@@ -60,7 +77,7 @@ class OpenloadIE(InfoExtractor):
if j >= 33 and j <= 126:
j = ((j + 14) % 94) + 33
if idx == len(enc_data) - 1:
j += 3
j += 2
video_url_chars += compat_chr(j)
video_url = 'https://openload.co/stream/%s?mime=true' % ''.join(video_url_chars)
@@ -70,11 +87,17 @@ class OpenloadIE(InfoExtractor):
'title', default=None) or self._html_search_meta(
'description', webpage, 'title', fatal=True)
return {
entries = self._parse_html5_media_entries(url, webpage, video_id)
subtitles = entries[0]['subtitles'] if entries else None
info_dict = {
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'url': video_url,
# Seems all videos have extensions in their titles
'ext': determine_ext(title),
'subtitles': subtitles,
}
return info_dict

View File

@@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_iso8601,
@@ -41,6 +43,13 @@ class PeriscopeIE(PeriscopeBaseIE):
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=([\'"])(?P<url>(?:https?:)?//(?:www\.)?periscope\.tv/(?:(?!\1).)+)\1', webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
token = self._match_id(url)
@@ -78,7 +87,7 @@ class PeriscopeIE(PeriscopeBaseIE):
'ext': 'flv' if format_id == 'rtmp' else 'mp4',
}
if format_id != 'rtmp':
f['protocol'] = 'm3u8_native' if state == 'ended' else 'm3u8'
f['protocol'] = 'm3u8_native' if state in ('ended', 'timed_out') else 'm3u8'
formats.append(f)
self._sort_formats(formats)
@@ -123,7 +132,7 @@ class PeriscopeUserIE(PeriscopeBaseIE):
user = list(data_store['UserCache']['users'].values())[0]['user']
user_id = user['id']
session_id = data_store['SessionToken']['broadcastHistory']['token']['session_id']
session_id = data_store['SessionToken']['public']['broadcastHistory']['token']['session_id']
broadcasts = self._call_api(
'getUserBroadcastsPublic',

View File

@@ -7,7 +7,6 @@ from .common import InfoExtractor
from ..utils import (
determine_ext,
ExtractorError,
sanitized_Request,
urlencode_postdata,
)
@@ -15,12 +14,12 @@ from ..utils import (
class PromptFileIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?promptfile\.com/l/(?P<id>[0-9A-Z\-]+)'
_TEST = {
'url': 'http://www.promptfile.com/l/D21B4746E9-F01462F0FF',
'md5': 'd1451b6302da7215485837aaea882c4c',
'url': 'http://www.promptfile.com/l/86D1CE8462-576CAAE416',
'md5': '5a7e285a26e0d66d9a263fae91bc92ce',
'info_dict': {
'id': 'D21B4746E9-F01462F0FF',
'id': '86D1CE8462-576CAAE416',
'ext': 'mp4',
'title': 'Birds.mp4',
'title': 'oceans.mp4',
'thumbnail': 're:^https?://.*\.jpg$',
}
}
@@ -33,14 +32,23 @@ class PromptFileIE(InfoExtractor):
raise ExtractorError('Video %s does not exist' % video_id,
expected=True)
chash = self._search_regex(
r'val\("([^"]*)"\s*\+\s*\$\("#chash"\)', webpage, 'chash')
fields = self._hidden_inputs(webpage)
post = urlencode_postdata(fields)
req = sanitized_Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
webpage = self._download_webpage(
req, video_id, 'Downloading video page')
keys = list(fields.keys())
chash_key = keys[0] if len(keys) == 1 else next(
key for key in keys if key.startswith('cha'))
fields[chash_key] = chash + fields[chash_key]
url = self._html_search_regex(r'url:\s*\'([^\']+)\'', webpage, 'URL')
webpage = self._download_webpage(
url, video_id, 'Downloading video page',
data=urlencode_postdata(fields),
headers={'Content-type': 'application/x-www-form-urlencoded'})
video_url = self._search_regex(
(r'<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1[^>]*>\s*Download File',
r'<a[^>]+href=(["\'])(?P<url>https?://(?:www\.)?promptfile\.com/file/(?:(?!\1).)+)\1'),
webpage, 'video url', group='url')
title = self._html_search_regex(
r'<span.+title="([^"]+)">', webpage, 'title')
thumbnail = self._html_search_regex(
@@ -49,7 +57,7 @@ class PromptFileIE(InfoExtractor):
formats = [{
'format_id': 'sd',
'url': url,
'url': video_url,
'ext': determine_ext(title),
}]
self._sort_formats(formats)

View File

@@ -122,7 +122,17 @@ class ProSiebenSat1BaseIE(InfoExtractor):
class ProSiebenSat1IE(ProSiebenSat1BaseIE):
IE_NAME = 'prosiebensat1'
IE_DESC = 'ProSiebenSat.1 Digital'
_VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany|7tv)\.(?:de|at|ch)|ran\.de|fem\.com)/(?P<id>.+)'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
(?:
prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia
)\.(?:de|at|ch)|
ran\.de|fem\.com|advopedia\.de
)
/(?P<id>.+)
'''
_TESTS = [
{
@@ -290,6 +300,24 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'skip_download': True,
},
},
{
# geo restricted to Germany
'url': 'http://www.kabeleinsdoku.de/tv/mayday-alarm-im-cockpit/video/102-notlandung-im-hudson-river-ganze-folge',
'only_matching': True,
},
{
# geo restricted to Germany
'url': 'http://www.sat1gold.de/tv/edel-starck/video/11-staffel-1-episode-1-partner-wider-willen-ganze-folge',
'only_matching': True,
},
{
'url': 'http://www.sat1gold.de/tv/edel-starck/playlist/die-gesamte-1-staffel',
'only_matching': True,
},
{
'url': 'http://www.advopedia.de/videos/lenssen-klaert-auf/lenssen-klaert-auf-folge-8-staffel-3-feiertage-und-freie-tage',
'only_matching': True,
},
]
_TOKEN = 'prosieben'
@@ -361,19 +389,28 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
def _extract_playlist(self, url, webpage):
playlist_id = self._html_search_regex(
self._PLAYLIST_ID_REGEXES, webpage, 'playlist id')
for regex in self._PLAYLIST_CLIP_REGEXES:
playlist_clips = re.findall(regex, webpage)
if playlist_clips:
title = self._html_search_regex(
self._TITLE_REGEXES, webpage, 'title')
description = self._html_search_regex(
self._DESCRIPTION_REGEXES, webpage, 'description', fatal=False)
entries = [
self.url_result(
re.match('(.+?//.+?)/', url).group(1) + clip_path,
'ProSiebenSat1')
for clip_path in playlist_clips]
return self.playlist_result(entries, playlist_id, title, description)
playlist = self._parse_json(
self._search_regex(
'var\s+contentResources\s*=\s*(\[.+?\]);\s*</script',
webpage, 'playlist'),
playlist_id)
entries = []
for item in playlist:
clip_id = item.get('id') or item.get('upc')
if not clip_id:
continue
info = self._extract_video_info(url, clip_id)
info.update({
'id': clip_id,
'title': item.get('title') or item.get('teaser', {}).get('headline'),
'description': item.get('teaser', {}).get('description'),
'thumbnail': item.get('poster'),
'duration': float_or_none(item.get('duration')),
'series': item.get('tvShowTitle'),
'uploader': item.get('broadcastPublisher'),
})
entries.append(info)
return self.playlist_result(entries, playlist_id)
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -13,6 +13,7 @@ from ..utils import (
xpath_element,
ExtractorError,
determine_protocol,
unsmuggle_url,
)
@@ -35,28 +36,51 @@ class RadioCanadaIE(InfoExtractor):
}
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
app_code, video_id = re.match(self._VALID_URL, url).groups()
device_types = ['ipad', 'android']
metadata = self._download_xml(
'http://api.radio-canada.ca/metaMedia/v1/index.ashx',
video_id, note='Downloading metadata XML', query={
'appCode': app_code,
'idMedia': video_id,
})
def get_meta(name):
el = find_xpath_attr(metadata, './/Meta', 'name', name)
return el.text if el is not None else None
if get_meta('protectionType'):
raise ExtractorError('This video is DRM protected.', expected=True)
device_types = ['ipad']
if app_code != 'toutv':
device_types.append('flash')
if not smuggled_data:
device_types.append('android')
formats = []
# TODO: extract f4m formats
# f4m formats can be extracted using flashhd device_type but they produce unplayable file
for device_type in device_types:
v_data = self._download_xml(
'http://api.radio-canada.ca/validationMedia/v1/Validation.ashx',
video_id, note='Downloading %s XML' % device_type, query={
'appCode': app_code,
'idMedia': video_id,
'connectionType': 'broadband',
'multibitrate': 'true',
'deviceType': device_type,
validation_url = 'http://api.radio-canada.ca/validationMedia/v1/Validation.ashx'
query = {
'appCode': app_code,
'idMedia': video_id,
'connectionType': 'broadband',
'multibitrate': 'true',
'deviceType': device_type,
}
if smuggled_data:
validation_url = 'https://services.radio-canada.ca/media/validation/v2/'
query.update(smuggled_data)
else:
query.update({
# paysJ391wsHjbOJwvCs26toz and bypasslock are used to bypass geo-restriction
'paysJ391wsHjbOJwvCs26toz': 'CA',
'bypasslock': 'NZt5K62gRqfc',
}, fatal=False)
})
v_data = self._download_xml(validation_url, video_id, note='Downloading %s XML' % device_type, query=query, fatal=False)
v_url = xpath_text(v_data, 'url')
if not v_url:
continue
@@ -101,17 +125,6 @@ class RadioCanadaIE(InfoExtractor):
f4m_id='hds', fatal=False))
self._sort_formats(formats)
metadata = self._download_xml(
'http://api.radio-canada.ca/metaMedia/v1/index.ashx',
video_id, note='Downloading metadata XML', query={
'appCode': app_code,
'idMedia': video_id,
})
def get_meta(name):
el = find_xpath_attr(metadata, './/Meta', 'name', name)
return el.text if el is not None else None
return {
'id': video_id,
'title': get_meta('Title'),

View File

@@ -43,7 +43,7 @@ class RudoIE(JWPlatformBaseIE):
transform_source=lambda s: js_to_json(re.sub(r'encodeURI\([^)]+\)', '""', s)))
info_dict = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False, m3u8_id='hls')
jwplayer_data, video_id, require_title=False, m3u8_id='hls', mpd_id='dash')
info_dict.update({
'title': self._og_search_title(webpage),

View File

@@ -53,6 +53,7 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'E.T. ExTerrestrial Music',
'title': 'Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1',
'duration': 143,
'license': 'all-rights-reserved',
}
},
# not streamable song
@@ -66,6 +67,7 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'The Royal Concept',
'upload_date': '20120521',
'duration': 227,
'license': 'all-rights-reserved',
},
'params': {
# rtmp
@@ -84,6 +86,7 @@ class SoundcloudIE(InfoExtractor):
'description': 'test chars: \"\'/\\ä↭',
'upload_date': '20131209',
'duration': 9,
'license': 'all-rights-reserved',
},
},
# private link (alt format)
@@ -98,6 +101,7 @@ class SoundcloudIE(InfoExtractor):
'description': 'test chars: \"\'/\\ä↭',
'upload_date': '20131209',
'duration': 9,
'license': 'all-rights-reserved',
},
},
# downloadable song
@@ -112,6 +116,7 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'oddsamples',
'upload_date': '20140109',
'duration': 17,
'license': 'cc-by-sa',
},
},
]
@@ -138,20 +143,20 @@ class SoundcloudIE(InfoExtractor):
name = full_title or track_id
if quiet:
self.report_extraction(name)
thumbnail = info['artwork_url']
if thumbnail is not None:
thumbnail = info.get('artwork_url')
if isinstance(thumbnail, compat_str):
thumbnail = thumbnail.replace('-large', '-t500x500')
ext = 'mp3'
result = {
'id': track_id,
'uploader': info['user']['username'],
'upload_date': unified_strdate(info['created_at']),
'uploader': info.get('user', {}).get('username'),
'upload_date': unified_strdate(info.get('created_at')),
'title': info['title'],
'description': info['description'],
'description': info.get('description'),
'thumbnail': thumbnail,
'duration': int_or_none(info.get('duration'), 1000),
'webpage_url': info.get('permalink_url'),
'license': info.get('license'),
}
formats = []
if info.get('downloadable', False):
@@ -221,7 +226,7 @@ class SoundcloudIE(InfoExtractor):
raise ExtractorError('Invalid URL: %s' % url)
track_id = mobj.group('track_id')
token = None
if track_id is not None:
info_json_url = 'http://api.soundcloud.com/tracks/' + track_id + '.json?client_id=' + self._CLIENT_ID
full_title = track_id
@@ -255,7 +260,20 @@ class SoundcloudIE(InfoExtractor):
return self._extract_info_dict(info, full_title, secret_token=token)
class SoundcloudSetIE(SoundcloudIE):
class SoundcloudPlaylistBaseIE(SoundcloudIE):
@staticmethod
def _extract_id(e):
return compat_str(e['id']) if e.get('id') else None
def _extract_track_entries(self, tracks):
return [
self.url_result(
track['permalink_url'], SoundcloudIE.ie_key(),
video_id=self._extract_id(track))
for track in tracks if track.get('permalink_url')]
class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
_VALID_URL = r'https?://(?:(?:www|m)\.)?soundcloud\.com/(?P<uploader>[\w\d-]+)/sets/(?P<slug_title>[\w\d-]+)(?:/(?P<token>[^?/]+))?'
IE_NAME = 'soundcloud:set'
_TESTS = [{
@@ -294,7 +312,7 @@ class SoundcloudSetIE(SoundcloudIE):
msgs = (compat_str(err['error_message']) for err in info['errors'])
raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
entries = [self.url_result(track['permalink_url'], 'Soundcloud') for track in info['tracks']]
entries = self._extract_track_entries(info['tracks'])
return {
'_type': 'playlist',
@@ -304,7 +322,7 @@ class SoundcloudSetIE(SoundcloudIE):
}
class SoundcloudUserIE(SoundcloudIE):
class SoundcloudUserIE(SoundcloudPlaylistBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:(?:www|m)\.)?soundcloud\.com/
@@ -321,21 +339,21 @@ class SoundcloudUserIE(SoundcloudIE):
'id': '114582580',
'title': 'The Akashic Chronicler (All)',
},
'playlist_mincount': 111,
'playlist_mincount': 74,
}, {
'url': 'https://soundcloud.com/the-akashic-chronicler/tracks',
'info_dict': {
'id': '114582580',
'title': 'The Akashic Chronicler (Tracks)',
},
'playlist_mincount': 50,
'playlist_mincount': 37,
}, {
'url': 'https://soundcloud.com/the-akashic-chronicler/sets',
'info_dict': {
'id': '114582580',
'title': 'The Akashic Chronicler (Playlists)',
},
'playlist_mincount': 3,
'playlist_mincount': 2,
}, {
'url': 'https://soundcloud.com/the-akashic-chronicler/reposts',
'info_dict': {
@@ -354,7 +372,7 @@ class SoundcloudUserIE(SoundcloudIE):
'url': 'https://soundcloud.com/grynpyret/spotlight',
'info_dict': {
'id': '7098329',
'title': 'Grynpyret (Spotlight)',
'title': 'GRYNPYRET (Spotlight)',
},
'playlist_mincount': 1,
}]
@@ -416,13 +434,14 @@ class SoundcloudUserIE(SoundcloudIE):
for cand in candidates:
if isinstance(cand, dict):
permalink_url = cand.get('permalink_url')
entry_id = self._extract_id(cand)
if permalink_url and permalink_url.startswith('http'):
return permalink_url
return permalink_url, entry_id
for e in collection:
permalink_url = resolve_permalink_url((e, e.get('track'), e.get('playlist')))
permalink_url, entry_id = resolve_permalink_url((e, e.get('track'), e.get('playlist')))
if permalink_url:
entries.append(self.url_result(permalink_url))
entries.append(self.url_result(permalink_url, video_id=entry_id))
next_href = response.get('next_href')
if not next_href:
@@ -442,7 +461,7 @@ class SoundcloudUserIE(SoundcloudIE):
}
class SoundcloudPlaylistIE(SoundcloudIE):
class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
_VALID_URL = r'https?://api\.soundcloud\.com/playlists/(?P<id>[0-9]+)(?:/?\?secret_token=(?P<token>[^&]+?))?$'
IE_NAME = 'soundcloud:playlist'
_TESTS = [{
@@ -472,7 +491,7 @@ class SoundcloudPlaylistIE(SoundcloudIE):
data = self._download_json(
base_url + data, playlist_id, 'Downloading playlist')
entries = [self.url_result(track['permalink_url'], 'Soundcloud') for track in data['tracks']]
entries = self._extract_track_entries(data['tracks'])
return {
'_type': 'playlist',

View File

@@ -16,7 +16,7 @@ class SVTBaseIE(InfoExtractor):
def _extract_video(self, video_info, video_id):
formats = []
for vr in video_info['videoReferences']:
player_type = vr.get('playerType')
player_type = vr.get('playerType') or vr.get('format')
vurl = vr['url']
ext = determine_ext(vurl)
if ext == 'm3u8':

View File

@@ -4,10 +4,7 @@ from __future__ import unicode_literals
import re
from .turner import TurnerBaseIE
from ..utils import (
extract_attributes,
ExtractorError,
)
from ..utils import extract_attributes
class TBSIE(TurnerBaseIE):
@@ -37,10 +34,6 @@ class TBSIE(TurnerBaseIE):
site = domain[:3]
webpage = self._download_webpage(url, display_id)
video_params = extract_attributes(self._search_regex(r'(<[^>]+id="page-video"[^>]*>)', webpage, 'video params'))
if video_params.get('isAuthRequired') == 'true':
raise ExtractorError(
'This video is only available via cable service provider subscription that'
' is not currently supported.', expected=True)
query = None
clip_id = video_params.get('clipid')
if clip_id:
@@ -56,4 +49,8 @@ class TBSIE(TurnerBaseIE):
'media_src': 'http://androidhls-secure.cdn.turner.com/%s/big' % site,
'tokenizer_src': 'http://www.%s.com/video/processors/services/token_ipadAdobe.do' % domain,
},
}, {
'url': url,
'site_name': site.upper(),
'auth_required': video_params.get('isAuthRequired') != 'false',
})

View File

@@ -3,13 +3,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import determine_ext
from .jwplatform import JWPlatformBaseIE
from ..utils import remove_end
class ThisAVIE(InfoExtractor):
class ThisAVIE(JWPlatformBaseIE):
_VALID_URL = r'https?://(?:www\.)?thisav\.com/video/(?P<id>[0-9]+)/.*'
_TEST = {
_TESTS = [{
'url': 'http://www.thisav.com/video/47734/%98%26sup1%3B%83%9E%83%82---just-fit.html',
'md5': '0480f1ef3932d901f0e0e719f188f19b',
'info_dict': {
@@ -19,29 +19,49 @@ class ThisAVIE(InfoExtractor):
'uploader': 'dj7970',
'uploader_id': 'dj7970'
}
}
}, {
'url': 'http://www.thisav.com/video/242352/nerdy-18yo-big-ass-tattoos-and-glasses.html',
'md5': 'ba90c076bd0f80203679e5b60bf523ee',
'info_dict': {
'id': '242352',
'ext': 'mp4',
'title': 'Nerdy 18yo Big Ass Tattoos and Glasses',
'uploader': 'cybersluts',
'uploader_id': 'cybersluts',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<h1>([^<]*)</h1>', webpage, 'title')
title = remove_end(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'),
' - 視頻 - ThisAV.com-世界第一中文成人娛樂網站')
video_url = self._html_search_regex(
r"addVariable\('file','([^']+)'\);", webpage, 'video url')
r"addVariable\('file','([^']+)'\);", webpage, 'video url', default=None)
if video_url:
info_dict = {
'formats': [{
'url': video_url,
}],
}
else:
info_dict = self._extract_jwplayer_data(
webpage, video_id, require_title=False)
uploader = self._html_search_regex(
r': <a href="http://www.thisav.com/user/[0-9]+/(?:[^"]+)">([^<]+)</a>',
webpage, 'uploader name', fatal=False)
uploader_id = self._html_search_regex(
r': <a href="http://www.thisav.com/user/[0-9]+/([^"]+)">(?:[^<]+)</a>',
webpage, 'uploader id', fatal=False)
ext = determine_ext(video_url)
return {
info_dict.update({
'id': video_id,
'url': video_url,
'uploader': uploader,
'uploader_id': uploader_id,
'title': title,
'ext': ext,
}
})
return info_dict

View File

@@ -2,12 +2,22 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import (
int_or_none,
js_to_json,
ExtractorError,
urlencode_postdata,
extract_attributes,
smuggle_url,
)
class TouTvIE(InfoExtractor):
_NETRC_MACHINE = 'toutv'
IE_NAME = 'tou.tv'
_VALID_URL = r'https?://ici\.tou\.tv/(?P<id>[a-zA-Z0-9_-]+/S[0-9]+E[0-9]+)'
_access_token = None
_claims = None
_TEST = {
'url': 'http://ici.tou.tv/garfield-tout-court/S2015E17',
@@ -22,18 +32,64 @@ class TouTvIE(InfoExtractor):
# m3u8 download
'skip_download': True,
},
'skip': '404 Not Found',
}
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
return
state = 'http://ici.tou.tv//'
webpage = self._download_webpage(state, None, 'Downloading homepage')
toutvlogin = self._parse_json(self._search_regex(
r'(?s)toutvlogin\s*=\s*({.+?});', webpage, 'toutvlogin'), None, js_to_json)
authorize_url = toutvlogin['host'] + '/auth/oauth/v2/authorize'
login_webpage = self._download_webpage(
authorize_url, None, 'Downloading login page', query={
'client_id': toutvlogin['clientId'],
'redirect_uri': 'https://ici.tou.tv/login/loginCallback',
'response_type': 'token',
'scope': 'media-drmt openid profile email id.write media-validation.read.privileged',
'state': state,
})
login_form = self._search_regex(
r'(?s)(<form[^>]+id="Form-login".+?</form>)', login_webpage, 'login form')
form_data = self._hidden_inputs(login_form)
form_data.update({
'login-email': email,
'login-password': password,
})
post_url = extract_attributes(login_form).get('action') or authorize_url
_, urlh = self._download_webpage_handle(
post_url, None, 'Logging in', data=urlencode_postdata(form_data))
self._access_token = self._search_regex(
r'access_token=([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
urlh.geturl(), 'access token')
self._claims = self._download_json(
'https://services.radio-canada.ca/media/validation/v2/getClaims',
None, 'Extracting Claims', query={
'token': self._access_token,
'access_token': self._access_token,
})['claims']
def _real_extract(self, url):
path = self._match_id(url)
metadata = self._download_json('http://ici.tou.tv/presentation/%s' % path, path)
if metadata.get('IsDrm'):
raise ExtractorError('This video is DRM protected.', expected=True)
video_id = metadata['IdMedia']
details = metadata['Details']
title = details['OriginalTitle']
video_url = 'radiocanada:%s:%s' % (metadata.get('AppCode', 'toutv'), video_id)
if self._access_token and self._claims:
video_url = smuggle_url(video_url, {
'access_token': self._access_token,
'claims': self._claims,
})
return {
'_type': 'url_transparent',
'url': 'radiocanada:%s:%s' % (metadata.get('AppCode', 'toutv'), video_id),
'url': video_url,
'id': video_id,
'title': title,
'thumbnail': details.get('ImageUrl'),

View File

@@ -22,9 +22,17 @@ class TruTVIE(TurnerBaseIE):
def _real_extract(self, url):
path, video_id = re.match(self._VALID_URL, url).groups()
auth_required = False
if path:
data_src = 'http://www.trutv.com/video/cvp/v2/xml/content.xml?id=%s.xml' % path
else:
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r"TTV\.TVE\.episodeId\s*=\s*'([^']+)';",
webpage, 'video id', default=video_id)
auth_required = self._search_regex(
r'TTV\.TVE\.authRequired\s*=\s*(true|false);',
webpage, 'auth required', default='false') == 'true'
data_src = 'http://www.trutv.com/tveverywhere/services/cvpXML.do?titleId=' + video_id
return self._extract_cvp_info(
data_src, path, {
@@ -32,4 +40,8 @@ class TruTVIE(TurnerBaseIE):
'media_src': 'http://androidhls-secure.cdn.turner.com/trutv/big',
'tokenizer_src': 'http://www.trutv.com/tveverywhere/processors/services/token_ipadAdobe.do',
},
}, {
'url': url,
'site_name': 'truTV',
'auth_required': auth_required,
})

View File

@@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .adobepass import AdobePassIE
from ..compat import compat_str
from ..utils import (
xpath_text,
@@ -16,11 +16,11 @@ from ..utils import (
)
class TurnerBaseIE(InfoExtractor):
class TurnerBaseIE(AdobePassIE):
def _extract_timestamp(self, video_data):
return int_or_none(xpath_attr(video_data, 'dateCreated', 'uts'))
def _extract_cvp_info(self, data_src, video_id, path_data={}):
def _extract_cvp_info(self, data_src, video_id, path_data={}, ap_data={}):
video_data = self._download_xml(data_src, video_id)
video_id = video_data.attrib['id']
title = xpath_text(video_data, 'headline', fatal=True)
@@ -70,11 +70,14 @@ class TurnerBaseIE(InfoExtractor):
secure_path = self._search_regex(r'https?://[^/]+(.+/)', video_url, 'secure path') + '*'
token = tokens.get(secure_path)
if not token:
query = {
'path': secure_path,
'videoId': content_id,
}
if ap_data.get('auth_required'):
query['accessToken'] = self._extract_mvpd_auth(ap_data['url'], video_id, ap_data['site_name'], ap_data['site_name'])
auth = self._download_xml(
secure_path_data['tokenizer_src'], video_id, query={
'path': secure_path,
'videoId': content_id,
})
secure_path_data['tokenizer_src'], video_id, query=query)
error_msg = xpath_text(auth, 'error/msg')
if error_msg:
raise ExtractorError(error_msg, expected=True)

View File

@@ -2,9 +2,13 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
parse_iso8601,
try_get,
update_url_query,
)
@@ -65,36 +69,47 @@ class TV4IE(InfoExtractor):
video_id = self._match_id(url)
info = self._download_json(
'http://www.tv4play.se/player/assets/%s.json' % video_id, video_id, 'Downloading video info JSON')
'http://www.tv4play.se/player/assets/%s.json' % video_id,
video_id, 'Downloading video info JSON')
# If is_geo_restricted is true, it doesn't necessarily mean we can't download it
if info['is_geo_restricted']:
if info.get('is_geo_restricted'):
self.report_warning('This content might not be available in your country due to licensing restrictions.')
if info['requires_subscription']:
if info.get('requires_subscription'):
raise ExtractorError('This content requires subscription.', expected=True)
sources_data = self._download_json(
'https://prima.tv4play.se/api/web/asset/%s/play.json?protocol=http&videoFormat=MP4' % video_id, video_id, 'Downloading sources JSON')
sources = sources_data['playback']
title = info['title']
formats = []
for item in sources.get('items', {}).get('item', []):
ext, bitrate = item['mediaFormat'], item['bitrate']
formats.append({
'format_id': '%s_%s' % (ext, bitrate),
'tbr': bitrate,
'ext': ext,
'url': item['url'],
})
# http formats are linked with unresolvable host
for kind in ('hls', ''):
data = self._download_json(
'https://prima.tv4play.se/api/web/asset/%s/play.json' % video_id,
video_id, 'Downloading sources JSON', query={
'protocol': kind,
'videoFormat': 'MP4+WEBVTTS+WEBVTT',
})
item = try_get(data, lambda x: x['playback']['items']['item'], dict)
manifest_url = item.get('url')
if not isinstance(manifest_url, compat_str):
continue
if kind == 'hls':
formats.extend(self._extract_m3u8_formats(
manifest_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=kind, fatal=False))
else:
formats.extend(self._extract_f4m_formats(
update_url_query(manifest_url, {'hdcore': '3.8.0'}),
video_id, f4m_id='hds', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'title': info['title'],
'title': title,
'formats': formats,
'description': info.get('description'),
'timestamp': parse_iso8601(info.get('broadcast_date_time')),
'duration': info.get('duration'),
'duration': int_or_none(info.get('duration')),
'thumbnail': info.get('image'),
'is_live': sources.get('live'),
'is_live': info.get('is_live') is True,
}

View File

@@ -6,7 +6,7 @@ from .mtv import MTVServicesInfoExtractor
class TVLandIE(MTVServicesInfoExtractor):
IE_NAME = 'tvland.com'
_VALID_URL = r'https?://(?:www\.)?tvland\.com/(?:video-clips|episodes)/(?P<id>[^/?#.]+)'
_VALID_URL = r'https?://(?:www\.)?tvland\.com/(?:video-clips|(?:full-)?episodes)/(?P<id>[^/?#.]+)'
_FEED_URL = 'http://www.tvland.com/feeds/mrss/'
_TESTS = [{
# Geo-restricted. Without a proxy metadata are still there. With a
@@ -28,4 +28,7 @@ class TVLandIE(MTVServicesInfoExtractor):
'upload_date': '20151228',
'timestamp': 1451289600,
},
}, {
'url': 'http://www.tvland.com/full-episodes/iu0hz6/younger-a-kiss-is-just-a-kiss-season-3-ep-301',
'only_matching': True,
}]

View File

@@ -247,6 +247,7 @@ class TwitchVodIE(TwitchItemBaseIE):
# m3u8 download
'skip_download': True,
},
'skip': 'HTTP Error 404: Not Found',
}]
def _real_extract(self, url):
@@ -400,11 +401,8 @@ class TwitchStreamIE(TwitchBaseIE):
'kraken/streams/%s' % channel_id, channel_id,
'Downloading stream JSON').get('stream')
# Fallback on profile extraction if stream is offline
if not stream:
return self.url_result(
'http://www.twitch.tv/%s/profile' % channel_id,
'TwitchProfile', channel_id)
raise ExtractorError('%s is offline' % channel_id, expected=True)
# Channel name may be typed if different case than the original channel name
# (e.g. http://www.twitch.tv/TWITCHPLAYSPOKEMON) that will lead to constructing

View File

@@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
float_or_none,
@@ -13,6 +14,8 @@ from ..utils import (
ExtractorError,
)
from .periscope import PeriscopeIE
class TwitterBaseIE(InfoExtractor):
def _get_vmap_video_url(self, vmap_url, video_id):
@@ -48,12 +51,12 @@ class TwitterCardIE(TwitterBaseIE):
},
{
'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977',
'md5': 'd4724ffe6d2437886d004fa5de1043b3',
'md5': 'b6d9683dd3f48e340ded81c0e917ad46',
'info_dict': {
'id': 'dq4Oj5quskI',
'ext': 'mp4',
'title': 'Ubuntu 11.10 Overview',
'description': 'Take a quick peek at what\'s new and improved in Ubuntu 11.10.\n\nOnce installed take a look at 10 Things to Do After Installing: http://www.omgubuntu.co.uk/2011/10/10...',
'description': 'md5:a831e97fa384863d6e26ce48d1c43376',
'upload_date': '20111013',
'uploader': 'OMG! Ubuntu!',
'uploader_id': 'omgubuntu',
@@ -100,12 +103,17 @@ class TwitterCardIE(TwitterBaseIE):
return self.url_result(iframe_url)
config = self._parse_json(self._html_search_regex(
r'data-(?:player-)?config="([^"]+)"', webpage, 'data player config'),
r'data-(?:player-)?config="([^"]+)"', webpage,
'data player config', default='{}'),
video_id)
if config.get('source_type') == 'vine':
return self.url_result(config['player_url'], 'Vine')
periscope_url = PeriscopeIE._extract_url(webpage)
if periscope_url:
return self.url_result(periscope_url, PeriscopeIE.ie_key())
def _search_dimensions_in_video_url(a_format, video_url):
m = re.search(r'/(?P<width>\d+)x(?P<height>\d+)/', video_url)
if m:
@@ -244,10 +252,10 @@ class TwitterIE(InfoExtractor):
'info_dict': {
'id': '700207533655363584',
'ext': 'mp4',
'title': 'Donte The Dumbass - BEAT PROD: @suhmeduh #Damndaniel',
'description': 'Donte The Dumbass on Twitter: "BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
'title': 'JG - BEAT PROD: @suhmeduh #Damndaniel',
'description': 'JG on Twitter: "BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
'thumbnail': 're:^https?://.*\.jpg',
'uploader': 'Donte The Dumbass',
'uploader': 'JG',
'uploader_id': 'jaydingeer',
},
'params': {
@@ -278,6 +286,18 @@ class TwitterIE(InfoExtractor):
'params': {
'skip_download': True, # requires ffmpeg
},
}, {
'url': 'https://twitter.com/OPP_HSD/status/779210622571536384',
'info_dict': {
'id': '1zqKVVlkqLaKB',
'ext': 'mp4',
'title': 'Sgt Kerry Schmidt - Ontario Provincial Police - Road rage, mischief, assault, rollover and fire in one occurrence',
'upload_date': '20160923',
'uploader_id': 'OPP_HSD',
'uploader': 'Sgt Kerry Schmidt - Ontario Provincial Police',
'timestamp': 1474613214,
},
'add_ie': ['Periscope'],
}]
def _real_extract(self, url):
@@ -328,13 +348,22 @@ class TwitterIE(InfoExtractor):
})
return info
twitter_card_url = None
if 'class="PlayableMedia' in webpage:
twitter_card_url = '%s//twitter.com/i/videos/tweet/%s' % (self.http_scheme(), twid)
else:
twitter_card_iframe_url = self._search_regex(
r'data-full-card-iframe-url=([\'"])(?P<url>(?:(?!\1).)+)\1',
webpage, 'Twitter card iframe URL', default=None, group='url')
if twitter_card_iframe_url:
twitter_card_url = compat_urlparse.urljoin(url, twitter_card_iframe_url)
if twitter_card_url:
info.update({
'_type': 'url_transparent',
'ie_key': 'TwitterCard',
'url': '%s//twitter.com/i/videos/tweet/%s' % (self.http_scheme(), twid),
'url': twitter_card_url,
})
return info
raise ExtractorError('There\'s no video in this tweet.')

View File

@@ -5,6 +5,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
compat_urllib_request,
compat_urlparse,
)
@@ -207,7 +208,7 @@ class UdemyIE(InfoExtractor):
if youtube_url:
return self.url_result(youtube_url, 'Youtube')
video_id = asset['id']
video_id = compat_str(asset['id'])
thumbnail = asset.get('thumbnail_url') or asset.get('thumbnailUrl')
duration = float_or_none(asset.get('data', {}).get('duration'))

View File

@@ -1,15 +1,20 @@
from __future__ import unicode_literals
import random
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
encode_data_uri,
ExtractorError,
int_or_none,
float_or_none,
mimetype2ext,
str_or_none,
)
@@ -47,8 +52,108 @@ class UstreamIE(InfoExtractor):
'id': '10299409',
},
'playlist_count': 3,
}, {
'url': 'http://www.ustream.tv/recorded/91343263',
'info_dict': {
'id': '91343263',
'ext': 'mp4',
'title': 'GitHub Universe - General Session - Day 1',
'upload_date': '20160914',
'description': 'GitHub Universe - General Session - Day 1',
'timestamp': 1473872730,
'uploader': 'wa0dnskeqkr',
'uploader_id': '38977840',
},
'params': {
'skip_download': True, # m3u8 download
},
}]
def _get_stream_info(self, url, video_id, app_id_ver, extra_note=None):
def num_to_hex(n):
return hex(n)[2:]
rnd = random.randrange
if not extra_note:
extra_note = ''
conn_info = self._download_json(
'http://r%d-1-%s-recorded-lp-live.ums.ustream.tv/1/ustream' % (rnd(1e8), video_id),
video_id, note='Downloading connection info' + extra_note,
query={
'type': 'viewer',
'appId': app_id_ver[0],
'appVersion': app_id_ver[1],
'rsid': '%s:%s' % (num_to_hex(rnd(1e8)), num_to_hex(rnd(1e8))),
'rpin': '_rpin.%d' % rnd(1e15),
'referrer': url,
'media': video_id,
'application': 'recorded',
})
host = conn_info[0]['args'][0]['host']
connection_id = conn_info[0]['args'][0]['connectionId']
return self._download_json(
'http://%s/1/ustream?connectionId=%s' % (host, connection_id),
video_id, note='Downloading stream info' + extra_note)
def _get_streams(self, url, video_id, app_id_ver):
# Sometimes the return dict does not have 'stream'
for trial_count in range(3):
stream_info = self._get_stream_info(
url, video_id, app_id_ver,
extra_note=' (try %d)' % (trial_count + 1) if trial_count > 0 else '')
if 'stream' in stream_info[0]['args'][0]:
return stream_info[0]['args'][0]['stream']
return []
def _parse_segmented_mp4(self, dash_stream_info):
def resolve_dash_template(template, idx, chunk_hash):
return template.replace('%', compat_str(idx), 1).replace('%', chunk_hash)
formats = []
for stream in dash_stream_info['streams']:
# Use only one provider to avoid too many formats
provider = dash_stream_info['providers'][0]
fragments = [{
'url': resolve_dash_template(
provider['url'] + stream['initUrl'], 0, dash_stream_info['hashes']['0'])
}]
for idx in range(dash_stream_info['videoLength'] // dash_stream_info['chunkTime']):
fragments.append({
'url': resolve_dash_template(
provider['url'] + stream['segmentUrl'], idx,
dash_stream_info['hashes'][compat_str(idx // 10 * 10)])
})
content_type = stream['contentType']
kind = content_type.split('/')[0]
f = {
'format_id': '-'.join(filter(None, [
'dash', kind, str_or_none(stream.get('bitrate'))])),
'protocol': 'http_dash_segments',
# TODO: generate a MPD doc for external players?
'url': encode_data_uri(b'<MPD/>', 'text/xml'),
'ext': mimetype2ext(content_type),
'height': stream.get('height'),
'width': stream.get('width'),
'fragments': fragments,
}
if kind == 'video':
f.update({
'vcodec': stream.get('codec'),
'acodec': 'none',
'vbr': stream.get('bitrate'),
})
else:
f.update({
'vcodec': 'none',
'acodec': stream.get('codec'),
'abr': stream.get('bitrate'),
})
formats.append(f)
return formats
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
@@ -86,7 +191,22 @@ class UstreamIE(InfoExtractor):
'url': video_url,
'ext': format_id,
'filesize': filesize,
} for format_id, video_url in video['media_urls'].items()]
} for format_id, video_url in video['media_urls'].items() if video_url]
if not formats:
hls_streams = self._get_streams(url, video_id, app_id_ver=(11, 2))
if hls_streams:
# m3u8_native leads to intermittent ContentTooShortError
formats.extend(self._extract_m3u8_formats(
hls_streams[0]['url'], video_id, ext='mp4', m3u8_id='hls'))
'''
# DASH streams handling is incomplete as 'url' is missing
dash_streams = self._get_streams(url, video_id, app_id_ver=(3, 1))
if dash_streams:
formats.extend(self._parse_segmented_mp4(dash_streams))
'''
self._sort_formats(formats)
description = video.get('description')

View File

@@ -22,6 +22,7 @@ class VGTVIE(XstreamIE):
'fvn.no/fvntv': 'fvntv',
'aftenposten.no/webtv': 'aptv',
'ap.vgtv.no/webtv': 'aptv',
'tv.aftonbladet.se/abtv': 'abtv',
}
_APP_NAME_TO_VENDOR = {
@@ -30,6 +31,7 @@ class VGTVIE(XstreamIE):
'satv': 'sa',
'fvntv': 'fvn',
'aptv': 'ap',
'abtv': 'ab',
}
_VALID_URL = r'''(?x)
@@ -40,7 +42,8 @@ class VGTVIE(XstreamIE):
/?
(?:
\#!/(?:video|live)/|
embed?.*id=
embed?.*id=|
articles/
)|
(?P<appname>
%s
@@ -135,6 +138,14 @@ class VGTVIE(XstreamIE):
'url': 'http://www.vgtv.no/#!/video/127205/inside-the-mind-of-favela-funk',
'only_matching': True,
},
{
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'only_matching': True,
},
{
'url': 'abtv:140026',
'only_matching': True,
}
]
def _real_extract(self, url):

View File

@@ -84,7 +84,7 @@ class VideomoreIE(InfoExtractor):
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<object[^>]+data=(["\'])https?://videomore.ru/player\.swf\?.*config=(?P<url>https?://videomore\.ru/(?:[^/]+/)+\d+\.xml).*\1',
r'<object[^>]+data=(["\'])https?://videomore\.ru/player\.swf\?.*config=(?P<url>https?://videomore\.ru/(?:[^/]+/)+\d+\.xml).*\1',
webpage)
if mobj:
return mobj.group('url')

View File

@@ -48,8 +48,8 @@ class VierIE(InfoExtractor):
[r'data-filename="([^"]+)"', r'"filename"\s*:\s*"([^"]+)"'],
webpage, 'filename')
playlist_url = 'http://vod.streamcloud.be/%s/mp4:_definst_/%s.mp4/playlist.m3u8' % (application, filename)
formats = self._extract_m3u8_formats(playlist_url, display_id, 'mp4')
playlist_url = 'http://vod.streamcloud.be/%s/_definst_/mp4:%s.mp4/playlist.m3u8' % (application, filename)
formats = self._extract_wowza_formats(playlist_url, display_id)
self._sort_formats(formats)
title = self._og_search_title(webpage, default=display_id)

View File

@@ -20,11 +20,12 @@ from ..utils import (
remove_start,
str_to_int,
unescapeHTML,
unified_strdate,
unified_timestamp,
urlencode_postdata,
)
from .vimeo import VimeoIE
from .dailymotion import DailymotionIE
from .pladform import PladformIE
from .vimeo import VimeoIE
class VKBaseIE(InfoExtractor):
@@ -105,6 +106,7 @@ class VKIE(VKBaseIE):
'title': 'ProtivoGunz - Хуёвая песня',
'uploader': 're:(?:Noize MC|Alexander Ilyashenko).*',
'duration': 195,
'timestamp': 1329060660,
'upload_date': '20120212',
'view_count': int,
},
@@ -118,6 +120,7 @@ class VKIE(VKBaseIE):
'uploader': 'Tom Cruise',
'title': 'No name',
'duration': 9,
'timestamp': 1374374880,
'upload_date': '20130721',
'view_count': int,
}
@@ -194,6 +197,7 @@ class VKIE(VKBaseIE):
'upload_date': '20150709',
'view_count': int,
},
'skip': 'Removed',
},
{
# youtube embed
@@ -210,6 +214,23 @@ class VKIE(VKBaseIE):
'view_count': int,
},
},
{
# dailymotion embed
'url': 'https://vk.com/video-37468416_456239855',
'info_dict': {
'id': 'k3lz2cmXyRuJQSjGHUv',
'ext': 'mp4',
'title': 'md5:d52606645c20b0ddbb21655adaa4f56f',
'description': 'md5:c651358f03c56f1150b555c26d90a0fd',
'uploader': 'AniLibria.Tv',
'upload_date': '20160914',
'uploader_id': 'x1p5vl5',
'timestamp': 1473877246,
},
'params': {
'skip_download': True,
},
},
{
# video key is extra_data not url\d+
'url': 'http://vk.com/video-110305615_171782105',
@@ -219,10 +240,30 @@ class VKIE(VKBaseIE):
'ext': 'mp4',
'title': 'S-Dance, репетиции к The way show',
'uploader': 'THE WAY SHOW | 17 апреля',
'timestamp': 1454870100,
'upload_date': '20160207',
'view_count': int,
},
},
{
# finished live stream, live_mp4
'url': 'https://vk.com/videos-387766?z=video-387766_456242764%2Fpl_-387766_-2',
'md5': '90d22d051fccbbe9becfccc615be6791',
'info_dict': {
'id': '456242764',
'ext': 'mp4',
'title': 'ИгроМир 2016 — день 1',
'uploader': 'Игромания',
'duration': 5239,
'view_count': int,
},
},
{
# live stream, hls and rtmp links,most likely already finished live
# stream by the time you are reading this comment
'url': 'https://vk.com/video-140332_456239111',
'only_matching': True,
},
{
# removed video, just testing that we match the pattern
'url': 'http://vk.com/feed?z=video-43215063_166094326%2Fbb50cacd3177146d7a',
@@ -315,6 +356,10 @@ class VKIE(VKBaseIE):
m_rutube.group(1).replace('\\', ''))
return self.url_result(rutube_url)
dailymotion_urls = DailymotionIE._extract_urls(info_page)
if dailymotion_urls:
return self.url_result(dailymotion_urls[0], DailymotionIE.ie_key())
m_opts = re.search(r'(?s)var\s+opts\s*=\s*({.+?});', info_page)
if m_opts:
m_opts_url = re.search(r"url\s*:\s*'((?!/\b)[^']+)", m_opts.group(1))
@@ -327,42 +372,51 @@ class VKIE(VKBaseIE):
data_json = self._search_regex(r'var\s+vars\s*=\s*({.+?});', info_page, 'vars')
data = json.loads(data_json)
# Extract upload date
upload_date = None
mobj = re.search(r'id="mv_date(?:_views)?_wrap"[^>]*>([a-zA-Z]+ [0-9]+), ([0-9]+) at', info_page)
if mobj is not None:
mobj.group(1) + ' ' + mobj.group(2)
upload_date = unified_strdate(mobj.group(1) + ' ' + mobj.group(2))
title = unescapeHTML(data['md_title'])
view_count = None
views = self._html_search_regex(
r'"mv_views_count_number"[^>]*>(.+?\bviews?)<',
info_page, 'view count', default=None)
if views:
view_count = str_to_int(self._search_regex(
r'([\d,.]+)', views, 'view count', fatal=False))
if data.get('live') == 2:
title = self._live_title(title)
timestamp = unified_timestamp(self._html_search_regex(
r'class=["\']mv_info_date[^>]+>([^<]+)(?:<|from)', info_page,
'upload date', fatal=False))
view_count = str_to_int(self._search_regex(
r'class=["\']mv_views_count[^>]+>\s*([\d,.]+)',
info_page, 'view count', fatal=False))
formats = []
for k, v in data.items():
if not k.startswith('url') and not k.startswith('cache') and k != 'extra_data' or not v:
for format_id, format_url in data.items():
if not isinstance(format_url, compat_str) or not format_url.startswith(('http', '//', 'rtmp')):
continue
height = int_or_none(self._search_regex(
r'^(?:url|cache)(\d+)', k, 'height', default=None))
formats.append({
'format_id': k,
'url': v,
'height': height,
})
if format_id.startswith(('url', 'cache')) or format_id in ('extra_data', 'live_mp4'):
height = int_or_none(self._search_regex(
r'^(?:url|cache)(\d+)', format_id, 'height', default=None))
formats.append({
'format_id': format_id,
'url': format_url,
'height': height,
})
elif format_id == 'hls':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', m3u8_id=format_id,
fatal=False, live=True))
elif format_id == 'rtmp':
formats.append({
'format_id': format_id,
'url': format_url,
'ext': 'flv',
})
self._sort_formats(formats)
return {
'id': compat_str(data['vid']),
'id': compat_str(data.get('vid') or video_id),
'formats': formats,
'title': unescapeHTML(data['md_title']),
'title': title,
'thumbnail': data.get('jpg'),
'uploader': data.get('md_author'),
'duration': data.get('duration'),
'upload_date': upload_date,
'timestamp': timestamp,
'view_count': view_count,
}

View File

@@ -25,29 +25,8 @@ class VODPlatformIE(InfoExtractor):
title = unescapeHTML(self._og_search_title(webpage))
hidden_inputs = self._hidden_inputs(webpage)
base_url = self._search_regex(
'(.*/)(?:playlist.m3u8|manifest.mpd)',
hidden_inputs.get('HiddenmyhHlsLink') or hidden_inputs['HiddenmyDashLink'],
'base url')
formats = self._extract_m3u8_formats(
base_url + 'playlist.m3u8', video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
formats.extend(self._extract_mpd_formats(
base_url + 'manifest.mpd', video_id,
mpd_id='dash', fatal=False))
rtmp_formats = self._extract_smil_formats(
base_url + 'jwplayer.smil', video_id, fatal=False)
for rtmp_format in rtmp_formats:
rtsp_format = rtmp_format.copy()
rtsp_format['url'] = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
del rtsp_format['play_path']
del rtsp_format['ext']
rtsp_format.update({
'url': rtsp_format['url'].replace('rtmp://', 'rtsp://'),
'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
'protocol': 'rtsp',
})
formats.extend([rtmp_format, rtsp_format])
formats = self._extract_wowza_formats(
hidden_inputs.get('HiddenmyhHlsLink') or hidden_inputs['HiddenmyDashLink'], video_id, skip_protocols=['f4m', 'smil'])
self._sort_formats(formats)
return {

View File

@@ -9,13 +9,16 @@ class VoxMediaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:theverge|vox|sbnation|eater|polygon|curbed|racked)\.com/(?:[^/]+/)*(?P<id>[^/?]+)'
_TESTS = [{
'url': 'http://www.theverge.com/2014/6/27/5849272/material-world-how-google-discovered-what-software-is-made-of',
'md5': '73856edf3e89a711e70d5cf7cb280b37',
'info_dict': {
'id': '11eXZobjrG8DCSTgrNjVinU-YmmdYjhe',
'ext': 'mp4',
'title': 'Google\'s new material design direction',
'description': 'md5:2f44f74c4d14a1f800ea73e1c6832ad2',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
# data-ooyala-id
@@ -31,13 +34,16 @@ class VoxMediaIE(InfoExtractor):
}, {
# volume embed
'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
'md5': '375c483c5080ab8cd85c9c84cfc2d1e4',
'info_dict': {
'id': 'wydzk3dDpmRz7PQoXRsTIX6XTkPjYL0b',
'ext': 'mp4',
'title': 'The new frontier of LGBTQ civil rights, explained',
'description': 'md5:0dc58e94a465cbe91d02950f770eb93f',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
# youtube embed

View File

@@ -5,7 +5,6 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
)
@@ -75,7 +74,6 @@ class VRTIE(InfoExtractor):
},
{
'url': 'http://cobra.canvas.be/cm/cobra/videozone/rubriek/film-videozone/1.2377055',
'md5': '',
'info_dict': {
'id': '2377055',
'ext': 'mp4',
@@ -119,39 +117,17 @@ class VRTIE(InfoExtractor):
video_id, 'mp4', m3u8_id='hls', fatal=False))
if src:
if determine_ext(src) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
formats.extend(self._extract_f4m_formats(
src.replace('playlist.m3u8', 'manifest.f4m'),
video_id, f4m_id='hds', fatal=False))
if 'data-video-geoblocking="true"' not in webpage:
rtmp_formats = self._extract_smil_formats(
src.replace('playlist.m3u8', 'jwplayer.smil'),
video_id, fatal=False)
formats.extend(rtmp_formats)
for rtmp_format in rtmp_formats:
rtmp_format_c = rtmp_format.copy()
rtmp_format_c['url'] = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
del rtmp_format_c['play_path']
del rtmp_format_c['ext']
http_format = rtmp_format_c.copy()
formats = self._extract_wowza_formats(src, video_id)
if 'data-video-geoblocking="true"' not in webpage:
for f in formats:
if f['url'].startswith('rtsp://'):
http_format = f.copy()
http_format.update({
'url': rtmp_format_c['url'].replace('rtmp://', 'http://').replace('vod.', 'download.').replace('/_definst_/', '/').replace('mp4:', ''),
'format_id': rtmp_format['format_id'].replace('rtmp', 'http'),
'url': f['url'].replace('rtsp://', 'http://').replace('vod.', 'download.').replace('/_definst_/', '/').replace('mp4:', ''),
'format_id': f['format_id'].replace('rtsp', 'http'),
'protocol': 'http',
})
rtsp_format = rtmp_format_c.copy()
rtsp_format.update({
'url': rtsp_format['url'].replace('rtmp://', 'rtsp://'),
'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
'protocol': 'rtsp',
})
formats.extend([http_format, rtsp_format])
else:
formats.extend(self._extract_f4m_formats(
'%s/manifest.f4m' % src, video_id, f4m_id='hds', fatal=False))
formats.append(http_format)
if not formats and 'data-video-geoblocking="true"' in webpage:
self.raise_geo_restricted('This video is only available in Belgium')

View File

@@ -0,0 +1,55 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
class VyboryMosIE(InfoExtractor):
_VALID_URL = r'https?://vybory\.mos\.ru/(?:#precinct/|account/channels\?.*?\bstation_id=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://vybory.mos.ru/#precinct/13636',
'info_dict': {
'id': '13636',
'ext': 'mp4',
'title': 're:^Участковая избирательная комиссия №2231 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'Россия, Москва, улица Введенского, 32А',
'is_live': True,
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://vybory.mos.ru/account/channels?station_id=13636',
'only_matching': True,
}]
def _real_extract(self, url):
station_id = self._match_id(url)
channels = self._download_json(
'http://vybory.mos.ru/account/channels?station_id=%s' % station_id,
station_id, 'Downloading channels JSON')
formats = []
for cam_num, (sid, hosts, name, _) in enumerate(channels, 1):
for num, host in enumerate(hosts, 1):
formats.append({
'url': 'http://%s/master.m3u8?sid=%s' % (host, sid),
'ext': 'mp4',
'format_id': 'camera%d-host%d' % (cam_num, num),
'format_note': '%s, %s' % (name, host),
})
info = self._download_json(
'http://vybory.mos.ru/json/voting_stations/%s/%s.json'
% (compat_str(station_id)[:3], station_id),
station_id, 'Downloading station JSON', fatal=False)
return {
'id': station_id,
'title': self._live_title(info['name'] if info else station_id),
'description': info.get('address'),
'is_live': True,
'formats': formats,
}

View File

@@ -124,12 +124,14 @@ class XFileShareIE(InfoExtractor):
webpage = self._download_webpage(req, video_id, 'Downloading video page')
title = (self._search_regex(
[r'style="z-index: [0-9]+;">([^<]+)</span>',
(r'style="z-index: [0-9]+;">([^<]+)</span>',
r'<td nowrap>([^<]+)</td>',
r'h4-fine[^>]*>([^<]+)<',
r'>Watch (.+) ',
r'<h2 class="video-page-head">([^<]+)</h2>'],
webpage, 'title', default=None) or self._og_search_title(webpage)).strip()
r'<h2 class="video-page-head">([^<]+)</h2>',
r'<h2 style="[^"]*color:#403f3d[^"]*"[^>]*>([^<]+)<'), # streamin.to
webpage, 'title', default=None) or self._og_search_title(
webpage, default=None) or video_id).strip()
def extract_video_url(default=NO_DEFAULT):
return self._search_regex(

View File

@@ -369,7 +369,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube'
_TESTS = [
{
'url': 'http://www.youtube.com/watch?v=BaW_jenozKc&t=1s&end=9',
'url': 'https://www.youtube.com/watch?v=BaW_jenozKc&t=1s&end=9',
'info_dict': {
'id': 'BaW_jenozKc',
'ext': 'mp4',
@@ -389,7 +389,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}
},
{
'url': 'http://www.youtube.com/watch?v=UxxajLWwzqY',
'url': 'https://www.youtube.com/watch?v=UxxajLWwzqY',
'note': 'Test generic use_cipher_signature video (#897)',
'info_dict': {
'id': 'UxxajLWwzqY',
@@ -443,7 +443,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}
},
{
'url': 'http://www.youtube.com/watch?v=BaW_jenozKc&v=UxxajLWwzqY',
'url': 'https://www.youtube.com/watch?v=BaW_jenozKc&v=UxxajLWwzqY',
'note': 'Use the first video ID in the URL',
'info_dict': {
'id': 'BaW_jenozKc',
@@ -465,7 +465,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
},
},
{
'url': 'http://www.youtube.com/watch?v=a9LDPn-MO4I',
'url': 'https://www.youtube.com/watch?v=a9LDPn-MO4I',
'note': '256k DASH audio (format 141) via DASH manifest',
'info_dict': {
'id': 'a9LDPn-MO4I',
@@ -539,7 +539,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
},
# Normal age-gate video (No vevo, embed allowed)
{
'url': 'http://youtube.com/watch?v=HtVdAasjOgU',
'url': 'https://youtube.com/watch?v=HtVdAasjOgU',
'info_dict': {
'id': 'HtVdAasjOgU',
'ext': 'mp4',
@@ -555,7 +555,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
},
# Age-gate video with encrypted signature
{
'url': 'http://www.youtube.com/watch?v=6kLq3WMV1nU',
'url': 'https://www.youtube.com/watch?v=6kLq3WMV1nU',
'info_dict': {
'id': '6kLq3WMV1nU',
'ext': 'mp4',
@@ -748,11 +748,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'skip': 'Not multifeed anymore',
},
{
'url': 'http://vid.plus/FlRa-iH7PGw',
'url': 'https://vid.plus/FlRa-iH7PGw',
'only_matching': True,
},
{
'url': 'http://zwearz.com/watch/9lWxNJF-ufM/electra-woman-dyna-girl-official-trailer-grace-helbig.html',
'url': 'https://zwearz.com/watch/9lWxNJF-ufM/electra-woman-dyna-girl-official-trailer-grace-helbig.html',
'only_matching': True,
},
{
@@ -1846,7 +1846,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'playlist_count': 2,
}, {
'note': 'embedded',
'url': 'http://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
'url': 'https://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
'playlist_count': 4,
'info_dict': {
'title': 'JODA15',
@@ -1854,7 +1854,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
}
}, {
'note': 'Embedded SWF player',
'url': 'http://www.youtube.com/p/YN5VISEtHet5D4NEvfTd0zcgFk84NqFZ?hl=en_US&fs=1&rel=0',
'url': 'https://www.youtube.com/p/YN5VISEtHet5D4NEvfTd0zcgFk84NqFZ?hl=en_US&fs=1&rel=0',
'playlist_count': 4,
'info_dict': {
'title': 'JODA7',
@@ -2156,7 +2156,7 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor):
IE_NAME = 'youtube:live'
_TESTS = [{
'url': 'http://www.youtube.com/user/TheYoungTurks/live',
'url': 'https://www.youtube.com/user/TheYoungTurks/live',
'info_dict': {
'id': 'a48o2S1cPoo',
'ext': 'mp4',
@@ -2176,7 +2176,7 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor):
'skip_download': True,
},
}, {
'url': 'http://www.youtube.com/channel/UC1yBKRuGpC1tSM73A0ZjYjQ/live',
'url': 'https://www.youtube.com/channel/UC1yBKRuGpC1tSM73A0ZjYjQ/live',
'only_matching': True,
}]
@@ -2201,7 +2201,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
IE_NAME = 'youtube:playlists'
_TESTS = [{
'url': 'http://www.youtube.com/user/ThirstForScience/playlists',
'url': 'https://www.youtube.com/user/ThirstForScience/playlists',
'playlist_mincount': 4,
'info_dict': {
'id': 'ThirstForScience',
@@ -2209,7 +2209,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
},
}, {
# with "Load more" button
'url': 'http://www.youtube.com/user/igorkle1/playlists?view=1&sort=dd',
'url': 'https://www.youtube.com/user/igorkle1/playlists?view=1&sort=dd',
'playlist_mincount': 70,
'info_dict': {
'id': 'igorkle1',
@@ -2442,10 +2442,10 @@ class YoutubeTruncatedURLIE(InfoExtractor):
'''
_TESTS = [{
'url': 'http://www.youtube.com/watch?annotation_id=annotation_3951667041',
'url': 'https://www.youtube.com/watch?annotation_id=annotation_3951667041',
'only_matching': True,
}, {
'url': 'http://www.youtube.com/watch?',
'url': 'https://www.youtube.com/watch?',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/watch?x-yt-cl=84503534',
@@ -2466,7 +2466,7 @@ class YoutubeTruncatedURLIE(InfoExtractor):
'Did you forget to quote the URL? Remember that & is a meta '
'character in most shells, so you want to put the URL in quotes, '
'like youtube-dl '
'"http://www.youtube.com/watch?feature=foo&v=BaW_jenozKc" '
'"https://www.youtube.com/watch?feature=foo&v=BaW_jenozKc" '
' or simply youtube-dl BaW_jenozKc .',
expected=True)

View File

@@ -139,6 +139,30 @@ class FFmpegPostProcessor(PostProcessor):
def probe_executable(self):
return self._paths[self.probe_basename]
def get_audio_codec(self, path):
if not self.probe_available:
raise PostProcessingError('ffprobe or avprobe not found. Please install one.')
try:
cmd = [
encodeFilename(self.probe_executable, True),
encodeArgument('-show_streams'),
encodeFilename(self._ffmpeg_filename_argument(path), True)]
if self._downloader.params.get('verbose', False):
self._downloader.to_screen('[debug] %s command line: %s' % (self.basename, shell_quote(cmd)))
handle = subprocess.Popen(cmd, stderr=compat_subprocess_get_DEVNULL(), stdout=subprocess.PIPE, stdin=subprocess.PIPE)
output = handle.communicate()[0]
if handle.wait() != 0:
return None
except (IOError, OSError):
return None
audio_codec = None
for line in output.decode('ascii', 'ignore').split('\n'):
if line.startswith('codec_name='):
audio_codec = line.split('=')[1].strip()
elif line.strip() == 'codec_type=audio' and audio_codec is not None:
return audio_codec
return None
def run_ffmpeg_multiple_files(self, input_paths, out_path, opts):
self.check_version()
@@ -188,31 +212,6 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
self._preferredquality = preferredquality
self._nopostoverwrites = nopostoverwrites
def get_audio_codec(self, path):
if not self.probe_available:
raise PostProcessingError('ffprobe or avprobe not found. Please install one.')
try:
cmd = [
encodeFilename(self.probe_executable, True),
encodeArgument('-show_streams'),
encodeFilename(self._ffmpeg_filename_argument(path), True)]
if self._downloader.params.get('verbose', False):
self._downloader.to_screen('[debug] %s command line: %s' % (self.basename, shell_quote(cmd)))
handle = subprocess.Popen(cmd, stderr=compat_subprocess_get_DEVNULL(), stdout=subprocess.PIPE, stdin=subprocess.PIPE)
output = handle.communicate()[0]
if handle.wait() != 0:
return None
except (IOError, OSError):
return None
audio_codec = None
for line in output.decode('ascii', 'ignore').split('\n'):
if line.startswith('codec_name='):
audio_codec = line.split('=')[1].strip()
elif line.strip() == 'codec_type=audio' and audio_codec is not None:
return audio_codec
return None
def run_ffmpeg(self, path, out_path, codec, more_opts):
if codec is None:
acodec_opts = []
@@ -504,15 +503,15 @@ class FFmpegFixupM4aPP(FFmpegPostProcessor):
class FFmpegFixupM3u8PP(FFmpegPostProcessor):
def run(self, info):
filename = info['filepath']
temp_filename = prepend_extension(filename, 'temp')
if self.get_audio_codec(filename) == 'aac':
temp_filename = prepend_extension(filename, 'temp')
options = ['-c', 'copy', '-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
self._downloader.to_screen('[ffmpeg] Fixing malformated aac bitstream in "%s"' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
options = ['-c', 'copy', '-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
self._downloader.to_screen('[ffmpeg] Fixing malformated aac bitstream in "%s"' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return [], info

View File

@@ -1,37 +1,15 @@
from __future__ import unicode_literals
import os
import subprocess
import sys
import errno
from .common import PostProcessor
from ..compat import compat_os_name
from ..utils import (
check_executable,
hyphenate_date,
version_tuple,
PostProcessingError,
encodeArgument,
encodeFilename,
write_xattr,
XAttrMetadataError,
XAttrUnavailableError,
)
class XAttrMetadataError(PostProcessingError):
def __init__(self, code=None, msg='Unknown error'):
super(XAttrMetadataError, self).__init__(msg)
self.code = code
# Parsing code and msg
if (self.code in (errno.ENOSPC, errno.EDQUOT) or
'No space left' in self.msg or 'Disk quota excedded' in self.msg):
self.reason = 'NO_SPACE'
elif self.code == errno.E2BIG or 'Argument list too long' in self.msg:
self.reason = 'VALUE_TOO_LONG'
else:
self.reason = 'NOT_SUPPORTED'
class XAttrMetadataPP(PostProcessor):
#
@@ -48,88 +26,6 @@ class XAttrMetadataPP(PostProcessor):
def run(self, info):
""" Set extended attributes on downloaded file (if xattr support is found). """
# This mess below finds the best xattr tool for the job and creates a
# "write_xattr" function.
try:
# try the pyxattr module...
import xattr
# Unicode arguments are not supported in python-pyxattr until
# version 0.5.0
# See https://github.com/rg3/youtube-dl/issues/5498
pyxattr_required_version = '0.5.0'
if version_tuple(xattr.__version__) < version_tuple(pyxattr_required_version):
self._downloader.report_warning(
'python-pyxattr is detected but is too old. '
'youtube-dl requires %s or above while your version is %s. '
'Falling back to other xattr implementations' % (
pyxattr_required_version, xattr.__version__))
raise ImportError
def write_xattr(path, key, value):
try:
xattr.set(path, key, value)
except EnvironmentError as e:
raise XAttrMetadataError(e.errno, e.strerror)
except ImportError:
if compat_os_name == 'nt':
# Write xattrs to NTFS Alternate Data Streams:
# http://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_.28ADS.29
def write_xattr(path, key, value):
assert ':' not in key
assert os.path.exists(path)
ads_fn = path + ':' + key
try:
with open(ads_fn, 'wb') as f:
f.write(value)
except EnvironmentError as e:
raise XAttrMetadataError(e.errno, e.strerror)
else:
user_has_setfattr = check_executable('setfattr', ['--version'])
user_has_xattr = check_executable('xattr', ['-h'])
if user_has_setfattr or user_has_xattr:
def write_xattr(path, key, value):
value = value.decode('utf-8')
if user_has_setfattr:
executable = 'setfattr'
opts = ['-n', key, '-v', value]
elif user_has_xattr:
executable = 'xattr'
opts = ['-w', key, value]
cmd = ([encodeFilename(executable, True)] +
[encodeArgument(o) for o in opts] +
[encodeFilename(path, True)])
try:
p = subprocess.Popen(
cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
except EnvironmentError as e:
raise XAttrMetadataError(e.errno, e.strerror)
stdout, stderr = p.communicate()
stderr = stderr.decode('utf-8', 'replace')
if p.returncode != 0:
raise XAttrMetadataError(p.returncode, stderr)
else:
# On Unix, and can't find pyxattr, setfattr, or xattr.
if sys.platform.startswith('linux'):
self._downloader.report_error(
"Couldn't find a tool to set the xattrs. "
"Install either the python 'pyxattr' or 'xattr' "
"modules, or the GNU 'attr' package "
"(which contains the 'setfattr' tool).")
else:
self._downloader.report_error(
"Couldn't find a tool to set the xattrs. "
"Install either the python 'xattr' module, "
"or the 'xattr' binary.")
# Write the metadata to the file's xattrs
self._downloader.to_screen('[metadata] Writing metadata to file\'s xattrs')
@@ -159,6 +55,10 @@ class XAttrMetadataPP(PostProcessor):
return [], info
except XAttrUnavailableError as e:
self._downloader.report_error(str(e))
return [], info
except XAttrMetadataError as e:
if e.reason == 'NO_SPACE':
self._downloader.report_warning(

View File

@@ -42,6 +42,7 @@ from .compat import (
compat_html_entities_html5,
compat_http_client,
compat_kwargs,
compat_os_name,
compat_parse_qs,
compat_shlex_quote,
compat_socket_create_connection,
@@ -141,6 +142,8 @@ DATE_FORMATS = (
'%Y-%m-%dT%H:%M:%S',
'%Y-%m-%dT%H:%M:%S.%f',
'%Y-%m-%dT%H:%M',
'%b %d %Y at %H:%M',
'%b %d %Y at %H:%M:%S',
)
DATE_FORMATS_DAY_FIRST = list(DATE_FORMATS)
@@ -775,6 +778,26 @@ class ContentTooShortError(Exception):
self.expected = expected
class XAttrMetadataError(Exception):
def __init__(self, code=None, msg='Unknown error'):
super(XAttrMetadataError, self).__init__(msg)
self.code = code
self.msg = msg
# Parsing code and msg
if (self.code in (errno.ENOSPC, errno.EDQUOT) or
'No space left' in self.msg or 'Disk quota excedded' in self.msg):
self.reason = 'NO_SPACE'
elif self.code == errno.E2BIG or 'Argument list too long' in self.msg:
self.reason = 'VALUE_TOO_LONG'
else:
self.reason = 'NOT_SUPPORTED'
class XAttrUnavailableError(Exception):
pass
def _create_http_connection(ydl_handler, http_class, is_https, *args, **kwargs):
# Working around python 2 bug (see http://bugs.python.org/issue17849) by limiting
# expected HTTP responses to meet HTTP/1.0 or later (see also
@@ -3131,3 +3154,87 @@ def decode_png(png_data):
current_row.append(color)
return width, height, pixels
def write_xattr(path, key, value):
# This mess below finds the best xattr tool for the job
try:
# try the pyxattr module...
import xattr
if hasattr(xattr, 'set'): # pyxattr
# Unicode arguments are not supported in python-pyxattr until
# version 0.5.0
# See https://github.com/rg3/youtube-dl/issues/5498
pyxattr_required_version = '0.5.0'
if version_tuple(xattr.__version__) < version_tuple(pyxattr_required_version):
# TODO: fallback to CLI tools
raise XAttrUnavailableError(
'python-pyxattr is detected but is too old. '
'youtube-dl requires %s or above while your version is %s. '
'Falling back to other xattr implementations' % (
pyxattr_required_version, xattr.__version__))
setxattr = xattr.set
else: # xattr
setxattr = xattr.setxattr
try:
setxattr(path, key, value)
except EnvironmentError as e:
raise XAttrMetadataError(e.errno, e.strerror)
except ImportError:
if compat_os_name == 'nt':
# Write xattrs to NTFS Alternate Data Streams:
# http://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_.28ADS.29
assert ':' not in key
assert os.path.exists(path)
ads_fn = path + ':' + key
try:
with open(ads_fn, 'wb') as f:
f.write(value)
except EnvironmentError as e:
raise XAttrMetadataError(e.errno, e.strerror)
else:
user_has_setfattr = check_executable('setfattr', ['--version'])
user_has_xattr = check_executable('xattr', ['-h'])
if user_has_setfattr or user_has_xattr:
value = value.decode('utf-8')
if user_has_setfattr:
executable = 'setfattr'
opts = ['-n', key, '-v', value]
elif user_has_xattr:
executable = 'xattr'
opts = ['-w', key, value]
cmd = ([encodeFilename(executable, True)] +
[encodeArgument(o) for o in opts] +
[encodeFilename(path, True)])
try:
p = subprocess.Popen(
cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
except EnvironmentError as e:
raise XAttrMetadataError(e.errno, e.strerror)
stdout, stderr = p.communicate()
stderr = stderr.decode('utf-8', 'replace')
if p.returncode != 0:
raise XAttrMetadataError(p.returncode, stderr)
else:
# On Unix, and can't find pyxattr, setfattr, or xattr.
if sys.platform.startswith('linux'):
raise XAttrUnavailableError(
"Couldn't find a tool to set the xattrs. "
"Install either the python 'pyxattr' or 'xattr' "
"modules, or the GNU 'attr' package "
"(which contains the 'setfattr' tool).")
else:
raise XAttrUnavailableError(
"Couldn't find a tool to set the xattrs. "
"Install either the python 'xattr' module, "
"or the 'xattr' binary.")

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2016.09.15'
__version__ = '2016.10.02'