Compare commits

...

105 Commits

Author SHA1 Message Date
Sergey M․
762d44c956 [channel9] Add support for rss links (Closes #9673) 2016-06-04 04:57:16 +07:00
Sergey M․
4d8856d511 [loc] Extract direct download links 2016-06-04 00:26:03 +07:00
Sergey M․
c917106be4 [loc] Extract subtites 2016-06-03 23:55:22 +07:00
Sergey M․
76e9cd7f24 [loc] Add support for another URL schema and simplify 2016-06-03 23:43:34 +07:00
Sergey M․
bf4c6a38e1 release 2016.06.03 2016-06-03 23:25:24 +07:00
Sergey M․
7f3c3dfa52 [loc] Improve (Closes #9521) 2016-06-03 23:19:11 +07:00
TRox1972
9c3c447eb3 [loc] Add extractor (Closes #3188)
Added extractor of loc.gov, which closes #3188. I am not an experienced programmer, so I am sure I did a bunch of mistakes, but the extractor works (for me at least).

[LibraryOfCongress] don't use video_id for _search_regex()

[LibraryOfCongress] Improvements
2016-06-03 22:17:35 +07:00
Yen Chi Hsuan
ad73083ff0 [bilibili] Add _part%d suffixes back (closes #9660) 2016-06-02 19:29:27 +08:00
Yen Chi Hsuan
1e8b59243f Merge pull request #9669 from bzc6p/master
Added sanitization support for Hungarian letters Ő and Ű
2016-06-02 18:23:54 +08:00
bzc6p
c88270271e Added sanitization support for Hungarian letters Ő and Ű 2016-06-02 11:51:48 +02:00
bzc6p
b96f007eeb Added sanitization support for Hungarian letters Ő and Ű 2016-06-02 11:39:32 +02:00
Yen Chi Hsuan
9a4aec8b7e [utils] Use bytes-like objects as header values on Python 2 2016-06-02 15:00:49 +08:00
Yen Chi Hsuan
54fb199681 [test/test_http] Fix getsockname() on Jython 2016-06-02 15:00:49 +08:00
Yen Chi Hsuan
8c32e5dc32 [test/test_utils] Add test for #9588 2016-06-02 15:00:49 +08:00
Yen Chi Hsuan
0ea590076f [utils] Always decode Location header
escape_url is broken for bytes-like objects
2016-06-02 15:00:49 +08:00
Remita Amine
4a684895c0 [seeker] Add new extractor(closes #9619) 2016-06-01 21:20:25 +01:00
Remita Amine
f4e4aa9b6b [revision3:embed] Add new extractor 2016-06-01 21:20:25 +01:00
Sergey M․
5e3856a2c5 release 2016.06.02 2016-06-02 01:19:57 +07:00
Sergey M․
6e6b9f600f [arte] Add support for playlists and rework tests (Closes #9632) 2016-06-02 01:10:23 +07:00
Sergey M․
6a1df4fb5f [spankwire] Add support for new URL format (Closes #9657) 2016-06-01 21:23:58 +07:00
Yen Chi Hsuan
dde1ce7c06 [tf1] Fix a regular expression (closes #9656)
This is a Python bug fixed in 2.7.6 [1]

[1] https://github.com/rg3/youtube-dl/issues/9656#issuecomment-222968594
2016-06-01 20:04:43 +08:00
Yen Chi Hsuan
811586ebcf [generic] Update the UDNEmbed test case 2016-06-01 19:23:44 +08:00
Yen Chi Hsuan
0ff3749bfe [udn] Fix m3u8 and f4m extraction as well as improve 2016-06-01 19:23:09 +08:00
Yen Chi Hsuan
28bab13348 [generic,viewlift] Move a test case to the specialized extractor 2016-06-01 19:18:01 +08:00
Yen Chi Hsuan
877032314f [generic] Improve Kaltura detection
Closes #4004
2016-06-01 18:37:34 +08:00
Sergey M․
8ec2b2c41c [options] Add --limit-rate alias for rate limiting option
Closes #9644
In order to follow regular --verb-noun pattern and better conformity with wget and curl
2016-05-30 21:48:35 +07:00
Sergey M․
197a5da1d0 [yandexmusic] Improve captcha detection 2016-05-30 03:26:26 +07:00
Sergey M․
abbb2938fa release 2016.05.30.2 2016-05-30 03:12:12 +07:00
Sergey M․
f657b1a5f2 release 2016.05.30.1 2016-05-30 03:03:06 +07:00
Philipp Hagemeister
86a52881c6 [travis] unsubscribe @phihag 2016-05-29 21:29:38 +02:00
Sergey M․
8267423652 release 2016.05.30 2016-05-30 01:18:23 +07:00
Sergey M
917a3196f8 [README.md] Update c runtime dependency FAQ entry 2016-05-30 01:03:40 +07:00
Sergey M․
56bd028a0f [devscripts/buildserver] Listen on all interfaces 2016-05-30 00:21:18 +07:00
Sergey M․
681b923b5c [devscripts/release.sh] Allow passing buildserver address as cli option 2016-05-29 23:36:42 +07:00
Yen Chi Hsuan
9ed6d8c6c5 [youku] Extract resolution 2016-05-29 13:54:05 +08:00
Sergey M․
f3fb420b82 [devscripts/release.sh] Check for wheel 2016-05-29 11:49:14 +06:00
Sergey M․
165e3561e9 [devscripts/buildserver] Check Wow6432Node first when searching for python
This allows building releases from 64bit OS
2016-05-29 10:02:00 +06:00
Sergey M․
27f17c0eab [Makefile] Fix youtube-dl.1 target
Now it accepts output filename as argument
2016-05-29 09:11:16 +06:00
Sergey M․
44c8892369 [devscripts/prepare_manpage] Fix manpage generation on Windows 2016-05-29 09:06:10 +06:00
Sergey M․
f574103d7c [buildserver] Fix buildserver and make python2 compatible 2016-05-29 09:03:17 +06:00
Yen Chi Hsuan
6d138e98e3 Merge pull request #9621 from venth/feature/ignored_intellij
ignored intellij related files
2016-05-29 03:10:29 +08:00
venth
2a329110b9 ignored intellij related files 2016-05-28 20:27:18 +02:00
Yen Chi Hsuan
2bee7b25f3 [Makefile] Cleanup m4a files
[ci skip]
2016-05-29 01:59:09 +08:00
Yen Chi Hsuan
92cf872a48 [.gitignore] Ignore mp3 files
[ci skip]
2016-05-29 01:59:01 +08:00
Yen Chi Hsuan
6461f2b7ec [bilibili] Fix extraction, improve and cleanup 2016-05-29 01:26:00 +08:00
Sergey M․
807cf7b07f [udemy] Fix authentication for localized layout (Closes #9594) 2016-05-28 21:18:24 +06:00
Sergey M․
de7d76af52 [coub] Add another test 2016-05-27 23:38:17 +06:00
Sergey M․
11c70deba7 [coub] Add extractor (Closes #9609) 2016-05-27 23:34:58 +06:00
Sergey M․
f36532404d [vk] Remove superfluous code 2016-05-27 22:19:10 +06:00
Sergey M․
77b8b4e696 [extractor/common] Borrow quality metadata from parent set-level manifest for f4m 2016-05-27 01:47:44 +06:00
Sergey M․
2615fa7584 [downloader/f4m] Simply select format when it's the only one 2016-05-27 01:46:12 +06:00
Yen Chi Hsuan
fac2af3c51 [common] Fix m3u8 extraction in f4m manifests 2016-05-27 01:41:27 +08:00
Sergey M․
6f8cb24219 [tvp] Expand _VALID_URL and improve naming (Closes #9602) 2016-05-26 22:21:55 +06:00
Yen Chi Hsuan
448bb5f333 [common] Fix non-bootstrapped support in f4m 2016-05-27 00:03:48 +08:00
Yen Chi Hsuan
293c255688 [utils] Remove debugging codes 2016-05-26 22:54:16 +08:00
Yen Chi Hsuan
ac88d2316e [dw] Support documentaries (closes #9475) 2016-05-26 22:48:47 +08:00
Yen Chi Hsuan
5950cb1d6d [utils] Support a new form of date
Found in dw.com (#9475)
2016-05-26 22:44:00 +08:00
Yen Chi Hsuan
761052db92 [playwire] Add the test (closed #9531) 2016-05-26 21:57:06 +08:00
Yen Chi Hsuan
240b60453e [common] Support m3u8 in f4m manifests
Related: #9531
2016-05-26 21:55:43 +08:00
Yen Chi Hsuan
85b0fe7d64 [playwire] Use _extract_f4m_formats
Related: #9531
2016-05-26 21:43:35 +08:00
Yen Chi Hsuan
0a5685b26f [common] Support non-bootstraped streams in f4m manifests
Related: #9531
2016-05-26 21:41:47 +08:00
Sergey M․
6f748df43f [eporner] Make test only_matching 2016-05-25 20:51:17 +06:00
Yen Chi Hsuan
b410cb83d4 Merge pull request #9595 from Kagami/vlive-site-update
[vlive] Address site update
2016-05-25 19:24:15 +08:00
Yen Chi Hsuan
da9d82840a Merge pull request #9600 from wankerer/master
[eporner] fix for the new URL layout
2016-05-25 18:52:55 +08:00
wankerer
4ee0b8afdb [eporner] fix for the new URL layout
Recently eporner slightly changed the URL layout, the ID that used to be
digits only are now digits and letters, so youtube-dl falls back to
the generic extractor that doesn't work.

Fix the matching regex to allow letters in ID.

[v2: added a test case]
2016-05-24 15:57:36 -07:00
remitamine
1de32771e1 [eyedotv] Add new extractor(closes #9582) 2016-05-24 20:10:12 +01:00
remitamine
688c634b7d skip some tests to reduce test time 2016-05-24 16:44:11 +01:00
Sergey M․
0d6ee97508 Credit @TRox1972 for tosh.cc (#9566) and localnews8 (#9539) 2016-05-24 21:42:47 +06:00
Sergey M․
6b43132ce9 [xhamster] Update tests 2016-05-24 21:38:27 +06:00
mexican porn commits
a4690b3244 [xhamster] url regex fix for videos with empty title. 2016-05-24 21:35:43 +06:00
remitamine
444417edb5 [radiocanada] Add new extractor(#4020) 2016-05-24 15:58:27 +01:00
remitamine
277c7465f5 [ooyala] check manifest ext with determine_ext and update tests for related extractors 2016-05-24 11:24:29 +01:00
Kagami Hiiragi
25bcd3550e [vlive] Address site update
Changes:
* Fix video params extraction
* Don't make status request since status info now available on the page
* Remove unneeded code
* Fix test
2016-05-24 12:54:28 +03:00
remitamine
a4760d204f [ooyala] use api v2 to reduce requests for format extraction 2016-05-24 00:22:29 +01:00
remitamine
e8593f346a [ooyala] extract subtitles 2016-05-23 23:58:16 +01:00
remitamine
05b651e3a5 [washingtonpost] reduce requests for m3u8 manifests 2016-05-23 13:04:50 +01:00
remitamine
42a7439717 [cbs] allow to pass content id to the extractor(closes #9589) 2016-05-23 09:31:37 +01:00
remitamine
b1e9ebd080 [washingtonpost] remove unnecessary code 2016-05-23 02:30:12 +01:00
remitamine
0c50eeb987 [reuters] Add new extractor 2016-05-23 02:27:31 +01:00
remitamine
4b464a6a78 [washingtonpost] improve format extraction and add support for video pages extraction 2016-05-23 00:48:11 +01:00
Sergey M․
5db9df622f [life:embed] Use native hls 2016-05-23 04:22:09 +06:00
Sergey M․
5181759c0d [life] Update _VALID_URL 2016-05-23 04:00:08 +06:00
Sergey M․
e54373204a [lifenews] Fix metadata extraction 2016-05-23 03:44:04 +06:00
remitamine
102810ef04 [voxmedia] fix volume embed extraction 2016-05-22 20:37:35 +01:00
Yen Chi Hsuan
78d3b3e213 [generic] Improve Livestream detection (closes #2234) 2016-05-23 01:40:11 +08:00
Yen Chi Hsuan
7a46542f97 [livestream] Video IDs should always be strings (#2234) 2016-05-23 01:40:11 +08:00
Yen Chi Hsuan
eb7941e3e6 [compat] Fix for XML with <!DOCTYPE> in Python 2.7 and 3.2
Such XML documents cause DeprecationWarning if python is run
with `-W error`
2016-05-23 01:40:11 +08:00
remitamine
db3b8b2103 [tf1] add support for more related web sites 2016-05-22 17:03:17 +01:00
remitamine
c5f5155100 [wat] extract all formats 2016-05-22 17:03:17 +01:00
Yen Chi Hsuan
4a12077855 [genric] Eliminate duplicated video URLs (closes #6562) 2016-05-22 22:23:20 +08:00
Sergey M
a4a7c44bd3 [README.md] Document solution for extremely slow start on Windows 2016-05-22 15:04:51 +06:00
Thor77
70346165fe [bandcamp] raise ExtractorError when track not streamable (#9465)
* [bandcamp] raise ExtractorError when track not streamable

* [bandcamp] update md5 for second test

* don't rely on json-data, but just check for 'file'

* don't rely on presence of 'file'
2016-05-22 14:15:39 +08:00
Sergey M
c776b99691 [README.md] Remove Windows updating trickery
Windows updating fixed in e9297256d4.
2016-05-22 10:14:02 +06:00
Sergey M․
e9297256d4 [update] Fix youtube-dl.exe updating from arbitrary directory (Closes #2718) 2016-05-22 10:06:45 +06:00
Sergey M
e5871c672b [README.md] Clarify location for youtube-dl.exe even more
%USERPROFILE% not in %PATH% by default.
2016-05-22 09:36:07 +06:00
Sergey M
9b06b0fb92 [README.md] Clarify updating on Windows 2016-05-22 09:26:06 +06:00
Sergey M
4f3a25c2b4 [README.md] Fix typo 2016-05-22 09:00:08 +06:00
Sergey M
21a19aa94d [README.md] Clarify location for youtube-dl.exe 2016-05-22 08:59:28 +06:00
Sergey M․
c6b9cf05e1 [utils] Do not fail on unknown date formats in unified_strdate 2016-05-22 08:28:41 +06:00
Sergey M․
4d8819d249 [extractor/generic] Add support for theplatform embeds (Closes #8636, closes #9476) 2016-05-22 06:52:39 +06:00
Sergey M․
898f4b49cc [theplatform] Add _extract_urls 2016-05-22 06:47:22 +06:00
Sergey M․
0150a00f33 [cc] Add test for tosh.cc (Closes #9566) 2016-05-22 02:58:41 +06:00
TRox1972
c8831015f4 [ComedyCentral] Add support for tosh.cc.com and cc.com/video-clips 2016-05-22 02:55:10 +06:00
Sergey M․
92d221ad48 [periscope] Update uploader_id (Closes #9565) 2016-05-22 02:39:15 +06:00
Sergey M․
0db9a05f88 [periscope:user] Adapt to layout changes (Closes #9563) 2016-05-22 02:15:56 +06:00
65 changed files with 1902 additions and 753 deletions

View File

@@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.05.21.2*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.03*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.05.21.2** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.03**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.05.21.2 [debug] youtube-dl version 2016.06.03
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

6
.gitignore vendored
View File

@@ -28,12 +28,16 @@ updates_key.pem
*.mp4 *.mp4
*.m4a *.m4a
*.m4v *.m4v
*.mp3
*.part *.part
*.swp *.swp
test/testdata test/testdata
test/local_parameters.json test/local_parameters.json
.tox .tox
youtube-dl.zsh youtube-dl.zsh
# IntelliJ related files
.idea .idea
.idea/* *.iml
tmp/ tmp/

View File

@@ -14,7 +14,6 @@ script: nosetests test --verbose
notifications: notifications:
email: email:
- filippo.valsorda@gmail.com - filippo.valsorda@gmail.com
- phihag@phihag.de
- yasoob.khld@gmail.com - yasoob.khld@gmail.com
# irc: # irc:
# channels: # channels:

View File

@@ -172,3 +172,4 @@ blahgeek
Kevin Deldycke Kevin Deldycke
inondle inondle
Tomáš Čech Tomáš Čech
Déstin Reed

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean: clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete find . -name "*.pyc" -delete
find . -name "*.class" -delete find . -name "*.class" -delete
@@ -69,7 +69,7 @@ README.txt: README.md
pandoc -f markdown -t plain README.md -o README.txt pandoc -f markdown -t plain README.md -o README.txt
youtube-dl.1: README.md youtube-dl.1: README.md
$(PYTHON) devscripts/prepare_manpage.py >youtube-dl.1.temp.md $(PYTHON) devscripts/prepare_manpage.py youtube-dl.1.temp.md
pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1 pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1
rm -f youtube-dl.1.temp.md rm -f youtube-dl.1.temp.md

View File

@@ -25,7 +25,7 @@ If you do not have curl, you can alternatively use a recent wget:
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl sudo chmod a+rx /usr/local/bin/youtube-dl
Windows users can [download a .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in their home directory or any other location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29). Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
OS X users can install **youtube-dl** with [Homebrew](http://brew.sh/). OS X users can install **youtube-dl** with [Homebrew](http://brew.sh/).
@@ -73,8 +73,8 @@ which means you can modify it, redistribute it or use it however you like.
repairs broken URLs, but emits an error if repairs broken URLs, but emits an error if
this is not possible instead of searching. this is not possible instead of searching.
--ignore-config Do not read configuration files. When given --ignore-config Do not read configuration files. When given
in the global configuration file /etc in the global configuration file
/youtube-dl.conf: Do not read the user /etc/youtube-dl.conf: Do not read the user
configuration in ~/.config/youtube- configuration in ~/.config/youtube-
dl/config (%APPDATA%/youtube-dl/config.txt dl/config (%APPDATA%/youtube-dl/config.txt
on Windows) on Windows)
@@ -162,7 +162,7 @@ which means you can modify it, redistribute it or use it however you like.
(experimental) (experimental)
## Download Options: ## Download Options:
-r, --rate-limit LIMIT Maximum download rate in bytes per second -r, --limit-rate RATE Maximum download rate in bytes per second
(e.g. 50K or 4.2M) (e.g. 50K or 4.2M)
-R, --retries RETRIES Number of retries (default is 10), or -R, --retries RETRIES Number of retries (default is 10), or
"infinite". "infinite".
@@ -256,11 +256,12 @@ which means you can modify it, redistribute it or use it however you like.
jar in jar in
--cache-dir DIR Location in the filesystem where youtube-dl --cache-dir DIR Location in the filesystem where youtube-dl
can store some downloaded information can store some downloaded information
permanently. By default $XDG_CACHE_HOME permanently. By default
/youtube-dl or ~/.cache/youtube-dl . At the $XDG_CACHE_HOME/youtube-dl or
moment, only YouTube player files (for ~/.cache/youtube-dl . At the moment, only
videos with obfuscated signatures) are YouTube player files (for videos with
cached, but that may change. obfuscated signatures) are cached, but that
may change.
--no-cache-dir Disable filesystem caching --no-cache-dir Disable filesystem caching
--rm-cache-dir Delete all filesystem cache files --rm-cache-dir Delete all filesystem cache files
@@ -433,7 +434,7 @@ You can use `--ignore-config` if you want to disable the configuration file for
### Authentication with `.netrc` file ### Authentication with `.netrc` file
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a`.netrc` file in your `$HOME` and restrict permissions to read/write by you only: You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
``` ```
touch $HOME/.netrc touch $HOME/.netrc
chmod a-rwx,u+rw $HOME/.netrc chmod a-rwx,u+rw $HOME/.netrc
@@ -693,6 +694,10 @@ hash -r
Again, from then on you'll be able to update with `sudo youtube-dl -U`. Again, from then on you'll be able to update with `sudo youtube-dl -U`.
### youtube-dl is extremely slow to start on Windows
Add a file exclusion for `youtube-dl.exe` in Windows Defender settings.
### I'm getting an error `Unable to extract OpenGraph title` on YouTube playlists ### I'm getting an error `Unable to extract OpenGraph title` on YouTube playlists
YouTube changed their playlist format in March 2014 and later on, so you'll need at least youtube-dl 2014.07.25 to download all YouTube videos. YouTube changed their playlist format in March 2014 and later on, so you'll need at least youtube-dl 2014.07.25 to download all YouTube videos.
@@ -780,9 +785,9 @@ means you're using an outdated version of Python. Please update to Python 2.6 or
Since June 2012 ([#342](https://github.com/rg3/youtube-dl/issues/342)) youtube-dl is packed as an executable zipfile, simply unzip it (might need renaming to `youtube-dl.zip` first on some systems) or clone the git repository, as laid out above. If you modify the code, you can run it by executing the `__main__.py` file. To recompile the executable, run `make youtube-dl`. Since June 2012 ([#342](https://github.com/rg3/youtube-dl/issues/342)) youtube-dl is packed as an executable zipfile, simply unzip it (might need renaming to `youtube-dl.zip` first on some systems) or clone the git repository, as laid out above. If you modify the code, you can run it by executing the `__main__.py` file. To recompile the executable, run `make youtube-dl`.
### The exe throws a *Runtime error from Visual C++* ### The exe throws an error due to missing `MSVCR100.dll`
To run the exe you need to install first the [Microsoft Visual C++ 2008 Redistributable Package](http://www.microsoft.com/en-us/download/details.aspx?id=29). To run the exe you need to install first the [Microsoft Visual C++ 2010 Redistributable Package (x86)](https://www.microsoft.com/en-US/download/details.aspx?id=5555).
### On Windows, how should I set up ffmpeg and youtube-dl? Where should I put the exe files? ### On Windows, how should I set up ffmpeg and youtube-dl? Where should I put the exe files?

View File

@@ -1,17 +1,42 @@
#!/usr/bin/python3 #!/usr/bin/python3
from http.server import HTTPServer, BaseHTTPRequestHandler
from socketserver import ThreadingMixIn
import argparse import argparse
import ctypes import ctypes
import functools import functools
import shutil
import subprocess
import sys import sys
import tempfile
import threading import threading
import traceback import traceback
import os.path import os.path
sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__)))))
from youtube_dl.compat import (
compat_http_server,
compat_str,
compat_urlparse,
)
class BuildHTTPServer(ThreadingMixIn, HTTPServer): # These are not used outside of buildserver.py thus not in compat.py
try:
import winreg as compat_winreg
except ImportError: # Python 2
import _winreg as compat_winreg
try:
import socketserver as compat_socketserver
except ImportError: # Python 2
import SocketServer as compat_socketserver
try:
compat_input = raw_input
except NameError: # Python 3
compat_input = input
class BuildHTTPServer(compat_socketserver.ThreadingMixIn, compat_http_server.HTTPServer):
allow_reuse_address = True allow_reuse_address = True
@@ -191,7 +216,7 @@ def main(args=None):
action='store_const', dest='action', const='service', action='store_const', dest='action', const='service',
help='Run as a Windows service') help='Run as a Windows service')
parser.add_argument('-b', '--bind', metavar='<host:port>', parser.add_argument('-b', '--bind', metavar='<host:port>',
action='store', default='localhost:8142', action='store', default='0.0.0.0:8142',
help='Bind to host:port (default %default)') help='Bind to host:port (default %default)')
options = parser.parse_args(args=args) options = parser.parse_args(args=args)
@@ -216,7 +241,7 @@ def main(args=None):
srv = BuildHTTPServer((host, port), BuildHTTPRequestHandler) srv = BuildHTTPServer((host, port), BuildHTTPRequestHandler)
thr = threading.Thread(target=srv.serve_forever) thr = threading.Thread(target=srv.serve_forever)
thr.start() thr.start()
input('Press ENTER to shut down') compat_input('Press ENTER to shut down')
srv.shutdown() srv.shutdown()
thr.join() thr.join()
@@ -231,8 +256,6 @@ def rmtree(path):
os.remove(fname) os.remove(fname)
os.rmdir(path) os.rmdir(path)
#==============================================================================
class BuildError(Exception): class BuildError(Exception):
def __init__(self, output, code=500): def __init__(self, output, code=500):
@@ -249,15 +272,25 @@ class HTTPError(BuildError):
class PythonBuilder(object): class PythonBuilder(object):
def __init__(self, **kwargs): def __init__(self, **kwargs):
pythonVersion = kwargs.pop('python', '2.7') python_version = kwargs.pop('python', '3.4')
try: python_path = None
key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, r'SOFTWARE\Python\PythonCore\%s\InstallPath' % pythonVersion) for node in ('Wow6432Node\\', ''):
try: try:
self.pythonPath, _ = _winreg.QueryValueEx(key, '') key = compat_winreg.OpenKey(
finally: compat_winreg.HKEY_LOCAL_MACHINE,
_winreg.CloseKey(key) r'SOFTWARE\%sPython\PythonCore\%s\InstallPath' % (node, python_version))
except Exception: try:
raise BuildError('No such Python version: %s' % pythonVersion) python_path, _ = compat_winreg.QueryValueEx(key, '')
finally:
compat_winreg.CloseKey(key)
break
except Exception:
pass
if not python_path:
raise BuildError('No such Python version: %s' % python_version)
self.pythonPath = python_path
super(PythonBuilder, self).__init__(**kwargs) super(PythonBuilder, self).__init__(**kwargs)
@@ -305,8 +338,10 @@ class YoutubeDLBuilder(object):
def build(self): def build(self):
try: try:
subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'], proc = subprocess.Popen([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'], stdin=subprocess.PIPE, cwd=self.buildPath)
cwd=self.buildPath) proc.wait()
#subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'],
# cwd=self.buildPath)
except subprocess.CalledProcessError as e: except subprocess.CalledProcessError as e:
raise BuildError(e.output) raise BuildError(e.output)
@@ -369,12 +404,12 @@ class Builder(PythonBuilder, GITBuilder, YoutubeDLBuilder, DownloadBuilder, Clea
pass pass
class BuildHTTPRequestHandler(BaseHTTPRequestHandler): class BuildHTTPRequestHandler(compat_http_server.BaseHTTPRequestHandler):
actionDict = {'build': Builder, 'download': Builder} # They're the same, no more caching. actionDict = {'build': Builder, 'download': Builder} # They're the same, no more caching.
def do_GET(self): def do_GET(self):
path = urlparse.urlparse(self.path) path = compat_urlparse.urlparse(self.path)
paramDict = dict([(key, value[0]) for key, value in urlparse.parse_qs(path.query).items()]) paramDict = dict([(key, value[0]) for key, value in compat_urlparse.parse_qs(path.query).items()])
action, _, path = path.path.strip('/').partition('/') action, _, path = path.path.strip('/').partition('/')
if path: if path:
path = path.split('/') path = path.split('/')
@@ -388,7 +423,7 @@ class BuildHTTPRequestHandler(BaseHTTPRequestHandler):
builder.close() builder.close()
except BuildError as e: except BuildError as e:
self.send_response(e.code) self.send_response(e.code)
msg = unicode(e).encode('UTF-8') msg = compat_str(e).encode('UTF-8')
self.send_header('Content-Type', 'text/plain; charset=UTF-8') self.send_header('Content-Type', 'text/plain; charset=UTF-8')
self.send_header('Content-Length', len(msg)) self.send_header('Content-Length', len(msg))
self.end_headers() self.end_headers()
@@ -400,7 +435,5 @@ class BuildHTTPRequestHandler(BaseHTTPRequestHandler):
else: else:
self.send_response(500, 'Malformed URL') self.send_response(500, 'Malformed URL')
#==============================================================================
if __name__ == '__main__': if __name__ == '__main__':
main() main()

View File

@@ -1,13 +1,46 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import io import io
import optparse
import os.path import os.path
import sys
import re import re
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
README_FILE = os.path.join(ROOT_DIR, 'README.md') README_FILE = os.path.join(ROOT_DIR, 'README.md')
PREFIX = '''%YOUTUBE-DL(1)
# NAME
youtube\-dl \- download videos from youtube.com or other video platforms
# SYNOPSIS
**youtube-dl** \[OPTIONS\] URL [URL...]
'''
def main():
parser = optparse.OptionParser(usage='%prog OUTFILE.md')
options, args = parser.parse_args()
if len(args) != 1:
parser.error('Expected an output filename')
outfile, = args
with io.open(README_FILE, encoding='utf-8') as f:
readme = f.read()
readme = re.sub(r'(?s)^.*?(?=# DESCRIPTION)', '', readme)
readme = re.sub(r'\s+youtube-dl \[OPTIONS\] URL \[URL\.\.\.\]', '', readme)
readme = PREFIX + readme
readme = filter_options(readme)
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(readme)
def filter_options(readme): def filter_options(readme):
ret = '' ret = ''
@@ -37,27 +70,5 @@ def filter_options(readme):
return ret return ret
with io.open(README_FILE, encoding='utf-8') as f: if __name__ == '__main__':
readme = f.read() main()
PREFIX = '''%YOUTUBE-DL(1)
# NAME
youtube\-dl \- download videos from youtube.com or other video platforms
# SYNOPSIS
**youtube-dl** \[OPTIONS\] URL [URL...]
'''
readme = re.sub(r'(?s)^.*?(?=# DESCRIPTION)', '', readme)
readme = re.sub(r'\s+youtube-dl \[OPTIONS\] URL \[URL\.\.\.\]', '', readme)
readme = PREFIX + readme
readme = filter_options(readme)
if sys.version_info < (3, 0):
print(readme.encode('utf-8'))
else:
print(readme)

View File

@@ -6,7 +6,7 @@
# * the git config user.signingkey is properly set # * the git config user.signingkey is properly set
# You will need # You will need
# pip install coverage nose rsa # pip install coverage nose rsa wheel
# TODO # TODO
# release notes # release notes
@@ -15,10 +15,28 @@
set -e set -e
skip_tests=true skip_tests=true
if [ "$1" = '--run-tests' ]; then buildserver='localhost:8142'
skip_tests=false
shift while true
fi do
case "$1" in
--run-tests)
skip_tests=false
shift
;;
--buildserver)
buildserver="$2"
shift 2
;;
--*)
echo "ERROR: unknown option $1"
exit 1
;;
*)
break
;;
esac
done
if [ -z "$1" ]; then echo "ERROR: specify version number like this: $0 1994.09.06"; exit 1; fi if [ -z "$1" ]; then echo "ERROR: specify version number like this: $0 1994.09.06"; exit 1; fi
version="$1" version="$1"
@@ -35,6 +53,7 @@ if [ ! -z "$useless_files" ]; then echo "ERROR: Non-.py files in youtube_dl: $us
if [ ! -f "updates_key.pem" ]; then echo 'ERROR: updates_key.pem missing'; exit 1; fi if [ ! -f "updates_key.pem" ]; then echo 'ERROR: updates_key.pem missing'; exit 1; fi
if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; exit 1; fi if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; exit 1; fi
if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi
/bin/echo -e "\n### First of all, testing..." /bin/echo -e "\n### First of all, testing..."
make clean make clean
@@ -66,7 +85,7 @@ git push origin "$version"
REV=$(git rev-parse HEAD) REV=$(git rev-parse HEAD)
make youtube-dl youtube-dl.tar.gz make youtube-dl youtube-dl.tar.gz
read -p "VM running? (y/n) " -n 1 read -p "VM running? (y/n) " -n 1
wget "http://localhost:8142/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe wget "http://$buildserver/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
mkdir -p "build/$version" mkdir -p "build/$version"
mv youtube-dl youtube-dl.exe "build/$version" mv youtube-dl youtube-dl.exe "build/$version"
mv youtube-dl.tar.gz "build/$version/youtube-dl-$version.tar.gz" mv youtube-dl.tar.gz "build/$version/youtube-dl-$version.tar.gz"

View File

@@ -55,6 +55,7 @@
- **arte.tv:future** - **arte.tv:future**
- **arte.tv:info** - **arte.tv:info**
- **arte.tv:magazine** - **arte.tv:magazine**
- **arte.tv:playlist**
- **AtresPlayer** - **AtresPlayer**
- **ATTTechChannel** - **ATTTechChannel**
- **AudiMedia** - **AudiMedia**
@@ -136,6 +137,7 @@
- **ComedyCentral** - **ComedyCentral**
- **ComedyCentralShows**: The Daily Show / The Colbert Report - **ComedyCentralShows**: The Daily Show / The Colbert Report
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED - **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **Coub**
- **Cracked** - **Cracked**
- **Crackle** - **Crackle**
- **Criterion** - **Criterion**
@@ -205,6 +207,7 @@
- **exfm**: ex.fm - **exfm**: ex.fm
- **ExpoTV** - **ExpoTV**
- **ExtremeTube** - **ExtremeTube**
- **EyedoTV**
- **facebook** - **facebook**
- **faz.net** - **faz.net**
- **fc2** - **fc2**
@@ -326,8 +329,8 @@
- **LePlaylist** - **LePlaylist**
- **LetvCloud**: 乐视云 - **LetvCloud**: 乐视云
- **Libsyn** - **Libsyn**
- **life**: Life.ru
- **life:embed** - **life:embed**
- **lifenews**: LIFE | NEWS
- **limelight** - **limelight**
- **limelight:channel** - **limelight:channel**
- **limelight:channel_list** - **limelight:channel_list**
@@ -336,6 +339,7 @@
- **livestream** - **livestream**
- **livestream:original** - **livestream:original**
- **LnkGo** - **LnkGo**
- **loc**: Library of Congress
- **LocalNews8** - **LocalNews8**
- **LoveHomePorn** - **LoveHomePorn**
- **lrt.lt** - **lrt.lt**
@@ -512,6 +516,8 @@
- **R7** - **R7**
- **radio.de** - **radio.de**
- **radiobremen** - **radiobremen**
- **radiocanada**
- **RadioCanadaAudioVideo**
- **radiofrance** - **radiofrance**
- **RadioJavan** - **RadioJavan**
- **Rai** - **Rai**
@@ -521,8 +527,10 @@
- **RedTube** - **RedTube**
- **RegioTV** - **RegioTV**
- **Restudy** - **Restudy**
- **Reuters**
- **ReverbNation** - **ReverbNation**
- **Revision3** - **revision**
- **revision3:embed**
- **RICE** - **RICE**
- **RingTV** - **RingTV**
- **RottenTomatoes** - **RottenTomatoes**
@@ -561,6 +569,7 @@
- **ScreencastOMatic** - **ScreencastOMatic**
- **ScreenJunkies** - **ScreenJunkies**
- **ScreenwaveMedia** - **ScreenwaveMedia**
- **Seeker**
- **SenateISVP** - **SenateISVP**
- **SendtoNews** - **SendtoNews**
- **ServingSys** - **ServingSys**
@@ -682,8 +691,8 @@
- **TVCArticle** - **TVCArticle**
- **tvigle**: Интернет-телевидение Tvigle.ru - **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com** - **tvland.com**
- **tvp.pl** - **tvp**: Telewizja Polska
- **tvp.pl:Series** - **tvp:series**
- **TVPlay**: TV3Play and related services - **TVPlay**: TV3Play and related services
- **Tweakers** - **Tweakers**
- **twitch:chapter** - **twitch:chapter**
@@ -766,7 +775,8 @@
- **VuClip** - **VuClip**
- **vulture.com** - **vulture.com**
- **Walla** - **Walla**
- **WashingtonPost** - **washingtonpost**
- **washingtonpost:article**
- **wat.tv** - **wat.tv**
- **WatchIndianPorn**: Watch Indian Porn - **WatchIndianPorn**: Watch Indian Porn
- **WDR** - **WDR**

View File

@@ -103,6 +103,12 @@ class TestCompat(unittest.TestCase):
self.assertTrue(isinstance(doc.find('chinese').text, compat_str)) self.assertTrue(isinstance(doc.find('chinese').text, compat_str))
self.assertTrue(isinstance(doc.find('foo/bar').text, compat_str)) self.assertTrue(isinstance(doc.find('foo/bar').text, compat_str))
def test_compat_etree_fromstring_doctype(self):
xml = '''<?xml version="1.0"?>
<!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0//EN" "http://www.w3.org/2001/SMIL20/SMIL20.dtd">
<smil xmlns="http://www.w3.org/2001/SMIL20/Language"></smil>'''
compat_etree_fromstring(xml)
def test_struct_unpack(self): def test_struct_unpack(self):
self.assertEqual(compat_struct_unpack('!B', b'\x00'), (0,)) self.assertEqual(compat_struct_unpack('!B', b'\x00'), (0,))

View File

@@ -16,6 +16,15 @@ import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__)) TEST_DIR = os.path.dirname(os.path.abspath(__file__))
def http_server_port(httpd):
if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
# In Jython SSLSocket is not a subclass of socket.socket
sock = httpd.socket.sock
else:
sock = httpd.socket
return sock.getsockname()[1]
class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler): class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
def log_message(self, format, *args): def log_message(self, format, *args):
pass pass
@@ -31,6 +40,22 @@ class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
self.send_header('Content-Type', 'video/mp4') self.send_header('Content-Type', 'video/mp4')
self.end_headers() self.end_headers()
self.wfile.write(b'\x00\x00\x00\x00\x20\x66\x74[video]') self.wfile.write(b'\x00\x00\x00\x00\x20\x66\x74[video]')
elif self.path == '/302':
if sys.version_info[0] == 3:
# XXX: Python 3 http server does not allow non-ASCII header values
self.send_response(404)
self.end_headers()
return
new_url = 'http://localhost:%d/中文.html' % http_server_port(self.server)
self.send_response(302)
self.send_header(b'Location', new_url.encode('utf-8'))
self.end_headers()
elif self.path == '/%E4%B8%AD%E6%96%87.html':
self.send_response(200)
self.send_header('Content-Type', 'text/html; charset=utf-8')
self.end_headers()
self.wfile.write(b'<html><video src="/vid.mp4" /></html>')
else: else:
assert False assert False
@@ -47,18 +72,32 @@ class FakeLogger(object):
class TestHTTP(unittest.TestCase): class TestHTTP(unittest.TestCase):
def setUp(self):
self.httpd = compat_http_server.HTTPServer(
('localhost', 0), HTTPTestRequestHandler)
self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True
self.server_thread.start()
def test_unicode_path_redirection(self):
# XXX: Python 3 http server does not allow non-ASCII header values
if sys.version_info[0] == 3:
return
ydl = YoutubeDL({'logger': FakeLogger()})
r = ydl.extract_info('http://localhost:%d/302' % self.port)
self.assertEqual(r['url'], 'http://localhost:%d/vid.mp4' % self.port)
class TestHTTPS(unittest.TestCase):
def setUp(self): def setUp(self):
certfn = os.path.join(TEST_DIR, 'testcert.pem') certfn = os.path.join(TEST_DIR, 'testcert.pem')
self.httpd = compat_http_server.HTTPServer( self.httpd = compat_http_server.HTTPServer(
('localhost', 0), HTTPTestRequestHandler) ('localhost', 0), HTTPTestRequestHandler)
self.httpd.socket = ssl.wrap_socket( self.httpd.socket = ssl.wrap_socket(
self.httpd.socket, certfile=certfn, server_side=True) self.httpd.socket, certfile=certfn, server_side=True)
if os.name == 'java': self.port = http_server_port(self.httpd)
# In Jython SSLSocket is not a subclass of socket.socket
sock = self.httpd.socket.sock
else:
sock = self.httpd.socket
self.port = sock.getsockname()[1]
self.server_thread = threading.Thread(target=self.httpd.serve_forever) self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True self.server_thread.daemon = True
self.server_thread.start() self.server_thread.start()
@@ -94,14 +133,14 @@ class TestProxy(unittest.TestCase):
def setUp(self): def setUp(self):
self.proxy = compat_http_server.HTTPServer( self.proxy = compat_http_server.HTTPServer(
('localhost', 0), _build_proxy_handler('normal')) ('localhost', 0), _build_proxy_handler('normal'))
self.port = self.proxy.socket.getsockname()[1] self.port = http_server_port(self.proxy)
self.proxy_thread = threading.Thread(target=self.proxy.serve_forever) self.proxy_thread = threading.Thread(target=self.proxy.serve_forever)
self.proxy_thread.daemon = True self.proxy_thread.daemon = True
self.proxy_thread.start() self.proxy_thread.start()
self.cn_proxy = compat_http_server.HTTPServer( self.cn_proxy = compat_http_server.HTTPServer(
('localhost', 0), _build_proxy_handler('cn')) ('localhost', 0), _build_proxy_handler('cn'))
self.cn_port = self.cn_proxy.socket.getsockname()[1] self.cn_port = http_server_port(self.cn_proxy)
self.cn_proxy_thread = threading.Thread(target=self.cn_proxy.serve_forever) self.cn_proxy_thread = threading.Thread(target=self.cn_proxy.serve_forever)
self.cn_proxy_thread.daemon = True self.cn_proxy_thread.daemon = True
self.cn_proxy_thread.start() self.cn_proxy_thread.start()

View File

@@ -157,8 +157,8 @@ class TestUtil(unittest.TestCase):
self.assertTrue(sanitize_filename(':', restricted=True) != '') self.assertTrue(sanitize_filename(':', restricted=True) != '')
self.assertEqual(sanitize_filename( self.assertEqual(sanitize_filename(
'ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØŒÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøœùúûüýþÿ', restricted=True), 'ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖŐØŒÙÚÛÜŰÝÞßàáâãäåæçèéêëìíîïðñòóôõöőøœùúûüűýþÿ', restricted=True),
'AAAAAAAECEEEEIIIIDNOOOOOOOEUUUUYPssaaaaaaaeceeeeiiiionoooooooeuuuuypy') 'AAAAAAAECEEEEIIIIDNOOOOOOOOEUUUUUYPssaaaaaaaeceeeeiiiionooooooooeuuuuuypy')
def test_sanitize_ids(self): def test_sanitize_ids(self):
self.assertEqual(sanitize_filename('_n_cd26wFpw', is_id=True), '_n_cd26wFpw') self.assertEqual(sanitize_filename('_n_cd26wFpw', is_id=True), '_n_cd26wFpw')

View File

@@ -245,13 +245,20 @@ try:
except ImportError: # Python 2.6 except ImportError: # Python 2.6
from xml.parsers.expat import ExpatError as compat_xml_parse_error from xml.parsers.expat import ExpatError as compat_xml_parse_error
etree = xml.etree.ElementTree
class _TreeBuilder(etree.TreeBuilder):
def doctype(self, name, pubid, system):
pass
if sys.version_info[0] >= 3: if sys.version_info[0] >= 3:
compat_etree_fromstring = xml.etree.ElementTree.fromstring def compat_etree_fromstring(text):
return etree.XML(text, parser=etree.XMLParser(target=_TreeBuilder()))
else: else:
# python 2.x tries to encode unicode strings with ascii (see the # python 2.x tries to encode unicode strings with ascii (see the
# XMLParser._fixtext method) # XMLParser._fixtext method)
etree = xml.etree.ElementTree
try: try:
_etree_iter = etree.Element.iter _etree_iter = etree.Element.iter
except AttributeError: # Python <=2.6 except AttributeError: # Python <=2.6
@@ -265,7 +272,7 @@ else:
# 2.7 source # 2.7 source
def _XML(text, parser=None): def _XML(text, parser=None):
if not parser: if not parser:
parser = etree.XMLParser(target=etree.TreeBuilder()) parser = etree.XMLParser(target=_TreeBuilder())
parser.feed(text) parser.feed(text)
return parser.close() return parser.close()
@@ -277,7 +284,7 @@ else:
return el return el
def compat_etree_fromstring(text): def compat_etree_fromstring(text):
doc = _XML(text, parser=etree.XMLParser(target=etree.TreeBuilder(element_factory=_element_factory))) doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory)))
for el in _etree_iter(doc): for el in _etree_iter(doc):
if el.text is not None and isinstance(el.text, bytes): if el.text is not None and isinstance(el.text, bytes):
el.text = el.text.decode('utf-8') el.text = el.text.decode('utf-8')

View File

@@ -319,7 +319,7 @@ class F4mFD(FragmentFD):
doc = compat_etree_fromstring(manifest) doc = compat_etree_fromstring(manifest)
formats = [(int(f.attrib.get('bitrate', -1)), f) formats = [(int(f.attrib.get('bitrate', -1)), f)
for f in self._get_unencrypted_media(doc)] for f in self._get_unencrypted_media(doc)]
if requested_bitrate is None: if requested_bitrate is None or len(formats) == 1:
# get the best format # get the best format
formats = sorted(formats, key=lambda f: f[0]) formats = sorted(formats, key=lambda f: f[0])
rate, media = formats[-1] rate, media = formats[-1]

View File

@@ -61,10 +61,7 @@ class ArteTvIE(InfoExtractor):
} }
class ArteTVPlus7IE(InfoExtractor): class ArteTVBaseIE(InfoExtractor):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&]+)'
@classmethod @classmethod
def _extract_url_info(cls, url): def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url) mobj = re.match(cls._VALID_URL, url)
@@ -78,60 +75,6 @@ class ArteTVPlus7IE(InfoExtractor):
video_id = mobj.group('id') video_id = mobj.group('id')
return video_id, lang return video_id, lang
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
ids = (video_id, '')
# some pages contain multiple videos (like
# http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
# so we first try to look for json URLs that contain the video id from
# the 'vid' parameter.
patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
json_url = self._html_search_regex(
patterns, webpage, 'json vp url', default=None)
if not json_url:
def find_iframe_url(webpage, default=NO_DEFAULT):
return self._html_search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
webpage, 'iframe url', group='url', default=default)
iframe_url = find_iframe_url(webpage, None)
if not iframe_url:
embed_url = self._html_search_regex(
r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
if embed_url:
player = self._download_json(
embed_url, video_id, 'Downloading player page')
iframe_url = find_iframe_url(player['html'])
# en and es URLs produce react-based pages with different layout (e.g.
# http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
if not iframe_url:
program = self._search_regex(
r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
webpage, 'program', default=None)
if program:
embed_html = self._parse_json(program, video_id)
if embed_html:
iframe_url = find_iframe_url(embed_html['embed_html'])
if iframe_url:
json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
if json_url:
title = self._search_regex(
r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
webpage, 'title', default=None, group='title')
return self._extract_from_json_url(json_url, video_id, lang, title=title)
# Different kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
embed_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'embed url', group='url')
return self.url_result(embed_url)
def _extract_from_json_url(self, json_url, video_id, lang, title=None): def _extract_from_json_url(self, json_url, video_id, lang, title=None):
info = self._download_json(json_url, video_id) info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer'] player_info = info['videoJsonPlayer']
@@ -235,6 +178,74 @@ class ArteTVPlus7IE(InfoExtractor):
return info_dict return info_dict
class ArteTVPlus7IE(ArteTVBaseIE):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
ids = (video_id, '')
# some pages contain multiple videos (like
# http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
# so we first try to look for json URLs that contain the video id from
# the 'vid' parameter.
patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
json_url = self._html_search_regex(
patterns, webpage, 'json vp url', default=None)
if not json_url:
def find_iframe_url(webpage, default=NO_DEFAULT):
return self._html_search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
webpage, 'iframe url', group='url', default=default)
iframe_url = find_iframe_url(webpage, None)
if not iframe_url:
embed_url = self._html_search_regex(
r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
if embed_url:
player = self._download_json(
embed_url, video_id, 'Downloading player page')
iframe_url = find_iframe_url(player['html'])
# en and es URLs produce react-based pages with different layout (e.g.
# http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
if not iframe_url:
program = self._search_regex(
r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
webpage, 'program', default=None)
if program:
embed_html = self._parse_json(program, video_id)
if embed_html:
iframe_url = find_iframe_url(embed_html['embed_html'])
if iframe_url:
json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
if json_url:
title = self._search_regex(
r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
webpage, 'title', default=None, group='title')
return self._extract_from_json_url(json_url, video_id, lang, title=title)
# Different kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
embed_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'embed url', group='url')
return self.url_result(embed_url)
# It also uses the arte_vp_url url from the webpage to extract the information # It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE): class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:creative' IE_NAME = 'arte.tv:creative'
@@ -267,7 +278,7 @@ class ArteTVInfoIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:info' IE_NAME = 'arte.tv:info'
_VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = { _TESTS = [{
'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere', 'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
'info_dict': { 'info_dict': {
'id': '067528-000-A', 'id': '067528-000-A',
@@ -275,7 +286,7 @@ class ArteTVInfoIE(ArteTVPlus7IE):
'title': 'Service civique, un cache misère ?', 'title': 'Service civique, un cache misère ?',
'upload_date': '20160403', 'upload_date': '20160403',
}, },
} }]
class ArteTVFutureIE(ArteTVPlus7IE): class ArteTVFutureIE(ArteTVPlus7IE):
@@ -300,6 +311,8 @@ class ArteTVDDCIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:ddc' IE_NAME = 'arte.tv:ddc'
_VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)'
_TESTS = []
def _real_extract(self, url): def _real_extract(self, url):
video_id, lang = self._extract_url_info(url) video_id, lang = self._extract_url_info(url)
if lang == 'folge': if lang == 'folge':
@@ -318,7 +331,7 @@ class ArteTVConcertIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:concert' IE_NAME = 'arte.tv:concert'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TEST = { _TESTS = [{
'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde', 'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
'md5': '9ea035b7bd69696b67aa2ccaaa218161', 'md5': '9ea035b7bd69696b67aa2ccaaa218161',
'info_dict': { 'info_dict': {
@@ -328,14 +341,14 @@ class ArteTVConcertIE(ArteTVPlus7IE):
'upload_date': '20140128', 'upload_date': '20140128',
'description': 'md5:486eb08f991552ade77439fe6d82c305', 'description': 'md5:486eb08f991552ade77439fe6d82c305',
}, },
} }]
class ArteTVCinemaIE(ArteTVPlus7IE): class ArteTVCinemaIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:cinema' IE_NAME = 'arte.tv:cinema'
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)' _VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
_TEST = { _TESTS = [{
'url': 'http://cinema.arte.tv/de/node/38291', 'url': 'http://cinema.arte.tv/de/node/38291',
'md5': '6b275511a5107c60bacbeeda368c3aa1', 'md5': '6b275511a5107c60bacbeeda368c3aa1',
'info_dict': { 'info_dict': {
@@ -345,7 +358,7 @@ class ArteTVCinemaIE(ArteTVPlus7IE):
'upload_date': '20160122', 'upload_date': '20160122',
'description': 'md5:7f749bbb77d800ef2be11d54529b96bc', 'description': 'md5:7f749bbb77d800ef2be11d54529b96bc',
}, },
} }]
class ArteTVMagazineIE(ArteTVPlus7IE): class ArteTVMagazineIE(ArteTVPlus7IE):
@@ -390,9 +403,41 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
) )
''' '''
_TESTS = []
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
lang = mobj.group('lang') lang = mobj.group('lang')
json_url = mobj.group('json_url') json_url = mobj.group('json_url')
return self._extract_from_json_url(json_url, video_id, lang) return self._extract_from_json_url(json_url, video_id, lang)
class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)'
_TESTS = [{
'url': 'http://www.arte.tv/guide/de/plus7/?country=DE#collection/PL-013263/ARTETV',
'info_dict': {
'id': 'PL-013263',
'title': 'Areva & Uramin',
},
'playlist_mincount': 6,
}, {
'url': 'http://www.arte.tv/guide/de/playlists?country=DE#collection/PL-013190/ARTETV',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id, lang = self._extract_url_info(url)
collection = self._download_json(
'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos'
% (lang, playlist_id), playlist_id)
title = collection.get('title')
description = collection.get('shortDescription') or collection.get('teaserText')
entries = [
self._extract_from_json_url(
video['jsonUrl'], video.get('programId') or playlist_id, lang)
for video in collection['videos'] if video.get('jsonUrl')]
return self.playlist_result(entries, playlist_id, title, description)

View File

@@ -29,7 +29,7 @@ class BandcampIE(InfoExtractor):
'_skip': 'There is a limit of 200 free downloads / month for the test song' '_skip': 'There is a limit of 200 free downloads / month for the test song'
}, { }, {
'url': 'http://benprunty.bandcamp.com/track/lanius-battle', 'url': 'http://benprunty.bandcamp.com/track/lanius-battle',
'md5': '2b68e5851514c20efdff2afc5603b8b4', 'md5': '73d0b3171568232574e45652f8720b5c',
'info_dict': { 'info_dict': {
'id': '2650410135', 'id': '2650410135',
'ext': 'mp3', 'ext': 'mp3',
@@ -48,6 +48,10 @@ class BandcampIE(InfoExtractor):
if m_trackinfo: if m_trackinfo:
json_code = m_trackinfo.group(1) json_code = m_trackinfo.group(1)
data = json.loads(json_code)[0] data = json.loads(json_code)[0]
track_id = compat_str(data['id'])
if not data.get('file'):
raise ExtractorError('Not streamable', video_id=track_id, expected=True)
formats = [] formats = []
for format_id, format_url in data['file'].items(): for format_id, format_url in data['file'].items():
@@ -64,7 +68,7 @@ class BandcampIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': compat_str(data['id']), 'id': track_id,
'title': data['title'], 'title': data['title'],
'formats': formats, 'formats': formats,
'duration': float_or_none(data.get('duration')), 'duration': float_or_none(data.get('duration')),

View File

@@ -1,34 +1,42 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import calendar
import datetime
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import (
compat_etree_fromstring,
compat_str,
compat_parse_qs,
compat_xml_parse_error,
)
from ..utils import ( from ..utils import (
int_or_none,
unescapeHTML,
ExtractorError, ExtractorError,
int_or_none,
float_or_none,
xpath_text, xpath_text,
) )
class BiliBiliIE(InfoExtractor): class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)(?:/index_(?P<page_num>\d+).html)?' _VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/', 'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '2c301e4dab317596e837c3e7633e7d86', 'md5': '5f7d29e1a2872f3df0cf76b1f87d3788',
'info_dict': { 'info_dict': {
'id': '1554319', 'id': '1554319',
'ext': 'flv', 'ext': 'flv',
'title': '【金坷垃】金泡沫', 'title': '【金坷垃】金泡沫',
'duration': 308313, 'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'duration': 308.067,
'timestamp': 1398012660,
'upload_date': '20140420', 'upload_date': '20140420',
'thumbnail': 're:^https?://.+\.jpg', 'thumbnail': 're:^https?://.+\.jpg',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'timestamp': 1397983878,
'uploader': '菊子桑', 'uploader': '菊子桑',
'uploader_id': '156160',
}, },
}, { }, {
'url': 'http://www.bilibili.com/video/av1041170/', 'url': 'http://www.bilibili.com/video/av1041170/',
@@ -36,75 +44,169 @@ class BiliBiliIE(InfoExtractor):
'id': '1041170', 'id': '1041170',
'title': '【BD1080P】刀语【诸神&异域】', 'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~', 'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
'uploader': '枫叶逝去',
'timestamp': 1396501299,
}, },
'playlist_count': 9, 'playlist_count': 9,
}, {
'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': {
'id': '4808130',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
},
'playlist': [{
'md5': '55cdadedf3254caaa0d5d27cf20a8f9c',
'info_dict': {
'id': '4808130_part1',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '926f9f67d0c482091872fbd8eca7ea3d',
'info_dict': {
'id': '4808130_part2',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '4b7b225b968402d7c32348c646f1fd83',
'info_dict': {
'id': '4808130_part3',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
'md5': '7b795e214166501e9141139eea236e91',
'info_dict': {
'id': '4808130_part4',
'ext': 'flv',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'timestamp': 1464564180,
'upload_date': '20160529',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}],
}] }]
# BiliBili blocks keys from time to time. The current key is extracted from
# the Android client
# TODO: find the sign algorithm used in the flash player
_APP_KEY = '86385cdc024c0f6c'
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
page_num = mobj.group('page_num') or '1'
view_data = self._download_json( webpage = self._download_webpage(url, video_id)
'http://api.bilibili.com/view?type=json&appkey=8e9fc618fbd41e28&id=%s&page=%s' % (video_id, page_num),
video_id)
if 'error' in view_data:
raise ExtractorError('%s said: %s' % (self.IE_NAME, view_data['error']), expected=True)
cid = view_data['cid'] params = compat_parse_qs(self._search_regex(
title = unescapeHTML(view_data['title']) [r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
webpage, 'player parameters'))
cid = params['cid'][0]
doc = self._download_xml( info_xml_str = self._download_webpage(
'http://interface.bilibili.com/v_cdn_play?appkey=8e9fc618fbd41e28&cid=%s' % cid, 'http://interface.bilibili.com/v_cdn_play',
cid, cid, query={'appkey': self._APP_KEY, 'cid': cid},
'Downloading page %s/%s' % (page_num, view_data['pages']) note='Downloading video info page')
)
if xpath_text(doc, './result') == 'error': err_msg = None
raise ExtractorError('%s said: %s' % (self.IE_NAME, xpath_text(doc, './message')), expected=True) durls = None
info_xml = None
try:
info_xml = compat_etree_fromstring(info_xml_str.encode('utf-8'))
except compat_xml_parse_error:
info_json = self._parse_json(info_xml_str, video_id, fatal=False)
err_msg = (info_json or {}).get('error_text')
else:
err_msg = xpath_text(info_xml, './message')
if info_xml is not None:
durls = info_xml.findall('./durl')
if not durls:
if err_msg:
raise ExtractorError('%s said: %s' % (self.IE_NAME, err_msg), expected=True)
else:
raise ExtractorError('No videos found!')
entries = [] entries = []
for durl in doc.findall('./durl'): for durl in durls:
size = xpath_text(durl, ['./filesize', './size']) size = xpath_text(durl, ['./filesize', './size'])
formats = [{ formats = [{
'url': durl.find('./url').text, 'url': durl.find('./url').text,
'filesize': int_or_none(size), 'filesize': int_or_none(size),
'ext': 'flv',
}] }]
backup_urls = durl.find('./backup_url') for backup_url in durl.findall('./backup_url/url'):
if backup_urls is not None: formats.append({
for backup_url in backup_urls.findall('./url'): 'url': backup_url.text,
formats.append({'url': backup_url.text}) # backup URLs have lower priorities
formats.reverse() 'preference': -2 if 'hd.mp4' in backup_url.text else -3,
})
self._sort_formats(formats)
entries.append({ entries.append({
'id': '%s_part%s' % (cid, xpath_text(durl, './order')), 'id': '%s_part%s' % (cid, xpath_text(durl, './order')),
'title': title,
'duration': int_or_none(xpath_text(durl, './length'), 1000), 'duration': int_or_none(xpath_text(durl, './length'), 1000),
'formats': formats, 'formats': formats,
}) })
title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title')
description = self._html_search_meta('description', webpage)
datetime_str = self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False)
if datetime_str:
timestamp = calendar.timegm(datetime.datetime.strptime(datetime_str, '%Y-%m-%dT%H:%M').timetuple())
# TODO 'view_count' requires deobfuscating Javascript
info = { info = {
'id': compat_str(cid), 'id': compat_str(cid),
'title': title, 'title': title,
'description': view_data.get('description'), 'description': description,
'thumbnail': view_data.get('pic'), 'timestamp': timestamp,
'uploader': view_data.get('author'), 'thumbnail': self._html_search_meta('thumbnailUrl', webpage),
'timestamp': int_or_none(view_data.get('created')), 'duration': float_or_none(xpath_text(info_xml, './timelength'), scale=1000),
'view_count': int_or_none(view_data.get('play')),
'duration': int_or_none(xpath_text(doc, './timelength')),
} }
uploader_mobj = re.search(
r'<a[^>]+href="https?://space\.bilibili\.com/(?P<id>\d+)"[^>]+title="(?P<name>[^"]+)"',
webpage)
if uploader_mobj:
info.update({
'uploader': uploader_mobj.group('name'),
'uploader_id': uploader_mobj.group('id'),
})
for entry in entries:
entry.update(info)
if len(entries) == 1: if len(entries) == 1:
entries[0].update(info)
return entries[0] return entries[0]
else: else:
info.update({ for idx, entry in enumerate(entries):
entry['id'] = '%s_part%d' % (video_id, (idx + 1))
return {
'_type': 'multi_video', '_type': 'multi_video',
'id': video_id, 'id': video_id,
'title': title,
'description': description,
'entries': entries, 'entries': entries,
}) }
return info

View File

@@ -11,6 +11,7 @@ class BYUtvIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?byutv.org/watch/[0-9a-f-]+/(?P<video_id>[^/?#]+)' _VALID_URL = r'^https?://(?:www\.)?byutv.org/watch/[0-9a-f-]+/(?P<video_id>[^/?#]+)'
_TEST = { _TEST = {
'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5', 'url': 'http://www.byutv.org/watch/6587b9a3-89d2-42a6-a7f7-fd2f81840a7d/studio-c-season-5-episode-5',
'md5': '05850eb8c749e2ee05ad5a1c34668493',
'info_dict': { 'info_dict': {
'id': 'studio-c-season-5-episode-5', 'id': 'studio-c-season-5-episode-5',
'ext': 'mp4', 'ext': 'mp4',
@@ -21,7 +22,8 @@ class BYUtvIE(InfoExtractor):
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} },
'add_ie': ['Ooyala'],
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -1,5 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
xpath_text, xpath_text,
@@ -21,7 +23,7 @@ class CBSBaseIE(ThePlatformIE):
class CBSIE(CBSBaseIE): class CBSIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<id>[^/]+)' _VALID_URL = r'(?:cbs:(?P<content_id>\w+)|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<display_id>[^/]+))'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/', 'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
@@ -66,11 +68,12 @@ class CBSIE(CBSBaseIE):
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) content_id, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id) if not content_id:
content_id = self._search_regex( webpage = self._download_webpage(url, display_id)
[r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"], content_id = self._search_regex(
webpage, 'content id') [r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"],
webpage, 'content id')
items_data = self._download_xml( items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php', 'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id}) content_id, query={'partner': 'cbs', 'contentId': content_id})

View File

@@ -20,54 +20,64 @@ class Channel9IE(InfoExtractor):
''' '''
IE_DESC = 'Channel 9' IE_DESC = 'Channel 9'
IE_NAME = 'channel9' IE_NAME = 'channel9'
_VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+)/?' _VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+?)(?P<rss>/RSS)?/?(?:[?#&]|$)'
_TESTS = [ _TESTS = [{
{ 'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002', 'md5': 'bbd75296ba47916b754e73c3a4bbdf10',
'md5': 'bbd75296ba47916b754e73c3a4bbdf10', 'info_dict': {
'info_dict': { 'id': 'Events/TechEd/Australia/2013/KOS002',
'id': 'Events/TechEd/Australia/2013/KOS002', 'ext': 'mp4',
'ext': 'mp4', 'title': 'Developer Kick-Off Session: Stuff We Love',
'title': 'Developer Kick-Off Session: Stuff We Love', 'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
'description': 'md5:c08d72240b7c87fcecafe2692f80e35f', 'duration': 4576,
'duration': 4576, 'thumbnail': 're:http://.*\.jpg',
'thumbnail': 're:http://.*\.jpg', 'session_code': 'KOS002',
'session_code': 'KOS002', 'session_day': 'Day 1',
'session_day': 'Day 1', 'session_room': 'Arena 1A',
'session_room': 'Arena 1A', 'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug',
'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug', 'Mads Kristensen'], 'Mads Kristensen'],
},
}, },
{ }, {
'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing', 'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
'md5': 'b43ee4529d111bc37ba7ee4f34813e68', 'md5': 'b43ee4529d111bc37ba7ee4f34813e68',
'info_dict': { 'info_dict': {
'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing', 'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Self-service BI with Power BI - nuclear testing', 'title': 'Self-service BI with Power BI - nuclear testing',
'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b', 'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
'duration': 1540, 'duration': 1540,
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
'authors': ['Mike Wilmot'], 'authors': ['Mike Wilmot'],
},
}, },
{ }, {
# low quality mp4 is best # low quality mp4 is best
'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library', 'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'info_dict': { 'info_dict': {
'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library', 'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ranges for the Standard Library', 'title': 'Ranges for the Standard Library',
'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d', 'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d',
'duration': 5646, 'duration': 5646,
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
} }, {
] 'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
'info_dict': {
'id': 'Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b',
'title': 'Channel 9',
},
'playlist_count': 2,
}, {
'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS',
'only_matching': True,
}, {
'url': 'https://channel9.msdn.com/Events/Speakers/scott-hanselman/RSS?UrlSafeName=scott-hanselman',
'only_matching': True,
}]
_RSS_URL = 'http://channel9.msdn.com/%s/RSS' _RSS_URL = 'http://channel9.msdn.com/%s/RSS'
@@ -254,22 +264,30 @@ class Channel9IE(InfoExtractor):
return self.playlist_result(contents) return self.playlist_result(contents)
def _extract_list(self, content_path): def _extract_list(self, video_id, rss_url=None):
rss = self._download_xml(self._RSS_URL % content_path, content_path, 'Downloading RSS') if not rss_url:
rss_url = self._RSS_URL % video_id
rss = self._download_xml(rss_url, video_id, 'Downloading RSS')
entries = [self.url_result(session_url.text, 'Channel9') entries = [self.url_result(session_url.text, 'Channel9')
for session_url in rss.findall('./channel/item/link')] for session_url in rss.findall('./channel/item/link')]
title_text = rss.find('./channel/title').text title_text = rss.find('./channel/title').text
return self.playlist_result(entries, content_path, title_text) return self.playlist_result(entries, video_id, title_text)
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
content_path = mobj.group('contentpath') content_path = mobj.group('contentpath')
rss = mobj.group('rss')
webpage = self._download_webpage(url, content_path, 'Downloading web page') if rss:
return self._extract_list(content_path, url)
page_type_m = re.search(r'<meta name="WT.entryid" content="(?P<pagetype>[^:]+)[^"]+"/>', webpage) webpage = self._download_webpage(
if page_type_m is not None: url, content_path, 'Downloading web page')
page_type = page_type_m.group('pagetype')
page_type = self._search_regex(
r'<meta[^>]+name=(["\'])WT\.entryid\1[^>]+content=(["\'])(?P<pagetype>[^:]+).+?\2',
webpage, 'page type', default=None, group='pagetype')
if page_type:
if page_type == 'Entry': # Any 'item'-like page, may contain downloadable content if page_type == 'Entry': # Any 'item'-like page, may contain downloadable content
return self._extract_entry_item(webpage, content_path) return self._extract_entry_item(webpage, content_path)
elif page_type == 'Session': # Event session page, may contain downloadable content elif page_type == 'Session': # Event session page, may contain downloadable content
@@ -278,6 +296,5 @@ class Channel9IE(InfoExtractor):
return self._extract_list(content_path) return self._extract_list(content_path)
else: else:
raise ExtractorError('Unexpected WT.entryid %s' % page_type, expected=True) raise ExtractorError('Unexpected WT.entryid %s' % page_type, expected=True)
else: # Assuming list else: # Assuming list
return self._extract_list(content_path) return self._extract_list(content_path)

View File

@@ -44,10 +44,10 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
# or: http://www.colbertnation.com/the-colbert-report-collections/422008/festival-of-lights/79524 # or: http://www.colbertnation.com/the-colbert-report-collections/422008/festival-of-lights/79524
_VALID_URL = r'''(?x)^(:(?P<shortname>tds|thedailyshow) _VALID_URL = r'''(?x)^(:(?P<shortname>tds|thedailyshow)
|https?://(:www\.)? |https?://(:www\.)?
(?P<showname>thedailyshow|thecolbertreport)\.(?:cc\.)?com/ (?P<showname>thedailyshow|thecolbertreport|tosh)\.(?:cc\.)?com/
((?:full-)?episodes/(?:[0-9a-z]{6}/)?(?P<episode>.*)| ((?:full-)?episodes/(?:[0-9a-z]{6}/)?(?P<episode>.*)|
(?P<clip> (?P<clip>
(?:(?:guests/[^/]+|videos|video-playlists|special-editions|news-team/[^/]+)/[^/]+/(?P<videotitle>[^/?#]+)) (?:(?:guests/[^/]+|videos|video-(?:clips|playlists)|special-editions|news-team/[^/]+)/[^/]+/(?P<videotitle>[^/?#]+))
|(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?)) |(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?))
|(watch/(?P<date>[^/]*)/(?P<tdstitle>.*)) |(watch/(?P<date>[^/]*)/(?P<tdstitle>.*))
)| )|
@@ -129,6 +129,9 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
}, { }, {
'url': 'http://thedailyshow.cc.com/news-team/michael-che/7wnfel/we-need-to-talk-about-israel', 'url': 'http://thedailyshow.cc.com/news-team/michael-che/7wnfel/we-need-to-talk-about-israel',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://tosh.cc.com/video-clips/68g93d/twitter-users-share-summer-plans',
'only_matching': True,
}] }]
_available_formats = ['3500', '2200', '1700', '1200', '750', '400'] _available_formats = ['3500', '2200', '1700', '1200', '750', '400']

View File

@@ -987,7 +987,7 @@ class InfoExtractor(object):
def _extract_f4m_formats(self, manifest_url, video_id, preference=None, f4m_id=None, def _extract_f4m_formats(self, manifest_url, video_id, preference=None, f4m_id=None,
transform_source=lambda s: fix_xml_ampersands(s).strip(), transform_source=lambda s: fix_xml_ampersands(s).strip(),
fatal=True): fatal=True, m3u8_id=None):
manifest = self._download_xml( manifest = self._download_xml(
manifest_url, video_id, 'Downloading f4m manifest', manifest_url, video_id, 'Downloading f4m manifest',
'Unable to download f4m manifest', 'Unable to download f4m manifest',
@@ -1001,11 +1001,11 @@ class InfoExtractor(object):
return self._parse_f4m_formats( return self._parse_f4m_formats(
manifest, manifest_url, video_id, preference=preference, f4m_id=f4m_id, manifest, manifest_url, video_id, preference=preference, f4m_id=f4m_id,
transform_source=transform_source, fatal=fatal) transform_source=transform_source, fatal=fatal, m3u8_id=m3u8_id)
def _parse_f4m_formats(self, manifest, manifest_url, video_id, preference=None, f4m_id=None, def _parse_f4m_formats(self, manifest, manifest_url, video_id, preference=None, f4m_id=None,
transform_source=lambda s: fix_xml_ampersands(s).strip(), transform_source=lambda s: fix_xml_ampersands(s).strip(),
fatal=True): fatal=True, m3u8_id=None):
# currently youtube-dl cannot decode the playerVerificationChallenge as Akamai uses Adobe Alchemy # currently youtube-dl cannot decode the playerVerificationChallenge as Akamai uses Adobe Alchemy
akamai_pv = manifest.find('{http://ns.adobe.com/f4m/1.0}pv-2.0') akamai_pv = manifest.find('{http://ns.adobe.com/f4m/1.0}pv-2.0')
if akamai_pv is not None and ';' in akamai_pv.text: if akamai_pv is not None and ';' in akamai_pv.text:
@@ -1029,9 +1029,26 @@ class InfoExtractor(object):
'base URL', default=None) 'base URL', default=None)
if base_url: if base_url:
base_url = base_url.strip() base_url = base_url.strip()
bootstrap_info = xpath_text(
manifest, ['{http://ns.adobe.com/f4m/1.0}bootstrapInfo', '{http://ns.adobe.com/f4m/2.0}bootstrapInfo'],
'bootstrap info', default=None)
for i, media_el in enumerate(media_nodes): for i, media_el in enumerate(media_nodes):
if manifest_version == '2.0': tbr = int_or_none(media_el.attrib.get('bitrate'))
media_url = media_el.attrib.get('href') or media_el.attrib.get('url') width = int_or_none(media_el.attrib.get('width'))
height = int_or_none(media_el.attrib.get('height'))
format_id = '-'.join(filter(None, [f4m_id, compat_str(i if tbr is None else tbr)]))
# If <bootstrapInfo> is present, the specified f4m is a
# stream-level manifest, and only set-level manifests may refer to
# external resources. See section 11.4 and section 4 of F4M spec
if bootstrap_info is None:
media_url = None
# @href is introduced in 2.0, see section 11.6 of F4M spec
if manifest_version == '2.0':
media_url = media_el.attrib.get('href')
if media_url is None:
media_url = media_el.attrib.get('url')
if not media_url: if not media_url:
continue continue
manifest_url = ( manifest_url = (
@@ -1041,19 +1058,37 @@ class InfoExtractor(object):
# since bitrates in parent manifest (this one) and media_url manifest # since bitrates in parent manifest (this one) and media_url manifest
# may differ leading to inability to resolve the format by requested # may differ leading to inability to resolve the format by requested
# bitrate in f4m downloader # bitrate in f4m downloader
if determine_ext(manifest_url) == 'f4m': ext = determine_ext(manifest_url)
formats.extend(self._extract_f4m_formats( if ext == 'f4m':
f4m_formats = self._extract_f4m_formats(
manifest_url, video_id, preference=preference, f4m_id=f4m_id, manifest_url, video_id, preference=preference, f4m_id=f4m_id,
transform_source=transform_source, fatal=fatal)) transform_source=transform_source, fatal=fatal)
# Sometimes stream-level manifest contains single media entry that
# does not contain any quality metadata (e.g. http://matchtv.ru/#live-player).
# At the same time parent's media entry in set-level manifest may
# contain it. We will copy it from parent in such cases.
if len(f4m_formats) == 1:
f = f4m_formats[0]
f.update({
'tbr': f.get('tbr') or tbr,
'width': f.get('width') or width,
'height': f.get('height') or height,
'format_id': f.get('format_id') if not tbr else format_id,
})
formats.extend(f4m_formats)
continue
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
manifest_url, video_id, 'mp4', preference=preference,
m3u8_id=m3u8_id, fatal=fatal))
continue continue
tbr = int_or_none(media_el.attrib.get('bitrate'))
formats.append({ formats.append({
'format_id': '-'.join(filter(None, [f4m_id, compat_str(i if tbr is None else tbr)])), 'format_id': format_id,
'url': manifest_url, 'url': manifest_url,
'ext': 'flv', 'ext': 'flv' if bootstrap_info else None,
'tbr': tbr, 'tbr': tbr,
'width': int_or_none(media_el.attrib.get('width')), 'width': width,
'height': int_or_none(media_el.attrib.get('height')), 'height': height,
'preference': preference, 'preference': preference,
}) })
return formats return formats

View File

@@ -0,0 +1,143 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_iso8601,
qualities,
)
class CoubIE(InfoExtractor):
_VALID_URL = r'(?:coub:|https?://(?:coub\.com/(?:view|embed|coubs)/|c-cdn\.coub\.com/fb-player\.swf\?.*\bcoub(?:ID|id)=))(?P<id>[\da-z]+)'
_TESTS = [{
'url': 'http://coub.com/view/5u5n1',
'info_dict': {
'id': '5u5n1',
'ext': 'mp4',
'title': 'The Matrix Moonwalk',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 4.6,
'timestamp': 1428527772,
'upload_date': '20150408',
'uploader': 'Артём Лоскутников',
'uploader_id': 'artyom.loskutnikov',
'view_count': int,
'like_count': int,
'repost_count': int,
'comment_count': int,
'age_limit': 0,
},
}, {
'url': 'http://c-cdn.coub.com/fb-player.swf?bot_type=vk&coubID=7w5a4',
'only_matching': True,
}, {
'url': 'coub:5u5n1',
'only_matching': True,
}, {
# longer video id
'url': 'http://coub.com/view/237d5l5h',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
coub = self._download_json(
'http://coub.com/api/v2/coubs/%s.json' % video_id, video_id)
if coub.get('error'):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, coub['error']), expected=True)
title = coub['title']
file_versions = coub['file_versions']
QUALITIES = ('low', 'med', 'high')
MOBILE = 'mobile'
IPHONE = 'iphone'
HTML5 = 'html5'
SOURCE_PREFERENCE = (MOBILE, IPHONE, HTML5)
quality_key = qualities(QUALITIES)
preference_key = qualities(SOURCE_PREFERENCE)
formats = []
for kind, items in file_versions.get(HTML5, {}).items():
if kind not in ('video', 'audio'):
continue
if not isinstance(items, dict):
continue
for quality, item in items.items():
if not isinstance(item, dict):
continue
item_url = item.get('url')
if not item_url:
continue
formats.append({
'url': item_url,
'format_id': '%s-%s-%s' % (HTML5, kind, quality),
'filesize': int_or_none(item.get('size')),
'vcodec': 'none' if kind == 'audio' else None,
'quality': quality_key(quality),
'preference': preference_key(HTML5),
})
iphone_url = file_versions.get(IPHONE, {}).get('url')
if iphone_url:
formats.append({
'url': iphone_url,
'format_id': IPHONE,
'preference': preference_key(IPHONE),
})
mobile_url = file_versions.get(MOBILE, {}).get('audio_url')
if mobile_url:
formats.append({
'url': mobile_url,
'format_id': '%s-audio' % MOBILE,
'preference': preference_key(MOBILE),
})
self._sort_formats(formats)
thumbnail = coub.get('picture')
duration = float_or_none(coub.get('duration'))
timestamp = parse_iso8601(coub.get('published_at') or coub.get('created_at'))
uploader = coub.get('channel', {}).get('title')
uploader_id = coub.get('channel', {}).get('permalink')
view_count = int_or_none(coub.get('views_count') or coub.get('views_increase_count'))
like_count = int_or_none(coub.get('likes_count'))
repost_count = int_or_none(coub.get('recoubs_count'))
comment_count = int_or_none(coub.get('comments_count'))
age_restricted = coub.get('age_restricted', coub.get('age_restricted_by_admin'))
if age_restricted is not None:
age_limit = 18 if age_restricted is True else 0
else:
age_limit = None
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'view_count': view_count,
'like_count': like_count,
'repost_count': repost_count,
'comment_count': comment_count,
'age_limit': age_limit,
'formats': formats,
}

View File

@@ -2,13 +2,16 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..utils import (
int_or_none,
unified_strdate,
)
from ..compat import compat_urlparse from ..compat import compat_urlparse
class DWIE(InfoExtractor): class DWIE(InfoExtractor):
IE_NAME = 'dw' IE_NAME = 'dw'
_VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+av-(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+(?:av|e)-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
# video # video
'url': 'http://www.dw.com/en/intelligent-light/av-19112290', 'url': 'http://www.dw.com/en/intelligent-light/av-19112290',
@@ -31,6 +34,16 @@ class DWIE(InfoExtractor):
'description': 'md5:bc9ca6e4e063361e21c920c53af12405', 'description': 'md5:bc9ca6e4e063361e21c920c53af12405',
'upload_date': '20160311', 'upload_date': '20160311',
} }
}, {
'url': 'http://www.dw.com/en/documentaries-welcome-to-the-90s-2016-05-21/e-19220158-9798',
'md5': '56b6214ef463bfb9a3b71aeb886f3cf1',
'info_dict': {
'id': '19274438',
'ext': 'mp4',
'title': 'Welcome to the 90s Hip Hop',
'description': 'Welcome to the 90s - The Golden Decade of Hip Hop',
'upload_date': '20160521',
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -38,6 +51,7 @@ class DWIE(InfoExtractor):
webpage = self._download_webpage(url, media_id) webpage = self._download_webpage(url, media_id)
hidden_inputs = self._hidden_inputs(webpage) hidden_inputs = self._hidden_inputs(webpage)
title = hidden_inputs['media_title'] title = hidden_inputs['media_title']
media_id = hidden_inputs.get('media_id') or media_id
if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1': if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1':
formats = self._extract_smil_formats( formats = self._extract_smil_formats(
@@ -49,13 +63,20 @@ class DWIE(InfoExtractor):
else: else:
formats = [{'url': hidden_inputs['file_name']}] formats = [{'url': hidden_inputs['file_name']}]
upload_date = hidden_inputs.get('display_date')
if not upload_date:
upload_date = self._html_search_regex(
r'<span[^>]+class="date">([0-9.]+)\s*\|', webpage,
'upload date', default=None)
upload_date = unified_strdate(upload_date)
return { return {
'id': media_id, 'id': media_id,
'title': title, 'title': title,
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'thumbnail': hidden_inputs.get('preview_image'), 'thumbnail': hidden_inputs.get('preview_image'),
'duration': int_or_none(hidden_inputs.get('file_duration')), 'duration': int_or_none(hidden_inputs.get('file_duration')),
'upload_date': hidden_inputs.get('display_date'), 'upload_date': upload_date,
'formats': formats, 'formats': formats,
} }

View File

@@ -11,8 +11,8 @@ from ..utils import (
class EpornerIE(InfoExtractor): class EpornerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?eporner\.com/hd-porn/(?P<id>\d+)/(?P<display_id>[\w-]+)' _VALID_URL = r'https?://(?:www\.)?eporner\.com/hd-porn/(?P<id>\w+)/(?P<display_id>[\w-]+)'
_TEST = { _TESTS = [{
'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/', 'url': 'http://www.eporner.com/hd-porn/95008/Infamous-Tiffany-Teen-Strip-Tease-Video/',
'md5': '39d486f046212d8e1b911c52ab4691f8', 'md5': '39d486f046212d8e1b911c52ab4691f8',
'info_dict': { 'info_dict': {
@@ -23,8 +23,12 @@ class EpornerIE(InfoExtractor):
'duration': 1838, 'duration': 1838,
'view_count': int, 'view_count': int,
'age_limit': 18, 'age_limit': 18,
} },
} }, {
# New (May 2016) URL layout
'url': 'http://www.eporner.com/hd-porn/3YRUtzMcWn0/Star-Wars-XXX-Parody/',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)

View File

@@ -8,6 +8,7 @@ class ESPNIE(InfoExtractor):
_VALID_URL = r'https?://espn\.go\.com/(?:[^/]+/)*(?P<id>[^/]+)' _VALID_URL = r'https?://espn\.go\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://espn.go.com/video/clip?id=10365079', 'url': 'http://espn.go.com/video/clip?id=10365079',
'md5': '60e5d097a523e767d06479335d1bdc58',
'info_dict': { 'info_dict': {
'id': 'FkYWtmazr6Ed8xmvILvKLWjd4QvYZpzG', 'id': 'FkYWtmazr6Ed8xmvILvKLWjd4QvYZpzG',
'ext': 'mp4', 'ext': 'mp4',
@@ -15,21 +16,22 @@ class ESPNIE(InfoExtractor):
'description': None, 'description': None,
}, },
'params': { 'params': {
# m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['OoyalaExternal'],
}, { }, {
# intl video, from http://www.espnfc.us/video/mls-highlights/150/video/2743663/must-see-moments-best-of-the-mls-season # intl video, from http://www.espnfc.us/video/mls-highlights/150/video/2743663/must-see-moments-best-of-the-mls-season
'url': 'http://espn.go.com/video/clip?id=2743663', 'url': 'http://espn.go.com/video/clip?id=2743663',
'md5': 'f4ac89b59afc7e2d7dbb049523df6768',
'info_dict': { 'info_dict': {
'id': '50NDFkeTqRHB0nXBOK-RGdSG5YQPuxHg', 'id': '50NDFkeTqRHB0nXBOK-RGdSG5YQPuxHg',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Must-See Moments: Best of the MLS season', 'title': 'Must-See Moments: Best of the MLS season',
}, },
'params': { 'params': {
# m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['OoyalaExternal'],
}, { }, {
'url': 'https://espn.go.com/video/iframe/twitter/?cms=espn&id=10365079', 'url': 'https://espn.go.com/video/iframe/twitter/?cms=espn&id=10365079',
'only_matching': True, 'only_matching': True,

View File

@@ -56,6 +56,7 @@ from .arte import (
ArteTVDDCIE, ArteTVDDCIE,
ArteTVMagazineIE, ArteTVMagazineIE,
ArteTVEmbedIE, ArteTVEmbedIE,
ArteTVPlaylistIE,
) )
from .atresplayer import AtresPlayerIE from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE from .atttechchannel import ATTTechChannelIE
@@ -143,6 +144,7 @@ from .cnn import (
CNNBlogsIE, CNNBlogsIE,
CNNArticleIE, CNNArticleIE,
) )
from .coub import CoubIE
from .collegerama import CollegeRamaIE from .collegerama import CollegeRamaIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .comcarcoff import ComCarCoffIE from .comcarcoff import ComCarCoffIE
@@ -231,6 +233,7 @@ from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE from .exfm import ExfmIE
from .expotv import ExpoTVIE from .expotv import ExpoTVIE
from .extremetube import ExtremeTubeIE from .extremetube import ExtremeTubeIE
from .eyedotv import EyedoTVIE
from .facebook import FacebookIE from .facebook import FacebookIE
from .faz import FazIE from .faz import FazIE
from .fc2 import FC2IE from .fc2 import FC2IE
@@ -379,6 +382,7 @@ from .leeco import (
LePlaylistIE, LePlaylistIE,
LetvCloudIE, LetvCloudIE,
) )
from .libraryofcongress import LibraryOfCongressIE
from .libsyn import LibsynIE from .libsyn import LibsynIE
from .lifenews import ( from .lifenews import (
LifeNewsIE, LifeNewsIE,
@@ -617,6 +621,10 @@ from .qqmusic import (
QQMusicPlaylistIE, QQMusicPlaylistIE,
) )
from .r7 import R7IE from .r7 import R7IE
from .radiocanada import (
RadioCanadaIE,
RadioCanadaAudioVideoIE,
)
from .radiode import RadioDeIE from .radiode import RadioDeIE
from .radiojavan import RadioJavanIE from .radiojavan import RadioJavanIE
from .radiobremen import RadioBremenIE from .radiobremen import RadioBremenIE
@@ -630,8 +638,12 @@ from .rds import RDSIE
from .redtube import RedTubeIE from .redtube import RedTubeIE
from .regiotv import RegioTVIE from .regiotv import RegioTVIE
from .restudy import RestudyIE from .restudy import RestudyIE
from .reuters import ReutersIE
from .reverbnation import ReverbNationIE from .reverbnation import ReverbNationIE
from .revision3 import Revision3IE from .revision3 import (
Revision3EmbedIE,
Revision3IE,
)
from .rice import RICEIE from .rice import RICEIE
from .ringtv import RingTVIE from .ringtv import RingTVIE
from .ro220 import Ro220IE from .ro220 import Ro220IE
@@ -670,6 +682,7 @@ from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE from .screencastomatic import ScreencastOMaticIE
from .screenjunkies import ScreenJunkiesIE from .screenjunkies import ScreenJunkiesIE
from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
from .seeker import SeekerIE
from .senateisvp import SenateISVPIE from .senateisvp import SenateISVPIE
from .sendtonews import SendtoNewsIE from .sendtonews import SendtoNewsIE
from .servingsys import ServingSysIE from .servingsys import ServingSysIE
@@ -827,7 +840,10 @@ from .tvc import (
) )
from .tvigle import TvigleIE from .tvigle import TvigleIE
from .tvland import TVLandIE from .tvland import TVLandIE
from .tvp import TvpIE, TvpSeriesIE from .tvp import (
TVPIE,
TVPSeriesIE,
)
from .tvplay import TVPlayIE from .tvplay import TVPlayIE
from .tweakers import TweakersIE from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE from .twentyfourvideo import TwentyFourVideoIE
@@ -941,7 +957,10 @@ from .vube import VubeIE
from .vuclip import VuClipIE from .vuclip import VuClipIE
from .vulture import VultureIE from .vulture import VultureIE
from .walla import WallaIE from .walla import WallaIE
from .washingtonpost import WashingtonPostIE from .washingtonpost import (
WashingtonPostIE,
WashingtonPostArticleIE,
)
from .wat import WatIE from .wat import WatIE
from .watchindianporn import WatchIndianPornIE from .watchindianporn import WatchIndianPornIE
from .wdr import ( from .wdr import (

View File

@@ -0,0 +1,64 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
xpath_text,
parse_duration,
ExtractorError,
)
class EyedoTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?eyedo\.tv/[^/]+/(?:#!/)?Live/Detail/(?P<id>[0-9]+)'
_TEST = {
'url': 'https://www.eyedo.tv/en-US/#!/Live/Detail/16301',
'md5': 'ba14f17995cdfc20c36ba40e21bf73f7',
'info_dict': {
'id': '16301',
'ext': 'mp4',
'title': 'Journée du conseil scientifique de l\'Afnic 2015',
'description': 'md5:4abe07293b2f73efc6e1c37028d58c98',
'uploader': 'Afnic Live',
'uploader_id': '8023',
}
}
_ROOT_URL = 'http://live.eyedo.net:1935/'
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_xml('http://eyedo.tv/api/live/GetLive/%s' % video_id, video_id)
def _add_ns(path):
return self._xpath_ns(path, 'http://schemas.datacontract.org/2004/07/EyeDo.Core.Implementation.Web.ViewModels.Api')
title = xpath_text(video_data, _add_ns('Titre'), 'title', True)
state_live_code = xpath_text(video_data, _add_ns('StateLiveCode'), 'title', True)
if state_live_code == 'avenir':
raise ExtractorError(
'%s said: We\'re sorry, but this video is not yet available.' % self.IE_NAME,
expected=True)
is_live = state_live_code == 'live'
m3u8_url = None
# http://eyedo.tv/Content/Html5/Scripts/html5view.js
if is_live:
if xpath_text(video_data, 'Cdn') == 'true':
m3u8_url = 'http://rrr.sz.xlcdn.com/?account=eyedo&file=A%s&type=live&service=wowza&protocol=http&output=playlist.m3u8' % video_id
else:
m3u8_url = self._ROOT_URL + 'w/%s/eyedo_720p/playlist.m3u8' % video_id
else:
m3u8_url = self._ROOT_URL + 'replay-w/%s/mp4:%s.mp4/playlist.m3u8' % (video_id, video_id)
return {
'id': video_id,
'title': title,
'formats': self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8' if is_live else 'm3u8_native'),
'description': xpath_text(video_data, _add_ns('Description')),
'duration': parse_duration(xpath_text(video_data, _add_ns('Duration'))),
'uploader': xpath_text(video_data, _add_ns('Createur')),
'uploader_id': xpath_text(video_data, _add_ns('CreateurId')),
'chapter': xpath_text(video_data, _add_ns('ChapitreTitre')),
'chapter_id': xpath_text(video_data, _add_ns('ChapitreId')),
}

View File

@@ -13,7 +13,8 @@ class Formula1IE(InfoExtractor):
'id': 'JvYXJpMzE6pArfHWm5ARp5AiUmD-gibV', 'id': 'JvYXJpMzE6pArfHWm5ARp5AiUmD-gibV',
'ext': 'flv', 'ext': 'flv',
'title': 'Race highlights - Spain 2016', 'title': 'Race highlights - Spain 2016',
} },
'add_ie': ['Ooyala'],
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -62,6 +62,7 @@ from .digiteka import DigitekaIE
from .instagram import InstagramIE from .instagram import InstagramIE
from .liveleak import LiveLeakIE from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE from .threeqsdn import ThreeQSDNIE
from .theplatform import ThePlatformIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@@ -783,6 +784,19 @@ class GenericIE(InfoExtractor):
'title': 'Rosetta #CometLanding webcast HL 10', 'title': 'Rosetta #CometLanding webcast HL 10',
} }
}, },
# Another Livestream embed, without 'new.' in URL
{
'url': 'https://www.freespeech.org/',
'info_dict': {
'id': '123537347',
'ext': 'mp4',
'title': 're:^FSTV [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
},
'params': {
# Live stream
'skip_download': True,
},
},
# LazyYT # LazyYT
{ {
'url': 'http://discourse.ubuntu.com/t/unity-8-desktop-mode-windows-on-mir/1986', 'url': 'http://discourse.ubuntu.com/t/unity-8-desktop-mode-windows-on-mir/1986',
@@ -867,18 +881,6 @@ class GenericIE(InfoExtractor):
'title': 'EP3S5 - Bon Appétit - Baqueira Mi Corazon !', 'title': 'EP3S5 - Bon Appétit - Baqueira Mi Corazon !',
} }
}, },
# Kaltura embed
{
'url': 'http://www.monumentalnetwork.com/videos/john-carlson-postgame-2-25-15',
'info_dict': {
'id': '1_eergr3h1',
'ext': 'mp4',
'upload_date': '20150226',
'uploader_id': 'MonumentalSports-Kaltura@perfectsensedigital.com',
'timestamp': int,
'title': 'John Carlson Postgame 2/25/15',
},
},
# Kaltura embed (different embed code) # Kaltura embed (different embed code)
{ {
'url': 'http://www.premierchristianradio.com/Shows/Saturday/Unbelievable/Conference-Videos/Os-Guinness-Is-It-Fools-Talk-Unbelievable-Conference-2014', 'url': 'http://www.premierchristianradio.com/Shows/Saturday/Unbelievable/Conference-Videos/Os-Guinness-Is-It-Fools-Talk-Unbelievable-Conference-2014',
@@ -904,6 +906,19 @@ class GenericIE(InfoExtractor):
'uploader_id': 'echojecka', 'uploader_id': 'echojecka',
}, },
}, },
# Kaltura embed with single quotes
{
'url': 'http://fod.infobase.com/p_ViewPlaylist.aspx?AssignmentID=NUN8ZY',
'info_dict': {
'id': '0_izeg5utt',
'ext': 'mp4',
'title': '35871',
'timestamp': 1355743100,
'upload_date': '20121217',
'uploader_id': 'batchUser',
},
'add_ie': ['Kaltura'],
},
# Eagle.Platform embed (generic URL) # Eagle.Platform embed (generic URL)
{ {
'url': 'http://lenta.ru/news/2015/03/06/navalny/', 'url': 'http://lenta.ru/news/2015/03/06/navalny/',
@@ -1018,14 +1033,18 @@ class GenericIE(InfoExtractor):
}, },
# UDN embed # UDN embed
{ {
'url': 'http://www.udn.com/news/story/7314/822787', 'url': 'https://video.udn.com/news/300346',
'md5': 'fd2060e988c326991037b9aff9df21a6', 'md5': 'fd2060e988c326991037b9aff9df21a6',
'info_dict': { 'info_dict': {
'id': '300346', 'id': '300346',
'ext': 'mp4', 'ext': 'mp4',
'title': '中一中男師變性 全校師生力挺', 'title': '中一中男師變性 全校師生力挺',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
} },
'params': {
# m3u8 download
'skip_download': True,
},
}, },
# Ooyala embed # Ooyala embed
{ {
@@ -1193,6 +1212,16 @@ class GenericIE(InfoExtractor):
'uploader': 'Lake8737', 'uploader': 'Lake8737',
} }
}, },
# Duplicated embedded video URLs
{
'url': 'http://www.hudl.com/athlete/2538180/highlights/149298443',
'info_dict': {
'id': '149298443_480_16c25b74_2',
'ext': 'mp4',
'title': 'vs. Blue Orange Spring Game',
'uploader': 'www.hudl.com',
},
},
] ]
def report_following_redirect(self, new_url): def report_following_redirect(self, new_url):
@@ -1499,6 +1528,11 @@ class GenericIE(InfoExtractor):
if bc_urls: if bc_urls:
return _playlist_from_matches(bc_urls, ie='BrightcoveNew') return _playlist_from_matches(bc_urls, ie='BrightcoveNew')
# Look for ThePlatform embeds
tp_urls = ThePlatformIE._extract_urls(webpage)
if tp_urls:
return _playlist_from_matches(tp_urls, ie='ThePlatform')
# Look for embedded rtl.nl player # Look for embedded rtl.nl player
matches = re.findall( matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"', r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"',
@@ -1862,7 +1896,7 @@ class GenericIE(InfoExtractor):
return self.url_result(self._proto_relative_url(mobj.group('url'), scheme='http:'), 'CondeNast') return self.url_result(self._proto_relative_url(mobj.group('url'), scheme='http:'), 'CondeNast')
mobj = re.search( mobj = re.search(
r'<iframe[^>]+src="(?P<url>https?://new\.livestream\.com/[^"]+/player[^"]+)"', r'<iframe[^>]+src="(?P<url>https?://(?:new\.)?livestream\.com/[^"]+/player[^"]+)"',
webpage) webpage)
if mobj is not None: if mobj is not None:
return self.url_result(mobj.group('url'), 'Livestream') return self.url_result(mobj.group('url'), 'Livestream')
@@ -1874,7 +1908,7 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'), 'Zapiks') return self.url_result(mobj.group('url'), 'Zapiks')
# Look for Kaltura embeds # Look for Kaltura embeds
mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P<partner_id>[^']+)',.*?'entry_?[Ii]d'\s*:\s*'(?P<id>[^']+)',", webpage) or mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?(?P<q1>['\"])wid(?P=q1)\s*:\s*(?P<q2>['\"])_?(?P<partner_id>[^'\"]+)(?P=q2),.*?(?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*(?P<q4>['\"])(?P<id>[^'\"]+)(?P=q4),", webpage) or
re.search(r'(?s)(?P<q1>["\'])(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?(?P=q1).*?entry_?[Ii]d\s*:\s*(?P<q2>["\'])(?P<id>.+?)(?P=q2)', webpage)) re.search(r'(?s)(?P<q1>["\'])(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?(?P=q1).*?entry_?[Ii]d\s*:\s*(?P<q2>["\'])(?P<id>.+?)(?P=q2)', webpage))
if mobj is not None: if mobj is not None:
return self.url_result(smuggle_url( return self.url_result(smuggle_url(
@@ -2105,7 +2139,7 @@ class GenericIE(InfoExtractor):
raise UnsupportedError(url) raise UnsupportedError(url)
entries = [] entries = []
for video_url in found: for video_url in orderedSet(found):
video_url = unescapeHTML(video_url) video_url = unescapeHTML(video_url)
video_url = video_url.replace('\\/', '/') video_url = video_url.replace('\\/', '/')
video_url = compat_urlparse.urljoin(url, video_url) video_url = compat_urlparse.urljoin(url, video_url)

View File

@@ -14,6 +14,7 @@ class GrouponIE(InfoExtractor):
'description': 'Studio kept at 105 degrees and 40% humidity with anti-microbial and anti-slip Flotex flooring; certified instructors', 'description': 'Studio kept at 105 degrees and 40% humidity with anti-microbial and anti-slip Flotex flooring; certified instructors',
}, },
'playlist': [{ 'playlist': [{
'md5': '42428ce8a00585f9bc36e49226eae7a1',
'info_dict': { 'info_dict': {
'id': 'fk6OhWpXgIQ', 'id': 'fk6OhWpXgIQ',
'ext': 'mp4', 'ext': 'mp4',
@@ -24,10 +25,11 @@ class GrouponIE(InfoExtractor):
'uploader_id': 'groupon', 'uploader_id': 'groupon',
'uploader': 'Groupon', 'uploader': 'Groupon',
}, },
'add_ie': ['Youtube'],
}], }],
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} },
} }
_PROVIDERS = { _PROVIDERS = {

View File

@@ -8,7 +8,7 @@ class HowcastIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?howcast\.com/videos/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?howcast\.com/videos/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.howcast.com/videos/390161-How-to-Tie-a-Square-Knot-Properly', 'url': 'http://www.howcast.com/videos/390161-How-to-Tie-a-Square-Knot-Properly',
'md5': '8b743df908c42f60cf6496586c7f12c3', 'md5': '7d45932269a288149483144f01b99789',
'info_dict': { 'info_dict': {
'id': '390161', 'id': '390161',
'ext': 'mp4', 'ext': 'mp4',
@@ -19,9 +19,9 @@ class HowcastIE(InfoExtractor):
'duration': 56.823, 'duration': 56.823,
}, },
'params': { 'params': {
# m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['Ooyala'],
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -0,0 +1,143 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
parse_filesize,
)
class LibraryOfCongressIE(InfoExtractor):
IE_NAME = 'loc'
IE_DESC = 'Library of Congress'
_VALID_URL = r'https?://(?:www\.)?loc\.gov/(?:item/|today/cyberlc/feature_wdesc\.php\?.*\brec=)(?P<id>[0-9]+)'
_TESTS = [{
# embedded via <div class="media-player"
'url': 'http://loc.gov/item/90716351/',
'md5': '353917ff7f0255aa6d4b80a034833de8',
'info_dict': {
'id': '90716351',
'ext': 'mp4',
'title': "Pa's trip to Mars",
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 0,
'view_count': int,
},
}, {
# webcast embedded via mediaObjectId
'url': 'https://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=5578',
'info_dict': {
'id': '5578',
'ext': 'mp4',
'title': 'Help! Preservation Training Needs Here, There & Everywhere',
'duration': 3765,
'view_count': int,
'subtitles': 'mincount:1',
},
'params': {
'skip_download': True,
},
}, {
# with direct download links
'url': 'https://www.loc.gov/item/78710669/',
'info_dict': {
'id': '78710669',
'ext': 'mp4',
'title': 'La vie et la passion de Jesus-Christ',
'duration': 0,
'view_count': int,
'formats': 'mincount:4',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
media_id = self._search_regex(
(r'id=(["\'])media-player-(?P<id>.+?)\1',
r'<video[^>]+id=(["\'])uuid-(?P<id>.+?)\1',
r'<video[^>]+data-uuid=(["\'])(?P<id>.+?)\1',
r'mediaObjectId\s*:\s*(["\'])(?P<id>.+?)\1'),
webpage, 'media id', group='id')
data = self._download_json(
'https://media.loc.gov/services/v1/media?id=%s&context=json' % media_id,
video_id)['mediaObject']
derivative = data['derivatives'][0]
media_url = derivative['derivativeUrl']
title = derivative.get('shortName') or data.get('shortName') or self._og_search_title(
webpage)
# Following algorithm was extracted from setAVSource js function
# found in webpage
media_url = media_url.replace('rtmp', 'https')
is_video = data.get('mediaType', 'v').lower() == 'v'
ext = determine_ext(media_url)
if ext not in ('mp4', 'mp3'):
media_url += '.mp4' if is_video else '.mp3'
if 'vod/mp4:' in media_url:
formats = [{
'url': media_url.replace('vod/mp4:', 'hls-vod/media/') + '.m3u8',
'format_id': 'hls',
'ext': 'mp4',
'protocol': 'm3u8_native',
'quality': 1,
}]
elif 'vod/mp3:' in media_url:
formats = [{
'url': media_url.replace('vod/mp3:', ''),
'vcodec': 'none',
}]
download_urls = set()
for m in re.finditer(
r'<option[^>]+value=(["\'])(?P<url>.+?)\1[^>]+data-file-download=[^>]+>\s*(?P<id>.+?)(?:(?:&nbsp;|\s+)\((?P<size>.+?)\))?\s*<', webpage):
format_id = m.group('id').lower()
if format_id == 'gif':
continue
download_url = m.group('url')
if download_url in download_urls:
continue
download_urls.add(download_url)
formats.append({
'url': download_url,
'format_id': format_id,
'filesize_approx': parse_filesize(m.group('size')),
})
self._sort_formats(formats)
duration = float_or_none(data.get('duration'))
view_count = int_or_none(data.get('viewCount'))
subtitles = {}
cc_url = data.get('ccUrl')
if cc_url:
subtitles.setdefault('en', []).append({
'url': cc_url,
'ext': 'ttml',
})
return {
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'view_count': view_count,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -7,48 +7,53 @@ from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
int_or_none,
remove_end,
unified_strdate,
ExtractorError, ExtractorError,
int_or_none,
parse_iso8601,
remove_end,
) )
class LifeNewsIE(InfoExtractor): class LifeNewsIE(InfoExtractor):
IE_NAME = 'lifenews' IE_NAME = 'life'
IE_DESC = 'LIFE | NEWS' IE_DESC = 'Life.ru'
_VALID_URL = r'https?://lifenews\.ru/(?:mobile/)?(?P<section>news|video)/(?P<id>\d+)' _VALID_URL = r'https?://life\.ru/t/[^/]+/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
# single video embedded via video/source # single video embedded via video/source
'url': 'http://lifenews.ru/news/98736', 'url': 'https://life.ru/t/новости/98736',
'md5': '77c95eaefaca216e32a76a343ad89d23', 'md5': '77c95eaefaca216e32a76a343ad89d23',
'info_dict': { 'info_dict': {
'id': '98736', 'id': '98736',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Мужчина нашел дома архив оборонного завода', 'title': 'Мужчина нашел дома архив оборонного завода',
'description': 'md5:3b06b1b39b5e2bea548e403d99b8bf26', 'description': 'md5:3b06b1b39b5e2bea548e403d99b8bf26',
'timestamp': 1344154740,
'upload_date': '20120805', 'upload_date': '20120805',
'view_count': int,
} }
}, { }, {
# single video embedded via iframe # single video embedded via iframe
'url': 'http://lifenews.ru/news/152125', 'url': 'https://life.ru/t/новости/152125',
'md5': '77d19a6f0886cd76bdbf44b4d971a273', 'md5': '77d19a6f0886cd76bdbf44b4d971a273',
'info_dict': { 'info_dict': {
'id': '152125', 'id': '152125',
'ext': 'mp4', 'ext': 'mp4',
'title': 'В Сети появилось видео захвата «Правым сектором» колхозных полей ', 'title': 'В Сети появилось видео захвата «Правым сектором» колхозных полей ',
'description': 'Жители двух поселков Днепропетровской области не простили радикалам угрозу лишения плодородных земель и пошли в лобовую. ', 'description': 'Жители двух поселков Днепропетровской области не простили радикалам угрозу лишения плодородных земель и пошли в лобовую. ',
'timestamp': 1427961840,
'upload_date': '20150402', 'upload_date': '20150402',
'view_count': int,
} }
}, { }, {
# two videos embedded via iframe # two videos embedded via iframe
'url': 'http://lifenews.ru/news/153461', 'url': 'https://life.ru/t/новости/153461',
'info_dict': { 'info_dict': {
'id': '153461', 'id': '153461',
'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве', 'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве',
'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.', 'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
'upload_date': '20150505', 'timestamp': 1430825520,
'view_count': int,
}, },
'playlist': [{ 'playlist': [{
'md5': '9b6ef8bc0ffa25aebc8bdb40d89ab795', 'md5': '9b6ef8bc0ffa25aebc8bdb40d89ab795',
@@ -57,6 +62,7 @@ class LifeNewsIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве (Видео 1)', 'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве (Видео 1)',
'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.', 'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
'timestamp': 1430825520,
'upload_date': '20150505', 'upload_date': '20150505',
}, },
}, { }, {
@@ -66,22 +72,25 @@ class LifeNewsIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве (Видео 2)', 'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве (Видео 2)',
'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.', 'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
'timestamp': 1430825520,
'upload_date': '20150505', 'upload_date': '20150505',
}, },
}], }],
}, { }, {
'url': 'http://lifenews.ru/video/13035', 'url': 'https://life.ru/t/новости/213035',
'only_matching': True,
}, {
'url': 'https://life.ru/t/%D0%BD%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8/153461',
'only_matching': True,
}, {
'url': 'https://life.ru/t/новости/411489/manuel_vals_nazval_frantsiiu_tsieliu_nomier_odin_dlia_ighil',
'only_matching': True, 'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
section = mobj.group('section')
webpage = self._download_webpage( webpage = self._download_webpage(url, video_id)
'http://lifenews.ru/%s/%s' % (section, video_id),
video_id, 'Downloading page')
video_urls = re.findall( video_urls = re.findall(
r'<video[^>]+><source[^>]+src=["\'](.+?)["\']', webpage) r'<video[^>]+><source[^>]+src=["\'](.+?)["\']', webpage)
@@ -95,26 +104,22 @@ class LifeNewsIE(InfoExtractor):
title = remove_end( title = remove_end(
self._og_search_title(webpage), self._og_search_title(webpage),
' - Первый по срочным новостям — LIFE | NEWS') ' - Life.ru')
description = self._og_search_description(webpage) description = self._og_search_description(webpage)
view_count = self._html_search_regex( view_count = self._html_search_regex(
r'<div class=\'views\'>\s*(\d+)\s*</div>', webpage, 'view count', fatal=False) r'<div[^>]+class=(["\']).*?\bhits-count\b.*?\1[^>]*>\s*(?P<value>\d+)\s*</div>',
comment_count = self._html_search_regex( webpage, 'view count', fatal=False, group='value')
r'=\'commentCount\'[^>]*>\s*(\d+)\s*<',
webpage, 'comment count', fatal=False)
upload_date = self._html_search_regex( timestamp = parse_iso8601(self._search_regex(
r'<time[^>]*datetime=\'([^\']+)\'', webpage, 'upload date', fatal=False) r'<time[^>]+datetime=(["\'])(?P<value>.+?)\1',
if upload_date is not None: webpage, 'upload date', fatal=False, group='value'))
upload_date = unified_strdate(upload_date)
common_info = { common_info = {
'description': description, 'description': description,
'view_count': int_or_none(view_count), 'view_count': int_or_none(view_count),
'comment_count': int_or_none(comment_count), 'timestamp': timestamp,
'upload_date': upload_date,
} }
def make_entry(video_id, video_url, index=None): def make_entry(video_id, video_url, index=None):
@@ -183,7 +188,8 @@ class LifeEmbedIE(InfoExtractor):
ext = determine_ext(video_url) ext = determine_ext(video_url)
if ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id='m3u8')) video_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='m3u8'))
else: else:
formats.append({ formats.append({
'url': video_url, 'url': video_url,

View File

@@ -150,7 +150,7 @@ class LivestreamIE(InfoExtractor):
} }
def _extract_stream_info(self, stream_info): def _extract_stream_info(self, stream_info):
broadcast_id = stream_info['broadcast_id'] broadcast_id = compat_str(stream_info['broadcast_id'])
is_live = stream_info.get('is_live') is_live = stream_info.get('is_live')
formats = [] formats = []

View File

@@ -8,6 +8,7 @@ from ..utils import (
float_or_none, float_or_none,
ExtractorError, ExtractorError,
unsmuggle_url, unsmuggle_url,
determine_ext,
) )
from ..compat import compat_urllib_parse_urlencode from ..compat import compat_urllib_parse_urlencode
@@ -15,71 +16,80 @@ from ..compat import compat_urllib_parse_urlencode
class OoyalaBaseIE(InfoExtractor): class OoyalaBaseIE(InfoExtractor):
_PLAYER_BASE = 'http://player.ooyala.com/' _PLAYER_BASE = 'http://player.ooyala.com/'
_CONTENT_TREE_BASE = _PLAYER_BASE + 'player_api/v1/content_tree/' _CONTENT_TREE_BASE = _PLAYER_BASE + 'player_api/v1/content_tree/'
_AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v1/authorization/embed_code/%s/%s?' _AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v2/authorization/embed_code/%s/%s?'
def _extract(self, content_tree_url, video_id, domain='example.org'): def _extract(self, content_tree_url, video_id, domain='example.org'):
content_tree = self._download_json(content_tree_url, video_id)['content_tree'] content_tree = self._download_json(content_tree_url, video_id)['content_tree']
metadata = content_tree[list(content_tree)[0]] metadata = content_tree[list(content_tree)[0]]
embed_code = metadata['embed_code'] embed_code = metadata['embed_code']
pcode = metadata.get('asset_pcode') or embed_code pcode = metadata.get('asset_pcode') or embed_code
video_info = { title = metadata['title']
'id': embed_code,
'title': metadata['title'], auth_data = self._download_json(
'description': metadata.get('description'), self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code) +
'thumbnail': metadata.get('thumbnail_image') or metadata.get('promo_image'), compat_urllib_parse_urlencode({
'duration': float_or_none(metadata.get('duration'), 1000), 'domain': domain,
} 'supportedFormats': 'mp4,rtmp,m3u8,hds',
}), video_id)
cur_auth_data = auth_data['authorization_data'][embed_code]
urls = [] urls = []
formats = [] formats = []
for supported_format in ('mp4', 'm3u8', 'hds', 'rtmp'): if cur_auth_data['authorized']:
auth_data = self._download_json( for stream in cur_auth_data['streams']:
self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code) + s_url = base64.b64decode(
compat_urllib_parse_urlencode({ stream['url']['data'].encode('ascii')).decode('utf-8')
'domain': domain, if s_url in urls:
'supportedFormats': supported_format continue
}), urls.append(s_url)
video_id, 'Downloading %s JSON' % supported_format) ext = determine_ext(s_url, None)
delivery_type = stream['delivery_type']
cur_auth_data = auth_data['authorization_data'][embed_code] if delivery_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
if cur_auth_data['authorized']: s_url, embed_code, 'mp4', 'm3u8_native',
for stream in cur_auth_data['streams']: m3u8_id='hls', fatal=False))
url = base64.b64decode( elif delivery_type == 'hds' or ext == 'f4m':
stream['url']['data'].encode('ascii')).decode('utf-8') formats.extend(self._extract_f4m_formats(
if url in urls: s_url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False))
continue elif ext == 'smil':
urls.append(url) formats.extend(self._extract_smil_formats(
delivery_type = stream['delivery_type'] s_url, embed_code, fatal=False))
if delivery_type == 'hls' or '.m3u8' in url: else:
formats.extend(self._extract_m3u8_formats( formats.append({
url, embed_code, 'mp4', 'm3u8_native', 'url': s_url,
m3u8_id='hls', fatal=False)) 'ext': ext or stream.get('delivery_type'),
elif delivery_type == 'hds' or '.f4m' in url: 'vcodec': stream.get('video_codec'),
formats.extend(self._extract_f4m_formats( 'format_id': delivery_type,
url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False)) 'width': int_or_none(stream.get('width')),
elif '.smil' in url: 'height': int_or_none(stream.get('height')),
formats.extend(self._extract_smil_formats( 'abr': int_or_none(stream.get('audio_bitrate')),
url, embed_code, fatal=False)) 'vbr': int_or_none(stream.get('video_bitrate')),
else: 'fps': float_or_none(stream.get('framerate')),
formats.append({ })
'url': url, else:
'ext': stream.get('delivery_type'), raise ExtractorError('%s said: %s' % (
'vcodec': stream.get('video_codec'), self.IE_NAME, cur_auth_data['message']), expected=True)
'format_id': delivery_type,
'width': int_or_none(stream.get('width')),
'height': int_or_none(stream.get('height')),
'abr': int_or_none(stream.get('audio_bitrate')),
'vbr': int_or_none(stream.get('video_bitrate')),
'fps': float_or_none(stream.get('framerate')),
})
else:
raise ExtractorError('%s said: %s' % (
self.IE_NAME, cur_auth_data['message']), expected=True)
self._sort_formats(formats) self._sort_formats(formats)
video_info['formats'] = formats subtitles = {}
return video_info for lang, sub in metadata.get('closed_captions_vtt', {}).get('captions', {}).items():
sub_url = sub.get('url')
if not sub_url:
continue
subtitles[lang] = [{
'url': sub_url,
}]
return {
'id': embed_code,
'title': title,
'description': metadata.get('description'),
'thumbnail': metadata.get('thumbnail_image') or metadata.get('promo_image'),
'duration': float_or_none(metadata.get('duration'), 1000),
'subtitles': subtitles,
'formats': formats,
}
class OoyalaIE(OoyalaBaseIE): class OoyalaIE(OoyalaBaseIE):

View File

@@ -2,7 +2,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import parse_iso8601 from ..utils import (
parse_iso8601,
unescapeHTML,
)
class PeriscopeIE(InfoExtractor): class PeriscopeIE(InfoExtractor):
@@ -42,8 +45,11 @@ class PeriscopeIE(InfoExtractor):
broadcast = broadcast_data['broadcast'] broadcast = broadcast_data['broadcast']
status = broadcast['status'] status = broadcast['status']
uploader = broadcast.get('user_display_name') or broadcast_data.get('user', {}).get('display_name') user = broadcast_data.get('user', {})
uploader_id = broadcast.get('user_id') or broadcast_data.get('user', {}).get('id')
uploader = broadcast.get('user_display_name') or user.get('display_name')
uploader_id = (broadcast.get('username') or user.get('username') or
broadcast.get('user_id') or user.get('id'))
title = '%s - %s' % (uploader, status) if uploader else status title = '%s - %s' % (uploader, status) if uploader else status
state = broadcast.get('state').lower() state = broadcast.get('state').lower()
@@ -92,6 +98,7 @@ class PeriscopeUserIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'LularoeHusbandMike', 'id': 'LularoeHusbandMike',
'title': 'LULAROE HUSBAND MIKE', 'title': 'LULAROE HUSBAND MIKE',
'description': 'md5:6cf4ec8047768098da58e446e82c82f0',
}, },
# Periscope only shows videos in the last 24 hours, so it's possible to # Periscope only shows videos in the last 24 hours, so it's possible to
# get 0 videos # get 0 videos
@@ -103,16 +110,19 @@ class PeriscopeUserIE(InfoExtractor):
webpage = self._download_webpage(url, user_id) webpage = self._download_webpage(url, user_id)
broadcast_data = self._parse_json(self._html_search_meta( data_store = self._parse_json(
'broadcast-data', webpage, default='{}'), user_id) unescapeHTML(self._search_regex(
username = broadcast_data.get('user', {}).get('display_name') r'data-store=(["\'])(?P<data>.+?)\1',
user_broadcasts = self._parse_json( webpage, 'data store', default='{}', group='data')),
self._html_search_meta('user-broadcasts', webpage, default='{}'),
user_id) user_id)
user = data_store.get('User', {}).get('user', {})
title = user.get('display_name') or user.get('username')
description = user.get('description')
entries = [ entries = [
self.url_result( self.url_result(
'https://www.periscope.tv/%s/%s' % (user_id, broadcast['id'])) 'https://www.periscope.tv/%s/%s' % (user_id, broadcast['id']))
for broadcast in user_broadcasts.get('broadcasts', [])] for broadcast in data_store.get('UserBroadcastHistory', {}).get('broadcasts', [])]
return self.playlist_result(entries, user_id, username) return self.playlist_result(entries, user_id, title, description)

View File

@@ -4,9 +4,8 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
xpath_text, dict_get,
float_or_none, float_or_none,
int_or_none,
) )
@@ -23,6 +22,19 @@ class PlaywireIE(InfoExtractor):
'duration': 145.94, 'duration': 145.94,
}, },
}, { }, {
# m3u8 in f4m
'url': 'http://config.playwire.com/21772/videos/v2/4840492/zeus.json',
'info_dict': {
'id': '4840492',
'ext': 'mp4',
'title': 'ITV EL SHOW FULL',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
# Multiple resolutions while bitrates missing
'url': 'http://cdn.playwire.com/11625/embed/85228.html', 'url': 'http://cdn.playwire.com/11625/embed/85228.html',
'only_matching': True, 'only_matching': True,
}, { }, {
@@ -48,25 +60,10 @@ class PlaywireIE(InfoExtractor):
thumbnail = content.get('poster') thumbnail = content.get('poster')
src = content['media']['f4m'] src = content['media']['f4m']
f4m = self._download_xml(src, video_id) formats = self._extract_f4m_formats(src, video_id, m3u8_id='hls')
base_url = xpath_text(f4m, './{http://ns.adobe.com/f4m/1.0}baseURL', 'base url', fatal=True) for a_format in formats:
formats = [] if not dict_get(a_format, ['tbr', 'width', 'height']):
for media in f4m.findall('./{http://ns.adobe.com/f4m/1.0}media'): a_format['quality'] = 1 if '-hd.' in a_format['url'] else 0
media_url = media.get('url')
if not media_url:
continue
tbr = int_or_none(media.get('bitrate'))
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
f = {
'url': '%s/%s' % (base_url, media.attrib['url']),
'tbr': tbr,
'width': width,
'height': height,
}
if not (tbr or width or height):
f['quality'] = 1 if '-hd.' in media_url else 0
formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@@ -0,0 +1,130 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
xpath_text,
find_xpath_attr,
determine_ext,
int_or_none,
unified_strdate,
xpath_element,
ExtractorError,
)
class RadioCanadaIE(InfoExtractor):
IE_NAME = 'radiocanada'
_VALID_URL = r'(?:radiocanada:|https?://ici\.radio-canada\.ca/widgets/mediaconsole/)(?P<app_code>[^:/]+)[:/](?P<id>[0-9]+)'
_TEST = {
'url': 'http://ici.radio-canada.ca/widgets/mediaconsole/medianet/7184272',
'info_dict': {
'id': '7184272',
'ext': 'flv',
'title': 'Le parcours du tireur capté sur vidéo',
'description': 'Images des caméras de surveillance fournies par la GRC montrant le parcours du tireur d\'Ottawa',
'upload_date': '20141023',
},
'params': {
# rtmp download
'skip_download': True,
},
}
def _real_extract(self, url):
app_code, video_id = re.match(self._VALID_URL, url).groups()
formats = []
# TODO: extract m3u8 and f4m formats
# m3u8 formats can be extracted using ipad device_type return 403 error code when ffmpeg try to download segements
# f4m formats can be extracted using flashhd device_type but they produce unplayable file
for device_type in ('flash',):
v_data = self._download_xml(
'http://api.radio-canada.ca/validationMedia/v1/Validation.ashx',
video_id, note='Downloading %s XML' % device_type, query={
'appCode': app_code,
'idMedia': video_id,
'connectionType': 'broadband',
'multibitrate': 'true',
'deviceType': device_type,
# paysJ391wsHjbOJwvCs26toz and bypasslock are used to bypass geo-restriction
'paysJ391wsHjbOJwvCs26toz': 'CA',
'bypasslock': 'NZt5K62gRqfc',
})
v_url = xpath_text(v_data, 'url')
if not v_url:
continue
if v_url == 'null':
raise ExtractorError('%s said: %s' % (
self.IE_NAME, xpath_text(v_data, 'message')), expected=True)
ext = determine_ext(v_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
v_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(v_url, video_id, f4m_id='hds', fatal=False))
else:
ext = determine_ext(v_url)
bitrates = xpath_element(v_data, 'bitrates')
for url_e in bitrates.findall('url'):
tbr = int_or_none(url_e.get('bitrate'))
if not tbr:
continue
formats.append({
'format_id': 'rtmp-%d' % tbr,
'url': re.sub(r'\d+\.%s' % ext, '%d.%s' % (tbr, ext), v_url),
'ext': 'flv',
'protocol': 'rtmp',
'width': int_or_none(url_e.get('width')),
'height': int_or_none(url_e.get('height')),
'tbr': tbr,
})
self._sort_formats(formats)
metadata = self._download_xml(
'http://api.radio-canada.ca/metaMedia/v1/index.ashx',
video_id, note='Downloading metadata XML', query={
'appCode': app_code,
'idMedia': video_id,
})
def get_meta(name):
el = find_xpath_attr(metadata, './/Meta', 'name', name)
return el.text if el is not None else None
return {
'id': video_id,
'title': get_meta('Title'),
'description': get_meta('Description') or get_meta('ShortDescription'),
'thumbnail': get_meta('imageHR') or get_meta('imageMR') or get_meta('imageBR'),
'duration': int_or_none(get_meta('length')),
'series': get_meta('Emission'),
'season_number': int_or_none('SrcSaison'),
'episode_number': int_or_none('SrcEpisode'),
'upload_date': unified_strdate(get_meta('Date')),
'formats': formats,
}
class RadioCanadaAudioVideoIE(InfoExtractor):
'radiocanada:audiovideo'
_VALID_URL = r'https?://ici\.radio-canada\.ca/audio-video/media-(?P<id>[0-9]+)'
_TEST = {
'url': 'http://ici.radio-canada.ca/audio-video/media-7527184/barack-obama-au-vietnam',
'info_dict': {
'id': '7527184',
'ext': 'flv',
'title': 'Barack Obama au Vietnam',
'description': 'Les États-Unis lèvent l\'embargo sur la vente d\'armes qui datait de la guerre du Vietnam',
'upload_date': '20160523',
},
'params': {
# rtmp download
'skip_download': True,
},
}
def _real_extract(self, url):
return self.url_result('radiocanada:medianet:%s' % self._match_id(url))

View File

@@ -0,0 +1,69 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
js_to_json,
int_or_none,
unescapeHTML,
)
class ReutersIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?reuters\.com/.*?\?.*?videoId=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.reuters.com/video/2016/05/20/san-francisco-police-chief-resigns?videoId=368575562',
'md5': '8015113643a0b12838f160b0b81cc2ee',
'info_dict': {
'id': '368575562',
'ext': 'mp4',
'title': 'San Francisco police chief resigns',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://www.reuters.com/assets/iframe/yovideo?videoId=%s' % video_id, video_id)
video_data = js_to_json(self._search_regex(
r'(?s)Reuters\.yovideo\.drawPlayer\(({.*?})\);',
webpage, 'video data'))
def get_json_value(key, fatal=False):
return self._search_regex('"%s"\s*:\s*"([^"]+)"' % key, video_data, key, fatal=fatal)
title = unescapeHTML(get_json_value('title', fatal=True))
mmid, fid = re.search(r',/(\d+)\?f=(\d+)', get_json_value('flv', fatal=True)).groups()
mas_data = self._download_json(
'http://mas-e.cds1.yospace.com/mas/%s/%s?trans=json' % (mmid, fid),
video_id, transform_source=js_to_json)
formats = []
for f in mas_data:
f_url = f.get('url')
if not f_url:
continue
method = f.get('method')
if method == 'hls':
formats.extend(self._extract_m3u8_formats(
f_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
else:
container = f.get('container')
ext = '3gp' if method == 'mobile' else container
formats.append({
'format_id': ext,
'url': f_url,
'ext': ext,
'container': container if method != 'mobile' else None,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': get_json_value('thumb'),
'duration': int_or_none(get_json_value('seconds')),
'formats': formats,
}

View File

@@ -13,8 +13,64 @@ from ..utils import (
) )
class Revision3EmbedIE(InfoExtractor):
IE_NAME = 'revision3:embed'
_VALID_URL = r'(?:revision3:(?:(?P<playlist_type>[^:]+):)?|https?://(?:(?:(?:www|embed)\.)?(?:revision3|animalist)|(?:(?:api|embed)\.)?seekernetwork)\.com/player/embed\?videoId=)(?P<playlist_id>\d+)'
_TEST = {
'url': 'http://api.seekernetwork.com/player/embed?videoId=67558',
'md5': '83bcd157cab89ad7318dd7b8c9cf1306',
'info_dict': {
'id': '67558',
'ext': 'mp4',
'title': 'The Pros & Cons Of Zoos',
'description': 'Zoos are often depicted as a terrible place for animals to live, but is there any truth to this?',
'uploader_id': 'dnews',
'uploader': 'DNews',
}
}
_API_KEY = 'ba9c741bce1b9d8e3defcc22193f3651b8867e62'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('playlist_id')
playlist_type = mobj.group('playlist_type') or 'video_id'
video_data = self._download_json(
'http://revision3.com/api/getPlaylist.json', playlist_id, query={
'api_key': self._API_KEY,
'codecs': 'h264,vp8,theora',
playlist_type: playlist_id,
})['items'][0]
formats = []
for vcodec, media in video_data['media'].items():
for quality_id, quality in media.items():
if quality_id == 'hls':
formats.extend(self._extract_m3u8_formats(
quality['url'], playlist_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
else:
formats.append({
'url': quality['url'],
'format_id': '%s-%s' % (vcodec, quality_id),
'tbr': int_or_none(quality.get('bitrate')),
'vcodec': vcodec,
})
self._sort_formats(formats)
return {
'id': playlist_id,
'title': unescapeHTML(video_data['title']),
'description': unescapeHTML(video_data.get('summary')),
'uploader': video_data.get('show', {}).get('name'),
'uploader_id': video_data.get('show', {}).get('slug'),
'duration': int_or_none(video_data.get('duration')),
'formats': formats,
}
class Revision3IE(InfoExtractor): class Revision3IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:revision3|testtube|animalist)\.com)/(?P<id>[^/]+(?:/[^/?#]+)?)' IE_NAME = 'revision'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:revision3|animalist)\.com)/(?P<id>[^/]+(?:/[^/?#]+)?)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.revision3.com/technobuffalo/5-google-predictions-for-2016', 'url': 'http://www.revision3.com/technobuffalo/5-google-predictions-for-2016',
'md5': 'd94a72d85d0a829766de4deb8daaf7df', 'md5': 'd94a72d85d0a829766de4deb8daaf7df',
@@ -32,52 +88,14 @@ class Revision3IE(InfoExtractor):
} }
}, { }, {
# Show # Show
'url': 'http://testtube.com/brainstuff', 'url': 'http://revision3.com/variant',
'info_dict': { 'only_matching': True,
'id': '251',
'title': 'BrainStuff',
'description': 'Whether the topic is popcorn or particle physics, you can count on the HowStuffWorks team to explore-and explain-the everyday science in the world around us on BrainStuff.',
},
'playlist_mincount': 93,
}, {
'url': 'https://testtube.com/dnews/5-weird-ways-plants-can-eat-animals?utm_source=FB&utm_medium=DNews&utm_campaign=DNewsSocial',
'info_dict': {
'id': '58227',
'display_id': 'dnews/5-weird-ways-plants-can-eat-animals',
'duration': 275,
'ext': 'webm',
'title': '5 Weird Ways Plants Can Eat Animals',
'description': 'Why have some plants evolved to eat meat?',
'upload_date': '20150120',
'timestamp': 1421763300,
'uploader': 'DNews',
'uploader_id': 'dnews',
},
}, {
'url': 'http://testtube.com/tt-editors-picks/the-israel-palestine-conflict-explained-in-ten-min',
'info_dict': {
'id': '71618',
'ext': 'mp4',
'display_id': 'tt-editors-picks/the-israel-palestine-conflict-explained-in-ten-min',
'title': 'The Israel-Palestine Conflict Explained in Ten Minutes',
'description': 'If you\'d like to learn about the struggle between Israelis and Palestinians, this video is a great place to start',
'uploader': 'Editors\' Picks',
'uploader_id': 'tt-editors-picks',
'timestamp': 1453309200,
'upload_date': '20160120',
},
'add_ie': ['Youtube'],
}, { }, {
# Tag # Tag
'url': 'http://testtube.com/tech-news', 'url': 'http://revision3.com/vr',
'info_dict': { 'only_matching': True,
'id': '21018',
'title': 'tech news',
},
'playlist_mincount': 9,
}] }]
_PAGE_DATA_TEMPLATE = 'http://www.%s/apiProxy/ddn/%s?domain=%s' _PAGE_DATA_TEMPLATE = 'http://www.%s/apiProxy/ddn/%s?domain=%s'
_API_KEY = 'ba9c741bce1b9d8e3defcc22193f3651b8867e62'
def _real_extract(self, url): def _real_extract(self, url):
domain, display_id = re.match(self._VALID_URL, url).groups() domain, display_id = re.match(self._VALID_URL, url).groups()
@@ -119,33 +137,9 @@ class Revision3IE(InfoExtractor):
}) })
return info return info
video_data = self._download_json(
'http://revision3.com/api/getPlaylist.json?api_key=%s&codecs=h264,vp8,theora&video_id=%s' % (self._API_KEY, video_id),
video_id)['items'][0]
formats = []
for vcodec, media in video_data['media'].items():
for quality_id, quality in media.items():
if quality_id == 'hls':
formats.extend(self._extract_m3u8_formats(
quality['url'], video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
else:
formats.append({
'url': quality['url'],
'format_id': '%s-%s' % (vcodec, quality_id),
'tbr': int_or_none(quality.get('bitrate')),
'vcodec': vcodec,
})
self._sort_formats(formats)
info.update({ info.update({
'title': unescapeHTML(video_data['title']), '_type': 'url_transparent',
'description': unescapeHTML(video_data.get('summary')), 'url': 'revision3:%s' % video_id,
'uploader': video_data.get('show', {}).get('name'),
'uploader_id': video_data.get('show', {}).get('slug'),
'duration': int_or_none(video_data.get('duration')),
'formats': formats,
}) })
return info return info
else: else:

View File

@@ -0,0 +1,57 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class SeekerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?seeker\.com/(?P<display_id>.*)-(?P<article_id>\d+)\.html'
_TESTS = [{
# player.loadRevision3Item
'url': 'http://www.seeker.com/should-trump-be-required-to-release-his-tax-returns-1833805621.html',
'md5': '30c1dc4030cc715cf05b423d0947ac18',
'info_dict': {
'id': '76243',
'ext': 'webm',
'title': 'Should Trump Be Required To Release His Tax Returns?',
'description': 'Donald Trump has been secretive about his "big," "beautiful" tax returns. So what can we learn if he decides to release them?',
'uploader': 'Seeker Daily',
'uploader_id': 'seekerdaily',
}
}, {
'url': 'http://www.seeker.com/changes-expected-at-zoos-following-recent-gorilla-lion-shootings-1834116536.html',
'playlist': [
{
'md5': '83bcd157cab89ad7318dd7b8c9cf1306',
'info_dict': {
'id': '67558',
'ext': 'mp4',
'title': 'The Pros & Cons Of Zoos',
'description': 'Zoos are often depicted as a terrible place for animals to live, but is there any truth to this?',
'uploader': 'DNews',
'uploader_id': 'dnews',
},
}
],
'info_dict': {
'id': '1834116536',
'title': 'After Gorilla Killing, Changes Ahead for Zoos',
'description': 'The largest association of zoos and others are hoping to learn from recent incidents that led to the shooting deaths of a gorilla and two lions.',
},
}]
def _real_extract(self, url):
display_id, article_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
mobj = re.search(r"player\.loadRevision3Item\('([^']+)'\s*,\s*(\d+)\);", webpage)
if mobj:
playlist_type, playlist_id = mobj.groups()
return self.url_result(
'revision3:%s:%s' % (playlist_type, playlist_id), 'Revision3Embed', playlist_id)
else:
entries = [self.url_result('revision3:video_id:%s' % video_id, 'Revision3Embed', video_id) for video_id in re.findall(
r'<iframe[^>]+src=[\'"](?:https?:)?//api\.seekernetwork\.com/player/embed\?videoId=(\d+)', webpage)]
return self.playlist_result(
entries, article_id, self._og_search_title(webpage), self._og_search_description(webpage))

View File

@@ -96,20 +96,18 @@ class SpankwireIE(InfoExtractor):
formats = [] formats = []
for height, video_url in zip(heights, video_urls): for height, video_url in zip(heights, video_urls):
path = compat_urllib_parse_urlparse(video_url).path path = compat_urllib_parse_urlparse(video_url).path
_, quality = path.split('/')[4].split('_')[:2] m = re.search(r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', path)
f = { if m:
'url': video_url, tbr = int(m.group('tbr'))
'height': height, height = int(m.group('height'))
}
tbr = self._search_regex(r'^(\d+)[Kk]$', quality, 'tbr', default=None)
if tbr:
f.update({
'tbr': int(tbr),
'format_id': '%dp' % height,
})
else: else:
f['format_id'] = quality tbr = None
formats.append(f) formats.append({
'url': video_url,
'format_id': '%dp' % height,
'height': height,
'tbr': tbr,
})
self._sort_formats(formats) self._sort_formats(formats)
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)

View File

@@ -11,6 +11,7 @@ class TeachingChannelIE(InfoExtractor):
_TEST = { _TEST = {
'url': 'https://www.teachingchannel.org/videos/teacher-teaming-evolution', 'url': 'https://www.teachingchannel.org/videos/teacher-teaming-evolution',
'md5': '3d6361864d7cac20b57c8784da17166f',
'info_dict': { 'info_dict': {
'id': 'F3bnlzbToeI6pLEfRyrlfooIILUjz4nM', 'id': 'F3bnlzbToeI6pLEfRyrlfooIILUjz4nM',
'ext': 'mp4', 'ext': 'mp4',
@@ -19,9 +20,9 @@ class TeachingChannelIE(InfoExtractor):
'duration': 422.255, 'duration': 422.255,
}, },
'params': { 'params': {
# m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['Ooyala'],
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -6,7 +6,7 @@ from .common import InfoExtractor
class TF1IE(InfoExtractor): class TF1IE(InfoExtractor):
"""TF1 uses the wat.tv player.""" """TF1 uses the wat.tv player."""
_VALID_URL = r'https?://(?:(?:videos|www|lci)\.tf1|www\.tfou)\.fr/(?:[^/]+/)*(?P<id>.+?)\.html' _VALID_URL = r'https?://(?:(?:videos|www|lci)\.tf1|(?:www\.)?(?:tfou|ushuaiatv|histoire|tvbreizh))\.fr/(?:[^/]+/)*(?P<id>[^/?#.]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://videos.tf1.fr/auto-moto/citroen-grand-c4-picasso-2013-presentation-officielle-8062060.html', 'url': 'http://videos.tf1.fr/auto-moto/citroen-grand-c4-picasso-2013-presentation-officielle-8062060.html',
'info_dict': { 'info_dict': {
@@ -48,6 +48,6 @@ class TF1IE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
wat_id = self._html_search_regex( wat_id = self._html_search_regex(
r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})(?:#.*?)?\1', r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8}).*?\1',
webpage, 'wat id', group='id') webpage, 'wat id', group='id')
return self.url_result('wat:%s' % wat_id, 'Wat') return self.url_result('wat:%s' % wat_id, 'Wat')

View File

@@ -151,6 +151,22 @@ class ThePlatformIE(ThePlatformBaseIE):
'only_matching': True, 'only_matching': True,
}] }]
@classmethod
def _extract_urls(cls, webpage):
m = re.search(
r'''(?x)
<meta\s+
property=(["'])(?:og:video(?::(?:secure_)?url)?|twitter:player)\1\s+
content=(["'])(?P<url>https?://player\.theplatform\.com/p/.+?)\2
''', webpage)
if m:
return [m.group('url')]
matches = re.findall(
r'<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage)
if matches:
return list(zip(*matches))[1]
@staticmethod @staticmethod
def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False): def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False):
flags = '10' if include_qs else '00' flags = '10' if include_qs else '00'

View File

@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*- # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
@@ -6,20 +6,13 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
class TvpIE(InfoExtractor): class TVPIE(InfoExtractor):
IE_NAME = 'tvp.pl' IE_NAME = 'tvp'
_VALID_URL = r'https?://(?:vod|www)\.tvp\.pl/.*/(?P<id>\d+)$' IE_DESC = 'Telewizja Polska'
_VALID_URL = r'https?://[^/]+\.tvp\.(?:pl|info)/(?:(?!\d+/)[^/]+/)*(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://vod.tvp.pl/filmy-fabularne/filmy-za-darmo/ogniem-i-mieczem/wideo/odc-2/4278035', 'url': 'http://vod.tvp.pl/194536/i-seria-odc-13',
'md5': 'cdd98303338b8a7f7abab5cd14092bf2',
'info_dict': {
'id': '4278035',
'ext': 'wmv',
'title': 'Ogniem i mieczem, odc. 2',
},
}, {
'url': 'http://vod.tvp.pl/seriale/obyczajowe/czas-honoru/sezon-1-1-13/i-seria-odc-13/194536',
'md5': '8aa518c15e5cc32dfe8db400dc921fbb', 'md5': '8aa518c15e5cc32dfe8db400dc921fbb',
'info_dict': { 'info_dict': {
'id': '194536', 'id': '194536',
@@ -36,12 +29,22 @@ class TvpIE(InfoExtractor):
}, },
}, { }, {
'url': 'http://vod.tvp.pl/seriale/obyczajowe/na-sygnale/sezon-2-27-/odc-39/17834272', 'url': 'http://vod.tvp.pl/seriale/obyczajowe/na-sygnale/sezon-2-27-/odc-39/17834272',
'md5': 'c3b15ed1af288131115ff17a17c19dda', 'only_matching': True,
'info_dict': { }, {
'id': '17834272', 'url': 'http://wiadomosci.tvp.pl/25169746/24052016-1200',
'ext': 'mp4', 'only_matching': True,
'title': 'Na sygnale, odc. 39', }, {
}, 'url': 'http://krakow.tvp.pl/25511623/25lecie-mck-wyjatkowe-miejsce-na-mapie-krakowa',
'only_matching': True,
}, {
'url': 'http://teleexpress.tvp.pl/25522307/wierni-wzieli-udzial-w-procesjach',
'only_matching': True,
}, {
'url': 'http://sport.tvp.pl/25522165/krychowiak-uspokaja-w-sprawie-kontuzji-dwa-tygodnie-to-maksimum',
'only_matching': True,
}, {
'url': 'http://www.tvp.info/25511919/trwa-rewolucja-wladza-zdecydowala-sie-na-pogwalcenie-konstytucji',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@@ -92,8 +95,8 @@ class TvpIE(InfoExtractor):
} }
class TvpSeriesIE(InfoExtractor): class TVPSeriesIE(InfoExtractor):
IE_NAME = 'tvp.pl:Series' IE_NAME = 'tvp:series'
_VALID_URL = r'https?://vod\.tvp\.pl/(?:[^/]+/){2}(?P<id>[^/]+)/?$' _VALID_URL = r'https?://vod\.tvp\.pl/(?:[^/]+/){2}(?P<id>[^/]+)/?$'
_TESTS = [{ _TESTS = [{
@@ -127,7 +130,7 @@ class TvpSeriesIE(InfoExtractor):
videos_paths = re.findall( videos_paths = re.findall(
'(?s)class="shortTitle">.*?href="(/[^"]+)', playlist) '(?s)class="shortTitle">.*?href="(/[^"]+)', playlist)
entries = [ entries = [
self.url_result('http://vod.tvp.pl%s' % v_path, ie=TvpIE.ie_key()) self.url_result('http://vod.tvp.pl%s' % v_path, ie=TVPIE.ie_key())
for v_path in videos_paths] for v_path in videos_paths]
return { return {

View File

@@ -142,7 +142,9 @@ class UdemyIE(InfoExtractor):
self._LOGIN_URL, None, 'Downloading login popup') self._LOGIN_URL, None, 'Downloading login popup')
def is_logged(webpage): def is_logged(webpage):
return any(p in webpage for p in ['href="https://www.udemy.com/user/logout/', '>Logout<']) return any(re.search(p, webpage) for p in (
r'href=["\'](?:https://www\.udemy\.com)?/user/logout/',
r'>Logout<'))
# already logged in # already logged in
if is_logged(login_popup): if is_logged(login_popup):

View File

@@ -2,10 +2,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import json import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
int_or_none,
js_to_json, js_to_json,
ExtractorError,
) )
from ..compat import compat_urlparse from ..compat import compat_urlparse
@@ -16,13 +19,16 @@ class UDNEmbedIE(InfoExtractor):
_VALID_URL = r'https?:' + _PROTOCOL_RELATIVE_VALID_URL _VALID_URL = r'https?:' + _PROTOCOL_RELATIVE_VALID_URL
_TESTS = [{ _TESTS = [{
'url': 'http://video.udn.com/embed/news/300040', 'url': 'http://video.udn.com/embed/news/300040',
'md5': 'de06b4c90b042c128395a88f0384817e',
'info_dict': { 'info_dict': {
'id': '300040', 'id': '300040',
'ext': 'mp4', 'ext': 'mp4',
'title': '生物老師男變女 全校挺"做自己"', 'title': '生物老師男變女 全校挺"做自己"',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
} },
'params': {
# m3u8 download
'skip_download': True,
},
}, { }, {
'url': 'https://video.udn.com/embed/news/300040', 'url': 'https://video.udn.com/embed/news/300040',
'only_matching': True, 'only_matching': True,
@@ -38,39 +44,53 @@ class UDNEmbedIE(InfoExtractor):
page = self._download_webpage(url, video_id) page = self._download_webpage(url, video_id)
options = json.loads(js_to_json(self._html_search_regex( options = json.loads(js_to_json(self._html_search_regex(
r'var options\s*=\s*([^;]+);', page, 'video urls dictionary'))) r'var\s+options\s*=\s*([^;]+);', page, 'video urls dictionary')))
video_urls = options['video'] video_urls = options['video']
if video_urls.get('youtube'): if video_urls.get('youtube'):
return self.url_result(video_urls.get('youtube'), 'Youtube') return self.url_result(video_urls.get('youtube'), 'Youtube')
try: formats = []
del video_urls['youtube'] for video_type, api_url in video_urls.items():
except KeyError: if not api_url:
pass continue
formats = [{ video_url = self._download_webpage(
'url': self._download_webpage(
compat_urlparse.urljoin(url, api_url), video_id, compat_urlparse.urljoin(url, api_url), video_id,
'retrieve url for %s video' % video_type), note='retrieve url for %s video' % video_type)
'format_id': video_type,
'preference': 0 if video_type == 'mp4' else -1,
} for video_type, api_url in video_urls.items() if api_url]
if not formats: ext = determine_ext(video_url)
raise ExtractorError('No videos found', expected=True) if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, ext='mp4', m3u8_id='hls'))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id='hds'))
else:
mobj = re.search(r'_(?P<height>\d+)p_(?P<tbr>\d+).mp4', video_url)
a_format = {
'url': video_url,
# video_type may be 'mp4', which confuses YoutubeDL
'format_id': 'http-' + video_type,
}
if mobj:
a_format.update({
'height': int_or_none(mobj.group('height')),
'tbr': int_or_none(mobj.group('tbr')),
})
formats.append(a_format)
self._sort_formats(formats) self._sort_formats(formats)
thumbnail = None thumbnails = [{
'url': img_url,
if options.get('gallery') and len(options['gallery']): 'id': img_type,
thumbnail = options['gallery'][0].get('original') } for img_type, img_url in options.get('gallery', [{}])[0].items() if img_url]
return { return {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'title': options['title'], 'title': options['title'],
'thumbnail': thumbnail 'thumbnails': thumbnails,
} }

View File

@@ -37,6 +37,7 @@ class VeohIE(InfoExtractor):
'uploader': 'afp-news', 'uploader': 'afp-news',
'duration': 123, 'duration': 123,
}, },
'skip': 'This video has been deleted.',
}, },
{ {
'url': 'http://www.veoh.com/watch/v69525809F6Nc4frX', 'url': 'http://www.veoh.com/watch/v69525809F6Nc4frX',

View File

@@ -11,12 +11,14 @@ class ViceIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.vice.com/video/cowboy-capitalists-part-1', 'url': 'http://www.vice.com/video/cowboy-capitalists-part-1',
'md5': 'e9d77741f9e42ba583e683cd170660f7',
'info_dict': { 'info_dict': {
'id': '43cW1mYzpia9IlestBjVpd23Yu3afAfp', 'id': '43cW1mYzpia9IlestBjVpd23Yu3afAfp',
'ext': 'flv', 'ext': 'flv',
'title': 'VICE_COWBOYCAPITALISTS_PART01_v1_VICE_WM_1080p.mov', 'title': 'VICE_COWBOYCAPITALISTS_PART01_v1_VICE_WM_1080p.mov',
'duration': 725.983, 'duration': 725.983,
}, },
'add_ie': ['Ooyala'],
}, { }, {
'url': 'http://www.vice.com/video/how-to-hack-a-car', 'url': 'http://www.vice.com/video/how-to-hack-a-car',
'md5': '6fb2989a3fed069fb8eab3401fc2d3c9', 'md5': '6fb2989a3fed069fb8eab3401fc2d3c9',
@@ -29,6 +31,7 @@ class ViceIE(InfoExtractor):
'uploader': 'Motherboard', 'uploader': 'Motherboard',
'upload_date': '20140529', 'upload_date': '20140529',
}, },
'add_ie': ['Youtube'],
}, { }, {
'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab', 'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab',
'only_matching': True, 'only_matching': True,

View File

@@ -141,6 +141,10 @@ class ViewLiftIE(ViewLiftBaseIE):
}, { }, {
'url': 'http://www.kesari.tv/news/video/1461919076414', 'url': 'http://www.kesari.tv/news/video/1461919076414',
'only_matching': True, 'only_matching': True,
}, {
# Was once Kaltura embed
'url': 'https://www.monumentalsportsnetwork.com/videos/john-carlson-postgame-2-25-15',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -217,7 +217,6 @@ class VKIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid') video_id = mobj.group('videoid')
info_url = url
if video_id: if video_id:
info_url = 'https://vk.com/al_video.php?act=show&al=1&module=video&video=%s' % video_id info_url = 'https://vk.com/al_video.php?act=show&al=1&module=video&video=%s' % video_id
# Some videos (removed?) can only be downloaded with list id specified # Some videos (removed?) can only be downloaded with list id specified

View File

@@ -1,8 +1,7 @@
# coding: utf-8 # coding: utf-8
from __future__ import division, unicode_literals from __future__ import unicode_literals
import re import re
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
@@ -23,7 +22,7 @@ class VLiveIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '1326', 'id': '1326',
'ext': 'mp4', 'ext': 'mp4',
'title': "[V] Girl's Day's Broadcast", 'title': "[V LIVE] Girl's Day's Broadcast",
'creator': "Girl's Day", 'creator': "Girl's Day",
'view_count': int, 'view_count': int,
}, },
@@ -35,24 +34,11 @@ class VLiveIE(InfoExtractor):
webpage = self._download_webpage( webpage = self._download_webpage(
'http://www.vlive.tv/video/%s' % video_id, video_id) 'http://www.vlive.tv/video/%s' % video_id, video_id)
# UTC+x - UTC+9 (KST)
tz = time.altzone if time.localtime().tm_isdst == 1 else time.timezone
tz_offset = -tz // 60 - 9 * 60
self._set_cookie('vlive.tv', 'timezoneOffset', '%d' % tz_offset)
status_params = self._download_json(
'http://www.vlive.tv/video/status?videoSeq=%s' % video_id,
video_id, 'Downloading JSON status',
headers={'Referer': url.encode('utf-8')})
status = status_params.get('status')
air_start = status_params.get('onAirStartAt', '')
is_live = status_params.get('isLive')
video_params = self._search_regex( video_params = self._search_regex(
r'vlive\.tv\.video\.ajax\.request\.handler\.init\((.+)\)', r'\bvlive\.video\.init\(([^)]+)\)',
webpage, 'video params') webpage, 'video params')
live_params, long_video_id, key = re.split( status, _, _, live_params, long_video_id, key = re.split(
r'"\s*,\s*"', video_params)[1:4] r'"\s*,\s*"', video_params)[2:8]
if status == 'LIVE_ON_AIR' or status == 'BIG_EVENT_ON_AIR': if status == 'LIVE_ON_AIR' or status == 'BIG_EVENT_ON_AIR':
live_params = self._parse_json('"%s"' % live_params, video_id) live_params = self._parse_json('"%s"' % live_params, video_id)
@@ -61,8 +47,6 @@ class VLiveIE(InfoExtractor):
elif status == 'VOD_ON_AIR' or status == 'BIG_EVENT_INTRO': elif status == 'VOD_ON_AIR' or status == 'BIG_EVENT_INTRO':
if long_video_id and key: if long_video_id and key:
return self._replay(video_id, webpage, long_video_id, key) return self._replay(video_id, webpage, long_video_id, key)
elif is_live:
status = 'LIVE_END'
else: else:
status = 'COMING_SOON' status = 'COMING_SOON'
@@ -70,7 +54,7 @@ class VLiveIE(InfoExtractor):
raise ExtractorError('Uploading for replay. Please wait...', raise ExtractorError('Uploading for replay. Please wait...',
expected=True) expected=True)
elif status == 'COMING_SOON': elif status == 'COMING_SOON':
raise ExtractorError('Coming soon! %s' % air_start, expected=True) raise ExtractorError('Coming soon!', expected=True)
elif status == 'CANCELED': elif status == 'CANCELED':
raise ExtractorError('We are sorry, ' raise ExtractorError('We are sorry, '
'but the live broadcast has been canceled.', 'but the live broadcast has been canceled.',

View File

@@ -15,7 +15,8 @@ class VoxMediaIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Google\'s new material design direction', 'title': 'Google\'s new material design direction',
'description': 'md5:2f44f74c4d14a1f800ea73e1c6832ad2', 'description': 'md5:2f44f74c4d14a1f800ea73e1c6832ad2',
} },
'add_ie': ['Ooyala'],
}, { }, {
# data-ooyala-id # data-ooyala-id
'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet', 'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet',
@@ -25,7 +26,8 @@ class VoxMediaIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Nexus 6: hands-on with Google\'s phablet', 'title': 'The Nexus 6: hands-on with Google\'s phablet',
'description': 'md5:87a51fe95ff8cea8b5bdb9ac7ae6a6af', 'description': 'md5:87a51fe95ff8cea8b5bdb9ac7ae6a6af',
} },
'add_ie': ['Ooyala'],
}, { }, {
# volume embed # volume embed
'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill', 'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
@@ -35,7 +37,8 @@ class VoxMediaIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'The new frontier of LGBTQ civil rights, explained', 'title': 'The new frontier of LGBTQ civil rights, explained',
'description': 'md5:0dc58e94a465cbe91d02950f770eb93f', 'description': 'md5:0dc58e94a465cbe91d02950f770eb93f',
} },
'add_ie': ['Ooyala'],
}, { }, {
# youtube embed # youtube embed
'url': 'http://www.vox.com/2016/3/24/11291692/robot-dance', 'url': 'http://www.vox.com/2016/3/24/11291692/robot-dance',
@@ -48,7 +51,8 @@ class VoxMediaIE(InfoExtractor):
'upload_date': '20160324', 'upload_date': '20160324',
'uploader_id': 'voxdotcom', 'uploader_id': 'voxdotcom',
'uploader': 'Vox', 'uploader': 'Vox',
} },
'add_ie': ['Youtube'],
}, { }, {
# SBN.VideoLinkset.entryGroup multiple ooyala embeds # SBN.VideoLinkset.entryGroup multiple ooyala embeds
'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok', 'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
@@ -117,7 +121,7 @@ class VoxMediaIE(InfoExtractor):
volume_webpage = self._download_webpage( volume_webpage = self._download_webpage(
'http://volume.vox-cdn.com/embed/%s' % volume_uuid, volume_uuid) 'http://volume.vox-cdn.com/embed/%s' % volume_uuid, volume_uuid)
video_data = self._parse_json(self._search_regex( video_data = self._parse_json(self._search_regex(
r'Volume\.createVideo\(({.+})\s*,\s*{.*}\);', volume_webpage, 'video data'), volume_uuid) r'Volume\.createVideo\(({.+})\s*,\s*{.*}\s*,\s*\[.*\]\s*,\s*{.*}\);', volume_webpage, 'video data'), volume_uuid)
for provider_video_type in ('ooyala', 'youtube'): for provider_video_type in ('ooyala', 'youtube'):
provider_video_id = video_data.get('%s_id' % provider_video_type) provider_video_id = video_data.get('%s_id' % provider_video_type)
if provider_video_id: if provider_video_id:

View File

@@ -11,7 +11,96 @@ from ..utils import (
class WashingtonPostIE(InfoExtractor): class WashingtonPostIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?washingtonpost\.com/.*?/(?P<id>[^/]+)/(?:$|[?#])' IE_NAME = 'washingtonpost'
_VALID_URL = r'(?:washingtonpost:|https?://(?:www\.)?washingtonpost\.com/video/(?:[^/]+/)*)(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TEST = {
'url': 'https://www.washingtonpost.com/video/c/video/480ba4ee-1ec7-11e6-82c2-a7dcb313287d',
'md5': '6f537e1334b714eb15f9563bd4b9cdfa',
'info_dict': {
'id': '480ba4ee-1ec7-11e6-82c2-a7dcb313287d',
'ext': 'mp4',
'title': 'Egypt finds belongings, debris from plane crash',
'description': 'md5:a17ceee432f215a5371388c1f680bd86',
'upload_date': '20160520',
'uploader': 'Reuters',
'timestamp': 1463778452,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
'http://www.washingtonpost.com/posttv/c/videojson/%s?resType=jsonp' % video_id,
video_id, transform_source=strip_jsonp)[0]['contentConfig']
title = video_data['title']
urls = []
formats = []
for s in video_data.get('streams', []):
s_url = s.get('url')
if not s_url or s_url in urls:
continue
urls.append(s_url)
video_type = s.get('type')
if video_type == 'smil':
continue
elif video_type in ('ts', 'hls') and ('_master.m3u8' in s_url or '_mobile.m3u8' in s_url):
m3u8_formats = self._extract_m3u8_formats(
s_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
for m3u8_format in m3u8_formats:
width = m3u8_format.get('width')
if not width:
continue
vbr = self._search_regex(
r'%d_%d_(\d+)' % (width, m3u8_format['height']), m3u8_format['url'], 'vbr', default=None)
if vbr:
m3u8_format.update({
'vbr': int_or_none(vbr),
})
formats.extend(m3u8_formats)
else:
width = int_or_none(s.get('width'))
vbr = int_or_none(s.get('bitrate'))
has_width = width != 0
formats.append({
'format_id': (
'%s-%d-%d' % (video_type, width, vbr)
if width
else video_type),
'vbr': vbr if has_width else None,
'width': width,
'height': int_or_none(s.get('height')),
'acodec': s.get('audioCodec'),
'vcodec': s.get('videoCodec') if has_width else 'none',
'filesize': int_or_none(s.get('fileSize')),
'url': s_url,
'ext': 'mp4',
'protocol': 'm3u8_native' if video_type in ('ts', 'hls') else None,
})
source_media_url = video_data.get('sourceMediaURL')
if source_media_url:
formats.append({
'format_id': 'source_media',
'url': source_media_url,
})
self._sort_formats(
formats, ('width', 'height', 'vbr', 'filesize', 'tbr', 'format_id'))
return {
'id': video_id,
'title': title,
'description': video_data.get('blurb'),
'uploader': video_data.get('credits', {}).get('source'),
'formats': formats,
'duration': int_or_none(video_data.get('videoDuration'), 100),
'timestamp': int_or_none(
video_data.get('dateConfig', {}).get('dateFirstPublished'), 1000),
}
class WashingtonPostArticleIE(InfoExtractor):
IE_NAME = 'washingtonpost:article'
_VALID_URL = r'https?://(?:www\.)?washingtonpost\.com/(?:[^/]+/)*(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.washingtonpost.com/sf/national/2014/03/22/sinkhole-of-bureaucracy/', 'url': 'http://www.washingtonpost.com/sf/national/2014/03/22/sinkhole-of-bureaucracy/',
'info_dict': { 'info_dict': {
@@ -63,6 +152,10 @@ class WashingtonPostIE(InfoExtractor):
}] }]
}] }]
@classmethod
def suitable(cls, url):
return False if WashingtonPostIE.suitable(url) else super(WashingtonPostArticleIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
page_id = self._match_id(url) page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id) webpage = self._download_webpage(url, page_id)
@@ -74,54 +167,7 @@ class WashingtonPostIE(InfoExtractor):
<div\s+class="posttv-video-embed[^>]*?data-uuid=| <div\s+class="posttv-video-embed[^>]*?data-uuid=|
data-video-uuid= data-video-uuid=
)"([^"]+)"''', webpage) )"([^"]+)"''', webpage)
entries = [] entries = [self.url_result('washingtonpost:%s' % uuid, 'WashingtonPost', uuid) for uuid in uuids]
for i, uuid in enumerate(uuids, start=1):
vinfo_all = self._download_json(
'http://www.washingtonpost.com/posttv/c/videojson/%s?resType=jsonp' % uuid,
page_id,
transform_source=strip_jsonp,
note='Downloading information of video %d/%d' % (i, len(uuids))
)
vinfo = vinfo_all[0]['contentConfig']
uploader = vinfo.get('credits', {}).get('source')
timestamp = int_or_none(
vinfo.get('dateConfig', {}).get('dateFirstPublished'), 1000)
formats = [{
'format_id': (
'%s-%s-%s' % (s.get('type'), s.get('width'), s.get('bitrate'))
if s.get('width')
else s.get('type')),
'vbr': s.get('bitrate') if s.get('width') != 0 else None,
'width': s.get('width'),
'height': s.get('height'),
'acodec': s.get('audioCodec'),
'vcodec': s.get('videoCodec') if s.get('width') != 0 else 'none',
'filesize': s.get('fileSize'),
'url': s.get('url'),
'ext': 'mp4',
'preference': -100 if s.get('type') == 'smil' else None,
'protocol': {
'MP4': 'http',
'F4F': 'f4m',
}.get(s.get('type')),
} for s in vinfo.get('streams', [])]
source_media_url = vinfo.get('sourceMediaURL')
if source_media_url:
formats.append({
'format_id': 'source_media',
'url': source_media_url,
})
self._sort_formats(formats)
entries.append({
'id': uuid,
'title': vinfo['title'],
'description': vinfo.get('blurb'),
'uploader': uploader,
'formats': formats,
'duration': int_or_none(vinfo.get('videoDuration'), 100),
'timestamp': timestamp,
})
return { return {
'_type': 'playlist', '_type': 'playlist',

View File

@@ -2,25 +2,26 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import hashlib
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
unified_strdate, unified_strdate,
HEADRequest,
float_or_none,
) )
class WatIE(InfoExtractor): class WatIE(InfoExtractor):
_VALID_URL = r'(?:wat:(?P<real_id>\d{8})|https?://www\.wat\.tv/video/(?P<display_id>.*)-(?P<short_id>.*?)_.*?\.html)' _VALID_URL = r'(?:wat:|https?://(?:www\.)?wat\.tv/video/.*-)(?P<id>[0-9a-z]+)'
IE_NAME = 'wat.tv' IE_NAME = 'wat.tv'
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.wat.tv/video/soupe-figues-l-orange-aux-epices-6z1uz_2hvf7_.html', 'url': 'http://www.wat.tv/video/soupe-figues-l-orange-aux-epices-6z1uz_2hvf7_.html',
'md5': 'ce70e9223945ed26a8056d413ca55dc9', 'md5': '83d882d9de5c9d97f0bb2c6273cde56a',
'info_dict': { 'info_dict': {
'id': '11713067', 'id': '11713067',
'display_id': 'soupe-figues-l-orange-aux-epices',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Soupe de figues à l\'orange et aux épices', 'title': 'Soupe de figues à l\'orange et aux épices',
'description': 'Retrouvez l\'émission "Petits plats en équilibre", diffusée le 18 août 2014.', 'description': 'Retrouvez l\'émission "Petits plats en équilibre", diffusée le 18 août 2014.',
@@ -33,7 +34,6 @@ class WatIE(InfoExtractor):
'md5': 'fbc84e4378165278e743956d9c1bf16b', 'md5': 'fbc84e4378165278e743956d9c1bf16b',
'info_dict': { 'info_dict': {
'id': '11713075', 'id': '11713075',
'display_id': 'gregory-lemarchal-voix-ange',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Grégory Lemarchal, une voix d\'ange depuis 10 ans (1/3)', 'title': 'Grégory Lemarchal, une voix d\'ange depuis 10 ans (1/3)',
'description': 'md5:b7a849cf16a2b733d9cd10c52906dee3', 'description': 'md5:b7a849cf16a2b733d9cd10c52906dee3',
@@ -44,96 +44,85 @@ class WatIE(InfoExtractor):
}, },
] ]
def download_video_info(self, real_id): def _real_extract(self, url):
video_id = self._match_id(url)
video_id = video_id if video_id.isdigit() and len(video_id) > 6 else compat_str(int(video_id, 36))
# 'contentv4' is used in the website, but it also returns the related # 'contentv4' is used in the website, but it also returns the related
# videos, we don't need them # videos, we don't need them
info = self._download_json('http://www.wat.tv/interface/contentv3/' + real_id, real_id) video_info = self._download_json(
return info['media'] 'http://www.wat.tv/interface/contentv3/' + video_id, video_id)['media']
def _real_extract(self, url):
def real_id_for_chapter(chapter):
return chapter['tc_start'].split('-')[0]
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
real_id = mobj.group('real_id')
if not real_id:
short_id = mobj.group('short_id')
webpage = self._download_webpage(url, display_id or short_id)
real_id = self._search_regex(r'xtpage = ".*-(.*?)";', webpage, 'real id')
video_info = self.download_video_info(real_id)
error_desc = video_info.get('error_desc') error_desc = video_info.get('error_desc')
if error_desc: if error_desc:
raise ExtractorError( raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_desc), expected=True) '%s returned error: %s' % (self.IE_NAME, error_desc), expected=True)
geo_list = video_info.get('geoList')
country = geo_list[0] if geo_list else ''
chapters = video_info['chapters'] chapters = video_info['chapters']
first_chapter = chapters[0] first_chapter = chapters[0]
files = video_info['files']
first_file = files[0]
if real_id_for_chapter(first_chapter) != real_id: def video_id_for_chapter(chapter):
return chapter['tc_start'].split('-')[0]
if video_id_for_chapter(first_chapter) != video_id:
self.to_screen('Multipart video detected') self.to_screen('Multipart video detected')
chapter_urls = [] entries = [self.url_result('wat:%s' % video_id_for_chapter(chapter)) for chapter in chapters]
for chapter in chapters: return self.playlist_result(entries, video_id, video_info['title'])
chapter_id = real_id_for_chapter(chapter)
# Yes, when we this chapter is processed by WatIE,
# it will download the info again
chapter_info = self.download_video_info(chapter_id)
chapter_urls.append(chapter_info['url'])
entries = [self.url_result(chapter_url) for chapter_url in chapter_urls]
return self.playlist_result(entries, real_id, video_info['title'])
upload_date = None
if 'date_diffusion' in first_chapter:
upload_date = unified_strdate(first_chapter['date_diffusion'])
# Otherwise we can continue and extract just one part, we have to use # Otherwise we can continue and extract just one part, we have to use
# the short id for getting the video url # the video id for getting the video url
formats = [{ date_diffusion = first_chapter.get('date_diffusion')
'url': 'http://wat.tv/get/android5/%s.mp4' % real_id, upload_date = unified_strdate(date_diffusion) if date_diffusion else None
'format_id': 'Mobile',
}]
fmts = [('SD', 'web')] def extract_url(path_template, url_type):
if first_file.get('hasHD'): req_url = 'http://www.wat.tv/get/%s' % (path_template % video_id)
fmts.append(('HD', 'webhd')) head = self._request_webpage(HEADRequest(req_url), video_id, 'Extracting %s url' % url_type)
red_url = head.geturl()
if req_url == red_url:
raise ExtractorError(
'%s said: Sorry, this video is not available from your country.' % self.IE_NAME,
expected=True)
return red_url
def compute_token(param): m3u8_url = extract_url('ipad/%s.m3u8', 'm3u8')
timestamp = '%08x' % int(self._download_webpage( http_url = extract_url('android5/%s.mp4', 'http')
'http://www.wat.tv/servertime', real_id,
'Downloading server time').split('|')[0])
magic = '9b673b13fa4682ed14c3cfa5af5310274b514c4133e9b3a81e6e3aba009l2564'
return '%s/%s' % (hashlib.md5((magic + param + timestamp).encode('ascii')).hexdigest(), timestamp)
for fmt in fmts: formats = []
webid = '/%s/%s' % (fmt[1], real_id) m3u8_formats = self._extract_m3u8_formats(
video_url = self._download_webpage( m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls')
'http://www.wat.tv/get%s?token=%s&getURL=1&country=%s' % (webid, compute_token(webid), country), formats.extend(m3u8_formats)
real_id, formats.extend(self._extract_f4m_formats(
'Downloading %s video URL' % fmt[0], m3u8_url.replace('ios.', 'web.').replace('.m3u8', '.f4m'),
'Failed to download %s video URL' % fmt[0], video_id, f4m_id='hds', fatal=False))
False) for m3u8_format in m3u8_formats:
if not video_url: mobj = re.search(
r'audio.*?%3D(\d+)(?:-video.*?%3D(\d+))?', m3u8_format['url'])
if not mobj:
continue continue
formats.append({ abr, vbr = mobj.groups()
'url': video_url, abr, vbr = float_or_none(abr, 1000), float_or_none(vbr, 1000)
'ext': 'mp4', m3u8_format.update({
'format_id': fmt[0], 'vbr': vbr,
'abr': abr,
}) })
if not vbr or not abr:
continue
f = m3u8_format.copy()
f.update({
'url': re.sub(r'%s-\d+00-\d+' % video_id, '%s-%d00-%d' % (video_id, round(vbr / 100), round(abr)), http_url),
'format_id': f['format_id'].replace('hls', 'http'),
'protocol': 'http',
})
formats.append(f)
self._sort_formats(formats)
return { return {
'id': real_id, 'id': video_id,
'display_id': display_id,
'title': first_chapter['title'], 'title': first_chapter['title'],
'thumbnail': first_chapter['preview'], 'thumbnail': first_chapter['preview'],
'description': first_chapter['description'], 'description': first_chapter['description'],
'view_count': video_info['views'], 'view_count': video_info['views'],
'upload_date': upload_date, 'upload_date': upload_date,
'duration': first_file['duration'], 'duration': video_info['files'][0]['duration'],
'formats': formats, 'formats': formats,
} }

View File

@@ -12,37 +12,52 @@ from ..utils import (
class XHamsterIE(InfoExtractor): class XHamsterIE(InfoExtractor):
_VALID_URL = r'(?P<proto>https?)://(?:.+?\.)?xhamster\.com/movies/(?P<id>[0-9]+)/(?P<seo>.+?)\.html(?:\?.*)?' _VALID_URL = r'(?P<proto>https?)://(?:.+?\.)?xhamster\.com/movies/(?P<id>[0-9]+)/(?P<seo>.*?)\.html(?:\?.*)?'
_TESTS = [ _TESTS = [{
{ 'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html', 'md5': '8281348b8d3c53d39fffb377d24eac4e',
'info_dict': { 'info_dict': {
'id': '1509445', 'id': '1509445',
'ext': 'mp4', 'ext': 'mp4',
'title': 'FemaleAgent Shy beauty takes the bait', 'title': 'FemaleAgent Shy beauty takes the bait',
'upload_date': '20121014', 'upload_date': '20121014',
'uploader': 'Ruseful2011', 'uploader': 'Ruseful2011',
'duration': 893.52, 'duration': 893.52,
'age_limit': 18, 'age_limit': 18,
}
}, },
{ }, {
'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd', 'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
'info_dict': { 'info_dict': {
'id': '2221348', 'id': '2221348',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Britney Spears Sexy Booty', 'title': 'Britney Spears Sexy Booty',
'upload_date': '20130914', 'upload_date': '20130914',
'uploader': 'jojo747400', 'uploader': 'jojo747400',
'duration': 200.48, 'duration': 200.48,
'age_limit': 18, 'age_limit': 18,
}
}, },
{ 'params': {
'url': 'https://xhamster.com/movies/2272726/amber_slayed_by_the_knight.html', 'skip_download': True,
'only_matching': True,
}, },
] }, {
# empty seo
'url': 'http://xhamster.com/movies/5667973/.html',
'info_dict': {
'id': '5667973',
'ext': 'mp4',
'title': '....',
'upload_date': '20160208',
'uploader': 'parejafree',
'duration': 72.0,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://xhamster.com/movies/2272726/amber_slayed_by_the_knight.html',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
def extract_video_url(webpage, name): def extract_video_url(webpage, name):
@@ -170,7 +185,7 @@ class XHamsterEmbedIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_url = self._search_regex( video_url = self._search_regex(
r'href="(https?://xhamster\.com/movies/%s/[^"]+\.html[^"]*)"' % video_id, r'href="(https?://xhamster\.com/movies/%s/[^"]*\.html[^"]*)"' % video_id,
webpage, 'xhamster url', default=None) webpage, 'xhamster url', default=None)
if not video_url: if not video_url:

View File

@@ -20,18 +20,24 @@ class YandexMusicBaseIE(InfoExtractor):
error = response.get('error') error = response.get('error')
if error: if error:
raise ExtractorError(error, expected=True) raise ExtractorError(error, expected=True)
if response.get('type') == 'captcha' or 'captcha' in response:
YandexMusicBaseIE._raise_captcha()
@staticmethod
def _raise_captcha():
raise ExtractorError(
'YandexMusic has considered youtube-dl requests automated and '
'asks you to solve a CAPTCHA. You can either wait for some '
'time until unblocked and optionally use --sleep-interval '
'in future or alternatively you can go to https://music.yandex.ru/ '
'solve CAPTCHA, then export cookies and pass cookie file to '
'youtube-dl with --cookies',
expected=True)
def _download_webpage(self, *args, **kwargs): def _download_webpage(self, *args, **kwargs):
webpage = super(YandexMusicBaseIE, self)._download_webpage(*args, **kwargs) webpage = super(YandexMusicBaseIE, self)._download_webpage(*args, **kwargs)
if 'Нам очень жаль, но&nbsp;запросы, поступившие с&nbsp;вашего IP-адреса, похожи на&nbsp;автоматические.' in webpage: if 'Нам очень жаль, но&nbsp;запросы, поступившие с&nbsp;вашего IP-адреса, похожи на&nbsp;автоматические.' in webpage:
raise ExtractorError( self._raise_captcha()
'YandexMusic has considered youtube-dl requests automated and '
'asks you to solve a CAPTCHA. You can either wait for some '
'time until unblocked and optionally use --sleep-interval '
'in future or alternatively you can go to https://music.yandex.ru/ '
'solve CAPTCHA, then export cookies and pass cookie file to '
'youtube-dl with --cookies',
expected=True)
return webpage return webpage
def _download_json(self, *args, **kwargs): def _download_json(self, *args, **kwargs):

View File

@@ -275,6 +275,8 @@ class YoukuIE(InfoExtractor):
'format_id': self.get_format_name(fm), 'format_id': self.get_format_name(fm),
'ext': self.parse_ext_l(fm), 'ext': self.parse_ext_l(fm),
'filesize': int(seg['size']), 'filesize': int(seg['size']),
'width': stream.get('width'),
'height': stream.get('height'),
}) })
return { return {

View File

@@ -395,8 +395,8 @@ def parseOpts(overrideArguments=None):
downloader = optparse.OptionGroup(parser, 'Download Options') downloader = optparse.OptionGroup(parser, 'Download Options')
downloader.add_option( downloader.add_option(
'-r', '--rate-limit', '-r', '--limit-rate', '--rate-limit',
dest='ratelimit', metavar='LIMIT', dest='ratelimit', metavar='RATE',
help='Maximum download rate in bytes per second (e.g. 50K or 4.2M)') help='Maximum download rate in bytes per second (e.g. 50K or 4.2M)')
downloader.add_option( downloader.add_option(
'-R', '--retries', '-R', '--retries',

View File

@@ -83,11 +83,8 @@ def update_self(to_screen, verbose, opener):
print_notes(to_screen, versions_info['versions']) print_notes(to_screen, versions_info['versions'])
filename = sys.argv[0] # sys.executable is set to the full pathname of the exe-file for py2exe
# Py2EXE: Filename could be different filename = sys.executable if hasattr(sys, 'frozen') else sys.argv[0]
if hasattr(sys, 'frozen') and not os.path.isfile(filename):
if os.path.isfile(filename + '.exe'):
filename += '.exe'
if not os.access(filename, os.W_OK): if not os.access(filename, os.W_OK):
to_screen('ERROR: no write permissions on %s' % filename) to_screen('ERROR: no write permissions on %s' % filename)
@@ -95,7 +92,7 @@ def update_self(to_screen, verbose, opener):
# Py2EXE # Py2EXE
if hasattr(sys, 'frozen'): if hasattr(sys, 'frozen'):
exe = os.path.abspath(filename) exe = filename
directory = os.path.dirname(exe) directory = os.path.dirname(exe)
if not os.access(directory, os.W_OK): if not os.access(directory, os.W_OK):
to_screen('ERROR: no write permissions on %s' % directory) to_screen('ERROR: no write permissions on %s' % directory)

View File

@@ -105,9 +105,9 @@ KNOWN_EXTENSIONS = (
'f4f', 'f4m', 'm3u8', 'smil') 'f4f', 'f4m', 'm3u8', 'smil')
# needed for sanitizing filenames in restricted mode # needed for sanitizing filenames in restricted mode
ACCENT_CHARS = dict(zip('ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØŒÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøœùúûüýþÿ', ACCENT_CHARS = dict(zip('ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖŐØŒÙÚÛÜŰÝÞßàáâãäåæçèéêëìíîïðñòóôõöőøœùúûüűýþÿ',
itertools.chain('AAAAAA', ['AE'], 'CEEEEIIIIDNOOOOOO', ['OE'], 'UUUUYP', ['ss'], itertools.chain('AAAAAA', ['AE'], 'CEEEEIIIIDNOOOOOOO', ['OE'], 'UUUUUYP', ['ss'],
'aaaaaa', ['ae'], 'ceeeeiiiionoooooo', ['oe'], 'uuuuypy'))) 'aaaaaa', ['ae'], 'ceeeeiiiionooooooo', ['oe'], 'uuuuuypy')))
def preferredencoding(): def preferredencoding():
@@ -861,9 +861,13 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
# As of RFC 2616 default charset is iso-8859-1 that is respected by python 3 # As of RFC 2616 default charset is iso-8859-1 that is respected by python 3
if sys.version_info >= (3, 0): if sys.version_info >= (3, 0):
location = location.encode('iso-8859-1').decode('utf-8') location = location.encode('iso-8859-1').decode('utf-8')
else:
location = location.decode('utf-8')
location_escaped = escape_url(location) location_escaped = escape_url(location)
if location != location_escaped: if location != location_escaped:
del resp.headers['Location'] del resp.headers['Location']
if sys.version_info < (3, 0):
location_escaped = location_escaped.encode('utf-8')
resp.headers['Location'] = location_escaped resp.headers['Location'] = location_escaped
return resp return resp
@@ -1035,6 +1039,7 @@ def unified_strdate(date_str, day_first=True):
format_expressions.extend([ format_expressions.extend([
'%d-%m-%Y', '%d-%m-%Y',
'%d.%m.%Y', '%d.%m.%Y',
'%d.%m.%y',
'%d/%m/%Y', '%d/%m/%Y',
'%d/%m/%y', '%d/%m/%y',
'%d/%m/%Y %H:%M:%S', '%d/%m/%Y %H:%M:%S',
@@ -1055,7 +1060,10 @@ def unified_strdate(date_str, day_first=True):
if upload_date is None: if upload_date is None:
timetuple = email.utils.parsedate_tz(date_str) timetuple = email.utils.parsedate_tz(date_str)
if timetuple: if timetuple:
upload_date = datetime.datetime(*timetuple[:6]).strftime('%Y%m%d') try:
upload_date = datetime.datetime(*timetuple[:6]).strftime('%Y%m%d')
except ValueError:
pass
if upload_date is not None: if upload_date is not None:
return compat_str(upload_date) return compat_str(upload_date)
@@ -1907,7 +1915,7 @@ def parse_age_limit(s):
def strip_jsonp(code): def strip_jsonp(code):
return re.sub( return re.sub(
r'(?s)^[a-zA-Z0-9_.]+\s*\(\s*(.*)\);?\s*?(?://[^\n]*)*$', r'\1', code) r'(?s)^[a-zA-Z0-9_.$]+\s*\(\s*(.*)\);?\s*?(?://[^\n]*)*$', r'\1', code)
def js_to_json(code): def js_to_json(code):

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2016.05.21.2' __version__ = '2016.06.03'