Compare commits

...

257 Commits

Author SHA1 Message Date
Philipp Hagemeister
317239b097 release 2015.01.23.1 2015-01-23 00:33:14 +01:00
Philipp Hagemeister
c2a30b250c [testtube] Add new extractor (Fixes #4759) 2015-01-23 00:32:35 +01:00
Philipp Hagemeister
c994e6bd63 release 2015.01.23 2015-01-23 00:06:11 +01:00
Philipp Hagemeister
3ee2aa7a16 Merge remote-tracking branch 'origin/master' 2015-01-23 00:06:02 +01:00
Philipp Hagemeister
083c9df93b [YoutubeDL] Allow filtering by properties (Fixes #4584) 2015-01-23 00:04:05 +01:00
Philipp Hagemeister
50789175ed [pornhub] Detect private videos and emit an error message (Closes #4764) 2015-01-22 23:48:58 +01:00
Philipp Hagemeister
dc1b027cd4 [twitch] PEP8 2015-01-22 23:06:03 +01:00
Sergey M․
f353cbdb2f [twitch:stream] Randomize query 2015-01-22 23:34:40 +06:00
Philipp Hagemeister
73e449b226 Merge branch 'master' of github.com:rg3/youtube-dl 2015-01-22 18:21:27 +01:00
Philipp Hagemeister
b4a64c592b [README] Add an FAQ entry about destination folder 2015-01-22 18:21:17 +01:00
Philipp Hagemeister
78111136db [twitch] Move URL matching tests into extractor 2015-01-22 18:18:21 +01:00
Philipp Hagemeister
650ab5beeb [comedycentral:shows] Remove references to colbert report 2015-01-22 18:15:58 +01:00
Philipp Hagemeister
7932de6352 [hearthisat] Correct error message 2015-01-22 18:15:04 +01:00
Sergey M․
240b9b7a5c [twitch] Add support for streams (Closes #893, closes #3693, closes #1884) 2015-01-22 23:11:22 +06:00
Naglis Jonaitis
bb6e38787d [videomega] Fix extraction (Closes #4763) 2015-01-22 18:36:49 +02:00
Philipp Hagemeister
898c23c03f release 2015.01.22 2015-01-22 12:04:26 +01:00
Philipp Hagemeister
b55ee18ff3 [hearthisat] Add support for more high-quality download links 2015-01-22 12:04:13 +01:00
Naglis Jonaitis
e5763a7a7e [hearthisat] Add new extractor (Closes #4743) 2015-01-21 21:47:55 +02:00
Sergey M․
8bb1bdfae9 [twitch:past_broadcasts] Fix IE_NAME 2015-01-21 23:06:16 +06:00
Sergey M․
c62b449765 Credit @yan12125 for streetvoice (#4758) 2015-01-21 22:56:28 +06:00
Sergey M․
bb0aa4cb3c [streetvoice] Improve 2015-01-21 22:53:51 +06:00
Sergey M.
d63528c8c7 Merge pull request #4758 from yan12125/IE_streetvoice
[StreetVoice] Add new extractor
2015-01-21 22:36:50 +06:00
Sergey M․
c5db6bb32b [twitch] Refactor and add support for past broadcasts 2015-01-21 22:27:21 +06:00
Yen Chi Hsuan
c8dc41a6e7 [StreetVoice] Add new extractor 2015-01-21 23:05:47 +08:00
Jaime Marquínez Ferrándiz
47e0e1e0e2 [nbc] Fix pep8 issue 2015-01-21 10:36:15 +01:00
Jaime Marquínez Ferrándiz
efcddaebe9 [cnn] Use edition.cnn.com for getting the information (fixes #4757)
Some videos (like http://edition.cnn.com/videos/us/2015/01/20/orig-yellowstone-oil-spill.cnn) will fail if we use cnn.com.
2015-01-21 10:31:57 +01:00
Jaime Marquínez Ferrándiz
5fe5112589 [CNNArticle] Update test 2015-01-21 10:27:18 +01:00
Sergey M․
564bb5e964 [tinypic] Tweak VALID_URL regex (Closes #4754) 2015-01-21 02:15:28 +06:00
Sergey M․
2df54b4ba8 [nbcnews] Ignore HTTP errors while coping with playlists (Closes #4749) 2015-01-20 21:23:51 +06:00
Sergey M․
030aa5d9e7 [tvp] Fix extraction 2015-01-19 23:00:22 +06:00
Philipp Hagemeister
c511f13f22 [ndtv] Modernize 2015-01-19 10:10:05 +01:00
Sergey M․
fdb2ed7455 [abc7news] Add extractor (Closes #4734) 2015-01-18 08:09:18 +06:00
Philipp Hagemeister
ba319696a9 [options] Clarify that --password can be left out (#4723) 2015-01-17 23:56:34 +01:00
Philipp Hagemeister
910c552052 release 2015.01.16 2015-01-16 14:20:38 +01:00
Philipp Hagemeister
cce81f192c [bandcamp:album] Fix title extraction (Fixes #4721) 2015-01-16 14:20:25 +01:00
Philipp Hagemeister
9d22a7dfb0 [fourtube] Fix extraction 2015-01-16 13:44:44 +01:00
Philipp Hagemeister
4f4f642822 [npo] Remove unused import 2015-01-16 13:44:36 +01:00
Jaime Marquínez Ferrándiz
2875cf01bb FFmpegEmbedSubtitlePP: simplify command 2015-01-16 13:37:37 +01:00
Jaime Marquínez Ferrándiz
e205db3bcd FFmpegEmbedSubtitlePP: don't fail if the video doesn't have an audio stream (fixes #4718)
Instead of specifying which streams ffmpeg must copy, we tell it to copy all.
2015-01-16 13:29:01 +01:00
Philipp Hagemeister
31d4a6e212 release 2015.01.15.1 2015-01-15 22:38:11 +01:00
Sergey M․
aaeb86f682 [youtube] Add test for #4706 2015-01-16 01:25:03 +06:00
Sergey M.
9fa6ea2680 Merge pull request #4706 from pkulak/master
Fix Youtube encrypted sigs.
2015-01-16 01:12:50 +06:00
Phil Kulak
a9b6b5cd15 Looks like Google switched to a new JS compiler that includes dollar signs in function names. 2015-01-15 10:23:05 -08:00
Naglis Jonaitis
a45c0a5d67 [videomega] Fix extraction (Closes #4703) 2015-01-15 19:57:36 +02:00
Sergey M․
c8dfe360eb [atresplayer] Add authentication support (Closes #4700) 2015-01-15 21:43:35 +06:00
Philipp Hagemeister
4cfaf85c65 release 2015.01.15 2015-01-15 12:42:11 +01:00
Philipp Hagemeister
be5f2c192c [ssl] Correct connect creation
We want to authenticate the server, see https://docs.python.org/dev/library/ssl.html#ssl.Purpose.SERVER_AUTH .
2015-01-15 02:06:50 +01:00
Sergey M․
c9ef44ce29 [smotri] Improve extraction (Closes #4698) 2015-01-14 21:50:36 +06:00
Sergey M․
e92d4a11f5 [spiegel] Test format video URLs for 404 (Closes #4579) 2015-01-14 20:27:14 +06:00
Naglis Jonaitis
f2cbc96c3e [lnkgo] Make more robust 2015-01-14 00:51:48 +02:00
Naglis Jonaitis
a69801e2c6 [utils] Add additional format to unified_strdate 2015-01-14 00:16:34 +02:00
Naglis Jonaitis
034206cec1 [lnkgo] Add new extractor 2015-01-14 00:14:59 +02:00
Sergey M․
04e0bac233 [npo:live] Add extractor (Closes #4691) 2015-01-13 20:54:03 +06:00
Philipp Hagemeister
fbef83f399 [README] Add FAQ for playing from another machine (Fixes #4693) 2015-01-13 08:10:17 +01:00
Sergey M․
a5fb718c50 [test_utils] Add more tests for parse_duration 2015-01-12 21:39:58 +06:00
Sergey M․
227d4822ff [utils] Disallow non string types in parse_duration (Closes #4679) 2015-01-12 21:06:26 +06:00
Philipp Hagemeister
5c4a81d934 [npo] Remove unused import 2015-01-11 23:43:09 +01:00
Philipp Hagemeister
263255eb8d Credit @Josso for drbonanza (#4581) 2015-01-11 23:42:24 +01:00
Philipp Hagemeister
8e2ec95575 [drbonanza] Simplify and fix duration (#4687) 2015-01-11 23:41:55 +01:00
Johan K. Jensen
8e7a9016d5 [DRBonanza] Add new extractor (fixing #4581) 2015-01-11 23:23:10 +01:00
Sergey M․
c85f368370 [npo] Make extension check less strict and add test (#4680) 2015-01-11 23:45:52 +06:00
Sergey M․
a0977064ce [npo] Fix non asf streams (Closes #4680) 2015-01-11 23:18:45 +06:00
Philipp Hagemeister
15aecd8711 release 2015.01.11 2015-01-11 17:47:04 +01:00
Philipp Hagemeister
20dd0b2d20 Merge branch 'master' of github.com:rg3/youtube-dl 2015-01-11 17:46:22 +01:00
Sergey M.
f934860a07 Merge pull request #4684 from Josso/patch-1
[drtv] Updated with support for https
2015-01-11 22:41:15 +06:00
Philipp Hagemeister
2aeb06d6dc [utils] Improve colon handling (Fixes #4683) 2015-01-11 17:40:45 +01:00
Johan
6ccbb335d2 [drtv] Updated with support for https 2015-01-11 17:39:16 +01:00
Pierre
4340decad2 check for overwriting files in the downloader (fixes #3916, closes #3829) 2015-01-11 12:02:27 +01:00
Jaime Marquínez Ferrándiz
f3ff1a3696 YoutubeDL: Make the decision about removing the original file after each postprocessor is run (fixes #2261)
If one of the processors said the file should be kept, it wouldn't pay
attention to the response from the following processors. This was wrong if the
'keep_video' option was False, if the first extractor modifies the original file
and then we extract its audio we don't want to keep the original video file.
2015-01-11 11:35:18 +01:00
Sergey M․
aa24de39aa [veehd] Update test 2015-01-11 16:20:39 +06:00
Sergey M․
a798e64c15 [veehd] Improve extraction 2015-01-11 16:20:16 +06:00
Philipp Hagemeister
6a5fa75490 [karaoketv] Remove unused import 2015-01-11 10:48:20 +01:00
Philipp Hagemeister
8ad6b5ed9f [compat] Correct socket error class reference 2015-01-11 10:47:39 +01:00
Sergey M․
d5bb814d34 [veehd] Capture removed video message 2015-01-11 15:42:53 +06:00
Sergey M․
d156a1d981 [xboxclips] Fix extraction 2015-01-11 15:25:29 +06:00
Sergey M․
987493aef3 [test_compat] Fix alphabetic order to make test_all_present pass 2015-01-11 15:13:03 +06:00
Philipp Hagemeister
8bfa75451b [options] Add --no-call-home
While we're at it, also drop "experimental" moniker for --call-home - should work fine.
2015-01-10 21:09:18 +01:00
Philipp Hagemeister
c071733fd4 [README] Highlight that bug reports should include the -v output 2015-01-10 21:07:44 +01:00
Philipp Hagemeister
cd3063f3fa release 2015.01.10.2 2015-01-10 21:03:00 +01:00
Philipp Hagemeister
58b1f00d19 [YoutubeDL] Add new --call-home option for debugging 2015-01-10 21:02:27 +01:00
Philipp Hagemeister
149f05c7b6 release 2015.01.10.1 2015-01-10 20:06:13 +01:00
Philipp Hagemeister
8a1b9b068e Merge remote-tracking branch 'origin/master' 2015-01-10 20:06:01 +01:00
Philipp Hagemeister
c5a59d9391 [utils] Fix call to _create_http_connection
Avoid confusion over args/kwargs.
2015-01-10 20:05:30 +01:00
Philipp Hagemeister
500b8b41c1 [options] Add -4 and -6 options
Fixes #520, fixes #3626.
2015-01-10 20:02:02 +01:00
Philipp Hagemeister
be4a824d74 Add new option --source-address
Closes #3618, fixes #721, fixes #2481, fixes #4551, closes #1020.
2015-01-10 19:56:51 +01:00
Sergey M․
ed3958d714 [collegerama] Add extractor (#4540) 2015-01-11 00:40:46 +06:00
Philipp Hagemeister
6ce08764a1 Credit @dinesh for rte.ie (#4015) 2015-01-10 18:58:03 +01:00
Philipp Hagemeister
c80ede5f13 [karaoketv] Simplify (#3853) 2015-01-10 18:03:36 +01:00
Philipp Hagemeister
bc694039e4 Merge remote-tracking branch 'lenaten/karaoketv' 2015-01-10 17:59:35 +01:00
Philipp Hagemeister
3462af03e6 [rte] PEP8 2015-01-10 17:59:07 +01:00
Philipp Hagemeister
ea1d5bdcdd [rte] Make more robust and add a new testcase (#4015) 2015-01-10 17:57:21 +01:00
Philipp Hagemeister
121c09c7be Merge remote-tracking branch 'Dineshs91/f4m-2.0' 2015-01-10 17:51:52 +01:00
Philipp Hagemeister
76bfaf6daf [nrk] Improve subtitle support (#3092) 2015-01-10 17:46:01 +01:00
Sergey M․
d89c6e336a [atttechchannel] Add extractor (Closes #3938) 2015-01-10 19:44:29 +06:00
Sergey M․
776dc3992a [utils] Clarify more day-month-first ambiguous formats 2015-01-10 19:43:52 +06:00
Philipp Hagemeister
27ca82ebc6 [orf:oe1] Add konsole URL schema (Fixes #4675) 2015-01-10 14:27:27 +01:00
Philipp Hagemeister
385f8ae468 [eighttracks] PEP8 2015-01-10 14:25:11 +01:00
Philipp Hagemeister
b9f030cc26 [orf] Fix typo 2015-01-10 14:23:54 +01:00
Philipp Hagemeister
52afb2ac1b [ffmpeg] Call encodeFilename on filenames 2015-01-10 06:13:18 +01:00
Philipp Hagemeister
43bc88903d Merge remote-tracking branch 'ivan/muxed-mtime' 2015-01-10 06:10:18 +01:00
Philipp Hagemeister
6ef9f88299 release 2015.01.10 2015-01-10 05:51:22 +01:00
Philipp Hagemeister
f71fdb0acc [eighttracks] Improve waiting (#3954) 2015-01-10 05:51:07 +01:00
Philipp Hagemeister
c24dfef63c Merge remote-tracking branch 'lenaten/8tracks' 2015-01-10 05:47:05 +01:00
Philipp Hagemeister
6271f1cad9 [youtube|ffmpeg] Automatically correct video with non-square pixels (Fixes #4674) 2015-01-10 05:45:51 +01:00
Philipp Hagemeister
fb4b030aaf [tvp] Update tests and improve output 2015-01-10 02:38:35 +01:00
Philipp Hagemeister
ff21a8e0ee Merge remote-tracking branch 'Tithen-Firion/master' 2015-01-10 02:26:21 +01:00
Philipp Hagemeister
904fffffeb [audiomack] Better titles, simplify code 2015-01-10 02:24:46 +01:00
Philipp Hagemeister
51897bb77c Merge remote-tracking branch 'xavierbeynon/master' 2015-01-10 02:03:46 +01:00
Philipp Hagemeister
bd1a281ede [options] PEP8 and simpler --merge-output-format handling (#4673) 2015-01-10 02:03:00 +01:00
Philipp Hagemeister
45598f1578 Merge remote-tracking branch 'aft90/merge-output-format'
Conflicts:
	youtube_dl/YoutubeDL.py
2015-01-10 01:59:14 +01:00
Andrei Troie
d02115f837 Use the option in preparing the merge output filename 2015-01-10 00:29:06 +00:00
Andrei Troie
34c781a24d Passing the option into the main program's arguments 2015-01-10 00:03:11 +00:00
Philipp Hagemeister
1302394603 release 2015.01.09.2 2015-01-09 23:59:29 +01:00
Philipp Hagemeister
dd622d7c4e [netzkino] Add new extractor (Fixes #4669) 2015-01-09 23:59:18 +01:00
Andrei Troie
d120e9013f Added an option to specify an output format for merges when downloading separate video & audio 2015-01-09 22:03:56 +00:00
Philipp Hagemeister
b8da6b9fc6 [elpais] Modernize 2015-01-09 22:43:49 +01:00
Philipp Hagemeister
4baea47c42 release 2015.01.09.1 2015-01-09 21:33:16 +01:00
Philipp Hagemeister
176cf9e0c3 [wdr] Support overviews (Fixes #4651) 2015-01-09 21:33:07 +01:00
Philipp Hagemeister
7b6faddfc8 [wdr] Modernize 2015-01-09 20:52:49 +01:00
Philipp Hagemeister
f90ad27375 [YoutubeDL] Copy over format metadata when merging (Fixes #4671) 2015-01-09 20:50:23 +01:00
Philipp Hagemeister
230b2287dd [youtube] Add acodec information
The codec seems to be consistently aac, so state that in our metadata.
2015-01-09 20:44:21 +01:00
Philipp Hagemeister
754c838903 release 2015.01.09 2015-01-09 20:20:55 +01:00
Philipp Hagemeister
aa2fd59857 [update] Use utils HTTPS handler (Fixes #4666)
On FreeBSD, the default HTTPS handler is missing certificates, so use our own.
2015-01-09 20:20:48 +01:00
Jaime Marquínez Ferrándiz
9932a65370 [vk] Remove debug assert statement (fixes #4672, fixes #4514) 2015-01-09 20:13:53 +01:00
Philipp Hagemeister
5e4166478d [README] Add an FAQ entry for how to install on Windows 2015-01-09 19:17:15 +01:00
Philipp Hagemeister
b0e87c3110 [ffmpeg] Correctly encode paths on Windows
On Python 2.x on Windows, if there are any unicode arguments in the command argument list, the whole list is converted to unicode internally.
Therefore, we need to call encodeArgument on every argument.

Fixes #4337 and #4668.
2015-01-09 19:02:07 +01:00
Your Name
ff0813313a Minor style changes 2015-01-08 18:35:33 -06:00
Philipp Hagemeister
c0bdf32a3c Add --print-json (Closes #2845) 2015-01-08 18:03:29 +01:00
Philipp Hagemeister
92b065dc53 [tudou] Fix extraction 2015-01-08 18:03:29 +01:00
Philipp Hagemeister
9298d4e3df [discovery] Fix extractor 2015-01-08 18:03:29 +01:00
Philipp Hagemeister
740a7fcbc8 [gdcvault] Skip test that is now restricted 2015-01-08 18:03:29 +01:00
Philipp Hagemeister
5fbf25a681 [test_age_restriction] remove misbehaving test
We now test for the age_limit being set right in test_download, so we don't need more than two tests for the actual age limit handling.
2015-01-08 18:03:29 +01:00
Philipp Hagemeister
db6e625005 [buzzfeed] Fix test 2015-01-08 18:03:29 +01:00
Philipp Hagemeister
811cacdc2c [bet] Correct test IDs 2015-01-08 18:03:29 +01:00
Philipp Hagemeister
ce08a86462 Merge pull request #4647 from aajanki/hds_metadata
[downloader/f4m] Improved metadata handling
2015-01-08 16:37:49 +01:00
Philipp Hagemeister
11497d5bba release 2015.01.08 2015-01-08 16:15:08 +01:00
Philipp Hagemeister
0217c78377 [YoutubeDL] Allow selection by more extensions 2015-01-08 16:14:50 +01:00
Philipp Hagemeister
bd6b25ce0e [fktv] Fix download URL 2015-01-08 16:14:50 +01:00
Philipp Hagemeister
d51a853d5c [zdf] Fix test case 2015-01-08 16:14:50 +01:00
Philipp Hagemeister
9ed99402f5 [youtube] Fix test case 2015-01-08 16:14:50 +01:00
Philipp Hagemeister
ec3a6a3137 [tunein] Ignore reliability if it's >90% (#4097) 2015-01-08 16:14:50 +01:00
Philipp Hagemeister
796858a53f [sexykarma] Add age_limit designation 2015-01-08 16:14:50 +01:00
Philipp Hagemeister
5b78caca94 [mit] Amend test definitions 2015-01-08 16:14:50 +01:00
Philipp Hagemeister
bec2248141 [InfoExtractor/common] Correct and test meta tag matching 2015-01-08 16:14:50 +01:00
Philipp Hagemeister
211503c39f [teachertube] Modernize 2015-01-08 16:14:50 +01:00
Philipp Hagemeister
adb1307b9a [imdb] Remove test md5
They seem to reencode quite frequently, so simply remove the md5 sum.
2015-01-08 16:14:50 +01:00
Philipp Hagemeister
99673f04bc [washingtonpost] Modernize and correct test case 2015-01-08 16:14:49 +01:00
Philipp Hagemeister
e9a537774d Merge pull request #4665 from Li4ick/patch-1
Change path name to MSDN standard.
2015-01-08 14:53:20 +01:00
Sergey M.
367f539769 Merge pull request #4664 from kieranoreilly/patch-1
Spelling
2015-01-08 19:09:46 +06:00
Sergey M․
398133cf55 [huffpost] Make extraction more robust (Closes #4663) 2015-01-08 19:07:28 +06:00
Li4ick
52fc3ba405 Change path name to MSDN standard.
<Yourname> changed to <user name>, which is more general.
2015-01-08 14:17:26 +02:00
Kieran O'Reilly
fdd6e18b75 Spelling
Corrected the spelling of incapacitated
2015-01-07 20:23:36 -08:00
Sergey M․
58a84b8cb6 [bilibili] Fix extraction (Closes #4660) 2015-01-08 01:33:22 +06:00
Jaime Marquínez Ferrándiz
c5d666d374 Fix build with python 2.6
* Packages cannot be executed
* '.format' needs the index of the argument

(Reported in https://github.com/Homebrew/homebrew/issues/35616)
2015-01-07 16:09:43 +01:00
Jaime Marquínez Ferrándiz
5d8993b06a [extractor/__init__] Remove unused import 2015-01-07 11:59:15 +01:00
Jaime Marquínez Ferrándiz
c758bf9fd7 [nrktv] Remove 'proxy' parameter from tests 2015-01-07 11:56:22 +01:00
Philipp Hagemeister
900813a328 release 2015.01.07.2 2015-01-07 07:41:48 +01:00
Philipp Hagemeister
2bad0e5d20 [/__init__] Define public API 2015-01-07 07:41:05 +01:00
Philipp Hagemeister
ccc5842bc9 [gameone] Modernize 2015-01-07 07:37:21 +01:00
Philipp Hagemeister
fd86c2026d release 2015.01.07.1 2015-01-07 07:31:38 +01:00
Philipp Hagemeister
e4a8eae701 Merge commit '8ee3415' 2015-01-07 07:30:57 +01:00
Philipp Hagemeister
75e51819d0 release 2015.01.07 2015-01-07 07:22:28 +01:00
Philipp Hagemeister
8ee341500d [viki] Modernize 2015-01-07 07:21:24 +01:00
Philipp Hagemeister
0590062925 Respect age_limit when listing extractors (Fixes #4653) 2015-01-07 07:20:20 +01:00
Sergey M․
799d88d3d8 [nrktv] Add support for playlists (Closes #4656) 2015-01-07 06:46:56 +06:00
Sergey M․
760aea9a96 Merge branch 'oskar456-ceskatelevizesrt' 2015-01-07 05:05:30 +06:00
Sergey M․
d6a31b1766 Credit @oskar456 for ceskatelevize subtitles support (#4622) 2015-01-07 05:05:18 +06:00
Sergey M․
0b54a5b10a [ceskatelevize] Add subtitles tests 2015-01-07 05:04:15 +06:00
Sergey M․
6309cb9b41 [ceskatelevize] Fix python 2.6 format issue 2015-01-07 05:03:34 +06:00
Sergey M․
27a82a1b93 [ceskatelevize] Simplify 2015-01-07 05:03:14 +06:00
Sergey M․
ecd1936695 Merge branch 'ceskatelevizesrt' of https://github.com/oskar456/youtube-dl into oskar456-ceskatelevizesrt 2015-01-07 05:02:27 +06:00
Jaime Marquínez Ferrándiz
76b3c61012 [youtube] Add formats 308 and 315 (closes #4650) 2015-01-06 11:59:41 +01:00
Sergey M․
0df2dea73b [giga] Add extractor (Closes #4090) 2015-01-06 06:54:31 +06:00
Philipp Hagemeister
f8bb576c4f release 2015.01.05.1 2015-01-05 22:42:38 +01:00
Philipp Hagemeister
ee61f6f3e2 [youtube] Handle cases where format comes without a preference (Fixes #4648) 2015-01-05 22:42:17 +01:00
Antti Ajanki
f14f2a6d79 [downloader/f4m] Minor cleanup 2015-01-05 21:12:33 +02:00
Antti Ajanki
2c322cc5d6 [downloader/f4m] The last value in a tag is the tag length 2015-01-05 21:07:15 +02:00
Antti Ajanki
3b8f3a1504 [downloader/f4m] <metadata> is optional according to the F4M specs 2015-01-05 21:07:13 +02:00
Jaime Marquínez Ferrándiz
8f9529cd05 [motorsport] Fix extraction and make trailing '/' optional
They directly embed a youtube video now.
2015-01-05 19:19:01 +01:00
Philipp Hagemeister
f4bca0b348 release 2015.01.05 2015-01-05 18:44:29 +01:00
Philipp Hagemeister
6291438073 [auengine] Simplify (#4643) 2015-01-05 18:21:32 +01:00
Philipp Hagemeister
18c3c15391 Merge remote-tracking branch 'Oteng/master' 2015-01-05 18:18:15 +01:00
Philipp Hagemeister
dda620e88c [radiobremen] Make code more readable and more resilient to failures 2015-01-05 18:17:03 +01:00
Philipp Hagemeister
d7cc31b63e [generic] PEP8 2015-01-05 18:16:47 +01:00
Philipp Hagemeister
5e3e1c82d8 Credit @ckrooss for radiobremen (#4632) 2015-01-05 18:14:39 +01:00
Philipp Hagemeister
aa80652f47 [radiobremen] Add test for thumbnail 2015-01-05 18:14:09 +01:00
Philipp Hagemeister
9d247bbd2d [radiobremen] Fix under Python 2.6 and fix duration 2015-01-05 18:13:19 +01:00
Philipp Hagemeister
93e40a7b2f Merge remote-tracking branch 'ckrooss/master' 2015-01-05 18:07:16 +01:00
oteng
03ff2cc1c4 [Auengine] corrected extractions logic
The way the video download url was been extracted was
not working well so i change it for it to extract the
correct url
2015-01-05 16:28:24 +00:00
Jaime Marquínez Ferrándiz
a285b6377b [normalboots] Skip download in test, it uses rtmp 2015-01-05 13:59:49 +01:00
Jaime Marquínez Ferrándiz
cd791a5ea0 [ted] Add support for embed-ssl.ted.com embedded videos 2015-01-05 13:11:13 +01:00
Jaime Marquínez Ferrándiz
87830900a9 [generic] Update some tests 2015-01-05 13:07:24 +01:00
Jaime Marquínez Ferrándiz
dfc9d9f50a Merge pull request #4639 from bartkappenburg/patch-1
Update rtlnl.py
2015-01-05 12:31:07 +01:00
Jaime Marquínez Ferrándiz
75311a7e16 .travis.yml: Remove my email from the list 2015-01-05 12:29:32 +01:00
Jaime Marquínez Ferrándiz
628bc4d1e7 [khanacademy] Update test 2015-01-05 12:28:35 +01:00
Jaime Marquínez Ferrándiz
a4c3f48639 [vimple] Replace tests
The first one seems to be no longer available and the second was an episode from a tv show.
2015-01-05 11:54:14 +01:00
Bart Kappenburg
bdf80aa542 Update rtlnl.py
Added support for the non-www version of rtlxl.nl by making "www." optional.
2015-01-05 11:51:24 +01:00
Naglis Jonaitis
adf3c58ad3 [lrt] Fix missing provider key
Also, modernize a bit.
2015-01-05 02:55:12 +02:00
Naglis Jonaitis
caf90bfaa5 [webofstories] Add new extractor (Closes #4585) 2015-01-05 02:22:01 +02:00
Jaime Marquínez Ferrándiz
2f985f4bb4 [youtube:toplist] Remove extractor
They use now normal playlists (their id is PL*).
2015-01-05 00:18:43 +01:00
Philipp Hagemeister
67c2bcdf4c Remove extractors which infringe copyright (#4554) 2015-01-04 19:19:18 +01:00
Jaime Marquínez Ferrándiz
1d2d0e3ff2 utils: Remove blank line at the end of file 2015-01-04 14:07:06 +01:00
Jaime Marquínez Ferrándiz
9fda6ee39f [tf1] Remove unused import 2015-01-04 14:06:23 +01:00
Jaime Marquínez Ferrándiz
bc3e582fe4 Don't use '-shortest' option for merging formats (closes #4220, closes #4580)
With avconv and older versions of ffmpeg the video is partially copied.
The duration difference between the audio and the video seem to be really small, so it's probably not noticeable.
2015-01-04 14:02:17 +01:00
Christopher Krooss
bc1fc5ddbc Don't check for height as it's not provided 2015-01-04 14:02:07 +01:00
Jaime Marquínez Ferrándiz
63948fc62c [downloader/hls] Respect the 'prefer_ffmpeg' option 2015-01-04 13:41:49 +01:00
Christopher Krooss
f4858a7103 Add support for Radio Bremen 2015-01-04 13:33:26 +01:00
Philipp Hagemeister
26886e6140 release 2015.01.04 2015-01-04 03:15:48 +01:00
Philipp Hagemeister
7a1818c99b [vk] Add support for rutube embeds (Fixes #4514) 2015-01-04 03:15:27 +01:00
Philipp Hagemeister
2ccd1b10e5 [soulanime] Fix under Python 3 2015-01-04 02:20:45 +01:00
Philipp Hagemeister
788fa208c8 Merge branch 'master' of github.com:rg3/youtube-dl 2015-01-04 02:08:38 +01:00
Philipp Hagemeister
8848314c08 [Makefile] Make offline tests actually work offline 2015-01-04 02:08:18 +01:00
Philipp Hagemeister
c11125f9ed [tests] Remove format 138 from tests (#4559) 2015-01-04 02:06:53 +01:00
Philipp Hagemeister
95ceeec722 Remove unused import 2015-01-04 02:05:35 +01:00
Philipp Hagemeister
b68ff25917 Add various anime sites (Closes #4554) 2015-01-04 02:05:26 +01:00
Sergey M.
3e3327ea17 Merge pull request #4629 from t0mm0/tf1-tfou
[tf1] add support for TFOU
2015-01-04 06:51:28 +06:00
t0mm0
b158bb8693 [tf1] simplify regex 2015-01-04 00:45:23 +00:00
t0mm0
2bf098eda4 [tf1] fix test 2015-01-04 00:43:55 +00:00
t0mm0
382e05fa56 [tf1] add support for TFOU 2015-01-04 00:05:31 +00:00
Philipp Hagemeister
19b05d886e release 2015.01.03 2015-01-03 18:35:30 +01:00
Philipp Hagemeister
e65566a9cc [youtube] Correct handling when DASH manifest is not necessary to find all formats 2015-01-03 18:33:38 +01:00
Sergey M․
baa3c3f0f6 [ellentv] Improve extraction 2015-01-03 21:54:18 +06:00
Sergey M․
f4f339529c [ellentv] Clean up and simplify 2015-01-03 21:44:47 +06:00
Sergey M.
7d02fae85b Merge pull request #4626 from gauravb7090/ellentube
Added support for EllenTube along with EllenTV
2015-01-03 21:40:39 +06:00
Gaurav
6e46c3f1fd Added support for EllenTube along with EllenTV 2015-01-03 20:30:28 +05:30
Sergey M․
c7e675940c [bbccouk] Add support for music clips (Closes #4143) 2015-01-03 20:43:40 +06:00
Jaime Marquínez Ferrándiz
d26b1317ed [downloader/mplayer] Use check_executable 2015-01-03 00:33:36 +01:00
Jaime Marquínez Ferrándiz
a221f22969 [crunchyroll] Fix format extraction
Reported in https://github.com/rg3/youtube-dl/issues/2782#issuecomment-68556780
2015-01-02 21:17:10 +01:00
Jaime Marquínez Ferrándiz
817f786fbb [canalplus] Raise an error if the video is georestricted (closes #4472) 2015-01-02 21:02:34 +01:00
Sergey M․
62420c73cb [played] Skip test 2015-01-02 22:31:55 +06:00
Sergey M․
2522a0b7da [kontrtube] Extract display_id
Trailing slash in URL is mandatory now
2015-01-02 22:28:48 +06:00
Sergey M․
46d32a12c9 [bet] Update test 2015-01-02 22:23:00 +06:00
Sergey M․
c491418526 [bbccouk] Update test 2015-01-02 22:13:26 +06:00
Ondřej Caletka
c067545c17 ceskatelevize: Closed captions support 2015-01-02 17:12:20 +01:00
Sergey M․
823a155293 [vier:videos] Tune _VALID_URL not to match single videos 2015-01-02 22:09:00 +06:00
Sergey M․
324b2c78fa [xtube] Fix uploader regex 2015-01-02 21:46:57 +06:00
Sergey M․
d34f98289b [xhamster] Remove identical tests 2015-01-02 21:12:25 +06:00
Sergey M.
644096b15c Merge pull request #4615 from dwemthy/https_xhamster
[xhamster] Add HTTPS support
2015-01-02 21:09:28 +06:00
Sergey M․
15cebcc363 Merge branch 'master' of github.com:rg3/youtube-dl 2015-01-02 20:57:12 +06:00
Sergey M․
faa4ea68c0 [generic] Add BBC iPlayer playlist test 2015-01-02 20:56:42 +06:00
Sergey M․
476eae0c2a [generic] Generalize BBC iPlayer playlist extraction 2015-01-02 20:55:09 +06:00
Sergey M․
8399267671 [generic] Make getter None by default 2015-01-02 20:54:30 +06:00
dwemthy
5b9aefef77 [xhamster] Add HTTPS support 2015-01-02 11:54:38 +00:00
Your Name
defaf19f5d Push api updates to simplify audiomack, add support for albums 2015-01-02 02:20:04 -06:00
netanel
754f0008ec fix increment operator 2014-12-06 09:20:35 +02:00
Tithen-Firion
2415951ead [tvp] Modernize 2014-12-04 14:16:09 +01:00
Tithen-Firion
995ad69c54 [common] Add new parameters for _download_webpage 2014-12-04 14:16:09 +01:00
Tithen-Firion
225e4b9633 [tvp] Remove unnecessary code 2014-12-04 14:16:09 +01:00
Tithen-Firion
6ce2c6783b [tvp] Add extractor 2014-12-04 05:14:09 +01:00
Tithen-Firion
29f400b97d [tvp] Update extractor 2014-12-04 02:54:25 +01:00
Ivan Kozik
0cd64bd077 Copy the mtime from the oldest source file to the file created by ffmpeg
Fixes #4245
2014-11-20 06:39:07 +00:00
dinesh
0551a02b82 [Rte] Improve extractor 2014-10-27 01:08:51 +05:30
dinesh
25fadd06d0 [Rte] New extractor added 2014-10-24 09:49:01 +05:30
dinesh
7a47d07c6d [extractor/common] href attribute added 2014-10-24 09:47:39 +05:30
dinesh
34e48bed3b [extractor/common] Added support for f4m manifest Version 2.0 2014-10-24 02:41:10 +05:30
net
7b61ac3ddf Fix #2310. Play by the 8tracks rules 2014-10-15 06:46:47 +03:00
net
c816336cbd [karaoketv] Add new extractor 2014-09-29 21:58:42 +03:00
109 changed files with 3553 additions and 871 deletions

2
.gitignore vendored
View File

@@ -31,3 +31,5 @@ updates_key.pem
test/testdata
.tox
youtube-dl.zsh
.idea
.idea/*

View File

@@ -9,7 +9,6 @@ notifications:
email:
- filippo.valsorda@gmail.com
- phihag@phihag.de
- jaime.marquinez.ferrandiz+travis@gmail.com
- yasoob.khld@gmail.com
# irc:
# channels:

View File

@@ -98,3 +98,9 @@ Will Glynn
Max Reimann
Cédric Luthi
Thijs Vermeir
Joel Leclerc
Christopher Krooss
Ondřej Caletka
Dinesh S
Johan K. Jensen
Yen Chi Hsuan

View File

@@ -44,7 +44,7 @@ In particular, every site support request issue should only pertain to services
### Is anyone going to need the feature?
Only post features that you (or an incapicated friend you can personally talk to) require. Do not post features because they seem like a good idea. If they are really useful, they will be requested by someone who requires them.
Only post features that you (or an incapacitated friend you can personally talk to) require. Do not post features because they seem like a good idea. If they are really useful, they will be requested by someone who requires them.
### Is your question about youtube-dl?

View File

@@ -46,7 +46,7 @@ test:
ot: offlinetest
offlinetest: codetest
nosetests --verbose test --exclude test_download --exclude test_age_restriction --exclude test_subtitles --exclude test_write_annotations
nosetests --verbose test --exclude test_download --exclude test_age_restriction --exclude test_subtitles --exclude test_write_annotations --exclude test_youtube_lists
tar: youtube-dl.tar.gz
@@ -63,7 +63,7 @@ youtube-dl: youtube_dl/*.py youtube_dl/*/*.py
chmod a+x youtube-dl
README.md: youtube_dl/*.py youtube_dl/*/*.py
COLUMNS=80 python -m youtube_dl --help | python devscripts/make_readme.py
COLUMNS=80 python youtube_dl/__main__.py --help | python devscripts/make_readme.py
CONTRIBUTING.md: README.md
python devscripts/make_contributing.py README.md CONTRIBUTING.md

View File

@@ -60,10 +60,6 @@ which means you can modify it, redistribute it or use it however you like.
they would handle
--extractor-descriptions Output descriptions of all supported
extractors
--proxy URL Use the specified HTTP/HTTPS proxy. Pass in
an empty string (--proxy "") for direct
connection
--socket-timeout None Time to wait before giving up, in seconds
--default-search PREFIX Use this prefix for unqualified URLs. For
example "gvsearch2:" downloads two videos
from google videos for youtube-dl "large
@@ -82,6 +78,18 @@ which means you can modify it, redistribute it or use it however you like.
--flat-playlist Do not extract the videos of a playlist,
only list them.
## Network Options:
--proxy URL Use the specified HTTP/HTTPS proxy. Pass in
an empty string (--proxy "") for direct
connection
--socket-timeout SECONDS Time to wait before giving up, in seconds
--source-address IP Client-side IP address to bind to
(experimental)
-4, --force-ipv4 Make all connections via IPv4
(experimental)
-6, --force-ipv6 Make all connections via IPv6
(experimental)
## Video Selection:
--playlist-start NUMBER playlist video to start at (default is 1)
--playlist-end NUMBER playlist video to end at (default is last)
@@ -219,6 +227,8 @@ which means you can modify it, redistribute it or use it however you like.
for each command-line argument. If the URL
refers to a playlist, dump the whole
playlist information in a single line.
--print-json Be quiet and print the video information as
JSON (video is still being downloaded).
--newline output progress bar as new lines
--no-progress do not print progress bar
--console-title display progress in console titlebar
@@ -229,6 +239,10 @@ which means you can modify it, redistribute it or use it however you like.
files in the current directory to debug
problems
--print-traffic Display sent and read HTTP traffic
-C, --call-home Contact the youtube-dl server for
debugging.
--no-call-home Do NOT contact the youtube-dl server for
debugging.
## Workarounds:
--encoding ENCODING Force the specified encoding (experimental)
@@ -248,11 +262,24 @@ which means you can modify it, redistribute it or use it however you like.
## Video Format Options:
-f, --format FORMAT video format code, specify the order of
preference using slashes: -f 22/17/18 . -f
mp4 , -f m4a and -f flv are also
supported. You can also use the special
names "best", "bestvideo", "bestaudio",
"worst", "worstvideo" and "worstaudio". By
preference using slashes, as in -f 22/17/18
. Instead of format codes, you can select
by extension for the extensions aac, m4a,
mp3, mp4, ogg, wav, webm. You can also use
the special names "best", "bestvideo",
"bestaudio", "worst". You can filter the
video results by putting a condition in
brackets, as in -f "best[height=720]" (or
-f "[filesize>10M]"). This works for
filesize, height, width, tbr, abr, and vbr
and the comparisons <, <=, >, >=, =, != .
Formats for which the value is not known
are excluded unless you put a question mark
(?) after the operator. You can combine
format filters, so -f "[height <=?
720][tbr>500]" selects up to 720p videos
(or videos where the height is not known)
with a bitrate of at least 500 KBit/s. By
default, youtube-dl will pick the best
quality. Use commas to download multiple
audio formats, such as -f
@@ -269,6 +296,10 @@ which means you can modify it, redistribute it or use it however you like.
-F, --list-formats list all available formats
--youtube-skip-dash-manifest Do not download the DASH manifest on
YouTube videos
--merge-output-format FORMAT If a merge is required (e.g.
bestvideo+bestaudio), output to given
container format. One of mkv, mp4, ogg,
webm, flv.Ignored if no merge is required
## Subtitle Options:
--write-sub write subtitle file
@@ -285,7 +316,8 @@ which means you can modify it, redistribute it or use it however you like.
## Authentication Options:
-u, --username USERNAME login with this account ID
-p, --password PASSWORD account password
-p, --password PASSWORD account password. If this option is left
out, youtube-dl will ask interactively.
-2, --twofactor TWOFACTOR two-factor auth code
-n, --netrc use .netrc authentication data
--video-password PASSWORD video password (vimeo, smotri)
@@ -315,6 +347,11 @@ which means you can modify it, redistribute it or use it however you like.
--add-metadata write metadata to the video file
--xattrs write metadata to the video file's xattrs
(using dublin core and xdg standards)
--fixup POLICY (experimental) Automatically correct known
faults of the file. One of never (do
nothing), warn (only emit a warning),
detect_or_warn(check whether we can do
anything about it, warn otherwise
--prefer-avconv Prefer avconv over ffmpeg for running the
postprocessors (default)
--prefer-ffmpeg Prefer ffmpeg over avconv for running the
@@ -326,7 +363,7 @@ which means you can modify it, redistribute it or use it however you like.
# CONFIGURATION
You can configure youtube-dl by placing default arguments (such as `--extract-audio --no-mtime` to always extract the audio and not copy the mtime) into `/etc/youtube-dl.conf` and/or `~/.config/youtube-dl/config`. On Windows, the configuration file locations are `%APPDATA%\youtube-dl\config.txt` and `C:\Users\<Yourname>\youtube-dl.conf`.
You can configure youtube-dl by placing default arguments (such as `--extract-audio --no-mtime` to always extract the audio and not copy the mtime) into `/etc/youtube-dl.conf` and/or `~/.config/youtube-dl/config`. On Windows, the configuration file locations are `%APPDATA%\youtube-dl\config.txt` and `C:\Users\<user name>\youtube-dl.conf`.
# OUTPUT TEMPLATE
@@ -420,9 +457,15 @@ Apparently YouTube requires you to pass a CAPTCHA test if you download too much.
Once the video is fully downloaded, use any video player, such as [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/).
### The links provided by youtube-dl -g are not working anymore
### I extracted a video URL with -g, but it does not play on another machine / in my webbrowser.
The URLs youtube-dl outputs require the downloader to have the correct cookies. Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl.
It depends a lot on the service. In many cases, requests for the video (to download/play it) must come from the same IP address and with the same cookies. Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl.
It may be beneficial to use IPv6; in some cases, the restrictions are only applied to IPv4. Some services (sometimes only for a subset of videos) do not restrict the video URL by IP address, cookie, or user-agent, but these are the exception rather than the rule.
Please bear in mind that some URL protocols are **not** supported by browsers out of the box, including RTMP. If you are using -g, your own downloader must support these as well.
If you want to play the video on a machine that is not running youtube-dl, you can relay the video content from the machine that runs youtube-dl. You can use `-o -` to let youtube-dl stream a video to stdout, or simply allow the player to download the files written by youtube-dl in turn.
### ERROR: no fmt_url_map or conn information found in video info
@@ -449,6 +492,18 @@ Since June 2012 (#342) youtube-dl is packed as an executable zipfile, simply unz
To run the exe you need to install first the [Microsoft Visual C++ 2008 Redistributable Package](http://www.microsoft.com/en-us/download/details.aspx?id=29).
### On Windows, how should I set up ffmpeg and youtube-dl? Where should I put the exe files?
If you put youtube-dl and ffmpeg in the same directory that you're running the command from, it will work, but that's rather cumbersome.
To make a different directory work - either for ffmpeg, or for youtube-dl, or for both - simply create the directory (say, `C:\bin`, or `C:\Users\<User name>\bin`), put all the executables directly in there, and then [set your PATH environment variable](https://www.java.com/en/download/help/path.xml) to include that directory.
From then on, after restarting your shell, you will be able to access both youtube-dl and ffmpeg (and youtube-dl will be able to find ffmpeg) by simply typing `youtube-dl` or `ffmpeg`, no matter what directory you're in.
### How do I put downloads into a specific folder?
Use the `-o` to specify an [output template](#output-template), for example `-o "/home/user/videos/%(title)s-%(id)s.%(ext)s"`. If you want this for all of your downloads, put the option into your [configuration file](#configuration).
### How can I detect whether a given URL is supported by youtube-dl?
For one, have a look at the [list of supported sites](docs/supportedsites). Note that it can sometimes happen that the site changes its URL scheme (say, from http://example.com/v/1234567 to http://example.com/v/1234567 ) and youtube-dl reports an URL of a service in that list as unsupported. In that case, simply report a bug.
@@ -597,7 +652,9 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl:
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues> . Unless you were prompted so or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the irc channel #youtube-dl on freenode.
Please include the full output of the command when run with `--verbose`. The output (including the first lines) contain important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
**Please include the full output of youtube-dl when run with `-v`**.
The output (including the first lines) contain important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
Please re-read your issue once again to avoid a couple of common mistakes (you can and should use this as a checklist):

View File

@@ -16,7 +16,7 @@ def main():
template = tmplf.read()
ie_htmls = []
for ie in sorted(youtube_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower()):
for ie in youtube_dl.list_extractors(age_limit=None):
ie_html = '<b>{}</b>'.format(ie.IE_NAME)
ie_desc = getattr(ie, 'IE_DESC', None)
if ie_desc is False:

View File

@@ -23,12 +23,12 @@ def main():
def gen_ies_md(ies):
for ie in ies:
ie_md = '**{}**'.format(ie.IE_NAME)
ie_md = '**{0}**'.format(ie.IE_NAME)
ie_desc = getattr(ie, 'IE_DESC', None)
if ie_desc is False:
continue
if ie_desc is not None:
ie_md += ': {}'.format(ie.IE_DESC)
ie_md += ': {0}'.format(ie.IE_DESC)
if not ie.working():
ie_md += ' (Currently broken)'
yield ie_md

View File

@@ -82,18 +82,8 @@ class FakeYDL(YoutubeDL):
def gettestcases(include_onlymatching=False):
for ie in youtube_dl.extractor.gen_extractors():
t = getattr(ie, '_TEST', None)
if t:
assert not hasattr(ie, '_TESTS'), \
'%s has _TEST and _TESTS' % type(ie).__name__
tests = [t]
else:
tests = getattr(ie, '_TESTS', [])
for t in tests:
if not include_onlymatching and t.get('only_matching', False):
continue
t['name'] = type(ie).__name__[:-len('IE')]
yield t
for tc in ie.get_testcases(include_onlymatching):
yield tc
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
@@ -120,6 +110,20 @@ def expect_info_dict(self, got_dict, expected_dict):
else:
if isinstance(expected, compat_str) and expected.startswith('md5:'):
got = 'md5:' + md5(got_dict.get(info_field))
elif isinstance(expected, compat_str) and expected.startswith('mincount:'):
got = got_dict.get(info_field)
self.assertTrue(
isinstance(got, list),
'Expected field %s to be a list, but it is of type %s' % (
info_field, type(got).__name__))
expected_num = int(expected.partition(':')[2])
assertGreaterEqual(
self, len(got), expected_num,
'Expected %d items in field %s, but only got %d' % (
expected_num, info_field, len(got)
)
)
continue
else:
got = got_dict.get(info_field)
self.assertEqual(expected, got,

View File

@@ -40,5 +40,23 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._og_search_description(html), 'Some video\'s description ')
self.assertEqual(ie._og_search_thumbnail(html), 'http://domain.com/pic.jpg?key1=val1&key2=val2')
def test_html_search_meta(self):
ie = self.ie
html = '''
<meta name="a" content="1" />
<meta name='b' content='2'>
<meta name="c" content='3'>
<meta name=d content='4'>
<meta property="e" content='5' >
<meta content="6" name="f">
'''
self.assertEqual(ie._html_search_meta('a', html), '1')
self.assertEqual(ie._html_search_meta('b', html), '2')
self.assertEqual(ie._html_search_meta('c', html), '3')
self.assertEqual(ie._html_search_meta('d', html), '4')
self.assertEqual(ie._html_search_meta('e', html), '5')
self.assertEqual(ie._html_search_meta('f', html), '6')
if __name__ == '__main__':
unittest.main()

View File

@@ -8,6 +8,8 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import copy
from test.helper import FakeYDL, assertRegexpMatches
from youtube_dl import YoutubeDL
from youtube_dl.extractor import YoutubeIE
@@ -192,6 +194,37 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'vid-high')
def test_format_selection_audio_exts(self):
formats = [
{'format_id': 'mp3-64', 'ext': 'mp3', 'abr': 64, 'url': 'http://_', 'vcodec': 'none'},
{'format_id': 'ogg-64', 'ext': 'ogg', 'abr': 64, 'url': 'http://_', 'vcodec': 'none'},
{'format_id': 'aac-64', 'ext': 'aac', 'abr': 64, 'url': 'http://_', 'vcodec': 'none'},
{'format_id': 'mp3-32', 'ext': 'mp3', 'abr': 32, 'url': 'http://_', 'vcodec': 'none'},
{'format_id': 'aac-32', 'ext': 'aac', 'abr': 32, 'url': 'http://_', 'vcodec': 'none'},
]
info_dict = _make_result(formats)
ydl = YDL({'format': 'best'})
ie = YoutubeIE(ydl)
ie._sort_formats(info_dict['formats'])
ydl.process_ie_result(copy.deepcopy(info_dict))
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'aac-64')
ydl = YDL({'format': 'mp3'})
ie = YoutubeIE(ydl)
ie._sort_formats(info_dict['formats'])
ydl.process_ie_result(copy.deepcopy(info_dict))
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'mp3-64')
ydl = YDL({'prefer_free_formats': True})
ie = YoutubeIE(ydl)
ie._sort_formats(info_dict['formats'])
ydl.process_ie_result(copy.deepcopy(info_dict))
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'ogg-64')
def test_format_selection_video(self):
formats = [
{'format_id': 'dash-video-low', 'ext': 'mp4', 'preference': 1, 'acodec': 'none', 'url': '_'},
@@ -218,7 +251,7 @@ class TestFormatSelection(unittest.TestCase):
# 3D
'85', '84', '102', '83', '101', '82', '100',
# Dash video
'138', '137', '248', '136', '247', '135', '246',
'137', '248', '136', '247', '135', '246',
'245', '244', '134', '243', '133', '242', '160',
# Dash audio
'141', '172', '140', '171', '139',
@@ -248,6 +281,61 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], f1id)
def test_format_filtering(self):
formats = [
{'format_id': 'A', 'filesize': 500, 'width': 1000},
{'format_id': 'B', 'filesize': 1000, 'width': 500},
{'format_id': 'C', 'filesize': 1000, 'width': 400},
{'format_id': 'D', 'filesize': 2000, 'width': 600},
{'format_id': 'E', 'filesize': 3000},
{'format_id': 'F'},
{'format_id': 'G', 'filesize': 1000000},
]
for f in formats:
f['url'] = 'http://_/'
f['ext'] = 'unknown'
info_dict = _make_result(formats)
ydl = YDL({'format': 'best[filesize<3000]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'D')
ydl = YDL({'format': 'best[filesize<=3000]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'E')
ydl = YDL({'format': 'best[filesize <= ? 3000]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'F')
ydl = YDL({'format': 'best [filesize = 1000] [width>450]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'B')
ydl = YDL({'format': 'best [filesize = 1000] [width!=450]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'C')
ydl = YDL({'format': '[filesize>?1]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'G')
ydl = YDL({'format': '[filesize<1M]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'E')
ydl = YDL({'format': '[filesize<1MiB]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'G')
def test_add_extra_info(self):
test_dict = {
'extractor': 'Foo',

View File

@@ -45,11 +45,6 @@ class TestAgeRestriction(unittest.TestCase):
'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'505835.mp4', 2, old_age=25)
def test_pornotube(self):
self._assert_restricted(
'http://pornotube.com/c/173/m/1689755/Marilyn-Monroe-Bathing',
'1689755.flv', 13)
if __name__ == '__main__':
unittest.main()

View File

@@ -14,7 +14,6 @@ from test.helper import gettestcases
from youtube_dl.extractor import (
FacebookIE,
gen_extractors,
TwitchIE,
YoutubeIE,
)
@@ -72,18 +71,6 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_twitch_channelid_matching(self):
self.assertTrue(TwitchIE.suitable('twitch.tv/vanillatv'))
self.assertTrue(TwitchIE.suitable('www.twitch.tv/vanillatv'))
self.assertTrue(TwitchIE.suitable('http://www.twitch.tv/vanillatv'))
self.assertTrue(TwitchIE.suitable('http://www.twitch.tv/vanillatv/'))
def test_twitch_videoid_matching(self):
self.assertTrue(TwitchIE.suitable('http://www.twitch.tv/vanillatv/b/328087483'))
def test_twitch_chapterid_matching(self):
self.assertTrue(TwitchIE.suitable('http://www.twitch.tv/tsm_theoddone/c/2349361'))
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
@@ -115,8 +102,6 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch(':ythistory', ['youtube:history'])
self.assertMatch(':thedailyshow', ['ComedyCentralShows'])
self.assertMatch(':tds', ['ComedyCentralShows'])
self.assertMatch(':colbertreport', ['ComedyCentralShows'])
self.assertMatch(':cr', ['ComedyCentralShows'])
def test_vimeo_matching(self):
self.assertMatch('http://vimeo.com/channels/tributes', ['vimeo:channel'])

View File

@@ -17,6 +17,7 @@ from youtube_dl.extractor import (
TEDIE,
VimeoIE,
WallaIE,
CeskaTelevizeIE,
)
@@ -317,5 +318,32 @@ class TestWallaSubtitles(BaseTestSubtitles):
self.assertEqual(len(subtitles), 0)
class TestCeskaTelevizeSubtitles(BaseTestSubtitles):
url = 'http://www.ceskatelevize.cz/ivysilani/10600540290-u6-uzasny-svet-techniky'
IE = CeskaTelevizeIE
def test_list_subtitles(self):
self.DL.expect_warning('Automatic Captions not supported by this server')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_allsubtitles(self):
self.DL.expect_warning('Automatic Captions not supported by this server')
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['cs']))
self.assertEqual(md5(subtitles['cs']), '9bf52d9549533c32c427e264bf0847d4')
def test_nosubtitles(self):
self.DL.expect_warning('video doesn\'t have subtitles')
self.url = 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
if __name__ == '__main__':
unittest.main()

View File

@@ -16,6 +16,7 @@ import json
import xml.etree.ElementTree
from youtube_dl.utils import (
age_restricted,
args_to_str,
clean_html,
DateRange,
@@ -78,6 +79,10 @@ class TestUtil(unittest.TestCase):
tests = '\u043a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430'
self.assertEqual(sanitize_filename(tests), tests)
self.assertEqual(
sanitize_filename('New World record at 0:12:34'),
'New World record at 0_12_34')
forbidden = '"\0\\/'
for fc in forbidden:
for fbc in forbidden:
@@ -143,6 +148,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_strdate('8/7/2009'), '20090708')
self.assertEqual(unified_strdate('Dec 14, 2012'), '20121214')
self.assertEqual(unified_strdate('2012/10/11 01:56:38 +0000'), '20121011')
self.assertEqual(unified_strdate('1968 12 10'), '19681210')
self.assertEqual(unified_strdate('1968-12-10'), '19681210')
self.assertEqual(unified_strdate('28/01/2014 21:00:00 +0100'), '20140128')
self.assertEqual(
@@ -207,6 +213,8 @@ class TestUtil(unittest.TestCase):
def test_parse_duration(self):
self.assertEqual(parse_duration(None), None)
self.assertEqual(parse_duration(False), None)
self.assertEqual(parse_duration('invalid'), None)
self.assertEqual(parse_duration('1'), 1)
self.assertEqual(parse_duration('1337:12'), 80232)
self.assertEqual(parse_duration('9:12:43'), 33163)
@@ -402,5 +410,12 @@ Trying to open render node...
Success at /dev/dri/renderD128.
ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
def test_age_restricted(self):
self.assertFalse(age_restricted(None, 10)) # unrestricted content
self.assertFalse(age_restricted(1, None)) # unrestricted policy
self.assertFalse(age_restricted(8, 10))
self.assertTrue(age_restricted(18, 14))
self.assertFalse(age_restricted(18, 18))
if __name__ == '__main__':
unittest.main()

View File

@@ -10,6 +10,7 @@ import io
import itertools
import json
import locale
import operator
import os
import platform
import re
@@ -49,6 +50,7 @@ from .utils import (
make_HTTPS_handler,
MaxDownloadsReached,
PagedList,
parse_filesize,
PostProcessingError,
platform_name,
preferredencoding,
@@ -58,17 +60,20 @@ from .utils import (
takewhile_inclusive,
UnavailableVideoError,
url_basename,
version_tuple,
write_json_file,
write_string,
YoutubeDLHandler,
prepend_extension,
args_to_str,
age_restricted,
)
from .cache import Cache
from .extractor import get_info_extractor, gen_extractors
from .downloader import get_suitable_downloader
from .downloader.rtmp import rtmpdump_version
from .postprocessor import (
FFmpegFixupStretchedPP,
FFmpegMergerPP,
FFmpegPostProcessor,
get_postprocessor,
@@ -202,6 +207,16 @@ class YoutubeDL(object):
Progress hooks are guaranteed to be called at least once
(with status "finished") if the download is successful.
merge_output_format: Extension to use when merging formats.
fixup: Automatically correct known faults of the file.
One of:
- "never": do nothing
- "warn": only emit a warning
- "detect_or_warn": check whether we can do anything
about it, warn otherwise
source_address: (Experimental) Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the
youtube-dl servers for debugging.
The following parameters are not used by YoutubeDL itself, they are used by
@@ -550,13 +565,8 @@ class YoutubeDL(object):
max_views = self.params.get('max_views')
if max_views is not None and view_count > max_views:
return 'Skipping %s, because it has exceeded the maximum view count (%d/%d)' % (video_title, view_count, max_views)
age_limit = self.params.get('age_limit')
if age_limit is not None:
actual_age_limit = info_dict.get('age_limit')
if actual_age_limit is None:
actual_age_limit = 0
if age_limit < actual_age_limit:
return 'Skipping "' + title + '" because it is age restricted'
if age_restricted(info_dict.get('age_limit'), self.params.get('age_limit')):
return 'Skipping "%s" because it is age restricted' % title
if self.in_download_archive(info_dict):
return '%s has already been recorded in archive' % video_title
return None
@@ -760,7 +770,59 @@ class YoutubeDL(object):
else:
raise Exception('Invalid result type: %s' % result_type)
def _apply_format_filter(self, format_spec, available_formats):
" Returns a tuple of the remaining format_spec and filtered formats "
OPERATORS = {
'<': operator.lt,
'<=': operator.le,
'>': operator.gt,
'>=': operator.ge,
'=': operator.eq,
'!=': operator.ne,
}
operator_rex = re.compile(r'''(?x)\s*\[
(?P<key>width|height|tbr|abr|vbr|filesize)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?P<value>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)
\]$
''' % '|'.join(map(re.escape, OPERATORS.keys())))
m = operator_rex.search(format_spec)
if not m:
raise ValueError('Invalid format specification %r' % format_spec)
try:
comparison_value = int(m.group('value'))
except ValueError:
comparison_value = parse_filesize(m.group('value'))
if comparison_value is None:
comparison_value = parse_filesize(m.group('value') + 'B')
if comparison_value is None:
raise ValueError(
'Invalid value %r in format specification %r' % (
m.group('value'), format_spec))
op = OPERATORS[m.group('op')]
def _filter(f):
actual_value = f.get(m.group('key'))
if actual_value is None:
return m.group('none_inclusive')
return op(actual_value, comparison_value)
new_formats = [f for f in available_formats if _filter(f)]
new_format_spec = format_spec[:-len(m.group(0))]
if not new_format_spec:
new_format_spec = 'best'
return (new_format_spec, new_formats)
def select_format(self, format_spec, available_formats):
while format_spec.endswith(']'):
format_spec, available_formats = self._apply_format_filter(
format_spec, available_formats)
if not available_formats:
return None
if format_spec == 'best' or format_spec is None:
return available_formats[-1]
elif format_spec == 'worst':
@@ -790,7 +852,7 @@ class YoutubeDL(object):
if video_formats:
return video_formats[0]
else:
extensions = ['mp4', 'flv', 'webm', '3gp', 'm4a']
extensions = ['mp4', 'flv', 'webm', '3gp', 'm4a', 'mp3', 'ogg', 'aac', 'wav']
if format_spec in extensions:
filter_f = lambda f: f['ext'] == format_spec
else:
@@ -913,10 +975,24 @@ class YoutubeDL(object):
'contain the video, try using '
'"-f %s+%s"' % (format_2, format_1))
return
output_ext = (
formats_info[0]['ext']
if self.params.get('merge_output_format') is None
else self.params['merge_output_format'])
selected_format = {
'requested_formats': formats_info,
'format': rf,
'ext': formats_info[0]['ext'],
'width': formats_info[0].get('width'),
'height': formats_info[0].get('height'),
'resolution': formats_info[0].get('resolution'),
'fps': formats_info[0].get('fps'),
'vcodec': formats_info[0].get('vcodec'),
'vbr': formats_info[0].get('vbr'),
'stretched_ratio': formats_info[0].get('stretched_ratio'),
'acodec': formats_info[1].get('acodec'),
'abr': formats_info[1].get('abr'),
'ext': output_ext,
}
else:
selected_format = None
@@ -1099,51 +1175,69 @@ class YoutubeDL(object):
(info_dict['thumbnail'], compat_str(err)))
if not self.params.get('skip_download', False):
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(filename)):
success = True
else:
try:
def dl(name, info):
fd = get_suitable_downloader(info)(self, self.params)
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
if self.params.get('verbose'):
self.to_stdout('[debug] Invoking downloader on %r' % info.get('url'))
return fd.download(name, info)
if info_dict.get('requested_formats') is not None:
downloaded = []
success = True
merger = FFmpegMergerPP(self, not self.params.get('keepvideo'))
if not merger._executable:
postprocessors = []
self.report_warning('You have requested multiple '
'formats but ffmpeg or avconv are not installed.'
' The formats won\'t be merged')
else:
postprocessors = [merger]
for f in info_dict['requested_formats']:
new_info = dict(info_dict)
new_info.update(f)
fname = self.prepare_filename(new_info)
fname = prepend_extension(fname, 'f%s' % f['format_id'])
downloaded.append(fname)
partial_success = dl(fname, new_info)
success = success and partial_success
info_dict['__postprocessors'] = postprocessors
info_dict['__files_to_merge'] = downloaded
try:
def dl(name, info):
fd = get_suitable_downloader(info)(self, self.params)
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
if self.params.get('verbose'):
self.to_stdout('[debug] Invoking downloader on %r' % info.get('url'))
return fd.download(name, info)
if info_dict.get('requested_formats') is not None:
downloaded = []
success = True
merger = FFmpegMergerPP(self, not self.params.get('keepvideo'))
if not merger._executable:
postprocessors = []
self.report_warning('You have requested multiple '
'formats but ffmpeg or avconv are not installed.'
' The formats won\'t be merged')
else:
# Just a single file
success = dl(filename, info_dict)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_error('unable to download video data: %s' % str(err))
return
except (OSError, IOError) as err:
raise UnavailableVideoError(err)
except (ContentTooShortError, ) as err:
self.report_error('content too short (expected %s bytes and served %s)' % (err.expected, err.downloaded))
return
postprocessors = [merger]
for f in info_dict['requested_formats']:
new_info = dict(info_dict)
new_info.update(f)
fname = self.prepare_filename(new_info)
fname = prepend_extension(fname, 'f%s' % f['format_id'])
downloaded.append(fname)
partial_success = dl(fname, new_info)
success = success and partial_success
info_dict['__postprocessors'] = postprocessors
info_dict['__files_to_merge'] = downloaded
else:
# Just a single file
success = dl(filename, info_dict)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_error('unable to download video data: %s' % str(err))
return
except (OSError, IOError) as err:
raise UnavailableVideoError(err)
except (ContentTooShortError, ) as err:
self.report_error('content too short (expected %s bytes and served %s)' % (err.expected, err.downloaded))
return
if success:
# Fixup content
stretched_ratio = info_dict.get('stretched_ratio')
if stretched_ratio is not None and stretched_ratio != 1:
fixup_policy = self.params.get('fixup')
if fixup_policy is None:
fixup_policy = 'detect_or_warn'
if fixup_policy == 'warn':
self.report_warning('%s: Non-uniform pixel ratio (%s)' % (
info_dict['id'], stretched_ratio))
elif fixup_policy == 'detect_or_warn':
stretched_pp = FFmpegFixupStretchedPP(self)
if stretched_pp.available:
info_dict.setdefault('__postprocessors', [])
info_dict['__postprocessors'].append(stretched_pp)
else:
self.report_warning(
'%s: Non-uniform pixel ratio (%s). Install ffmpeg or avconv to fix this automatically.' % (
info_dict['id'], stretched_ratio))
else:
assert fixup_policy == 'ignore'
try:
self.post_process(filename, info_dict)
except (PostProcessingError) as err:
@@ -1192,14 +1286,15 @@ class YoutubeDL(object):
"""Run all the postprocessors on the given file."""
info = dict(ie_info)
info['filepath'] = filename
keep_video = None
pps_chain = []
if ie_info.get('__postprocessors') is not None:
pps_chain.extend(ie_info['__postprocessors'])
pps_chain.extend(self._pps)
for pp in pps_chain:
keep_video = None
old_filename = info['filepath']
try:
keep_video_wish, new_info = pp.run(info)
keep_video_wish, info = pp.run(info)
if keep_video_wish is not None:
if keep_video_wish:
keep_video = keep_video_wish
@@ -1208,12 +1303,12 @@ class YoutubeDL(object):
keep_video = keep_video_wish
except PostProcessingError as e:
self.report_error(e.msg)
if keep_video is False and not self.params.get('keepvideo', False):
try:
self.to_screen('Deleting original file %s (pass -k to keep)' % filename)
os.remove(encodeFilename(filename))
except (IOError, OSError):
self.report_warning('Unable to remove downloaded video file')
if keep_video is False and not self.params.get('keepvideo', False):
try:
self.to_screen('Deleting original file %s (pass -k to keep)' % old_filename)
os.remove(encodeFilename(old_filename))
except (IOError, OSError):
self.report_warning('Unable to remove downloaded video file')
def _make_archive_id(self, info_dict):
# Future-proof against any change in case
@@ -1333,7 +1428,9 @@ class YoutubeDL(object):
formats = info_dict.get('formats', [info_dict])
idlen = max(len('format code'),
max(len(f['format_id']) for f in formats))
formats_s = [line(f, idlen) for f in formats]
formats_s = [
line(f, idlen) for f in formats
if f.get('preference') is None or f['preference'] >= -1000]
if len(formats) > 1:
formats_s[0] += (' ' if self._format_note(formats[0]) else '') + '(worst)'
formats_s[-1] += (' ' if self._format_note(formats[-1]) else '') + '(best)'
@@ -1422,6 +1519,17 @@ class YoutubeDL(object):
proxy_map.update(handler.proxies)
self._write_string('[debug] Proxy map: ' + compat_str(proxy_map) + '\n')
if self.params.get('call_home', False):
ipaddr = self.urlopen('https://yt-dl.org/ip').read().decode('utf-8')
self._write_string('[debug] Public IP address: %s\n' % ipaddr)
latest_version = self.urlopen(
'https://yt-dl.org/latest/version').read().decode('utf-8')
if version_tuple(latest_version) > version_tuple(__version__):
self.report_warning(
'You are using an outdated version (newest version: %s)! '
'See https://yt-dl.org/update if you need help updating.' %
latest_version)
def _setup_opener(self):
timeout_val = self.params.get('socket_timeout')
self._socket_timeout = 600 if timeout_val is None else float(timeout_val)
@@ -1452,9 +1560,8 @@ class YoutubeDL(object):
proxy_handler = compat_urllib_request.ProxyHandler(proxies)
debuglevel = 1 if self.params.get('debug_printtraffic') else 0
https_handler = make_HTTPS_handler(
self.params.get('nocheckcertificate', False), debuglevel=debuglevel)
ydlh = YoutubeDLHandler(debuglevel=debuglevel)
https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
opener = compat_urllib_request.build_opener(
https_handler, proxy_handler, cookie_processor, ydlh)
# Delete the default user-agent header, which would otherwise apply in

View File

@@ -38,7 +38,7 @@ from .update import update_self
from .downloader import (
FileDownloader,
)
from .extractor import gen_extractors
from .extractor import gen_extractors, list_extractors
from .YoutubeDL import YoutubeDL
@@ -95,17 +95,15 @@ def _real_main(argv=None):
_enc = preferredencoding()
all_urls = [url.decode(_enc, 'ignore') if isinstance(url, bytes) else url for url in all_urls]
extractors = gen_extractors()
if opts.list_extractors:
for ie in sorted(extractors, key=lambda ie: ie.IE_NAME.lower()):
for ie in list_extractors(opts.age_limit):
compat_print(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else ''))
matchedUrls = [url for url in all_urls if ie.suitable(url)]
for mu in matchedUrls:
compat_print(' ' + mu)
sys.exit(0)
if opts.list_extractor_descriptions:
for ie in sorted(extractors, key=lambda ie: ie.IE_NAME.lower()):
for ie in list_extractors(opts.age_limit):
if not ie._WORKING:
continue
desc = getattr(ie, 'IE_DESC', ie.IE_NAME)
@@ -168,6 +166,7 @@ def _real_main(argv=None):
if opts.recodevideo is not None:
if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg', 'mkv']:
parser.error('invalid video recode format specified')
if opts.date is not None:
date = DateRange.day(opts.date)
else:
@@ -199,7 +198,8 @@ def _real_main(argv=None):
' file! Use "{0}.%(ext)s" instead of "{0}" as the output'
' template'.format(outtmpl))
any_printing = opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.getduration or opts.dumpjson or opts.dump_single_json
any_getting = opts.geturl or opts.gettitle or opts.getid or opts.getthumbnail or opts.getdescription or opts.getfilename or opts.getformat or opts.getduration or opts.dumpjson or opts.dump_single_json
any_printing = opts.print_json
download_archive_fn = compat_expanduser(opts.download_archive) if opts.download_archive is not None else opts.download_archive
# PostProcessors
@@ -245,7 +245,7 @@ def _real_main(argv=None):
'password': opts.password,
'twofactor': opts.twofactor,
'videopassword': opts.videopassword,
'quiet': (opts.quiet or any_printing),
'quiet': (opts.quiet or any_getting or any_printing),
'no_warnings': opts.no_warnings,
'forceurl': opts.geturl,
'forcetitle': opts.gettitle,
@@ -255,9 +255,9 @@ def _real_main(argv=None):
'forceduration': opts.getduration,
'forcefilename': opts.getfilename,
'forceformat': opts.getformat,
'forcejson': opts.dumpjson,
'forcejson': opts.dumpjson or opts.print_json,
'dump_single_json': opts.dump_single_json,
'simulate': opts.simulate or any_printing,
'simulate': opts.simulate or any_getting,
'skip_download': opts.skip_download,
'format': opts.format,
'format_limit': opts.format_limit,
@@ -324,7 +324,11 @@ def _real_main(argv=None):
'encoding': opts.encoding,
'exec_cmd': opts.exec_cmd,
'extract_flat': opts.extract_flat,
'merge_output_format': opts.merge_output_format,
'postprocessors': postprocessors,
'fixup': opts.fixup,
'source_address': opts.source_address,
'call_home': opts.call_home,
}
with YoutubeDL(ydl_opts) as ydl:
@@ -365,3 +369,5 @@ def main(argv=None):
sys.exit('ERROR: fixed output name but more than one file to download')
except KeyboardInterrupt:
sys.exit('\nERROR: Interrupted by user')
__all__ = ['main', 'YoutubeDL', 'gen_extractors', 'list_extractors']

View File

@@ -4,6 +4,7 @@ import getpass
import optparse
import os
import re
import socket
import subprocess
import sys
@@ -307,6 +308,32 @@ else:
compat_kwargs = lambda kwargs: kwargs
if sys.version_info < (2, 7):
def compat_socket_create_connection(address, timeout, source_address=None):
host, port = address
err = None
for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
af, socktype, proto, canonname, sa = res
sock = None
try:
sock = socket.socket(af, socktype, proto)
sock.settimeout(timeout)
if source_address:
sock.bind(source_address)
sock.connect(sa)
return sock
except socket.error as _:
err = _
if sock is not None:
sock.close()
if err is not None:
raise err
else:
raise socket.error("getaddrinfo returns an empty list")
else:
compat_socket_create_connection = socket.create_connection
# Fix https://github.com/rg3/youtube-dl/issues/4223
# See http://bugs.python.org/issue9161 for what is broken
def workaround_optparse_bug9161():
@@ -342,6 +369,7 @@ __all__ = [
'compat_ord',
'compat_parse_qs',
'compat_print',
'compat_socket_create_connection',
'compat_str',
'compat_subprocess_get_DEVNULL',
'compat_urllib_error',

View File

@@ -284,8 +284,19 @@ class FileDownloader(object):
"""Download to a filename using the info from info_dict
Return True on success and False otherwise
"""
nooverwrites_and_exists = (
self.params.get('nooverwrites', False)
and os.path.exists(encodeFilename(filename))
)
continuedl_and_exists = (
self.params.get('continuedl', False)
and os.path.isfile(encodeFilename(filename))
and not self.params.get('nopart', False)
)
# Check file already present
if filename != '-' and self.params.get('continuedl', False) and os.path.isfile(encodeFilename(filename)) and not self.params.get('nopart', False):
if filename != '-' and nooverwrites_and_exists or continuedl_and_exists:
self.report_file_already_downloaded(filename)
self._hook_progress({
'filename': filename,

View File

@@ -187,24 +187,34 @@ def build_fragments_list(boot_info):
return res
def write_flv_header(stream, metadata):
"""Writes the FLV header and the metadata to stream"""
def write_unsigned_int(stream, val):
stream.write(struct_pack('!I', val))
def write_unsigned_int_24(stream, val):
stream.write(struct_pack('!I', val)[1:])
def write_flv_header(stream):
"""Writes the FLV header to stream"""
# FLV header
stream.write(b'FLV\x01')
stream.write(b'\x05')
stream.write(b'\x00\x00\x00\x09')
# FLV File body
stream.write(b'\x00\x00\x00\x00')
# FLVTAG
# Script data
stream.write(b'\x12')
# Size of the metadata with 3 bytes
stream.write(struct_pack('!L', len(metadata))[1:])
stream.write(b'\x00\x00\x00\x00\x00\x00\x00')
stream.write(metadata)
# Magic numbers extracted from the output files produced by AdobeHDS.php
# (https://github.com/K-S-V/Scripts)
stream.write(b'\x00\x00\x01\x73')
def write_metadata_tag(stream, metadata):
"""Writes optional metadata tag to stream"""
SCRIPT_TAG = b'\x12'
FLV_TAG_HEADER_LEN = 11
if metadata:
stream.write(SCRIPT_TAG)
write_unsigned_int_24(stream, len(metadata))
stream.write(b'\x00\x00\x00\x00\x00\x00\x00')
stream.write(metadata)
write_unsigned_int(stream, FLV_TAG_HEADER_LEN + len(metadata))
def _add_ns(prop):
@@ -256,7 +266,11 @@ class F4mFD(FileDownloader):
bootstrap = self.ydl.urlopen(bootstrap_url).read()
else:
bootstrap = base64.b64decode(bootstrap_node.text)
metadata = base64.b64decode(media.find(_add_ns('metadata')).text)
metadata_node = media.find(_add_ns('metadata'))
if metadata_node is not None:
metadata = base64.b64decode(metadata_node.text)
else:
metadata = None
boot_info = read_bootstrap_info(bootstrap)
fragments_list = build_fragments_list(boot_info)
@@ -269,7 +283,8 @@ class F4mFD(FileDownloader):
tmpfilename = self.temp_name(filename)
(dest_stream, tmpfilename) = sanitize_open(tmpfilename, 'wb')
write_flv_header(dest_stream, metadata)
write_flv_header(dest_stream)
write_metadata_tag(dest_stream, metadata)
# This dict stores the download progress, it's updated by the progress
# hook

View File

@@ -11,7 +11,6 @@ from ..compat import (
compat_urllib_request,
)
from ..utils import (
check_executable,
encodeFilename,
)
@@ -27,16 +26,13 @@ class HlsFD(FileDownloader):
'-bsf:a', 'aac_adtstoasc',
encodeFilename(tmpfilename, for_subprocess=True)]
for program in ['avconv', 'ffmpeg']:
if check_executable(program, ['-version']):
break
else:
ffpp = FFmpegPostProcessor(downloader=self)
program = ffpp._executable
if program is None:
self.report_error('m3u8 download detected but ffmpeg or avconv could not be found. Please install one.')
return False
cmd = [program] + args
ffpp = FFmpegPostProcessor(downloader=self)
ffpp.check_version()
cmd = [program] + args
retval = subprocess.call(cmd)
if retval == 0:

View File

@@ -4,8 +4,8 @@ import os
import subprocess
from .common import FileDownloader
from ..compat import compat_subprocess_get_DEVNULL
from ..utils import (
check_executable,
encodeFilename,
)
@@ -20,11 +20,7 @@ class MplayerFD(FileDownloader):
'mplayer', '-really-quiet', '-vo', 'null', '-vc', 'dummy',
'-dumpstream', '-dumpfile', tmpfilename, url]
# Check for mplayer first
try:
subprocess.call(
['mplayer', '-h'],
stdout=compat_subprocess_get_DEVNULL(), stderr=subprocess.STDOUT)
except (OSError, IOError):
if not check_executable('mplayer', ['-h']):
self.report_error('MMS or RTSP download detected but "%s" could not be run' % args[0])
return False

View File

@@ -1,6 +1,7 @@
from __future__ import unicode_literals
from .abc import ABCIE
from .abc7news import Abc7NewsIE
from .academicearth import AcademicEarthCourseIE
from .addanime import AddAnimeIE
from .adobetv import AdobeTVIE
@@ -26,7 +27,8 @@ from .arte import (
ArteTVEmbedIE,
)
from .atresplayer import AtresPlayerIE
from .audiomack import AudiomackIE
from .atttechchannel import ATTTechChannelIE
from .audiomack import AudiomackIE, AudiomackAlbumIE
from .auengine import AUEngineIE
from .azubu import AzubuIE
from .bambuser import BambuserIE, BambuserChannelIE
@@ -69,6 +71,7 @@ from .cnn import (
CNNArticleIE,
)
from .collegehumor import CollegeHumorIE
from .collegerama import CollegeRamaIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE
@@ -91,6 +94,7 @@ from .deezer import DeezerPlaylistIE
from .dfb import DFBIE
from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .drbonanza import DRBonanzaIE
from .drtuber import DrTuberIE
from .drtv import DRTVIE
from .dvtv import DVTVIE
@@ -159,6 +163,7 @@ from .gametrailers import GametrailersIE
from .gdcvault import GDCVaultIE
from .generic import GenericIE
from .giantbomb import GiantBombIE
from .giga import GigaIE
from .glide import GlideIE
from .globo import GloboIE
from .godtube import GodTubeIE
@@ -171,6 +176,7 @@ from .goshgay import GoshgayIE
from .grooveshark import GroovesharkIE
from .groupon import GrouponIE
from .hark import HarkIE
from .hearthisat import HearThisAtIE
from .heise import HeiseIE
from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE
@@ -205,6 +211,7 @@ from .jove import JoveIE
from .jukebox import JukeboxIE
from .jpopsukitv import JpopsukiIE
from .kankan import KankanIE
from .karaoketv import KaraoketvIE
from .keezmovies import KeezMoviesIE
from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
@@ -221,6 +228,7 @@ from .livestream import (
LivestreamOriginalIE,
LivestreamShortenerIE,
)
from .lnkgo import LnkGoIE
from .lrt import LRTIE
from .lynda import (
LyndaIE,
@@ -273,6 +281,7 @@ from .nbc import (
)
from .ndr import NDRIE
from .ndtv import NDTVIE
from .netzkino import NetzkinoIE
from .nerdcubed import NerdCubedFeedIE
from .newgrounds import NewgroundsIE
from .newstube import NewstubeIE
@@ -289,6 +298,7 @@ from .nowness import NownessIE
from .nowvideo import NowVideoIE
from .npo import (
NPOIE,
NPOLiveIE,
TegenlichtVproIE,
)
from .nrk import (
@@ -325,6 +335,7 @@ from .prosiebensat1 import ProSiebenSat1IE
from .pyvideo import PyvideoIE
from .quickvid import QuickVidIE
from .radiode import RadioDeIE
from .radiobremen import RadioBremenIE
from .radiofrance import RadioFranceIE
from .rai import RaiIE
from .rbmaradio import RBMARadioIE
@@ -336,6 +347,7 @@ from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtbf import RTBFIE
from .rte import RteIE
from .rtlnl import RtlXlIE
from .rtlnow import RTLnowIE
from .rtp import RTPIE
@@ -345,6 +357,7 @@ from .ruhd import RUHDIE
from .rutube import (
RutubeIE,
RutubeChannelIE,
RutubeEmbedIE,
RutubeMovieIE,
RutubePersonIE,
)
@@ -397,6 +410,7 @@ from .stanfordoc import StanfordOpenClassroomIE
from .steam import SteamIE
from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE
from .sunporno import SunPornoIE
from .swrmediathek import SWRMediathekIE
from .syfy import SyfyIE
@@ -418,6 +432,7 @@ from .telemb import TeleMBIE
from .teletask import TeleTaskIE
from .tenplay import TenPlayIE
from .testurl import TestURLIE
from .testtube import TestTubeIE
from .tf1 import TF1IE
from .theonion import TheOnionIE
from .theplatform import ThePlatformIE
@@ -443,10 +458,17 @@ from .tunein import TuneInIE
from .turbo import TurboIE
from .tutv import TutvIE
from .tvigle import TvigleIE
from .tvp import TvpIE
from .tvp import TvpIE, TvpSeriesIE
from .tvplay import TVPlayIE
from .twentyfourvideo import TwentyFourVideoIE
from .twitch import TwitchIE
from .twitch import (
TwitchVideoIE,
TwitchChapterIE,
TwitchVodIE,
TwitchProfileIE,
TwitchPastBroadcastsIE,
TwitchStreamIE,
)
from .ubu import UbuIE
from .udemy import (
UdemyIE,
@@ -510,6 +532,7 @@ from .wdr import (
WDRMobileIE,
WDRMausIE,
)
from .webofstories import WebOfStoriesIE
from .weibo import WeiboIE
from .wimp import WimpIE
from .wistia import WistiaIE
@@ -545,7 +568,6 @@ from .youtube import (
YoutubeSearchURLIE,
YoutubeShowIE,
YoutubeSubscriptionsIE,
YoutubeTopListIE,
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,
YoutubeUserIE,
@@ -572,6 +594,17 @@ def gen_extractors():
return [klass() for klass in _ALL_CLASSES]
def list_extractors(age_limit):
"""
Return a list of extractors that are suitable for the given age,
sorted by extractor ID.
"""
return sorted(
filter(lambda ie: ie.is_suitable(age_limit), gen_extractors()),
key=lambda ie: ie.IE_NAME.lower())
def get_info_extractor(ie_name):
"""Returns the info extractor class with the given ie_name"""
return globals()[ie_name + 'IE']

View File

@@ -0,0 +1,68 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import parse_iso8601
class Abc7NewsIE(InfoExtractor):
_VALID_URL = r'https?://abc7news\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
_TESTS = [
{
'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/',
'info_dict': {
'id': '472581',
'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
'ext': 'mp4',
'title': 'East Bay museum celebrates history of synthesized music',
'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1421123075,
'upload_date': '20150113',
'uploader': 'Jonathan Bloom',
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://abc7news.com/472581',
'only_matching': True,
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
m3u8 = self._html_search_meta(
'contentURL', webpage, 'm3u8 url', fatal=True)
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
self._sort_formats(formats)
title = self._og_search_title(webpage).strip()
description = self._og_search_description(webpage).strip()
thumbnail = self._og_search_thumbnail(webpage)
timestamp = parse_iso8601(self._search_regex(
r'<div class="meta">\s*<time class="timeago" datetime="([^"]+)">',
webpage, 'upload date', fatal=False))
uploader = self._search_regex(
r'rel="author">([^<]+)</a>',
webpage, 'uploader', default=None)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'uploader': uploader,
'formats': formats,
}

View File

@@ -4,9 +4,12 @@ import time
import hmac
from .common import InfoExtractor
from ..utils import (
from ..compat import (
compat_str,
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
int_or_none,
float_or_none,
xpath_text,
@@ -44,6 +47,33 @@ class AtresPlayerIE(InfoExtractor):
_PLAYER_URL_TEMPLATE = 'https://servicios.atresplayer.com/episode/getplayer.json?episodePk=%s'
_EPISODE_URL_TEMPLATE = 'http://www.atresplayer.com/episodexml/%s'
_LOGIN_URL = 'https://servicios.atresplayer.com/j_spring_security_check'
def _real_initialize(self):
self._login()
def _login(self):
(username, password) = self._get_login_info()
if username is None:
return
login_form = {
'j_username': username,
'j_password': password,
}
request = compat_urllib_request.Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
response = self._download_webpage(
request, None, 'Logging in as %s' % username)
error = self._html_search_regex(
r'(?s)<ul class="list_error">(.+?)</ul>', response, 'error', default=None)
if error:
raise ExtractorError(
'Unable to login: %s' % error, expected=True)
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -0,0 +1,55 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unified_strdate
class ATTTechChannelIE(InfoExtractor):
_VALID_URL = r'https?://techchannel\.att\.com/play-video\.cfm/([^/]+/)*(?P<id>.+)'
_TEST = {
'url': 'http://techchannel.att.com/play-video.cfm/2014/1/27/ATT-Archives-The-UNIX-System-Making-Computers-Easier-to-Use',
'info_dict': {
'id': '11316',
'display_id': 'ATT-Archives-The-UNIX-System-Making-Computers-Easier-to-Use',
'ext': 'flv',
'title': 'AT&T Archives : The UNIX System: Making Computers Easier to Use',
'description': 'A 1982 film about UNIX is the foundation for software in use around Bell Labs and AT&T.',
'thumbnail': 're:^https?://.*\.jpg$',
'upload_date': '20140127',
},
'params': {
# rtmp download
'skip_download': True,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_url = self._search_regex(
r"url\s*:\s*'(rtmp://[^']+)'",
webpage, 'video URL')
video_id = self._search_regex(
r'mediaid\s*=\s*(\d+)',
webpage, 'video id', fatal=False)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
upload_date = unified_strdate(self._search_regex(
r'[Rr]elease\s+date:\s*(\d{1,2}/\d{1,2}/\d{4})',
webpage, 'upload date', fatal=False), False)
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'ext': 'flv',
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
}

View File

@@ -1,11 +1,15 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import time
from .common import InfoExtractor
from .soundcloud import SoundcloudIE
from ..utils import ExtractorError
import time
from ..utils import (
ExtractorError,
url_basename,
)
class AudiomackIE(InfoExtractor):
@@ -17,12 +21,13 @@ class AudiomackIE(InfoExtractor):
'url': 'http://www.audiomack.com/song/roosh-williams/extraordinary',
'info_dict':
{
'id': 'roosh-williams/extraordinary',
'id': '310086',
'ext': 'mp3',
'title': 'Roosh Williams - Extraordinary'
'uploader': 'Roosh Williams',
'title': 'Extraordinary'
}
},
# hosted on soundcloud via audiomack
# audiomack wrapper around soundcloud song
{
'add_ie': ['Soundcloud'],
'url': 'http://www.audiomack.com/song/xclusiveszone/take-kare',
@@ -38,32 +43,97 @@ class AudiomackIE(InfoExtractor):
]
def _real_extract(self, url):
video_id = self._match_id(url)
# URLs end with [uploader name]/[uploader title]
# this title is whatever the user types in, and is rarely
# the proper song title. Real metadata is in the api response
album_url_tag = self._match_id(url)
# Request the extended version of the api for extra fields like artist and title
api_response = self._download_json(
"http://www.audiomack.com/api/music/url/song/%s?_=%d" % (
video_id, time.time()),
video_id)
'http://www.audiomack.com/api/music/url/song/%s?extended=1&_=%d' % (
album_url_tag, time.time()),
album_url_tag)
if "url" not in api_response:
raise ExtractorError("Unable to deduce api url of song")
realurl = api_response["url"]
# API is inconsistent with errors
if 'url' not in api_response or not api_response['url'] or 'error' in api_response:
raise ExtractorError('Invalid url %s', url)
# Audiomack wraps a lot of soundcloud tracks in their branded wrapper
# - if so, pass the work off to the soundcloud extractor
if SoundcloudIE.suitable(realurl):
return {'_type': 'url', 'url': realurl, 'ie_key': 'Soundcloud'}
webpage = self._download_webpage(url, video_id)
artist = self._html_search_regex(
r'<span class="artist">(.*?)</span>', webpage, "artist")
songtitle = self._html_search_regex(
r'<h1 class="profile-title song-title"><span class="artist">.*?</span>(.*?)</h1>',
webpage, "title")
title = artist + " - " + songtitle
# if so, pass the work off to the soundcloud extractor
if SoundcloudIE.suitable(api_response['url']):
return {'_type': 'url', 'url': api_response['url'], 'ie_key': 'Soundcloud'}
return {
'id': video_id,
'title': title,
'url': realurl,
'id': api_response.get('id', album_url_tag),
'uploader': api_response.get('artist'),
'title': api_response.get('title'),
'url': api_response['url'],
}
class AudiomackAlbumIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audiomack\.com/album/(?P<id>[\w/-]+)'
IE_NAME = 'audiomack:album'
_TESTS = [
# Standard album playlist
{
'url': 'http://www.audiomack.com/album/flytunezcom/tha-tour-part-2-mixtape',
'playlist_count': 15,
'info_dict':
{
'id': '812251',
'title': 'Tha Tour: Part 2 (Official Mixtape)'
}
},
# Album playlist ripped from fakeshoredrive with no metadata
{
'url': 'http://www.audiomack.com/album/fakeshoredrive/ppp-pistol-p-project',
'playlist': [{
'info_dict': {
'title': '9.-heaven-or-hell-chimaca-ft-zuse-prod-by-dj-fu',
'id': '9.-heaven-or-hell-chimaca-ft-zuse-prod-by-dj-fu',
'ext': 'mp3',
}
}],
'params': {
'playliststart': 8,
'playlistend': 8,
}
}
]
def _real_extract(self, url):
# URLs end with [uploader name]/[uploader title]
# this title is whatever the user types in, and is rarely
# the proper song title. Real metadata is in the api response
album_url_tag = self._match_id(url)
result = {'_type': 'playlist', 'entries': []}
# There is no one endpoint for album metadata - instead it is included/repeated in each song's metadata
# Therefore we don't know how many songs the album has and must infi-loop until failure
for track_no in itertools.count():
# Get song's metadata
api_response = self._download_json(
'http://www.audiomack.com/api/music/url/album/%s/%d?extended=1&_=%d'
% (album_url_tag, track_no, time.time()), album_url_tag,
note='Querying song information (%d)' % (track_no + 1))
# Total failure, only occurs when url is totally wrong
# Won't happen in middle of valid playlist (next case)
if 'url' not in api_response or 'error' in api_response:
raise ExtractorError('Invalid url for track %d of album url %s' % (track_no, url))
# URL is good but song id doesn't exist - usually means end of playlist
elif not api_response['url']:
break
else:
# Pull out the album metadata and add to result (if it exists)
for resultkey, apikey in [('id', 'album_id'), ('title', 'album_title')]:
if apikey in api_response and resultkey not in result:
result[resultkey] = api_response[apikey]
song_id = url_basename(api_response['url']).rpartition('.')[0]
result['entries'].append({
'id': api_response.get('id', song_id),
'uploader': api_response.get('artist'),
'title': api_response.get('title', song_id),
'url': api_response['url'],
})
return result

View File

@@ -7,6 +7,7 @@ from ..compat import compat_urllib_parse
from ..utils import (
determine_ext,
ExtractorError,
remove_end,
)
@@ -27,23 +28,18 @@ class AUEngineIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<title>(?P<title>.+?)</title>', webpage, 'title')
title = title.strip()
links = re.findall(r'\s(?:file|url):\s*["\']([^\'"]+)["\']', webpage)
links = map(compat_urllib_parse.unquote, links)
title = self._html_search_regex(
r'<title>\s*(?P<title>.+?)\s*</title>', webpage, 'title')
video_urls = re.findall(r'http://\w+.auengine.com/vod/.*[^\W]', webpage)
video_url = compat_urllib_parse.unquote(video_urls[0])
thumbnails = re.findall(r'http://\w+.auengine.com/thumb/.*[^\W]', webpage)
thumbnail = compat_urllib_parse.unquote(thumbnails[0])
thumbnail = None
video_url = None
for link in links:
if link.endswith('.png'):
thumbnail = link
elif '/videos/' in link:
video_url = link
if not video_url:
raise ExtractorError('Could not find video URL')
ext = '.' + determine_ext(video_url)
if ext == title[-len(ext):]:
title = title[:-len(ext)]
title = remove_end(title, ext)
return {
'id': video_id,

View File

@@ -161,7 +161,8 @@ class BandcampAlbumIE(InfoExtractor):
entries = [
self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key())
for t_path in tracks_paths]
title = self._search_regex(r'album_title : "(.*?)"', webpage, 'title')
title = self._search_regex(
r'album_title\s*:\s*"(.*?)"', webpage, 'title', fatal=False)
return {
'_type': 'playlist',
'id': playlist_id,

View File

@@ -10,7 +10,7 @@ from ..compat import compat_HTTPError
class BBCCoUkIE(SubtitlesInfoExtractor):
IE_NAME = 'bbc.co.uk'
IE_DESC = 'BBC iPlayer'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:programmes|iplayer/(?:episode|playlist))/(?P<id>[\da-z]{8})'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:(?:(?:programmes|iplayer/(?:episode|playlist))/)|music/clips[/#])(?P<id>[\da-z]{8})'
_TESTS = [
{
@@ -18,8 +18,8 @@ class BBCCoUkIE(SubtitlesInfoExtractor):
'info_dict': {
'id': 'b039d07m',
'ext': 'flv',
'title': 'Kaleidoscope: Leonard Cohen',
'description': 'md5:db4755d7a665ae72343779f7dacb402c',
'title': 'Kaleidoscope, Leonard Cohen',
'description': 'The Canadian poet and songwriter reflects on his musical career.',
'duration': 1740,
},
'params': {
@@ -84,9 +84,40 @@ class BBCCoUkIE(SubtitlesInfoExtractor):
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/music/clips/p02frcc3',
'note': 'Audio',
'info_dict': {
'id': 'p02frcch',
'ext': 'flv',
'title': 'Pete Tong, Past, Present and Future Special, Madeon - After Hours mix',
'description': 'French house superstar Madeon takes us out of the club and onto the after party.',
'duration': 3507,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/music/clips/p025c0zz',
'note': 'Video',
'info_dict': {
'id': 'p025c103',
'ext': 'flv',
'title': 'Reading and Leeds Festival, 2014, Rae Morris - Closer (Live on BBC Three)',
'description': 'Rae Morris performs Closer for BBC Three at Reading 2014',
'duration': 226,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/music/clips#p02frcc3',
'only_matching': True,
}
]

View File

@@ -16,7 +16,7 @@ class BetIE(InfoExtractor):
{
'url': 'http://www.bet.com/news/politics/2014/12/08/in-bet-exclusive-obama-talks-race-and-racism.html',
'info_dict': {
'id': '417cd61c-c793-4e8e-b006-e445ecc45add',
'id': '740ab250-bb94-4a8a-8787-fe0de7c74471',
'display_id': 'in-bet-exclusive-obama-talks-race-and-racism',
'ext': 'flv',
'title': 'BET News Presents: A Conversation With President Obama',
@@ -35,7 +35,7 @@ class BetIE(InfoExtractor):
{
'url': 'http://www.bet.com/video/news/national/2014/justice-for-ferguson-a-community-reacts.html',
'info_dict': {
'id': '4160e53b-ad41-43b1-980f-8d85f63121f4',
'id': 'bcd1b1df-673a-42cf-8d01-b282db608f2d',
'display_id': 'justice-for-ferguson-a-community-reacts',
'ext': 'flv',
'title': 'Justice for Ferguson: A Community Reacts',
@@ -55,7 +55,6 @@ class BetIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
media_url = compat_urllib_parse.unquote(self._search_regex(

View File

@@ -4,9 +4,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_parse_qs
from ..utils import (
ExtractorError,
int_or_none,
unified_strdate,
)
@@ -54,45 +52,38 @@ class BiliBiliIE(InfoExtractor):
thumbnail = self._html_search_meta(
'thumbnailUrl', video_code, 'thumbnail', fatal=False)
player_params = compat_parse_qs(self._html_search_regex(
r'<iframe .*?class="player" src="https://secure\.bilibili\.(?:tv|com)/secure,([^"]+)"',
webpage, 'player params'))
cid = self._search_regex(r'cid=(\d+)', webpage, 'cid')
if 'cid' in player_params:
cid = player_params['cid'][0]
lq_doc = self._download_xml(
'http://interface.bilibili.com/v_cdn_play?appkey=1&cid=%s' % cid,
video_id,
note='Downloading LQ video info'
)
lq_durl = lq_doc.find('./durl')
formats = [{
'format_id': 'lq',
'quality': 1,
'url': lq_durl.find('./url').text,
'filesize': int_or_none(
lq_durl.find('./size'), get_attr='text'),
}]
lq_doc = self._download_xml(
'http://interface.bilibili.cn/v_cdn_play?cid=%s' % cid,
video_id,
note='Downloading LQ video info'
)
lq_durl = lq_doc.find('.//durl')
formats = [{
'format_id': 'lq',
'quality': 1,
'url': lq_durl.find('./url').text,
hq_doc = self._download_xml(
'http://interface.bilibili.com/playurl?appkey=1&cid=%s' % cid,
video_id,
note='Downloading HQ video info',
fatal=False,
)
if hq_doc is not False:
hq_durl = hq_doc.find('./durl')
formats.append({
'format_id': 'hq',
'quality': 2,
'ext': 'flv',
'url': hq_durl.find('./url').text,
'filesize': int_or_none(
lq_durl.find('./size'), get_attr='text'),
}]
hq_doc = self._download_xml(
'http://interface.bilibili.cn/playurl?cid=%s' % cid,
video_id,
note='Downloading HQ video info',
fatal=False,
)
if hq_doc is not False:
hq_durl = hq_doc.find('.//durl')
formats.append({
'format_id': 'hq',
'quality': 2,
'ext': 'flv',
'url': hq_durl.find('./url').text,
'filesize': int_or_none(
hq_durl.find('./size'), get_attr='text'),
})
else:
raise ExtractorError('Unsupported player parameters: %r' % (player_params,))
hq_durl.find('./size'), get_attr='text'),
})
self._sort_formats(formats)
return {

View File

@@ -33,7 +33,7 @@ class BuzzFeedIE(InfoExtractor):
'skip_download': True, # Got enough YouTube download tests
},
'info_dict': {
'description': 'Munchkin the Teddy Bear is back !',
'description': 're:Munchkin the Teddy Bear is back ?!',
'title': 'You Need To Stop What You\'re Doing And Watching This Dog Walk On A Treadmill',
},
'playlist': [{
@@ -42,9 +42,9 @@ class BuzzFeedIE(InfoExtractor):
'ext': 'mp4',
'upload_date': '20141124',
'uploader_id': 'CindysMunchkin',
'description': '© 2014 Munchkin the Shih Tzu\nAll rights reserved\nFacebook: http://facebook.com/MunchkintheShihTzu',
'description': 're:© 2014 Munchkin the Shih Tzu',
'uploader': 'Munchkin the Shih Tzu',
'title': 'Munchkin the Teddy Bear gets her exercise',
'title': 're:Munchkin the Teddy Bear gets her exercise',
},
}]
}]

View File

@@ -5,6 +5,8 @@ import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
HEADRequest,
unified_strdate,
url_basename,
qualities,
@@ -76,6 +78,16 @@ class CanalplusIE(InfoExtractor):
preference = qualities(['MOBILE', 'BAS_DEBIT', 'HAUT_DEBIT', 'HD', 'HLS', 'HDS'])
fmt_url = next(iter(media.find('VIDEOS'))).text
if '/geo' in fmt_url.lower():
response = self._request_webpage(
HEADRequest(fmt_url), video_id,
'Checking if the video is georestricted')
if '/blocage' in response.geturl():
raise ExtractorError(
'The video is not available in your country',
expected=True)
formats = []
for fmt in media.find('VIDEOS'):
format_url = fmt.text

View File

@@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..compat import (
compat_urllib_request,
compat_urllib_parse,
@@ -15,7 +15,7 @@ from ..utils import (
)
class CeskaTelevizeIE(InfoExtractor):
class CeskaTelevizeIE(SubtitlesInfoExtractor):
_VALID_URL = r'https?://www\.ceskatelevize\.cz/(porady|ivysilani)/(.+/)?(?P<id>[^?#]+)'
_TESTS = [
@@ -104,6 +104,17 @@ class CeskaTelevizeIE(InfoExtractor):
duration = float_or_none(item.get('duration'))
thumbnail = item.get('previewImageUrl')
subtitles = {}
subs = item.get('subtitles')
if subs:
subtitles['cs'] = subs[0]['url']
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id, subtitles)
return
subtitles = self._fix_subtitles(self.extract_subtitles(video_id, subtitles))
return {
'id': episode_id,
'title': title,
@@ -111,4 +122,34 @@ class CeskaTelevizeIE(InfoExtractor):
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}
@staticmethod
def _fix_subtitles(subtitles):
""" Convert millisecond-based subtitles to SRT """
if subtitles is None:
return subtitles # subtitles not requested
def _msectotimecode(msec):
""" Helper utility to convert milliseconds to timecode """
components = []
for divider in [1000, 60, 60, 100]:
components.append(msec % divider)
msec //= divider
return "{3:02}:{2:02}:{1:02},{0:03}".format(*components)
def _fix_subtitle(subtitle):
for line in subtitle.splitlines():
m = re.match(r"^\s*([0-9]+);\s*([0-9]+)\s+([0-9]+)\s*$", line)
if m:
yield m.group(1)
start, stop = (_msectotimecode(int(t)) for t in m.groups()[1:])
yield "{0} --> {1}".format(start, stop)
else:
yield line
fixed_subtitles = {}
for k, v in subtitles.items():
fixed_subtitles[k] = "\r\n".join(_fix_subtitle(v))
return fixed_subtitles

View File

@@ -51,7 +51,7 @@ class CNNIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
path = mobj.group('path')
page_title = mobj.group('title')
info_url = 'http://cnn.com/video/data/3.0/%s/index.xml' % path
info_url = 'http://edition.cnn.com/video/data/3.0/%s/index.xml' % path
info = self._download_xml(info_url, page_title)
formats = []
@@ -143,13 +143,13 @@ class CNNArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:edition|www)\.)?cnn\.com/(?!video/)'
_TEST = {
'url': 'http://www.cnn.com/2014/12/21/politics/obama-north-koreas-hack-not-war-but-cyber-vandalism/',
'md5': '275b326f85d80dff7592a9820f5dc887',
'md5': '689034c2a3d9c6dc4aa72d65a81efd01',
'info_dict': {
'id': 'bestoftv/2014/12/21/sotu-crowley-president-obama-north-korea-not-going-to-be-intimidated.cnn',
'id': 'bestoftv/2014/12/21/ip-north-korea-obama.cnn',
'ext': 'mp4',
'title': 'Obama: We\'re not going to be intimidated',
'description': 'md5:e735586f3dc936075fa654a4d91b21f9',
'upload_date': '20141220',
'title': 'Obama: Cyberattack not an act of war',
'description': 'md5:51ce6750450603795cad0cdfbd7d05c5',
'upload_date': '20141221',
},
'add_ie': ['CNN'],
}

View File

@@ -0,0 +1,92 @@
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..compat import compat_urllib_request
from ..utils import (
float_or_none,
int_or_none,
)
class CollegeRamaIE(InfoExtractor):
_VALID_URL = r'https?://collegerama\.tudelft\.nl/Mediasite/Play/(?P<id>[\da-f]+)'
_TESTS = [
{
'url': 'https://collegerama.tudelft.nl/Mediasite/Play/585a43626e544bdd97aeb71a0ec907a01d',
'md5': '481fda1c11f67588c0d9d8fbdced4e39',
'info_dict': {
'id': '585a43626e544bdd97aeb71a0ec907a01d',
'ext': 'mp4',
'title': 'Een nieuwe wereld: waarden, bewustzijn en techniek van de mensheid 2.0.',
'description': '',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 7713.088,
'timestamp': 1413309600,
'upload_date': '20141014',
},
},
{
'url': 'https://collegerama.tudelft.nl/Mediasite/Play/86a9ea9f53e149079fbdb4202b521ed21d?catalog=fd32fd35-6c99-466c-89d4-cd3c431bc8a4',
'md5': 'ef1fdded95bdf19b12c5999949419c92',
'info_dict': {
'id': '86a9ea9f53e149079fbdb4202b521ed21d',
'ext': 'wmv',
'title': '64ste Vakantiecursus: Afvalwater',
'description': 'md5:7fd774865cc69d972f542b157c328305',
'duration': 10853,
'timestamp': 1326446400,
'upload_date': '20120113',
},
},
]
def _real_extract(self, url):
video_id = self._match_id(url)
player_options_request = {
"getPlayerOptionsRequest": {
"ResourceId": video_id,
"QueryString": "",
}
}
request = compat_urllib_request.Request(
'http://collegerama.tudelft.nl/Mediasite/PlayerService/PlayerService.svc/json/GetPlayerOptions',
json.dumps(player_options_request))
request.add_header('Content-Type', 'application/json')
player_options = self._download_json(request, video_id)
presentation = player_options['d']['Presentation']
title = presentation['Title']
description = presentation.get('Description')
thumbnail = None
duration = float_or_none(presentation.get('Duration'), 1000)
timestamp = int_or_none(presentation.get('UnixTime'), 1000)
formats = []
for stream in presentation['Streams']:
for video in stream['VideoUrls']:
thumbnail_url = stream.get('ThumbnailUrl')
if thumbnail_url:
thumbnail = 'http://collegerama.tudelft.nl' + thumbnail_url
format_id = video['MediaType']
if format_id == 'SS':
continue
formats.append({
'url': video['Location'],
'format_id': format_id,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@@ -34,12 +34,12 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
class ComedyCentralShowsIE(MTVServicesInfoExtractor):
IE_DESC = 'The Daily Show / The Colbert Report'
# urls can be abbreviations like :thedailyshow or :colbert
# urls can be abbreviations like :thedailyshow
# urls for episodes like:
# or urls for clips like: http://www.thedailyshow.com/watch/mon-december-10-2012/any-given-gun-day
# or: http://www.colbertnation.com/the-colbert-report-videos/421667/november-29-2012/moon-shattering-news
# or: http://www.colbertnation.com/the-colbert-report-collections/422008/festival-of-lights/79524
_VALID_URL = r'''(?x)^(:(?P<shortname>tds|thedailyshow|cr|colbert|colbertnation|colbertreport)
_VALID_URL = r'''(?x)^(:(?P<shortname>tds|thedailyshow)
|https?://(:www\.)?
(?P<showname>thedailyshow|thecolbertreport)\.(?:cc\.)?com/
((?:full-)?episodes/(?:[0-9a-z]{6}/)?(?P<episode>.*)|

View File

@@ -21,6 +21,7 @@ from ..compat import (
compat_str,
)
from ..utils import (
age_restricted,
clean_html,
compiled_regex_type,
ExtractorError,
@@ -92,6 +93,8 @@ class InfoExtractor(object):
by this field, regardless of all other values.
-1 for default (order by other properties),
-2 or smaller for less than default.
< -1000 to hide the format (if there is
another one which is strictly better)
* language_preference Is this in the correct requested
language?
10 if it's what the URL is about,
@@ -111,6 +114,9 @@ class InfoExtractor(object):
to add to the request.
* http_post_data Additional data to send with a POST
request.
* stretched_ratio If given and not 1, indicates that the
video's pixels are not square.
width : height ratio as float.
url: Final video URL.
ext: Video filename extension.
format: The video format, defaults to ext (used for --get-format)
@@ -144,6 +150,17 @@ class InfoExtractor(object):
like_count: Number of positive ratings of the video
dislike_count: Number of negative ratings of the video
comment_count: Number of comments on the video
comments: A list of comments, each with one or more of the following
properties (all but one of text or html optional):
* "author" - human-readable name of the comment author
* "author_id" - user ID of the comment author
* "id" - Comment ID
* "html" - Comment as HTML
* "text" - Plain text of the comment
* "timestamp" - UNIX timestamp of comment
* "parent" - ID of the comment this one is replying to.
Set to "root" to indicate that this is a
comment to the original video.
age_limit: Age restriction for the video, as an integer (years)
webpage_url: The url to the video webpage, if given to youtube-dl it
should allow to get the same result again. (It will be set
@@ -362,9 +379,19 @@ class InfoExtractor(object):
return content
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True):
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5):
""" Returns the data of the page as a string """
res = self._download_webpage_handle(url_or_request, video_id, note, errnote, fatal)
success = False
try_count = 0
while success is False:
try:
res = self._download_webpage_handle(url_or_request, video_id, note, errnote, fatal)
success = True
except compat_http_client.IncompleteRead as e:
try_count += 1
if try_count >= tries:
raise e
self._sleep(timeout, video_id)
if res is False:
return res
else:
@@ -591,7 +618,7 @@ class InfoExtractor(object):
return self._html_search_regex(
r'''(?isx)<meta
(?=[^>]+(?:itemprop|name|property)=(["\']?)%s\1)
[^>]+content=(["\'])(?P<content>.*?)\1''' % re.escape(name),
[^>]+?content=(["\'])(?P<content>.*?)\2''' % re.escape(name),
html, display_name, fatal=fatal, group='content', **kwargs)
def _dc_search_uploader(self, html):
@@ -715,8 +742,14 @@ class InfoExtractor(object):
'Unable to download f4m manifest')
formats = []
manifest_version = '1.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/1.0}media')
if not media_nodes:
manifest_version = '2.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
for i, media_el in enumerate(media_nodes):
if manifest_version == '2.0':
manifest_url = '/'.join(manifest_url.split('/')[:-1]) + '/' + media_el.attrib.get('href')
tbr = int_or_none(media_el.attrib.get('bitrate'))
format_id = 'f4m-%d' % (i if tbr is None else tbr)
formats.append({
@@ -875,6 +908,35 @@ class InfoExtractor(object):
None, '/', True, False, expire_time, '', None, None, None)
self._downloader.cookiejar.set_cookie(cookie)
def get_testcases(self, include_onlymatching=False):
t = getattr(self, '_TEST', None)
if t:
assert not hasattr(self, '_TESTS'), \
'%s has _TEST and _TESTS' % type(self).__name__
tests = [t]
else:
tests = getattr(self, '_TESTS', [])
for t in tests:
if not include_onlymatching and t.get('only_matching', False):
continue
t['name'] = type(self).__name__[:-len('IE')]
yield t
def is_suitable(self, age_limit):
""" Test whether the extractor is generally suitable for the given
age limit (i.e. pornographic sites are not, all others usually are) """
any_restricted = False
for tc in self.get_testcases(include_onlymatching=False):
if 'playlist' in tc:
tc = tc['playlist'][0]
is_restricted = age_restricted(
tc.get('info_dict', {}).get('age_limit'), age_limit)
if not is_restricted:
return True
any_restricted = any_restricted or is_restricted
return not any_restricted
class SearchInfoExtractor(InfoExtractor):
"""

View File

@@ -228,7 +228,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
video_thumbnail = self._search_regex(r'<episode_image_url>([^<]+)', playerdata, 'thumbnail', fatal=False)
formats = []
for fmt in re.findall(r'\?p([0-9]{3,4})=1', webpage):
for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage):
stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p'
streamdata_req = compat_urllib_request.Request('http://www.crunchyroll.com/xml/')

View File

@@ -1,47 +1,45 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import (
parse_iso8601,
int_or_none,
)
class DiscoveryIE(InfoExtractor):
_VALID_URL = r'http://www\.discovery\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9\-]*)(.htm)?'
_VALID_URL = r'http://www\.discovery\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9_\-]*)(?:\.htm)?'
_TEST = {
'url': 'http://www.discovery.com/tv-shows/mythbusters/videos/mission-impossible-outtakes.htm',
'md5': 'e12614f9ee303a6ccef415cb0793eba2',
'md5': '3c69d77d9b0d82bfd5e5932a60f26504',
'info_dict': {
'id': '614784',
'ext': 'mp4',
'title': 'MythBusters: Mission Impossible Outtakes',
'id': 'mission-impossible-outtakes',
'ext': 'flv',
'title': 'Mission Impossible Outtakes',
'description': ('Watch Jamie Hyneman and Adam Savage practice being'
' each other -- to the point of confusing Jamie\'s dog -- and '
'don\'t miss Adam moon-walking as Jamie ... behind Jamie\'s'
' back.'),
'duration': 156,
'timestamp': 1303099200,
'upload_date': '20110418',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_list_json = self._search_regex(r'var videoListJSON = ({.*?});',
webpage, 'video list', flags=re.DOTALL)
video_list = json.loads(video_list_json)
info = video_list['clips'][0]
formats = []
for f in info['mp4']:
formats.append(
{'url': f['src'], 'ext': 'mp4', 'tbr': int(f['bitrate'][:-1])})
info = self._parse_json(self._search_regex(
r'(?s)<script type="application/ld\+json">(.*?)</script>',
webpage, 'video info'), video_id)
return {
'id': info['contentId'],
'title': video_list['name'],
'formats': formats,
'description': info['videoCaption'],
'thumbnail': info.get('videoStillURL') or info.get('thumbnailURL'),
'duration': info['duration'],
'id': video_id,
'title': info['name'],
'url': info['contentURL'],
'description': info.get('description'),
'thumbnail': info.get('thumbnailUrl'),
'timestamp': parse_iso8601(info.get('uploadDate')),
'duration': int_or_none(info.get('duration')),
}

View File

@@ -0,0 +1,131 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
)
class DRBonanzaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dr\.dk/bonanza/(?:[^/]+/)+(?:[^/])+?(?:assetId=(?P<id>\d+))?(?:[#&]|$)'
_TESTS = [{
'url': 'http://www.dr.dk/bonanza/serie/portraetter/Talkshowet.htm?assetId=65517',
'md5': 'fe330252ddea607635cf2eb2c99a0af3',
'info_dict': {
'id': '65517',
'ext': 'mp4',
'title': 'Talkshowet - Leonard Cohen',
'description': 'md5:8f34194fb30cd8c8a30ad8b27b70c0ca',
'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
'timestamp': 1295537932,
'upload_date': '20110120',
'duration': 3664,
},
}, {
'url': 'http://www.dr.dk/bonanza/radio/serie/sport/fodbold.htm?assetId=59410',
'md5': '6dfe039417e76795fb783c52da3de11d',
'info_dict': {
'id': '59410',
'ext': 'mp3',
'title': 'EM fodbold 1992 Danmark - Tyskland finale Transmission',
'description': 'md5:501e5a195749480552e214fbbed16c4e',
'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
'timestamp': 1223274900,
'upload_date': '20081006',
'duration': 7369,
},
}]
def _real_extract(self, url):
url_id = self._match_id(url)
webpage = self._download_webpage(url, url_id)
if url_id:
info = json.loads(self._html_search_regex(r'({.*?%s.*})' % url_id, webpage, 'json'))
else:
# Just fetch the first video on that page
info = json.loads(self._html_search_regex(r'bonanzaFunctions.newPlaylist\(({.*})\)', webpage, 'json'))
asset_id = str(info['AssetId'])
title = info['Title'].rstrip(' \'\"-,.:;!?')
duration = int_or_none(info.get('Duration'), scale=1000)
# First published online. "FirstPublished" contains the date for original airing.
timestamp = parse_iso8601(
re.sub(r'\.\d+$', '', info['Created']))
def parse_filename_info(url):
match = re.search(r'/\d+_(?P<width>\d+)x(?P<height>\d+)x(?P<bitrate>\d+)K\.(?P<ext>\w+)$', url)
if match:
return {
'width': int(match.group('width')),
'height': int(match.group('height')),
'vbr': int(match.group('bitrate')),
'ext': match.group('ext')
}
match = re.search(r'/\d+_(?P<bitrate>\d+)K\.(?P<ext>\w+)$', url)
if match:
return {
'vbr': int(match.group('bitrate')),
'ext': match.group(2)
}
return {}
video_types = ['VideoHigh', 'VideoMid', 'VideoLow']
preferencemap = {
'VideoHigh': -1,
'VideoMid': -2,
'VideoLow': -3,
'Audio': -4,
}
formats = []
for file in info['Files']:
if info['Type'] == "Video":
if file['Type'] in video_types:
format = parse_filename_info(file['Location'])
format.update({
'url': file['Location'],
'format_id': file['Type'].replace('Video', ''),
'preference': preferencemap.get(file['Type'], -10),
})
formats.append(format)
elif file['Type'] == "Thumb":
thumbnail = file['Location']
elif info['Type'] == "Audio":
if file['Type'] == "Audio":
format = parse_filename_info(file['Location'])
format.update({
'url': file['Location'],
'format_id': file['Type'],
'vcodec': 'none',
})
formats.append(format)
elif file['Type'] == "Thumb":
thumbnail = file['Location']
description = '%s\n%s\n%s\n' % (
info['Description'], info['Actors'], info['Colophon'])
for f in formats:
f['url'] = f['url'].replace('rtmp://vod-bonanza.gss.dr.dk/bonanza/', 'http://vodfiles.dr.dk/')
f['url'] = f['url'].replace('mp4:bonanza', 'bonanza')
self._sort_formats(formats)
display_id = re.sub(r'[^\w\d-]', '', re.sub(r' ', '-', title.lower())) + '-' + asset_id
display_id = re.sub(r'-+', '-', display_id)
return {
'id': asset_id,
'display_id': display_id,
'title': title,
'formats': formats,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
}

View File

@@ -6,7 +6,7 @@ from ..utils import parse_iso8601
class DRTVIE(SubtitlesInfoExtractor):
_VALID_URL = r'http://(?:www\.)?dr\.dk/tv/se/(?:[^/]+/)+(?P<id>[\da-z-]+)(?:[/#?]|$)'
_VALID_URL = r'https?://(?:www\.)?dr\.dk/tv/se/(?:[^/]+/)+(?P<id>[\da-z-]+)(?:[/#?]|$)'
_TEST = {
'url': 'http://www.dr.dk/tv/se/partiets-mand/partiets-mand-7-8',

View File

@@ -9,6 +9,9 @@ from .common import InfoExtractor
from ..compat import (
compat_str,
)
from ..utils import (
ExtractorError,
)
class EightTracksIE(InfoExtractor):
@@ -112,14 +115,29 @@ class EightTracksIE(InfoExtractor):
session = str(random.randint(0, 1000000000))
mix_id = data['id']
track_count = data['tracks_count']
duration = data['duration']
avg_song_duration = float(duration) / track_count
first_url = 'http://8tracks.com/sets/%s/play?player=sm&mix_id=%s&format=jsonh' % (session, mix_id)
next_url = first_url
entries = []
for i in range(track_count):
api_json = self._download_webpage(
next_url, playlist_id,
note='Downloading song information %d/%d' % (i + 1, track_count),
errnote='Failed to download song information')
api_json = None
download_tries = 0
while api_json is None:
try:
api_json = self._download_webpage(
next_url, playlist_id,
note='Downloading song information %d/%d' % (i + 1, track_count),
errnote='Failed to download song information')
except ExtractorError:
if download_tries > 3:
raise
else:
download_tries += 1
self._sleep(avg_song_duration, playlist_id)
api_data = json.loads(api_json)
track_data = api_data['set']['track']
info = {
@@ -131,6 +149,7 @@ class EightTracksIE(InfoExtractor):
'ext': 'm4a',
}
entries.append(info)
next_url = 'http://8tracks.com/sets/%s/next?player=sm&mix_id=%s&format=jsonh&track_id=%s' % (
session, mix_id, track_data['id'])
return {

View File

@@ -1,7 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
@@ -12,32 +11,49 @@ from ..utils import (
class EllenTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ellentv\.com/videos/(?P<id>[a-z0-9_-]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?(?:ellentv|ellentube)\.com/videos/(?P<id>[a-z0-9_-]+)'
_TESTS = [{
'url': 'http://www.ellentv.com/videos/0-7jqrsr18/',
'md5': 'e4af06f3bf0d5f471921a18db5764642',
'info_dict': {
'id': '0-7jqrsr18',
'ext': 'mp4',
'title': 'What\'s Wrong with These Photos? A Whole Lot',
'description': 'md5:35f152dc66b587cf13e6d2cf4fa467f6',
'timestamp': 1406876400,
'upload_date': '20140801',
}
}
}, {
'url': 'http://ellentube.com/videos/0-dvzmabd5/',
'md5': '98238118eaa2bbdf6ad7f708e3e4f4eb',
'info_dict': {
'id': '0-dvzmabd5',
'ext': 'mp4',
'title': '1 year old twin sister makes her brother laugh',
'description': '1 year old twin sister makes her brother laugh',
'timestamp': 1419542075,
'upload_date': '20141225',
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_meta('VideoURL', webpage, 'url')
title = self._og_search_title(webpage, default=None) or self._search_regex(
r'pageName\s*=\s*"([^"]+)"', webpage, 'title')
description = self._html_search_meta(
'description', webpage, 'description') or self._og_search_description(webpage)
timestamp = parse_iso8601(self._search_regex(
r'<span class="publish-date"><time datetime="([^"]+)">',
webpage, 'timestamp'))
return {
'id': video_id,
'title': self._og_search_title(webpage),
'url': self._html_search_meta('VideoURL', webpage, 'url'),
'url': video_url,
'title': title,
'description': description,
'timestamp': timestamp,
}
@@ -55,8 +71,7 @@ class EllenTVClipsIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
playlist = self._extract_playlist(webpage)

View File

@@ -1,8 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import unified_strdate
@@ -24,9 +22,7 @@ class ElPaisIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
prefix = self._html_search_regex(

View File

@@ -13,7 +13,7 @@ from ..utils import (
class FKTVIE(InfoExtractor):
IE_NAME = 'fernsehkritik.tv'
_VALID_URL = r'http://(?:www\.)?fernsehkritik\.tv/folge-(?P<ep>[0-9]+)(?:/.*)?'
_VALID_URL = r'http://(?:www\.)?fernsehkritik\.tv/folge-(?P<id>[0-9]+)(?:/.*)?'
_TEST = {
'url': 'http://fernsehkritik.tv/folge-1',
@@ -26,29 +26,32 @@ class FKTVIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
episode = int(mobj.group('ep'))
episode = int(self._match_id(url))
server = random.randint(2, 4)
video_thumbnail = 'http://fernsehkritik.tv/images/magazin/folge%d.jpg' % episode
start_webpage = self._download_webpage('http://fernsehkritik.tv/folge-%d/Start' % episode,
video_thumbnail = 'http://fernsehkritik.tv/images/magazin/folge%s.jpg' % episode
start_webpage = self._download_webpage('http://fernsehkritik.tv/folge-%s/Start' % episode,
episode)
playlist = self._search_regex(r'playlist = (\[.*?\]);', start_webpage,
'playlist', flags=re.DOTALL)
files = json.loads(re.sub('{[^{}]*?}', '{}', playlist))
# TODO: return a single multipart video
videos = []
for i, _ in enumerate(files, 1):
video_id = '%04d%d' % (episode, i)
video_url = 'http://dl%d.fernsehkritik.tv/fernsehkritik%d%s.flv' % (server, episode, '' if i == 1 else '-%d' % i)
video_url = 'http://fernsehkritik.tv/js/directme.php?file=%s%s.flv' % (episode, '' if i == 1 else '-%d' % i)
videos.append({
'ext': 'flv',
'id': video_id,
'url': video_url,
'title': clean_html(get_element_by_id('eptitle', start_webpage)),
'description': clean_html(get_element_by_id('contentlist', start_webpage)),
'thumbnail': video_thumbnail
})
return videos
return {
'_type': 'multi_video',
'entries': videos,
'id': 'folge-%s' % episode,
}
class FKTVPosteckeIE(InfoExtractor):

View File

@@ -7,10 +7,9 @@ from ..compat import (
compat_urllib_request,
)
from ..utils import (
clean_html,
parse_duration,
parse_iso8601,
str_to_int,
unified_strdate,
)
@@ -28,68 +27,81 @@ class FourTubeIE(InfoExtractor):
'uploader': 'WCP Club',
'uploader_id': 'wcp-club',
'upload_date': '20131031',
'timestamp': 1383263892,
'duration': 583,
'view_count': int,
'like_count': int,
'categories': list,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage_url = 'http://www.4tube.com/videos/' + video_id
webpage = self._download_webpage(webpage_url, video_id)
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)
title = self._html_search_meta('name', webpage)
timestamp = parse_iso8601(self._html_search_meta(
'uploadDate', webpage))
thumbnail = self._html_search_meta('thumbnailUrl', webpage)
uploader_id = self._html_search_regex(
r'<a class="img-avatar" href="[^"]+/channels/([^/"]+)" title="Go to [^"]+ page">',
webpage, 'uploader id')
uploader = self._html_search_regex(
r'<a class="img-avatar" href="[^"]+/channels/[^/"]+" title="Go to ([^"]+) page">',
webpage, 'uploader')
playlist_json = self._html_search_regex(r'var playerConfigPlaylist\s+=\s+([^;]+)', webpage, 'Playlist')
media_id = self._search_regex(r'idMedia:\s*(\d+)', playlist_json, 'Media Id')
sources = self._search_regex(r'sources:\s*\[([^\]]*)\]', playlist_json, 'Sources').split(',')
title = self._search_regex(r'title:\s*"([^"]*)', playlist_json, 'Title')
thumbnail_url = self._search_regex(r'image:\s*"([^"]*)', playlist_json, 'Thumbnail', fatal=False)
categories_html = self._search_regex(
r'(?s)><i class="icon icon-tag"></i>\s*Categories / Tags\s*.*?<ul class="list">(.*?)</ul>',
webpage, 'categories', fatal=False)
categories = None
if categories_html:
categories = [
c.strip() for c in re.findall(
r'(?s)<li><a.*?>(.*?)</a>', categories_html)]
uploader_str = self._search_regex(r'<span>Uploaded by</span>(.*?)<span>', webpage, 'uploader', fatal=False)
mobj = re.search(r'<a href="/sites/(?P<id>[^"]+)"><strong>(?P<name>[^<]+)</strong></a>', uploader_str)
(uploader, uploader_id) = (mobj.group('name'), mobj.group('id')) if mobj else (clean_html(uploader_str), None)
view_count = str_to_int(self._search_regex(
r'<meta itemprop="interactionCount" content="UserPlays:([0-9,]+)">',
webpage, 'view count', fatal=False))
like_count = str_to_int(self._search_regex(
r'<meta itemprop="interactionCount" content="UserLikes:([0-9,]+)">',
webpage, 'like count', fatal=False))
duration = parse_duration(self._html_search_meta('duration', webpage))
upload_date = None
view_count = None
duration = None
description = self._html_search_meta('description', webpage, 'description')
if description:
upload_date = self._search_regex(r'Published Date: (\d{2} [a-zA-Z]{3} \d{4})', description, 'upload date',
fatal=False)
if upload_date:
upload_date = unified_strdate(upload_date)
view_count = self._search_regex(r'Views: ([\d,\.]+)', description, 'view count', fatal=False)
if view_count:
view_count = str_to_int(view_count)
duration = parse_duration(self._search_regex(r'Length: (\d+m\d+s)', description, 'duration', fatal=False))
params_js = self._search_regex(
r'\$\.ajax\(url,\ opts\);\s*\}\s*\}\)\(([0-9,\[\] ]+)\)',
webpage, 'initialization parameters'
)
params = self._parse_json('[%s]' % params_js, video_id)
media_id = params[0]
sources = ['%s' % p for p in params[2]]
token_url = "http://tkn.4tube.com/{0}/desktop/{1}".format(media_id, "+".join(sources))
token_url = 'http://tkn.4tube.com/{0}/desktop/{1}'.format(
media_id, '+'.join(sources))
headers = {
b'Content-Type': b'application/x-www-form-urlencoded',
b'Origin': b'http://www.4tube.com',
}
token_req = compat_urllib_request.Request(token_url, b'{}', headers)
tokens = self._download_json(token_req, video_id)
formats = [{
'url': tokens[format]['token'],
'format_id': format + 'p',
'resolution': format + 'p',
'quality': int(format),
} for format in sources]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail_url,
'categories': categories,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'upload_date': upload_date,
'timestamp': timestamp,
'like_count': like_count,
'view_count': view_count,
'duration': duration,
'age_limit': 18,
'webpage_url': webpage_url,
}

View File

@@ -57,8 +57,7 @@ class GameOneIE(InfoExtractor):
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
og_video = self._og_search_video_url(webpage, secure=False)

View File

@@ -39,7 +39,8 @@ class GDCVaultIE(InfoExtractor):
'id': '1015301',
'ext': 'flv',
'title': 'Thexder Meets Windows 95, or Writing Great Games in the Windows 95 Environment',
}
},
'skip': 'Requires login',
}
]

View File

@@ -131,12 +131,13 @@ class GenericIE(InfoExtractor):
# ooyala video
{
'url': 'http://www.rollingstone.com/music/videos/norwegian-dj-cashmere-cat-goes-spartan-on-with-me-premiere-20131219',
'md5': '5644c6ca5d5782c1d0d350dad9bd840c',
'md5': '166dd577b433b4d4ebfee10b0824d8ff',
'info_dict': {
'id': 'BwY2RxaTrTkslxOfcan0UCf0YqyvWysJ',
'ext': 'mp4',
'title': '2cc213299525360.mov', # that's what we get
},
'add_ie': ['Ooyala'],
},
# google redirect
{
@@ -146,7 +147,7 @@ class GenericIE(InfoExtractor):
'ext': 'mp4',
'upload_date': '20130224',
'uploader_id': 'TheVerge',
'description': 'Chris Ziegler takes a look at the Alcatel OneTouch Fire and the ZTE Open; two of the first Firefox OS handsets to be officially announced.',
'description': 're:^Chris Ziegler takes a look at the\.*',
'uploader': 'The Verge',
'title': 'First Firefox OS phones side-by-side',
},
@@ -181,6 +182,14 @@ class GenericIE(InfoExtractor):
'description': 'Episode 18: President Barack Obama sits down with Zach Galifianakis for his most memorable interview yet.',
},
},
# BBC iPlayer embeds
{
'url': 'http://www.bbc.co.uk/blogs/adamcurtis/posts/BUGGER',
'info_dict': {
'title': 'BBC - Blogs - Adam Curtis - BUGGER',
},
'playlist_mincount': 18,
},
# RUTV embed
{
'url': 'http://www.rg.ru/2014/03/15/reg-dfo/anklav-anons.html',
@@ -699,9 +708,9 @@ class GenericIE(InfoExtractor):
r'^(?:https?://)?([^/]*)/.*', url, 'video uploader')
# Helper method
def _playlist_from_matches(matches, getter, ie=None):
def _playlist_from_matches(matches, getter=None, ie=None):
urlrs = orderedSet(
self.url_result(self._proto_relative_url(getter(m)), ie)
self.url_result(self._proto_relative_url(getter(m) if getter else m), ie)
for m in matches)
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
@@ -908,7 +917,7 @@ class GenericIE(InfoExtractor):
# Look for BBC iPlayer embed
matches = re.findall(r'setPlaylist\("(https?://www\.bbc\.co\.uk/iplayer/[^/]+/[\da-z]{8})"\)', webpage)
if matches:
return self.playlist_result([self.url_result(video_url, ie='BBCCoUk') for video_url in matches])
return _playlist_from_matches(matches, ie='BBCCoUk')
# Look for embedded RUTV player
rutv_url = RUTVIE._extract_url(webpage)
@@ -917,7 +926,7 @@ class GenericIE(InfoExtractor):
# Look for embedded TED player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://embed\.ted\.com/.+?)\1', webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>https?://embed(?:-ssl)?\.ted\.com/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'TED')

View File

@@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
from .common import InfoExtractor
from ..utils import (
qualities,
compat_str,
parse_duration,
parse_iso8601,
str_to_int,
)
class GigaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?giga\.de/(?:[^/]+/)*(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.giga.de/filme/anime-awesome/trailer/anime-awesome-chihiros-reise-ins-zauberland-das-beste-kommt-zum-schluss/',
'md5': '6bc5535e945e724640664632055a584f',
'info_dict': {
'id': '2622086',
'display_id': 'anime-awesome-chihiros-reise-ins-zauberland-das-beste-kommt-zum-schluss',
'ext': 'mp4',
'title': 'Anime Awesome: Chihiros Reise ins Zauberland Das Beste kommt zum Schluss',
'description': 'md5:afdf5862241aded4718a30dff6a57baf',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 578,
'timestamp': 1414749706,
'upload_date': '20141031',
'uploader': 'Robin Schweiger',
'view_count': int,
},
}, {
'url': 'http://www.giga.de/games/channel/giga-top-montag/giga-topmontag-die-besten-serien-2014/',
'only_matching': True,
}, {
'url': 'http://www.giga.de/extra/netzkultur/videos/giga-games-tom-mats-robin-werden-eigene-wege-gehen-eine-ankuendigung/',
'only_matching': True,
}, {
'url': 'http://www.giga.de/tv/jonas-liest-spieletitel-eingedeutscht-episode-2/',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
[r'data-video-id="(\d+)"', r'/api/video/jwplayer/#v=(\d+)'],
webpage, 'video id')
playlist = self._download_json(
'http://www.giga.de/api/syndication/video/video_id/%s/playlist.json?content=syndication/key/368b5f151da4ae05ced7fa296bdff65a/'
% video_id, video_id)[0]
quality = qualities(['normal', 'hd720'])
formats = []
for format_id in itertools.count(0):
fmt = playlist.get(compat_str(format_id))
if not fmt:
break
formats.append({
'url': fmt['src'],
'format_id': '%s-%s' % (fmt['quality'], fmt['type'].split('/')[-1]),
'quality': quality(fmt['quality']),
})
self._sort_formats(formats)
title = self._html_search_meta(
'title', webpage, 'title', fatal=True)
description = self._html_search_meta(
'description', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage)
duration = parse_duration(self._search_regex(
r'(?s)(?:data-video-id="{0}"|data-video="[^"]*/api/video/jwplayer/#v={0}[^"]*")[^>]*>.+?<span class="duration">([^<]+)</span>'.format(video_id),
webpage, 'duration', fatal=False))
timestamp = parse_iso8601(self._search_regex(
r'datetime="([^"]+)"', webpage, 'upload date', fatal=False))
uploader = self._search_regex(
r'class="author">([^<]+)</a>', webpage, 'uploader', fatal=False)
view_count = str_to_int(self._search_regex(
r'<span class="views"><strong>([\d.]+)</strong>', webpage, 'view count', fatal=False))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'uploader': uploader,
'view_count': view_count,
'formats': formats,
}

View File

@@ -0,0 +1,117 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
compat_urlparse,
)
from ..utils import (
HEADRequest,
str_to_int,
urlencode_postdata,
urlhandle_detect_ext,
)
class HearThisAtIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hearthis\.at/(?P<artist>[^/]+)/(?P<title>[A-Za-z0-9\-]+)/?$'
_PLAYLIST_URL = 'https://hearthis.at/playlist.php'
_TEST = {
'url': 'https://hearthis.at/moofi/dr-kreep',
'md5': 'ab6ec33c8fed6556029337c7885eb4e0',
'info_dict': {
'id': '150939',
'ext': 'wav',
'title': 'Moofi - Dr. Kreep',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1421564134,
'description': 'Creepy Patch. Mutable Instruments Braids Vowel + Formant Mode.',
'upload_date': '20150118',
'comment_count': int,
'view_count': int,
'like_count': int,
'duration': 71,
'categories': ['Experimental'],
}
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
display_id = '{artist:s} - {title:s}'.format(**m.groupdict())
webpage = self._download_webpage(url, display_id)
track_id = self._search_regex(
r'intTrackId\s*=\s*(\d+)', webpage, 'track ID')
payload = urlencode_postdata({'tracks[]': track_id})
req = compat_urllib_request.Request(self._PLAYLIST_URL, payload)
req.add_header('Content-type', 'application/x-www-form-urlencoded')
track = self._download_json(req, track_id, 'Downloading playlist')[0]
title = '{artist:s} - {title:s}'.format(**track)
categories = None
if track.get('category'):
categories = [track['category']]
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
meta_span = r'<span[^>]+class="%s".*?</i>([^<]+)</span>'
view_count = str_to_int(self._search_regex(
meta_span % 'plays_count', webpage, 'view count', fatal=False))
like_count = str_to_int(self._search_regex(
meta_span % 'likes_count', webpage, 'like count', fatal=False))
comment_count = str_to_int(self._search_regex(
meta_span % 'comment_count', webpage, 'comment count', fatal=False))
duration = str_to_int(self._search_regex(
r'data-length="(\d+)', webpage, 'duration', fatal=False))
timestamp = str_to_int(self._search_regex(
r'<span[^>]+class="calctime"[^>]+data-time="(\d+)', webpage, 'timestamp', fatal=False))
formats = []
mp3_url = self._search_regex(
r'(?s)<a class="player-link"\s+(?:[a-zA-Z0-9_:-]+="[^"]+"\s+)*?data-mp3="([^"]+)"',
webpage, 'mp3 URL', fatal=False)
if mp3_url:
formats.append({
'format_id': 'mp3',
'vcodec': 'none',
'acodec': 'mp3',
'url': mp3_url,
})
download_path = self._search_regex(
r'<a class="[^"]*download_fct[^"]*"\s+href="([^"]+)"',
webpage, 'download URL', default=None)
if download_path:
download_url = compat_urlparse.urljoin(url, download_path)
ext_req = HEADRequest(download_url)
ext_handle = self._request_webpage(
ext_req, display_id, note='Determining extension')
ext = urlhandle_detect_ext(ext_handle)
formats.append({
'format_id': 'download',
'vcodec': 'none',
'ext': ext,
'url': download_url,
'preference': 2, # Usually better quality
})
self._sort_formats(formats)
return {
'id': track_id,
'display_id': display_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'description': description,
'duration': duration,
'timestamp': timestamp,
'view_count': view_count,
'comment_count': comment_count,
'like_count': like_count,
'categories': categories,
}

View File

@@ -39,8 +39,9 @@ class HuffPostIE(InfoExtractor):
data = self._download_json(api_url, video_id)['data']
video_title = data['title']
duration = parse_duration(data['running_time'])
upload_date = unified_strdate(data['schedule']['starts_at'])
duration = parse_duration(data.get('running_time'))
upload_date = unified_strdate(
data.get('schedule', {}).get('starts_at') or data.get('segment_start_date_time'))
description = data.get('description')
thumbnails = []
@@ -59,16 +60,11 @@ class HuffPostIE(InfoExtractor):
'ext': 'mp4',
'url': url,
'vcodec': 'none' if key.startswith('audio/') else None,
} for key, url in data['sources']['live'].items()]
if data.get('fivemin_id'):
fid = data['fivemin_id']
fcat = str(int(fid) // 100 + 1)
furl = 'http://avideos.5min.com/2/' + fcat[-3:] + '/' + fcat + '/' + fid + '.mp4'
formats.append({
'format': 'fivemin',
'url': furl,
'preference': 1,
})
} for key, url in data.get('sources', {}).get('live', {}).items()]
if not formats and data.get('fivemin_id'):
return self.url_result('5min:%s' % data['fivemin_id'])
self._sort_formats(formats)
return {

View File

@@ -16,7 +16,6 @@ class ImdbIE(InfoExtractor):
_TEST = {
'url': 'http://www.imdb.com/video/imdb/vi2524815897',
'md5': '9f34fa777ade3a6e57a054fdbcb3a068',
'info_dict': {
'id': '2524815897',
'ext': 'mp4',

View File

@@ -0,0 +1,40 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..utils import (
js_to_json,
)
class KaraoketvIE(InfoExtractor):
_VALID_URL = r'http://karaoketv\.co\.il/\?container=songs&id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://karaoketv.co.il/?container=songs&id=171568',
'info_dict': {
'id': '171568',
'ext': 'mp4',
'title': 'אל העולם שלך - רותם כהן - שרים קריוקי',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
page_video_url = self._og_search_video_url(webpage, video_id)
config_json = compat_urllib_parse.unquote_plus(self._search_regex(
r'config=(.*)', page_video_url, 'configuration'))
urls_info_json = self._download_json(
config_json, video_id, 'Downloading configuration',
transform_source=js_to_json)
url = urls_info_json['playlist'][0]['url']
return {
'id': video_id,
'title': self._og_search_title(webpage),
'url': url,
}

View File

@@ -22,8 +22,10 @@ class KhanAcademyIE(InfoExtractor):
'description': 'The perfect cipher',
'duration': 176,
'uploader': 'Brit Cruise',
'uploader_id': 'khanacademy',
'upload_date': '20120411',
}
},
'add_ie': ['Youtube'],
}, {
'url': 'https://www.khanacademy.org/math/applied-math/cryptography',
'info_dict': {

View File

@@ -10,13 +10,14 @@ from ..utils import int_or_none
class KontrTubeIE(InfoExtractor):
IE_NAME = 'kontrtube'
IE_DESC = 'KontrTube.ru - Труба зовёт'
_VALID_URL = r'http://(?:www\.)?kontrtube\.ru/videos/(?P<id>\d+)/.+'
_VALID_URL = r'http://(?:www\.)?kontrtube\.ru/videos/(?P<id>\d+)/(?P<display_id>[^/]+)/'
_TEST = {
'url': 'http://www.kontrtube.ru/videos/2678/nad-olimpiyskoy-derevney-v-sochi-podnyat-rossiyskiy-flag/',
'md5': '975a991a4926c9a85f383a736a2e6b80',
'info_dict': {
'id': '2678',
'display_id': 'nad-olimpiyskoy-derevney-v-sochi-podnyat-rossiyskiy-flag',
'ext': 'mp4',
'title': 'Над олимпийской деревней в Сочи поднят российский флаг',
'description': 'md5:80edc4c613d5887ae8ccf1d59432be41',
@@ -28,21 +29,28 @@ class KontrTubeIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, video_id, 'Downloading page')
webpage = self._download_webpage(
url, display_id, 'Downloading page')
video_url = self._html_search_regex(r"video_url: '(.+?)/?',", webpage, 'video URL')
thumbnail = self._html_search_regex(r"preview_url: '(.+?)/?',", webpage, 'video thumbnail', fatal=False)
video_url = self._html_search_regex(
r"video_url\s*:\s*'(.+?)/?',", webpage, 'video URL')
thumbnail = self._html_search_regex(
r"preview_url\s*:\s*'(.+?)/?',", webpage, 'video thumbnail', fatal=False)
title = self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'video title')
description = self._html_search_meta('description', webpage, 'video description')
description = self._html_search_meta(
'description', webpage, 'video description')
mobj = re.search(
r'<div class="col_2">Длительность: <span>(?P<minutes>\d+)м:(?P<seconds>\d+)с</span></div>', webpage)
r'<div class="col_2">Длительность: <span>(?P<minutes>\d+)м:(?P<seconds>\d+)с</span></div>',
webpage)
duration = int(mobj.group('minutes')) * 60 + int(mobj.group('seconds')) if mobj else None
view_count = self._html_search_regex(
r'<div class="col_2">Просмотров: <span>(\d+)</span></div>', webpage, 'view count', fatal=False)
r'<div class="col_2">Просмотров: <span>(\d+)</span></div>',
webpage, 'view count', fatal=False)
comment_count = None
comment_str = self._html_search_regex(
@@ -56,6 +64,7 @@ class KontrTubeIE(InfoExtractor):
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'thumbnail': thumbnail,
'title': title,

View File

@@ -0,0 +1,124 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
js_to_json,
unified_strdate,
)
class LnkGoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lnkgo\.alfa\.lt/visi\-video/(?P<show>[^/]+)/ziurek\-(?P<display_id>[A-Za-z0-9\-]+)'
_TESTS = [{
'url': 'http://lnkgo.alfa.lt/visi-video/yra-kaip-yra/ziurek-yra-kaip-yra-162',
'info_dict': {
'id': '46712',
'ext': 'mp4',
'title': 'Yra kaip yra',
'upload_date': '20150107',
'description': 'md5:d82a5e36b775b7048617f263a0e3475e',
'age_limit': 7,
'duration': 3019,
'thumbnail': 're:^https?://.*\.jpg$'
},
'params': {
'skip_download': True, # HLS download
},
}, {
'url': 'http://lnkgo.alfa.lt/visi-video/aktualai-pratesimas/ziurek-nerdas-taiso-kompiuteri-2',
'info_dict': {
'id': '47289',
'ext': 'mp4',
'title': 'Nėrdas: Kompiuterio Valymas',
'upload_date': '20150113',
'description': 'md5:7352d113a242a808676ff17e69db6a69',
'age_limit': 18,
'duration': 346,
'thumbnail': 're:^https?://.*\.jpg$'
},
'params': {
'skip_download': True, # HLS download
},
}]
_AGE_LIMITS = {
'N-7': 7,
'N-14': 14,
'S': 18,
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
webpage = self._download_webpage(
url, display_id, 'Downloading player webpage')
video_id = self._search_regex(
r'data-ep="([^"]+)"', webpage, 'video ID')
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail_w = int_or_none(
self._og_search_property('image:width', webpage, 'thumbnail width', fatal=False))
thumbnail_h = int_or_none(
self._og_search_property('image:height', webpage, 'thumbnail height', fatal=False))
thumbnail = {
'url': self._og_search_thumbnail(webpage),
}
if thumbnail_w and thumbnail_h:
thumbnail.update({
'width': thumbnail_w,
'height': thumbnail_h,
})
upload_date = unified_strdate(self._search_regex(
r'class="meta-item\sair-time">.*?<strong>([^<]+)</strong>', webpage, 'upload date', fatal=False))
duration = int_or_none(self._search_regex(
r'VideoDuration = "([^"]+)"', webpage, 'duration', fatal=False))
pg_rating = self._search_regex(
r'pgrating="([^"]+)"', webpage, 'PG rating', fatal=False, default='')
age_limit = self._AGE_LIMITS.get(pg_rating.upper(), 0)
sources_js = self._search_regex(
r'(?s)sources:\s(\[.*?\]),', webpage, 'sources')
sources = self._parse_json(
sources_js, video_id, transform_source=js_to_json)
formats = []
for source in sources:
if source.get('provider') == 'rtmp':
m = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<play_path>.+)$', source['file'])
if not m:
continue
formats.append({
'format_id': 'rtmp',
'ext': 'flv',
'url': m.group('url'),
'play_path': m.group('play_path'),
'page_url': url,
})
elif source.get('file').endswith('.m3u8'):
formats.append({
'format_id': 'hls',
'ext': source.get('type', 'mp4'),
'url': source['file'],
})
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'formats': formats,
'thumbnails': [thumbnail],
'duration': duration,
'description': description,
'age_limit': age_limit,
'upload_date': upload_date,
}

View File

@@ -2,7 +2,6 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import (
@@ -28,7 +27,6 @@ class LRTIE(InfoExtractor):
'params': {
'skip_download': True, # HLS download
},
}
def _real_extract(self, url):
@@ -44,7 +42,9 @@ class LRTIE(InfoExtractor):
formats = []
for js in re.findall(r'(?s)config:\s*(\{.*?\})', webpage):
data = json.loads(js_to_json(js))
data = self._parse_json(js, video_id, transform_source=js_to_json)
if 'provider' not in data:
continue
if data['provider'] == 'rtmp':
formats.append({
'format_id': 'rtmp',

View File

@@ -105,6 +105,9 @@ class OCWMITIE(InfoExtractor):
'ext': 'mp4',
'title': 'Lecture 7: Multiple Discrete Random Variables: Expectations, Conditioning, Independence',
'description': 'In this lecture, the professor discussed multiple random variables, expectations, and binomial distribution.',
'upload_date': '20121109',
'uploader_id': 'MIT',
'uploader': 'MIT OpenCourseWare',
# 'subtitles': 'http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systems-analysis-and-applied-probability-fall-2010/video-lectures/lecture-7-multiple-variables-expectations-independence/MIT6_041F11_lec07_300k.mp4.srt'
}
},
@@ -114,6 +117,9 @@ class OCWMITIE(InfoExtractor):
'id': '7K1sB05pE0A',
'ext': 'mp4',
'title': 'Session 1: Introduction to Derivatives',
'upload_date': '20090818',
'uploader_id': 'MIT',
'uploader': 'MIT OpenCourseWare',
'description': 'This section contains lecture video excerpts, lecture notes, an interactive mathlet with supporting documents, and problem solving videos.',
# 'subtitles': 'http://ocw.mit.edu//courses/mathematics/18-01sc-single-variable-calculus-fall-2010/ocw-18.01-f07-lec01_300k.SRT'
}

View File

@@ -1,63 +1,49 @@
# coding: utf-8
from __future__ import unicode_literals
import hashlib
import json
import time
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
)
from ..utils import (
int_or_none,
compat_urlparse,
)
class MotorsportIE(InfoExtractor):
IE_DESC = 'motorsport.com'
_VALID_URL = r'http://www\.motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/(?:$|[?#])'
_VALID_URL = r'http://www\.motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/?(?:$|[?#])'
_TEST = {
'url': 'http://www.motorsport.com/f1/video/main-gallery/red-bull-racing-2014-rules-explained/',
'md5': '5592cb7c5005d9b2c163df5ac3dc04e4',
'info_dict': {
'id': '7063',
'id': '2-T3WuR-KMM',
'ext': 'mp4',
'title': 'Red Bull Racing: 2014 Rules Explained',
'duration': 207,
'duration': 208,
'description': 'A new clip from Red Bull sees Daniel Ricciardo and Sebastian Vettel explain the 2014 Formula One regulations which are arguably the most complex the sport has ever seen.',
'uploader': 'rainiere',
'thumbnail': r're:^http://.*motorsport\.com/.+\.jpg$'
}
'uploader': 'mcomstaff',
'uploader_id': 'UC334JIYKkVnyFoNCclfZtHQ',
'upload_date': '20140903',
'thumbnail': r're:^https?://.+\.jpg$'
},
'add_ie': ['Youtube'],
'params': {
'skip_download': True,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
flashvars_code = self._html_search_regex(
r'<embed id="player".*?flashvars="([^"]+)"', webpage, 'flashvars')
flashvars = compat_parse_qs(flashvars_code)
params = json.loads(flashvars['parameters'][0])
e = compat_str(int(time.time()) + 24 * 60 * 60)
base_video_url = params['location'] + '?e=' + e
s = 'h3hg713fh32'
h = hashlib.md5((s + base_video_url).encode('utf-8')).hexdigest()
video_url = base_video_url + '&h=' + h
uploader = self._html_search_regex(
r'(?s)<span class="label">Video by: </span>(.*?)</a>', webpage,
'uploader', fatal=False)
iframe_path = self._html_search_regex(
r'<iframe id="player_iframe"[^>]+src="([^"]+)"', webpage,
'iframe path')
iframe = self._download_webpage(
compat_urlparse.urljoin(url, iframe_path), display_id,
'Downloading iframe')
youtube_id = self._search_regex(
r'www.youtube.com/embed/(.{11})', iframe, 'youtube id')
return {
'id': params['video_id'],
'_type': 'url_transparent',
'display_id': display_id,
'title': params['title'],
'url': video_url,
'description': params.get('description'),
'thumbnail': params.get('main_thumb'),
'duration': int_or_none(params.get('duration')),
'uploader': uploader,
'url': 'https://youtube.com/watch?v=%s' % youtube_id,
}

View File

@@ -6,6 +6,7 @@ import json
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_HTTPError,
)
from ..utils import (
ExtractorError,
@@ -78,6 +79,16 @@ class NBCNewsIE(InfoExtractor):
},
'add_ie': ['ThePlatform'],
},
{
'url': 'http://www.nbcnews.com/feature/dateline-full-episodes/full-episode-family-business-n285156',
'md5': 'fdbf39ab73a72df5896b6234ff98518a',
'info_dict': {
'id': 'Wjf9EDR3A_60',
'ext': 'mp4',
'title': 'FULL EPISODE: Family Business',
'description': 'md5:757988edbaae9d7be1d585eb5d55cc04',
},
},
]
def _real_extract(self, url):
@@ -115,10 +126,19 @@ class NBCNewsIE(InfoExtractor):
if not base_url:
continue
playlist_url = base_url + '?form=MPXNBCNewsAPI'
all_videos = self._download_json(playlist_url, title)['videos']
try:
info = next(v for v in all_videos if v['mpxId'] == mpxid)
all_videos = self._download_json(playlist_url, title)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError):
continue
raise
if not all_videos or 'videos' not in all_videos:
continue
try:
info = next(v for v in all_videos['videos'] if v['mpxId'] == mpxid)
break
except StopIteration:
continue

View File

@@ -27,9 +27,7 @@ class NDTVIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
filename = self._search_regex(

View File

@@ -0,0 +1,86 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
js_to_json,
parse_iso8601,
)
class NetzkinoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?netzkino\.de/\#!/(?P<category>[^/]+)/(?P<id>[^/]+)'
_TEST = {
'url': 'http://www.netzkino.de/#!/scifikino/rakete-zum-mond',
'md5': '92a3f8b76f8d7220acce5377ea5d4873',
'info_dict': {
'id': 'rakete-zum-mond',
'ext': 'mp4',
'title': 'Rakete zum Mond (Endstation Mond, Destination Moon)',
'comments': 'mincount:3',
'description': 'md5:1eddeacc7e62d5a25a2d1a7290c64a28',
'upload_date': '20120813',
'thumbnail': 're:https?://.*\.jpg$',
'timestamp': 1344858571,
'age_limit': 12,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
category_id = mobj.group('category')
video_id = mobj.group('id')
api_url = 'http://api.netzkino.de.simplecache.net/capi-2.0a/categories/%s.json?d=www' % category_id
api_info = self._download_json(api_url, video_id)
info = next(
p for p in api_info['posts'] if p['slug'] == video_id)
custom_fields = info['custom_fields']
production_js = self._download_webpage(
'http://www.netzkino.de/beta/dist/production.min.js', video_id,
note='Downloading player code')
avo_js = self._search_regex(
r'window\.avoCore\s*=.*?urlTemplate:\s*(\{.*?"\})',
production_js, 'URL templates')
templates = self._parse_json(
avo_js, video_id, transform_source=js_to_json)
suffix = {
'hds': '.mp4/manifest.f4m',
'hls': '.mp4/master.m3u8',
'pmd': '.mp4',
}
film_fn = custom_fields['Streaming'][0]
formats = [{
'format_id': key,
'ext': 'mp4',
'url': tpl.replace('{}', film_fn) + suffix[key],
} for key, tpl in templates.items()]
self._sort_formats(formats)
comments = [{
'timestamp': parse_iso8601(c.get('date'), delimiter=' '),
'id': c['id'],
'author': c['name'],
'html': c['content'],
'parent': 'root' if c.get('parent', 0) == 0 else c['parent'],
} for c in info.get('comments', [])]
return {
'id': video_id,
'formats': formats,
'comments': comments,
'title': info['title'],
'age_limit': int_or_none(custom_fields.get('FSK')[0]),
'timestamp': parse_iso8601(info.get('date'), delimiter=' '),
'description': clean_html(info.get('content')),
'thumbnail': info.get('thumbnail'),
'playlist_title': api_info.get('title'),
'playlist_id': category_id,
}

View File

@@ -22,7 +22,11 @@ class NormalbootsIE(InfoExtractor):
'description': 'Jon is late for Christmas. Typical. Thanks to: Paul Ritchey for Co-Writing/Filming: http://www.youtube.com/user/ContinueShow Michael Azzi for Christmas Intro Animation: http://michafrar.tumblr.com/ Jerrod Waters for Christmas Intro Music: http://www.youtube.com/user/xXJerryTerryXx Casey Ormond for Tense Battle Theme:\xa0http://www.youtube.com/Kiamet/',
'uploader': 'JonTron',
'upload_date': '20140125',
}
},
'params': {
# rtmp download
'skip_download': True,
},
}
def _real_extract(self, url):

View File

@@ -1,19 +1,26 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
unified_strdate,
fix_xml_ampersands,
parse_duration,
qualities,
strip_jsonp,
unified_strdate,
url_basename,
fix_xml_ampersands,
)
class NPOIE(InfoExtractor):
class NPOBaseIE(InfoExtractor):
def _get_token(self, video_id):
token_page = self._download_webpage(
'http://ida.omroep.nl/npoplayer/i.js',
video_id, note='Downloading token')
return self._search_regex(
r'npoplayer\.token = "(.+?)"', token_page, 'token')
class NPOIE(NPOBaseIE):
IE_NAME = 'npo.nl'
_VALID_URL = r'https?://www\.npo\.nl/[^/]+/[^/]+/(?P<id>[^/?]+)'
@@ -67,11 +74,20 @@ class NPOIE(InfoExtractor):
'skip_download': True,
}
},
# non asf in streams
{
'url': 'http://www.npo.nl/hoe-gaat-europa-verder-na-parijs/10-01-2015/WO_NOS_762771',
'md5': 'b3da13de374cbe2d5332a7e910bef97f',
'info_dict': {
'id': 'WO_NOS_762771',
'ext': 'mp4',
'title': 'Hoe gaat Europa verder na Parijs?',
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
return self._get_info(video_id)
def _get_info(self, video_id):
@@ -81,12 +97,8 @@ class NPOIE(InfoExtractor):
# We have to remove the javascript callback
transform_source=strip_jsonp,
)
token_page = self._download_webpage(
'http://ida.omroep.nl/npoplayer/i.js',
video_id,
note='Downloading token'
)
token = self._search_regex(r'npoplayer\.token = "(.+?)"', token_page, 'token')
token = self._get_token(video_id)
formats = []
@@ -125,6 +137,12 @@ class NPOIE(InfoExtractor):
stream_url = stream.get('url')
if not stream_url:
continue
if '.asf' not in stream_url:
formats.append({
'url': stream_url,
'quality': stream.get('kwaliteit'),
})
continue
asx = self._download_xml(
stream_url, video_id,
'Downloading stream %d ASX playlist' % i,
@@ -154,6 +172,83 @@ class NPOIE(InfoExtractor):
}
class NPOLiveIE(NPOBaseIE):
IE_NAME = 'npo.nl:live'
_VALID_URL = r'https?://www\.npo\.nl/live/(?P<id>.+)'
_TEST = {
'url': 'http://www.npo.nl/live/npo-1',
'info_dict': {
'id': 'LI_NEDERLAND1_136692',
'display_id': 'npo-1',
'ext': 'mp4',
'title': 're:^Nederland 1 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'Livestream',
'is_live': True,
},
'params': {
'skip_download': True,
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
live_id = self._search_regex(
r'data-prid="([^"]+)"', webpage, 'live id')
metadata = self._download_json(
'http://e.omroep.nl/metadata/%s' % live_id,
display_id, transform_source=strip_jsonp)
token = self._get_token(display_id)
formats = []
streams = metadata.get('streams')
if streams:
for stream in streams:
stream_type = stream.get('type').lower()
if stream_type == 'ss':
continue
stream_info = self._download_json(
'http://ida.omroep.nl/aapi/?stream=%s&token=%s&type=jsonp'
% (stream.get('url'), token),
display_id, 'Downloading %s JSON' % stream_type)
if stream_info.get('error_code', 0) or stream_info.get('errorcode', 0):
continue
stream_url = self._download_json(
stream_info['stream'], display_id,
'Downloading %s URL' % stream_type,
transform_source=strip_jsonp)
if stream_type == 'hds':
f4m_formats = self._extract_f4m_formats(stream_url, display_id)
# f4m downloader downloads only piece of live stream
for f4m_format in f4m_formats:
f4m_format['preference'] = -1
formats.extend(f4m_formats)
elif stream_type == 'hls':
formats.extend(self._extract_m3u8_formats(stream_url, display_id, 'mp4'))
else:
formats.append({
'url': stream_url,
})
self._sort_formats(formats)
return {
'id': live_id,
'display_id': display_id,
'title': self._live_title(metadata['titel']),
'description': metadata['info'],
'thumbnail': metadata.get('images', [{'url': None}])[-1]['url'],
'formats': formats,
'is_live': True,
}
class TegenlichtVproIE(NPOIE):
IE_NAME = 'tegenlicht.vpro.nl'
_VALID_URL = r'https?://tegenlicht\.vpro\.nl/afleveringen/.*?'

View File

@@ -7,8 +7,10 @@ from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
parse_duration,
unified_strdate,
)
from .subtitles import SubtitlesInfoExtractor
class NRKIE(InfoExtractor):
@@ -71,8 +73,8 @@ class NRKIE(InfoExtractor):
}
class NRKTVIE(InfoExtractor):
_VALID_URL = r'http://tv\.nrk(?:super)?\.no/(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})'
class NRKTVIE(SubtitlesInfoExtractor):
_VALID_URL = r'(?P<baseurl>http://tv\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
_TESTS = [
{
@@ -85,7 +87,7 @@ class NRKTVIE(InfoExtractor):
'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'upload_date': '20140523',
'duration': 1741.52,
}
},
},
{
'url': 'http://tv.nrk.no/program/mdfp15000514',
@@ -97,42 +99,155 @@ class NRKTVIE(InfoExtractor):
'description': 'md5:654c12511f035aed1e42bdf5db3b206a',
'upload_date': '20140524',
'duration': 4605.0,
}
},
},
{
# single playlist video
'url': 'http://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
'md5': 'adbd1dbd813edaf532b0a253780719c2',
'info_dict': {
'id': 'MSPO40010515-part2',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'upload_date': '20150106',
},
'skip': 'Only works from Norway',
},
{
'url': 'http://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
'playlist': [
{
'md5': '9480285eff92d64f06e02a5367970a7a',
'info_dict': {
'id': 'MSPO40010515-part1',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 1:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'upload_date': '20150106',
},
},
{
'md5': 'adbd1dbd813edaf532b0a253780719c2',
'info_dict': {
'id': 'MSPO40010515-part2',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'upload_date': '20150106',
},
},
],
'info_dict': {
'id': 'MSPO40010515',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'upload_date': '20150106',
'duration': 6947.5199999999995,
},
'skip': 'Only works from Norway',
}
]
def _seconds2str(self, s):
return '%02d:%02d:%02d.%03d' % (s / 3600, (s % 3600) / 60, s % 60, (s % 1) * 1000)
def _debug_print(self, txt):
if self._downloader.params.get('verbose', False):
self.to_screen('[debug] %s' % txt)
def _extract_captions(self, subtitlesurl, video_id, baseurl):
url = "%s%s" % (baseurl, subtitlesurl)
self._debug_print('%s: Subtitle url: %s' % (video_id, url))
captions = self._download_xml(url, video_id, 'Downloading subtitles')
lang = captions.get('lang', 'no')
ps = captions.findall('./{0}body/{0}div/{0}p'.format('{http://www.w3.org/ns/ttml}'))
srt = ''
for pos, p in enumerate(ps):
begin = parse_duration(p.get('begin'))
duration = parse_duration(p.get('dur'))
starttime = self._seconds2str(begin)
endtime = self._seconds2str(begin + duration)
text = '\n'.join(p.itertext())
srt += '%s\r\n%s --> %s\r\n%s\r\n\r\n' % (str(pos), starttime, endtime, text)
return {lang: srt}
def _extract_f4m(self, manifest_url, video_id):
return self._extract_f4m_formats(manifest_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124', video_id)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
part_id = mobj.group('part_id')
baseurl = mobj.group('baseurl')
page = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta('title', page, 'title')
description = self._html_search_meta('description', page, 'description')
thumbnail = self._html_search_regex(r'data-posterimage="([^"]+)"', page, 'thumbnail', fatal=False)
upload_date = unified_strdate(self._html_search_meta('rightsfrom', page, 'upload date', fatal=False))
duration = float_or_none(
self._html_search_regex(r'data-duration="([^"]+)"', page, 'duration', fatal=False))
title = self._html_search_meta(
'title', webpage, 'title')
description = self._html_search_meta(
'description', webpage, 'description')
thumbnail = self._html_search_regex(
r'data-posterimage="([^"]+)"',
webpage, 'thumbnail', fatal=False)
upload_date = unified_strdate(self._html_search_meta(
'rightsfrom', webpage, 'upload date', fatal=False))
duration = float_or_none(self._html_search_regex(
r'data-duration="([^"]+)"',
webpage, 'duration', fatal=False))
# playlist
parts = re.findall(
r'<a href="#del=(\d+)"[^>]+data-argument="([^"]+)">([^<]+)</a>', webpage)
if parts:
entries = []
for current_part_id, stream_url, part_title in parts:
if part_id and current_part_id != part_id:
continue
video_part_id = '%s-part%s' % (video_id, current_part_id)
formats = self._extract_f4m(stream_url, video_part_id)
entries.append({
'id': video_part_id,
'title': part_title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'formats': formats,
})
if part_id:
if entries:
return entries[0]
else:
playlist = self.playlist_result(entries, video_id, title, description)
playlist.update({
'thumbnail': thumbnail,
'upload_date': upload_date,
'duration': duration,
})
return playlist
formats = []
f4m_url = re.search(r'data-media="([^"]+)"', page)
f4m_url = re.search(r'data-media="([^"]+)"', webpage)
if f4m_url:
formats.append({
'url': f4m_url.group(1) + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124',
'format_id': 'f4m',
'ext': 'flv',
})
formats.extend(self._extract_f4m(f4m_url.group(1), video_id))
m3u8_url = re.search(r'data-hls-media="([^"]+)"', page)
m3u8_url = re.search(r'data-hls-media="([^"]+)"', webpage)
if m3u8_url:
formats.append({
'url': m3u8_url.group(1),
'format_id': 'm3u8',
})
formats.extend(self._extract_m3u8_formats(m3u8_url.group(1), video_id, 'mp4'))
self._sort_formats(formats)
subtitles_url = self._html_search_regex(
r'data-subtitlesurl[ ]*=[ ]*"([^"]+)"',
webpage, 'subtitle URL', default=None)
subtitles = None
if subtitles_url:
subtitles = self._extract_captions(subtitles_url, video_id, baseurl)
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id, subtitles)
return
return {
'id': video_id,
'title': title,
@@ -141,4 +256,5 @@ class NRKTVIE(InfoExtractor):
'upload_date': upload_date,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -128,13 +128,16 @@ class ORFTVthekIE(InfoExtractor):
}
# Audios on ORF radio are only available for 7 days, so we can't add tests.
class ORFOE1IE(InfoExtractor):
IE_NAME = 'orf:oe1'
IE_DESC = 'Radio Österreich 1'
_VALID_URL = r'http://oe1\.orf\.at/programm/(?P<id>[0-9]+)'
_VALID_URL = r'http://oe1\.orf\.at/(?:programm/|konsole.*?#\?track_id=)(?P<id>[0-9]+)'
# Audios on ORF radio are only available for 7 days, so we can't add tests.
_TEST = {
'url': 'http://oe1.orf.at/konsole?show=on_demand#?track_id=394211',
'only_matching': True,
}
def _real_extract(self, url):
show_id = self._match_id(url)
@@ -160,7 +163,7 @@ class ORFOE1IE(InfoExtractor):
class ORFFM4IE(InfoExtractor):
IE_DESC = 'orf:fm4'
IE_NAME = 'orf:fm4'
IE_DESC = 'radio FM4'
_VALID_URL = r'http://fm4\.orf\.at/7tage/?#(?P<date>[0-9]+)/(?P<show>\w+)'

View File

@@ -26,6 +26,7 @@ class PlayedIE(InfoExtractor):
'ext': 'flv',
'title': 'youtube-dl_test_video.mp4',
},
'skip': 'Removed for copyright infringement.', # oh wow
}
def _real_extract(self, url):

View File

@@ -10,6 +10,7 @@ from ..compat import (
compat_urllib_request,
)
from ..utils import (
ExtractorError,
str_to_int,
)
from ..aes import (
@@ -44,6 +45,15 @@ class PornHubIE(InfoExtractor):
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
error_msg = self._html_search_regex(
r'(?s)<div class="userMessageSection[^"]*".*?>(.*?)</div>',
webpage, 'error message', default=None)
if error_msg:
error_msg = re.sub(r'\s+', ' ', error_msg)
raise ExtractorError(
'PornHub said: %s' % error_msg,
expected=True, video_id=video_id)
video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
video_uploader = self._html_search_regex(
r'(?s)From:&nbsp;.+?<(?:a href="/users/|a href="/channels/|<span class="username)[^>]+>(.+?)<',

View File

@@ -0,0 +1,63 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import parse_duration
class RadioBremenIE(InfoExtractor):
_VALID_URL = r'http?://(?:www\.)?radiobremen\.de/mediathek/(?:index\.html)?\?id=(?P<id>[0-9]+)'
IE_NAME = 'radiobremen'
_TEST = {
'url': 'http://www.radiobremen.de/mediathek/index.html?id=114720',
'info_dict': {
'id': '114720',
'ext': 'mp4',
'duration': 1685,
'width': 512,
'title': 'buten un binnen vom 22. Dezember',
'thumbnail': 're:https?://.*\.jpg$',
'description': 'Unter anderem mit diesen Themen: 45 Flüchtlinge sind in Worpswede angekommen +++ Freies Internet für alle: Bremer arbeiten an einem flächendeckenden W-Lan-Netzwerk +++ Aktivisten kämpfen für das Unibad +++ So war das Wetter 2014 +++',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
meta_url = "http://www.radiobremen.de/apps/php/mediathek/metadaten.php?id=%s" % video_id
meta_doc = self._download_webpage(
meta_url, video_id, 'Downloading metadata')
title = self._html_search_regex(
r"<h1.*>(?P<title>.+)</h1>", meta_doc, "title")
description = self._html_search_regex(
r"<p>(?P<description>.*)</p>", meta_doc, "description", fatal=False)
duration = parse_duration(self._html_search_regex(
r"L&auml;nge:</td>\s+<td>(?P<duration>[0-9]+:[0-9]+)</td>",
meta_doc, "duration", fatal=False))
page_doc = self._download_webpage(
url, video_id, 'Downloading video information')
mobj = re.search(
r"ardformatplayerclassic\(\'playerbereich\',\'(?P<width>[0-9]+)\',\'.*\',\'(?P<video_id>[0-9]+)\',\'(?P<secret>[0-9]+)\',\'(?P<thumbnail>.+)\',\'\'\)",
page_doc)
video_url = (
"http://dl-ondemand.radiobremen.de/mediabase/%s/%s_%s_%s.mp4" %
(video_id, video_id, mobj.group("secret"), mobj.group('width')))
formats = [{
'url': video_url,
'ext': 'mp4',
'width': int(mobj.group("width")),
}]
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'formats': formats,
'thumbnail': mobj.group('thumbnail'),
}

View File

@@ -0,0 +1,62 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
float_or_none,
)
class RteIE(InfoExtractor):
_VALID_URL = r'http?://(?:www\.)?rte\.ie/player/[^/]{2,3}/show/(?P<id>[0-9]+)/'
_TEST = {
'url': 'http://www.rte.ie/player/de/show/10363114/',
'info_dict': {
'id': '10363114',
'ext': 'mp4',
'title': 'One News',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'The One O\'Clock News followed by Weather.',
'duration': 436.844,
},
'params': {
'skip_download': 'f4m fails with --test atm'
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
description = self._html_search_meta('description', webpage, 'description')
duration = float_or_none(self._html_search_meta(
'duration', webpage, 'duration', fatal=False), 1000)
thumbnail_id = self._search_regex(
r'<meta name="thumbnail" content="uri:irus:(.*?)" />', webpage, 'thumbnail')
thumbnail = 'http://img.rasset.ie/' + thumbnail_id + '.jpg'
feeds_url = self._html_search_meta("feeds-prefix", webpage, 'feeds url') + video_id
json_string = self._download_json(feeds_url, video_id)
# f4m_url = server + relative_url
f4m_url = json_string['shows'][0]['media:group'][0]['rte:server'] + json_string['shows'][0]['media:group'][0]['url']
f4m_formats = self._extract_f4m_formats(f4m_url, video_id)
f4m_formats = [{
'format_id': f['format_id'],
'url': f['url'],
'ext': 'mp4',
'width': f['width'],
'height': f['height'],
} for f in f4m_formats]
return {
'id': video_id,
'title': title,
'formats': f4m_formats,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
}

View File

@@ -8,7 +8,7 @@ from ..utils import parse_duration
class RtlXlIE(InfoExtractor):
IE_NAME = 'rtlxl.nl'
_VALID_URL = r'https?://www\.rtlxl\.nl/#!/[^/]+/(?P<uuid>[^/?]+)'
_VALID_URL = r'https?://(www\.)?rtlxl\.nl/#!/[^/]+/(?P<uuid>[^/?]+)'
_TEST = {
'url': 'http://www.rtlxl.nl/#!/rtl-nieuws-132237/6e4203a6-0a5e-3596-8424-c599a59e0677',

View File

@@ -70,6 +70,37 @@ class RutubeIE(InfoExtractor):
}
class RutubeEmbedIE(InfoExtractor):
IE_NAME = 'rutube:embed'
IE_DESC = 'Rutube embedded videos'
_VALID_URL = 'https?://rutube\.ru/video/embed/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://rutube.ru/video/embed/6722881?vk_puid37=&vk_puid38=',
'info_dict': {
'id': 'a10e53b86e8f349080f718582ce4c661',
'ext': 'mp4',
'upload_date': '20131223',
'uploader_id': '297833',
'description': 'Видео группы ★http://vk.com/foxkidsreset★ музей Fox Kids и Jetix<br/><br/> восстановлено и сделано в шикоформате subziro89 http://vk.com/subziro89',
'uploader': 'subziro89 ILya',
'title': 'Мистический городок Эйри в Индиан 5 серия озвучка subziro89',
},
'params': {
'skip_download': 'Requires ffmpeg',
},
}
def _real_extract(self, url):
embed_id = self._match_id(url)
webpage = self._download_webpage(url, embed_id)
canonical_url = self._html_search_regex(
r'<link\s+rel="canonical"\s+href="([^"]+?)"', webpage,
'Canonical URL')
return self.url_result(canonical_url, 'Rutube')
class RutubeChannelIE(InfoExtractor):
IE_NAME = 'rutube:channel'
IE_DESC = 'Rutube channels'

View File

@@ -24,7 +24,7 @@ class SexyKarmaIE(InfoExtractor):
'title': 'Taking a quick pee.',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'wildginger7',
'upload_date': '20141007',
'upload_date': '20141008',
'duration': 22,
'view_count': int,
'comment_count': int,
@@ -45,6 +45,7 @@ class SexyKarmaIE(InfoExtractor):
'view_count': int,
'comment_count': int,
'categories': list,
'age_limit': 18,
}
}, {
'url': 'http://www.watchindianporn.net/video/desi-dancer-namrata-stripping-completely-nude-and-dancing-on-a-hot-number-dW2mtctxJfs.html',
@@ -61,6 +62,7 @@ class SexyKarmaIE(InfoExtractor):
'view_count': int,
'comment_count': int,
'categories': list,
'age_limit': 18,
}
}]
@@ -114,4 +116,5 @@ class SexyKarmaIE(InfoExtractor):
'view_count': view_count,
'comment_count': comment_count,
'categories': categories,
'age_limit': 18,
}

View File

@@ -90,6 +90,20 @@ class SmotriIE(InfoExtractor):
},
'skip': 'Video is not approved by moderator',
},
# not approved by moderator, but available
{
'url': 'http://smotri.com/video/view/?id=v28888533b73',
'md5': 'f44bc7adac90af518ef1ecf04893bb34',
'info_dict': {
'id': 'v28888533b73',
'ext': 'mp4',
'title': 'Russian Spies Killed By ISIL Child Soldier',
'uploader': 'Mopeder',
'uploader_id': 'mopeder',
'duration': 71,
'thumbnail': 'http://frame9.loadup.ru/d7/32/2888853.2.3.jpg',
},
},
# swf player
{
'url': 'http://pics.smotri.com/scrubber_custom8.swf?file=v9188090500',
@@ -146,13 +160,16 @@ class SmotriIE(InfoExtractor):
video = self._download_json(request, video_id, 'Downloading video JSON')
if video.get('_moderate_no') or not video.get('moderated'):
raise ExtractorError('Video %s has not been approved by moderator' % video_id, expected=True)
if video.get('error'):
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
video_url = video.get('_vidURL') or video.get('_vidURL_mp4')
if not video_url:
if video.get('_moderate_no') or not video.get('moderated'):
raise ExtractorError(
'Video %s has not been approved by moderator' % video_id, expected=True)
if video.get('error'):
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
title = video['title']
thumbnail = video['_imgURL']
upload_date = unified_strdate(video['added'])

View File

@@ -0,0 +1,80 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
HEADRequest,
urlhandle_detect_ext,
)
class SoulAnimeWatchingIE(InfoExtractor):
IE_NAME = "soulanime:watching"
IE_DESC = "SoulAnime video"
_TEST = {
'url': 'http://www.soul-anime.net/watching/seirei-tsukai-no-blade-dance-episode-9/',
'md5': '05fae04abf72298098b528e98abf4298',
'info_dict': {
'id': 'seirei-tsukai-no-blade-dance-episode-9',
'ext': 'mp4',
'title': 'seirei-tsukai-no-blade-dance-episode-9',
'description': 'seirei-tsukai-no-blade-dance-episode-9'
}
}
_VALID_URL = r'http://[w.]*soul-anime\.(?P<domain>[^/]+)/watch[^/]*/(?P<id>[^/]+)'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
domain = mobj.group('domain')
page = self._download_webpage(url, video_id)
video_url_encoded = self._html_search_regex(
r'<div id="download">[^<]*<a href="(?P<url>[^"]+)"', page, 'url')
video_url = "http://www.soul-anime." + domain + video_url_encoded
ext_req = HEADRequest(video_url)
ext_handle = self._request_webpage(
ext_req, video_id, note='Determining extension')
ext = urlhandle_detect_ext(ext_handle)
return {
'id': video_id,
'url': video_url,
'ext': ext,
'title': video_id,
'description': video_id
}
class SoulAnimeSeriesIE(InfoExtractor):
IE_NAME = "soulanime:series"
IE_DESC = "SoulAnime Series"
_VALID_URL = r'http://[w.]*soul-anime\.(?P<domain>[^/]+)/anime./(?P<id>[^/]+)'
_EPISODE_REGEX = r'<option value="(/watch[^/]*/[^"]+)">[^<]*</option>'
_TEST = {
'url': 'http://www.soul-anime.net/anime1/black-rock-shooter-tv/',
'info_dict': {
'id': 'black-rock-shooter-tv'
},
'playlist_count': 8
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
series_id = mobj.group('id')
domain = mobj.group('domain')
pattern = re.compile(self._EPISODE_REGEX)
page = self._download_webpage(url, series_id, "Downloading series page")
mobj = pattern.findall(page)
entries = [self.url_result("http://www.soul-anime." + domain + obj) for obj in mobj]
return self.playlist_result(entries, series_id)

View File

@@ -4,7 +4,14 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..compat import (
compat_urlparse,
compat_HTTPError,
)
from ..utils import (
HEADRequest,
ExtractorError,
)
from .spiegeltv import SpiegeltvIE
@@ -60,21 +67,31 @@ class SpiegelIE(InfoExtractor):
xml_url = base_url + video_id + '.xml'
idoc = self._download_xml(xml_url, video_id)
formats = [
{
'format_id': n.tag.rpartition('type')[2],
'url': base_url + n.find('./filename').text,
'width': int(n.find('./width').text),
'height': int(n.find('./height').text),
'abr': int(n.find('./audiobitrate').text),
'vbr': int(n.find('./videobitrate').text),
'vcodec': n.find('./codec').text,
'acodec': 'MP4A',
}
for n in list(idoc)
# Blacklist type 6, it's extremely LQ and not available on the same server
if n.tag.startswith('type') and n.tag != 'type6'
]
formats = []
for n in list(idoc):
if n.tag.startswith('type') and n.tag != 'type6':
format_id = n.tag.rpartition('type')[2]
video_url = base_url + n.find('./filename').text
# Test video URLs beforehand as some of them are invalid
try:
self._request_webpage(
HEADRequest(video_url), video_id,
'Checking %s video URL' % format_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
self.report_warning(
'%s video URL is invalid, skipping' % format_id, video_id)
continue
formats.append({
'format_id': format_id,
'url': video_url,
'width': int(n.find('./width').text),
'height': int(n.find('./height').text),
'abr': int(n.find('./audiobitrate').text),
'vbr': int(n.find('./videobitrate').text),
'vcodec': n.find('./codec').text,
'acodec': 'MP4A',
})
duration = float(idoc[0].findall('./duration')[0].text)
self._sort_formats(formats)

View File

@@ -0,0 +1,51 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import unified_strdate
class StreetVoiceIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?streetvoice\.com/[^/]+/songs/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://streetvoice.com/skippylu/songs/94440/',
'md5': '15974627fc01a29e492c98593c2fd472',
'info_dict': {
'id': '94440',
'ext': 'mp3',
'filesize': 4167053,
'title': '',
'description': 'Crispy脆樂團 - 輸',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 260,
'upload_date': '20091018',
'uploader': 'Crispy脆樂團',
'uploader_id': '627810',
}
}, {
'url': 'http://tw.streetvoice.com/skippylu/songs/94440/',
'only_matching': True,
}]
def _real_extract(self, url):
song_id = self._match_id(url)
song = self._download_json(
'http://streetvoice.com/music/api/song/%s' % song_id, song_id)
title = song['name']
author = song['musician']['name']
return {
'id': song_id,
'url': song['file'],
'filesize': song.get('size'),
'title': title,
'description': '%s - %s' % (author, title),
'thumbnail': self._proto_relative_url(song.get('image'), 'http:'),
'duration': song.get('length'),
'upload_date': unified_strdate(song.get('created_at')),
'uploader': author,
'uploader_id': compat_str(song['musician']['id']),
}

View File

@@ -57,9 +57,7 @@ class TeacherTubeIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta('title', webpage, 'title', fatal=True)

View File

@@ -13,7 +13,7 @@ from ..compat import (
class TEDIE(SubtitlesInfoExtractor):
_VALID_URL = r'''(?x)
(?P<proto>https?://)
(?P<type>www|embed)(?P<urlmain>\.ted\.com/
(?P<type>www|embed(?:-ssl)?)(?P<urlmain>\.ted\.com/
(
(?P<type_playlist>playlists(?:/\d+)?) # We have a playlist
|
@@ -98,7 +98,7 @@ class TEDIE(SubtitlesInfoExtractor):
def _real_extract(self, url):
m = re.match(self._VALID_URL, url, re.VERBOSE)
if m.group('type') == 'embed':
if m.group('type').startswith('embed'):
desktop_url = m.group('proto') + 'www' + m.group('urlmain')
return self.url_result(desktop_url, 'TED')
name = m.group('name')

View File

@@ -0,0 +1,60 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class TestTubeIE(InfoExtractor):
_VALID_URL = r'https?://testtube\.com/[^/?#]+/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://testtube.com/dnews/5-weird-ways-plants-can-eat-animals?utm_source=FB&utm_medium=DNews&utm_campaign=DNewsSocial',
'info_dict': {
'id': '60163',
'display_id': '5-weird-ways-plants-can-eat-animals',
'duration': 275,
'ext': 'mp4',
'title': '5 Weird Ways Plants Can Eat Animals',
'description': 'Why have some plants evolved to eat meat?',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'DNews',
'uploader_id': 'dnews',
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r"player\.loadRevision3Item\('video_id',\s*([0-9]+)\);",
webpage, 'video ID')
all_info = self._download_json(
'https://testtube.com/api/getPlaylist.json?api_key=ba9c741bce1b9d8e3defcc22193f3651b8867e62&codecs=h264,vp8,theora&video_id=%s' % video_id,
video_id)
info = all_info['items'][0]
formats = []
for vcodec, fdatas in info['media'].items():
for name, fdata in fdatas.items():
formats.append({
'format_id': '%s-%s' % (vcodec, name),
'url': fdata['url'],
'vcodec': vcodec,
'tbr': fdata.get('bitrate'),
})
self._sort_formats(formats)
duration = int_or_none(info.get('duration'))
return {
'id': video_id,
'display_id': display_id,
'title': info['title'],
'description': info.get('summary'),
'thumbnail': info.get('images', {}).get('large'),
'uploader': info.get('show', {}).get('name'),
'uploader_id': info.get('show', {}).get('slug'),
'duration': duration,
'formats': formats,
}

View File

@@ -1,15 +1,13 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class TF1IE(InfoExtractor):
"""TF1 uses the wat.tv player."""
_VALID_URL = r'http://videos\.tf1\.fr/.*-(?P<id>.*?)\.html'
_TEST = {
_VALID_URL = r'http://(?:videos\.tf1|www\.tfou)\.fr/.*?-(?P<id>\d+)(?:-\d+)?\.html'
_TESTS = {
'url': 'http://videos.tf1.fr/auto-moto/citroen-grand-c4-picasso-2013-presentation-officielle-8062060.html',
'info_dict': {
'id': '10635995',
@@ -21,14 +19,26 @@ class TF1IE(InfoExtractor):
# Sometimes wat serves the whole file with the --test option
'skip_download': True,
},
}, {
'url': 'http://www.tfou.fr/chuggington/videos/le-grand-mysterioso-chuggington-7085291-739.html',
'info_dict': {
'id': '12043945',
'ext': 'mp4',
'title': 'Le grand Mystérioso - Chuggington',
'description': 'Le grand Mystérioso - Emery rêve qu\'un article lui soit consacré dans le journal.',
'upload_date': '20150103',
},
'params': {
# Sometimes wat serves the whole file with the --test option
'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
embed_url = self._html_search_regex(
r'"(https://www.wat.tv/embedframe/.*?)"', webpage, 'embed url')
r'["\'](https?://www.wat.tv/embedframe/.*?)["\']', webpage, 'embed url')
embed_page = self._download_webpage(embed_url, video_id,
'Downloading embed player page')
wat_id = self._search_regex(r'UVID=(.*?)&', embed_page, 'wat id')

View File

@@ -9,17 +9,23 @@ from ..utils import ExtractorError
class TinyPicIE(InfoExtractor):
IE_NAME = 'tinypic'
IE_DESC = 'tinypic.com videos'
_VALID_URL = r'http://tinypic\.com/player\.php\?v=(?P<id>[^&]+)&s=\d+'
_VALID_URL = r'http://(?:.+?\.)?tinypic\.com/player\.php\?v=(?P<id>[^&]+)&s=\d+'
_TEST = {
'url': 'http://tinypic.com/player.php?v=6xw7tc%3E&s=5#.UtqZmbRFCM8',
'md5': '609b74432465364e72727ebc6203f044',
'info_dict': {
'id': '6xw7tc',
'ext': 'flv',
'title': 'shadow phenomenon weird',
_TESTS = [
{
'url': 'http://tinypic.com/player.php?v=6xw7tc%3E&s=5#.UtqZmbRFCM8',
'md5': '609b74432465364e72727ebc6203f044',
'info_dict': {
'id': '6xw7tc',
'ext': 'flv',
'title': 'shadow phenomenon weird',
},
},
{
'url': 'http://de.tinypic.com/player.php?v=dy90yh&s=8',
'only_matching': True,
}
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@@ -9,7 +9,7 @@ from .common import InfoExtractor
class TudouIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?tudou\.com/(?:listplay|programs|albumplay)/(?:view|(.+?))/(?:([^/]+)|([^/]+))(?:\.html)?'
_VALID_URL = r'https?://(?:www\.)?tudou\.com/(?:listplay|programs(?:/view)?|albumplay)/.*?/(?P<id>[^/?#]+?)(?:\.html)?/?(?:$|[?#])'
_TESTS = [{
'url': 'http://www.tudou.com/listplay/zzdE77v6Mmo/2xN2duXMxmw.html',
'md5': '140a49ed444bd22f93330985d8475fcb',
@@ -27,13 +27,6 @@ class TudouIE(InfoExtractor):
'title': 'La Sylphide-Bolshoi-Ekaterina Krysanova & Vyacheslav Lopatin 2012',
'thumbnail': 're:^https?://.*\.jpg$',
}
}, {
'url': 'http://www.tudou.com/albumplay/TenTw_JgiPM/PzsAs5usU9A.html',
'info_dict': {
'title': 'todo.mp4',
},
'add_ie': ['Youku'],
'skip': 'Only works from China'
}]
def _url_for_id(self, id, quality=None):
@@ -45,8 +38,7 @@ class TudouIE(InfoExtractor):
return final_url
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(2)
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
m = re.search(r'vcode:\s*[\'"](.+?)[\'"]', webpage)
@@ -87,4 +79,9 @@ class TudouIE(InfoExtractor):
}
result.append(part_info)
return result
return {
'_type': 'multi_video',
'entries': result,
'id': video_id,
'title': title,
}

View File

@@ -24,7 +24,7 @@ class TuneInIE(InfoExtractor):
_INFO_DICT = {
'id': '34682',
'title': 'Jazz 24 on 88.5 Jazz24 - KPLU-HD2',
'ext': 'AAC',
'ext': 'aac',
'thumbnail': 're:^https?://.*\.png$',
'location': 'Tacoma, WA',
}
@@ -78,14 +78,21 @@ class TuneInIE(InfoExtractor):
for stream in streams:
if stream.get('Type') == 'Live':
is_live = True
reliability = stream.get('Reliability')
format_note = (
'Reliability: %d%%' % reliability
if reliability is not None else None)
formats.append({
'preference': (
0 if reliability is None or reliability > 90
else 1),
'abr': stream.get('Bandwidth'),
'ext': stream.get('MediaType'),
'ext': stream.get('MediaType').lower(),
'acodec': stream.get('MediaType'),
'vcodec': 'none',
'url': stream.get('Url'),
# Sometimes streams with the highest quality do not exist
'preference': stream.get('Reliability'),
'source_preference': reliability,
'format_note': format_note,
})
self._sort_formats(formats)

View File

@@ -1,37 +1,139 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class TvpIE(InfoExtractor):
IE_NAME = 'tvp.pl'
_VALID_URL = r'https?://www\.tvp\.pl/.*?wideo/(?P<date>\d+)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:vod|www)\.tvp\.pl/.*/(?P<id>\d+)$'
_TEST = {
'url': 'http://www.tvp.pl/warszawa/magazyny/campusnews/wideo/31102013/12878238',
'md5': '148408967a6a468953c0a75cbdaf0d7a',
_TESTS = [{
'url': 'http://vod.tvp.pl/filmy-fabularne/filmy-za-darmo/ogniem-i-mieczem/wideo/odc-2/4278035',
'md5': 'cdd98303338b8a7f7abab5cd14092bf2',
'info_dict': {
'id': '12878238',
'id': '4278035',
'ext': 'wmv',
'title': '31.10.2013 - Odcinek 2',
'description': '31.10.2013 - Odcinek 2',
'title': 'Ogniem i mieczem, odc. 2',
},
'skip': 'Download has to use same server IP as extraction. Therefore, a good (load-balancing) DNS resolver will make the download fail.'
}
}, {
'url': 'http://vod.tvp.pl/seriale/obyczajowe/czas-honoru/sezon-1-1-13/i-seria-odc-13/194536',
'md5': '8aa518c15e5cc32dfe8db400dc921fbb',
'info_dict': {
'id': '194536',
'ext': 'mp4',
'title': 'Czas honoru, I seria odc. 13',
},
}, {
'url': 'http://www.tvp.pl/there-can-be-anything-so-i-shortened-it/17916176',
'md5': 'c3b15ed1af288131115ff17a17c19dda',
'info_dict': {
'id': '17916176',
'ext': 'mp4',
'title': 'TVP Gorzów pokaże filmy studentów z podroży dookoła świata',
},
}, {
'url': 'http://vod.tvp.pl/seriale/obyczajowe/na-sygnale/sezon-2-27-/odc-39/17834272',
'md5': 'c3b15ed1af288131115ff17a17c19dda',
'info_dict': {
'id': '17834272',
'ext': 'mp4',
'title': 'Na sygnale, odc. 39',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
json_url = 'http://www.tvp.pl/pub/stat/videofileinfo?video_id=%s' % video_id
params = self._download_json(
json_url, video_id, "Downloading video metadata")
video_url = params['video_url']
webpage = self._download_webpage(
'http://www.tvp.pl/sess/tvplayer.php?object_id=%s' % video_id, video_id)
title = self._search_regex(
r'name\s*:\s*([\'"])Title\1\s*,\s*value\s*:\s*\1(?P<title>.+?)\1',
webpage, 'title', group='title')
series_title = self._search_regex(
r'name\s*:\s*([\'"])SeriesTitle\1\s*,\s*value\s*:\s*\1(?P<series>.+?)\1',
webpage, 'series', group='series', default=None)
if series_title:
title = '%s, %s' % (series_title, title)
thumbnail = self._search_regex(
r"poster\s*:\s*'([^']+)'", webpage, 'thumbnail', default=None)
video_url = self._search_regex(
r'0:{src:([\'"])(?P<url>.*?)\1', webpage, 'formats', group='url', default=None)
if not video_url:
video_url = self._download_json(
'http://www.tvp.pl/pub/stat/videofileinfo?video_id=%s' % video_id,
video_id)['video_url']
ext = video_url.rsplit('.', 1)[-1]
if ext != 'ism/manifest':
if '/' in ext:
ext = 'mp4'
formats = [{
'format_id': 'direct',
'url': video_url,
'ext': ext,
}]
else:
m3u8_url = re.sub('([^/]*)\.ism/manifest', r'\1.ism/\1.m3u8', video_url)
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': self._og_search_title(webpage),
'ext': 'wmv',
'url': video_url,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}
class TvpSeriesIE(InfoExtractor):
IE_NAME = 'tvp.pl:Series'
_VALID_URL = r'https?://vod\.tvp\.pl/(?:[^/]+/){2}(?P<id>[^/]+)/?$'
_TESTS = [{
'url': 'http://vod.tvp.pl/filmy-fabularne/filmy-za-darmo/ogniem-i-mieczem',
'info_dict': {
'title': 'Ogniem i mieczem',
'id': '4278026',
},
'playlist_count': 4,
}, {
'url': 'http://vod.tvp.pl/audycje/podroze/boso-przez-swiat',
'info_dict': {
'title': 'Boso przez świat',
'id': '9329207',
},
'playlist_count': 86,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id, tries=5)
title = self._html_search_regex(
r'(?s) id=[\'"]path[\'"]>(?:.*? / ){2}(.*?)</span>', webpage, 'series')
playlist_id = self._search_regex(r'nodeId:\s*(\d+)', webpage, 'playlist id')
playlist = self._download_webpage(
'http://vod.tvp.pl/vod/seriesAjax?type=series&nodeId=%s&recommend'
'edId=0&sort=&page=0&pageSize=10000' % playlist_id, display_id, tries=5,
note='Downloading playlist')
videos_paths = re.findall(
'(?s)class="shortTitle">.*?href="(/[^"]+)', playlist)
entries = [
self.url_result('http://vod.tvp.pl%s' % v_path, ie=TvpIE.ie_key())
for v_path in videos_paths]
return {
'_type': 'playlist',
'id': playlist_id,
'display_id': display_id,
'title': title,
'entries': entries,
}

View File

@@ -3,9 +3,11 @@ from __future__ import unicode_literals
import itertools
import re
import random
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse,
compat_urllib_request,
)
@@ -15,44 +17,12 @@ from ..utils import (
)
class TwitchIE(InfoExtractor):
# TODO: One broadcast may be split into multiple videos. The key
# 'broadcast_id' is the same for all parts, and 'broadcast_part'
# starts at 1 and increases. Can we treat all parts as one video?
_VALID_URL = r"""(?x)^(?:http://)?(?:www\.)?twitch\.tv/
(?:
(?P<channelid>[^/]+)|
(?:(?:[^/]+)/v/(?P<vodid>[^/]+))|
(?:(?:[^/]+)/b/(?P<videoid>[^/]+))|
(?:(?:[^/]+)/c/(?P<chapterid>[^/]+))
)
/?(?:\#.*)?$
"""
_PAGE_LIMIT = 100
class TwitchBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?twitch\.tv'
_API_BASE = 'https://api.twitch.tv'
_USHER_BASE = 'http://usher.twitch.tv'
_LOGIN_URL = 'https://secure.twitch.tv/user/login'
_TESTS = [{
'url': 'http://www.twitch.tv/riotgames/b/577357806',
'info_dict': {
'id': 'a577357806',
'title': 'Worlds Semifinals - Star Horn Royal Club vs. OMG',
},
'playlist_mincount': 12,
}, {
'url': 'http://www.twitch.tv/acracingleague/c/5285812',
'info_dict': {
'id': 'c5285812',
'title': 'ACRL Off Season - Sports Cars @ Nordschleife',
},
'playlist_mincount': 3,
}, {
'url': 'http://www.twitch.tv/vanillatv',
'info_dict': {
'id': 'vanillatv',
'title': 'VanillaTV',
},
'playlist_mincount': 412,
}]
def _handle_error(self, response):
if not isinstance(response, dict):
@@ -64,71 +34,10 @@ class TwitchIE(InfoExtractor):
expected=True)
def _download_json(self, url, video_id, note='Downloading JSON metadata'):
response = super(TwitchIE, self)._download_json(url, video_id, note)
response = super(TwitchBaseIE, self)._download_json(url, video_id, note)
self._handle_error(response)
return response
def _extract_media(self, item, item_id):
ITEMS = {
'a': 'video',
'v': 'vod',
'c': 'chapter',
}
info = self._extract_info(self._download_json(
'%s/kraken/videos/%s%s' % (self._API_BASE, item, item_id), item_id,
'Downloading %s info JSON' % ITEMS[item]))
if item == 'v':
access_token = self._download_json(
'%s/api/vods/%s/access_token' % (self._API_BASE, item_id), item_id,
'Downloading %s access token' % ITEMS[item])
formats = self._extract_m3u8_formats(
'http://usher.twitch.tv/vod/%s?nauth=%s&nauthsig=%s'
% (item_id, access_token['token'], access_token['sig']),
item_id, 'mp4')
info['formats'] = formats
return info
response = self._download_json(
'%s/api/videos/%s%s' % (self._API_BASE, item, item_id), item_id,
'Downloading %s playlist JSON' % ITEMS[item])
entries = []
chunks = response['chunks']
qualities = list(chunks.keys())
for num, fragment in enumerate(zip(*chunks.values()), start=1):
formats = []
for fmt_num, fragment_fmt in enumerate(fragment):
format_id = qualities[fmt_num]
fmt = {
'url': fragment_fmt['url'],
'format_id': format_id,
'quality': 1 if format_id == 'live' else 0,
}
m = re.search(r'^(?P<height>\d+)[Pp]', format_id)
if m:
fmt['height'] = int(m.group('height'))
formats.append(fmt)
self._sort_formats(formats)
entry = dict(info)
entry['id'] = '%s_%d' % (entry['id'], num)
entry['title'] = '%s part %d' % (entry['title'], num)
entry['formats'] = formats
entries.append(entry)
return self.playlist_result(entries, info['id'], info['title'])
def _extract_info(self, info):
return {
'id': info['_id'],
'title': info['title'],
'description': info['description'],
'duration': info['length'],
'thumbnail': info['preview'],
'uploader': info['channel']['display_name'],
'uploader_id': info['channel']['name'],
'timestamp': parse_iso8601(info['recorded_at']),
'view_count': info['views'],
}
def _real_initialize(self):
self._login()
@@ -167,81 +76,276 @@ class TwitchIE(InfoExtractor):
raise ExtractorError(
'Unable to login: %s' % m.group('msg').strip(), expected=True)
class TwitchItemBaseIE(TwitchBaseIE):
def _download_info(self, item, item_id):
return self._extract_info(self._download_json(
'%s/kraken/videos/%s%s' % (self._API_BASE, item, item_id), item_id,
'Downloading %s info JSON' % self._ITEM_TYPE))
def _extract_media(self, item_id):
info = self._download_info(self._ITEM_SHORTCUT, item_id)
response = self._download_json(
'%s/api/videos/%s%s' % (self._API_BASE, self._ITEM_SHORTCUT, item_id), item_id,
'Downloading %s playlist JSON' % self._ITEM_TYPE)
entries = []
chunks = response['chunks']
qualities = list(chunks.keys())
for num, fragment in enumerate(zip(*chunks.values()), start=1):
formats = []
for fmt_num, fragment_fmt in enumerate(fragment):
format_id = qualities[fmt_num]
fmt = {
'url': fragment_fmt['url'],
'format_id': format_id,
'quality': 1 if format_id == 'live' else 0,
}
m = re.search(r'^(?P<height>\d+)[Pp]', format_id)
if m:
fmt['height'] = int(m.group('height'))
formats.append(fmt)
self._sort_formats(formats)
entry = dict(info)
entry['id'] = '%s_%d' % (entry['id'], num)
entry['title'] = '%s part %d' % (entry['title'], num)
entry['formats'] = formats
entries.append(entry)
return self.playlist_result(entries, info['id'], info['title'])
def _extract_info(self, info):
return {
'id': info['_id'],
'title': info['title'],
'description': info['description'],
'duration': info['length'],
'thumbnail': info['preview'],
'uploader': info['channel']['display_name'],
'uploader_id': info['channel']['name'],
'timestamp': parse_iso8601(info['recorded_at']),
'view_count': info['views'],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj.group('chapterid'):
return self._extract_media('c', mobj.group('chapterid'))
return self._extract_media(self._match_id(url))
"""
webpage = self._download_webpage(url, chapter_id)
m = re.search(r'PP\.archive_id = "([0-9]+)";', webpage)
class TwitchVideoIE(TwitchItemBaseIE):
IE_NAME = 'twitch:video'
_VALID_URL = r'%s/[^/]+/b/(?P<id>[^/]+)' % TwitchBaseIE._VALID_URL_BASE
_ITEM_TYPE = 'video'
_ITEM_SHORTCUT = 'a'
_TEST = {
'url': 'http://www.twitch.tv/riotgames/b/577357806',
'info_dict': {
'id': 'a577357806',
'title': 'Worlds Semifinals - Star Horn Royal Club vs. OMG',
},
'playlist_mincount': 12,
}
class TwitchChapterIE(TwitchItemBaseIE):
IE_NAME = 'twitch:chapter'
_VALID_URL = r'%s/[^/]+/c/(?P<id>[^/]+)' % TwitchBaseIE._VALID_URL_BASE
_ITEM_TYPE = 'chapter'
_ITEM_SHORTCUT = 'c'
_TESTS = [{
'url': 'http://www.twitch.tv/acracingleague/c/5285812',
'info_dict': {
'id': 'c5285812',
'title': 'ACRL Off Season - Sports Cars @ Nordschleife',
},
'playlist_mincount': 3,
}, {
'url': 'http://www.twitch.tv/tsm_theoddone/c/2349361',
'only_matching': True,
}]
class TwitchVodIE(TwitchItemBaseIE):
IE_NAME = 'twitch:vod'
_VALID_URL = r'%s/[^/]+/v/(?P<id>[^/]+)' % TwitchBaseIE._VALID_URL_BASE
_ITEM_TYPE = 'vod'
_ITEM_SHORTCUT = 'v'
_TEST = {
'url': 'http://www.twitch.tv/ksptv/v/3622000',
'info_dict': {
'id': 'v3622000',
'ext': 'mp4',
'title': '''KSPTV: Squadcast: "Everyone's on vacation so here's Dahud" Edition!''',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 6951,
'timestamp': 1419028564,
'upload_date': '20141219',
'uploader': 'KSPTV',
'uploader_id': 'ksptv',
'view_count': int,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
item_id = self._match_id(url)
info = self._download_info(self._ITEM_SHORTCUT, item_id)
access_token = self._download_json(
'%s/api/vods/%s/access_token' % (self._API_BASE, item_id), item_id,
'Downloading %s access token' % self._ITEM_TYPE)
formats = self._extract_m3u8_formats(
'%s/vod/%s?nauth=%s&nauthsig=%s'
% (self._USHER_BASE, item_id, access_token['token'], access_token['sig']),
item_id, 'mp4')
info['formats'] = formats
return info
class TwitchPlaylistBaseIE(TwitchBaseIE):
_PLAYLIST_URL = '%s/kraken/channels/%%s/videos/?offset=%%d&limit=%%d' % TwitchBaseIE._API_BASE
_PAGE_LIMIT = 100
def _extract_playlist(self, channel_id):
info = self._download_json(
'%s/kraken/channels/%s' % (self._API_BASE, channel_id),
channel_id, 'Downloading channel info JSON')
channel_name = info.get('display_name') or info.get('name')
entries = []
offset = 0
limit = self._PAGE_LIMIT
for counter in itertools.count(1):
response = self._download_json(
self._PLAYLIST_URL % (channel_id, offset, limit),
channel_id, 'Downloading %s videos JSON page %d' % (self._PLAYLIST_TYPE, counter))
videos = response['videos']
if not videos:
break
entries.extend([self.url_result(video['url']) for video in videos])
offset += limit
return self.playlist_result(entries, channel_id, channel_name)
def _real_extract(self, url):
return self._extract_playlist(self._match_id(url))
class TwitchProfileIE(TwitchPlaylistBaseIE):
IE_NAME = 'twitch:profile'
_VALID_URL = r'%s/(?P<id>[^/]+)/profile/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
_PLAYLIST_TYPE = 'profile'
_TEST = {
'url': 'http://www.twitch.tv/vanillatv/profile',
'info_dict': {
'id': 'vanillatv',
'title': 'VanillaTV',
},
'playlist_mincount': 412,
}
class TwitchPastBroadcastsIE(TwitchPlaylistBaseIE):
IE_NAME = 'twitch:past_broadcasts'
_VALID_URL = r'%s/(?P<id>[^/]+)/profile/past_broadcasts/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
_PLAYLIST_URL = TwitchPlaylistBaseIE._PLAYLIST_URL + '&broadcasts=true'
_PLAYLIST_TYPE = 'past broadcasts'
_TEST = {
'url': 'http://www.twitch.tv/spamfish/profile/past_broadcasts',
'info_dict': {
'id': 'spamfish',
'title': 'Spamfish',
},
'playlist_mincount': 54,
}
class TwitchStreamIE(TwitchBaseIE):
IE_NAME = 'twitch:stream'
_VALID_URL = r'%s/(?P<id>[^/]+)/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
_TEST = {
'url': 'http://www.twitch.tv/shroomztv',
'info_dict': {
'id': '12772022048',
'display_id': 'shroomztv',
'ext': 'mp4',
'title': 're:^ShroomzTV [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'H1Z1 - lonewolfing with ShroomzTV | A3 Battle Royale later - @ShroomzTV',
'is_live': True,
'timestamp': 1421928037,
'upload_date': '20150122',
'uploader': 'ShroomzTV',
'uploader_id': 'shroomztv',
'view_count': int,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
channel_id = self._match_id(url)
stream = self._download_json(
'%s/kraken/streams/%s' % (self._API_BASE, channel_id), channel_id,
'Downloading stream JSON').get('stream')
# Fallback on profile extraction if stream is offline
if not stream:
return self.url_result(
'http://www.twitch.tv/%s/profile' % channel_id,
'TwitchProfile', channel_id)
access_token = self._download_json(
'%s/api/channels/%s/access_token' % (self._API_BASE, channel_id), channel_id,
'Downloading channel access token')
query = {
'allow_source': 'true',
'p': random.randint(1000000, 10000000),
'player': 'twitchweb',
'segment_preference': '4',
'sig': access_token['sig'],
'token': access_token['token'],
}
formats = self._extract_m3u8_formats(
'%s/api/channel/hls/%s.m3u8?%s'
% (self._USHER_BASE, channel_id, compat_urllib_parse.urlencode(query).encode('utf-8')),
channel_id, 'mp4')
view_count = stream.get('viewers')
timestamp = parse_iso8601(stream.get('created_at'))
channel = stream['channel']
title = self._live_title(channel.get('display_name') or channel.get('name'))
description = channel.get('status')
thumbnails = []
for thumbnail_key, thumbnail_url in stream['preview'].items():
m = re.search(r'(?P<width>\d+)x(?P<height>\d+)\.jpg$', thumbnail_key)
if not m:
raise ExtractorError('Cannot find archive of a chapter')
archive_id = m.group(1)
continue
thumbnails.append({
'url': thumbnail_url,
'width': int(m.group('width')),
'height': int(m.group('height')),
})
api = api_base + '/broadcast/by_chapter/%s.xml' % chapter_id
doc = self._download_xml(
api, chapter_id,
note='Downloading chapter information',
errnote='Chapter information download failed')
for a in doc.findall('.//archive'):
if archive_id == a.find('./id').text:
break
else:
raise ExtractorError('Could not find chapter in chapter information')
video_url = a.find('./video_file_url').text
video_ext = video_url.rpartition('.')[2] or 'flv'
chapter_api_url = 'https://api.twitch.tv/kraken/videos/c' + chapter_id
chapter_info = self._download_json(
chapter_api_url, 'c' + chapter_id,
note='Downloading chapter metadata',
errnote='Download of chapter metadata failed')
bracket_start = int(doc.find('.//bracket_start').text)
bracket_end = int(doc.find('.//bracket_end').text)
# TODO determine start (and probably fix up file)
# youtube-dl -v http://www.twitch.tv/firmbelief/c/1757457
#video_url += '?start=' + TODO:start_timestamp
# bracket_start is 13290, but we want 51670615
self._downloader.report_warning('Chapter detected, but we can just download the whole file. '
'Chapter starts at %s and ends at %s' % (formatSeconds(bracket_start), formatSeconds(bracket_end)))
info = {
'id': 'c' + chapter_id,
'url': video_url,
'ext': video_ext,
'title': chapter_info['title'],
'thumbnail': chapter_info['preview'],
'description': chapter_info['description'],
'uploader': chapter_info['channel']['display_name'],
'uploader_id': chapter_info['channel']['name'],
}
return info
"""
elif mobj.group('videoid'):
return self._extract_media('a', mobj.group('videoid'))
elif mobj.group('vodid'):
return self._extract_media('v', mobj.group('vodid'))
elif mobj.group('channelid'):
channel_id = mobj.group('channelid')
info = self._download_json(
'%s/kraken/channels/%s' % (self._API_BASE, channel_id),
channel_id, 'Downloading channel info JSON')
channel_name = info.get('display_name') or info.get('name')
entries = []
offset = 0
limit = self._PAGE_LIMIT
for counter in itertools.count(1):
response = self._download_json(
'%s/kraken/channels/%s/videos/?offset=%d&limit=%d'
% (self._API_BASE, channel_id, offset, limit),
channel_id, 'Downloading channel videos JSON page %d' % counter)
videos = response['videos']
if not videos:
break
entries.extend([self.url_result(video['url'], 'Twitch') for video in videos])
offset += limit
return self.playlist_result(entries, channel_id, channel_name)
return {
'id': compat_str(stream['_id']),
'display_id': channel_id,
'title': title,
'description': description,
'thumbnails': thumbnails,
'uploader': channel.get('display_name'),
'uploader_id': channel.get('name'),
'timestamp': timestamp,
'view_count': view_count,
'formats': formats,
'is_live': True,
}

View File

@@ -8,6 +8,7 @@ from ..compat import (
compat_urlparse,
)
from ..utils import (
ExtractorError,
clean_html,
get_element_by_id,
)
@@ -17,13 +18,13 @@ class VeeHDIE(InfoExtractor):
_VALID_URL = r'https?://veehd\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'http://veehd.com/video/4686958',
'url': 'http://veehd.com/video/4639434_Solar-Sinter',
'info_dict': {
'id': '4686958',
'id': '4639434',
'ext': 'mp4',
'title': 'Time Lapse View from Space ( ISS)',
'uploader_id': 'spotted',
'description': 'md5:f0094c4cf3a72e22bc4e4239ef767ad7',
'title': 'Solar Sinter',
'uploader_id': 'VideoEyes',
'description': 'md5:46a840e8692ddbaffb5f81d9885cb457',
},
}
@@ -34,6 +35,10 @@ class VeeHDIE(InfoExtractor):
# See https://github.com/rg3/youtube-dl/issues/2102
self._download_webpage(url, video_id, 'Requesting webpage')
webpage = self._download_webpage(url, video_id)
if 'This video has been removed<' in webpage:
raise ExtractorError('Video %s has been removed' % video_id, expected=True)
player_path = self._search_regex(
r'\$\("#playeriframe"\).attr\({src : "(.+?)"',
webpage, 'player path')
@@ -42,18 +47,35 @@ class VeeHDIE(InfoExtractor):
self._download_webpage(player_url, video_id, 'Requesting player page')
player_page = self._download_webpage(
player_url, video_id, 'Downloading player page')
config_json = self._search_regex(
r'value=\'config=({.+?})\'', player_page, 'config json')
config = json.loads(config_json)
video_url = compat_urlparse.unquote(config['clip']['url'])
config_json = self._search_regex(
r'value=\'config=({.+?})\'', player_page, 'config json', default=None)
if config_json:
config = json.loads(config_json)
video_url = compat_urlparse.unquote(config['clip']['url'])
else:
iframe_src = self._search_regex(
r'<iframe[^>]+src="/?([^"]+)"', player_page, 'iframe url')
iframe_url = 'http://veehd.com/%s' % iframe_src
self._download_webpage(iframe_url, video_id, 'Requesting iframe page')
iframe_page = self._download_webpage(
iframe_url, video_id, 'Downloading iframe page')
video_url = self._search_regex(
r"file\s*:\s*'([^']+)'", iframe_page, 'video url')
title = clean_html(get_element_by_id('videoName', webpage).rpartition('|')[0])
uploader_id = self._html_search_regex(r'<a href="/profile/\d+">(.+?)</a>',
webpage, 'uploader')
thumbnail = self._search_regex(r'<img id="veehdpreview" src="(.+?)"',
webpage, 'thumbnail')
description = self._html_search_regex(r'<td class="infodropdown".*?<div>(.*?)<ul',
webpage, 'description', flags=re.DOTALL)
uploader_id = self._html_search_regex(
r'<a href="/profile/\d+">(.+?)</a>',
webpage, 'uploader')
thumbnail = self._search_regex(
r'<img id="veehdpreview" src="(.+?)"',
webpage, 'thumbnail')
description = self._html_search_regex(
r'<td class="infodropdown".*?<div>(.*?)<ul',
webpage, 'description', flags=re.DOTALL)
return {
'_type': 'video',

View File

@@ -1,11 +1,15 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
ExtractorError,
remove_start,
)
@@ -16,34 +20,40 @@ class VideoMegaIE(InfoExtractor):
(?:iframe\.php)?\?ref=(?P<id>[A-Za-z0-9]+)
'''
_TEST = {
'url': 'http://videomega.tv/?ref=GKeGPVedBe',
'md5': '240fb5bcf9199961f48eb17839b084d6',
'url': 'http://videomega.tv/?ref=QR0HCUHI1661IHUCH0RQ',
'md5': 'bf5c2f95c4c917536e80936af7bc51e1',
'info_dict': {
'id': 'GKeGPVedBe',
'id': 'QR0HCUHI1661IHUCH0RQ',
'ext': 'mp4',
'title': 'XXL - All Sports United',
'title': 'Big Buck Bunny',
'thumbnail': 're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
url = 'http://videomega.tv/iframe.php?ref={0:}'.format(video_id)
webpage = self._download_webpage(url, video_id)
escaped_data = self._search_regex(
r'unescape\("([^"]+)"\)', webpage, 'escaped data')
iframe_url = 'http://videomega.tv/iframe.php?ref={0:}'.format(video_id)
req = compat_urllib_request.Request(iframe_url)
req.add_header('Referer', url)
webpage = self._download_webpage(req, video_id)
try:
escaped_data = re.findall(r'unescape\("([^"]+)"\)', webpage)[-1]
except IndexError:
raise ExtractorError('Unable to extract escaped data')
playlist = compat_urllib_parse.unquote(escaped_data)
thumbnail = self._search_regex(
r'image:\s*"([^"]+)"', playlist, 'thumbnail', fatal=False)
url = self._search_regex(r'file:\s*"([^"]+)"', playlist, 'URL')
video_url = self._search_regex(r'file:\s*"([^"]+)"', playlist, 'URL')
title = remove_start(self._html_search_regex(
r'<title>(.*?)</title>', webpage, 'title'), 'VideoMega.tv - ')
formats = [{
'format_id': 'sd',
'url': url,
'url': video_url,
}]
self._sort_formats(formats)
@@ -52,4 +62,5 @@ class VideoMegaIE(InfoExtractor):
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'http_referer': iframe_url,
}

View File

@@ -63,7 +63,7 @@ class VierIE(InfoExtractor):
class VierVideosIE(InfoExtractor):
IE_NAME = 'vier:videos'
_VALID_URL = r'https?://(?:www\.)?vier\.be/(?P<program>[^/]+)/videos(?:\?.*\bpage=(?P<page>\d+))?'
_VALID_URL = r'https?://(?:www\.)?vier\.be/(?P<program>[^/]+)/videos(?:\?.*\bpage=(?P<page>\d+)|$)'
_TESTS = [{
'url': 'http://www.vier.be/demoestuin/videos',
'info_dict': {

View File

@@ -17,7 +17,6 @@ class VikiIE(SubtitlesInfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?viki\.com/videos/(?P<id>[0-9]+v)'
_TEST = {
'url': 'http://www.viki.com/videos/1023585v-heirs-episode-14',
'md5': 'a21454021c2646f5433514177e2caa5f',
'info_dict': {
'id': '1023585v',
'ext': 'mp4',
@@ -31,8 +30,7 @@ class VikiIE(SubtitlesInfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)

View File

@@ -14,28 +14,17 @@ class VimpleIE(InfoExtractor):
IE_DESC = 'Vimple.ru'
_VALID_URL = r'https?://(player.vimple.ru/iframe|vimple.ru)/(?P<id>[a-f0-9]{10,})'
_TESTS = [
# Quality: Large, from iframe
{
'url': 'http://player.vimple.ru/iframe/b132bdfd71b546d3972f9ab9a25f201c',
'url': 'http://vimple.ru/c0f6b1687dcd4000a97ebe70068039cf',
'md5': '2e750a330ed211d3fd41821c6ad9a279',
'info_dict': {
'id': 'b132bdfd71b546d3972f9ab9a25f201c',
'title': 'great-escape-minecraft.flv',
'id': 'c0f6b1687dcd4000a97ebe70068039cf',
'ext': 'mp4',
'duration': 352,
'webpage_url': 'http://vimple.ru/b132bdfd71b546d3972f9ab9a25f201c',
'title': 'Sunset',
'duration': 20,
'thumbnail': 're:https?://.*?\.jpg',
},
},
# Quality: Medium, from mainpage
{
'url': 'http://vimple.ru/a15950562888453b8e6f9572dc8600cd',
'info_dict': {
'id': 'a15950562888453b8e6f9572dc8600cd',
'title': 'DB 01',
'ext': 'flv',
'duration': 1484,
'webpage_url': 'http://vimple.ru/a15950562888453b8e6f9572dc8600cd',
}
},
]
def _real_extract(self, url):

View File

@@ -164,6 +164,14 @@ class VKIE(InfoExtractor):
self.to_screen('Youtube video detected')
return self.url_result(m_yt.group(1), 'Youtube')
m_rutube = re.search(
r'\ssrc="((?:https?:)?//rutube\.ru\\?/video\\?/embed(?:.*?))\\?"', info_page)
if m_rutube is not None:
self.to_screen('rutube video detected')
rutube_url = self._proto_relative_url(
m_rutube.group(1).replace('\\', ''))
return self.url_result(rutube_url)
m_opts = re.search(r'(?s)var\s+opts\s*=\s*({.*?});', info_page)
if m_opts:
m_opts_url = re.search(r"url\s*:\s*'([^']+)", m_opts.group(1))

View File

@@ -10,14 +10,14 @@ from ..utils import (
class WashingtonPostIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?washingtonpost\.com/.*?/(?P<id>[^/]+)/(?:$|[?#])'
_VALID_URL = r'https?://(?:www\.)?washingtonpost\.com/.*?/(?P<id>[^/]+)/(?:$|[?#])'
_TEST = {
'url': 'http://www.washingtonpost.com/sf/national/2014/03/22/sinkhole-of-bureaucracy/',
'info_dict': {
'title': 'Sinkhole of bureaucracy',
},
'playlist': [{
'md5': 'c3f4b4922ffa259243f68e928db2db8c',
'md5': '79132cc09ec5309fa590ae46e4cc31bc',
'info_dict': {
'id': 'fc433c38-b146-11e3-b8b3-44b1d1cd4c1f',
'ext': 'mp4',
@@ -29,7 +29,7 @@ class WashingtonPostIE(InfoExtractor):
'upload_date': '20140322',
},
}, {
'md5': 'f645a07652c2950cd9134bb852c5f5eb',
'md5': 'e1d5734c06865cc504ad99dc2de0d443',
'info_dict': {
'id': '41255e28-b14a-11e3-b8b3-44b1d1cd4c1f',
'ext': 'mp4',
@@ -44,10 +44,9 @@ class WashingtonPostIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_id = mobj.group('id')
page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id)
title = self._og_search_title(webpage)
uuids = re.findall(r'data-video-uuid="([^"]+)"', webpage)
entries = []

View File

@@ -1,6 +1,7 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import itertools
import re
from .common import InfoExtractor
@@ -67,6 +68,10 @@ class WDRIE(InfoExtractor):
'upload_date': '20140717',
},
},
{
'url': 'http://www1.wdr.de/mediathek/video/sendungen/quarks_und_co/filterseite-quarks-und-co100.html',
'playlist_mincount': 146,
}
]
def _real_extract(self, url):
@@ -81,6 +86,27 @@ class WDRIE(InfoExtractor):
self.url_result(page_url + href, 'WDR')
for href in re.findall(r'<a href="/?(.+?%s\.html)" rel="nofollow"' % self._PLAYER_REGEX, webpage)
]
if entries: # Playlist page
return self.playlist_result(entries, page_id)
# Overview page
entries = []
for page_num in itertools.count(2):
hrefs = re.findall(
r'<li class="mediathekvideo"\s*>\s*<img[^>]*>\s*<a href="(/mediathek/video/[^"]+)"',
webpage)
entries.extend(
self.url_result(page_url + href, 'WDR')
for href in hrefs)
next_url_m = re.search(
r'<li class="nextToLast">\s*<a href="([^"]+)"', webpage)
if not next_url_m:
break
next_url = page_url + next_url_m.group(1)
webpage = self._download_webpage(
next_url, page_id,
note='Downloading playlist page %d' % page_num)
return self.playlist_result(entries, page_id)
flashvars = compat_parse_qs(
@@ -172,8 +198,7 @@ class WDRMausIE(InfoExtractor):
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
param_code = self._html_search_regex(
@@ -224,5 +249,3 @@ class WDRMausIE(InfoExtractor):
'thumbnail': thumbnail,
'upload_date': upload_date,
}
# TODO test _1

View File

@@ -0,0 +1,102 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class WebOfStoriesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?webofstories\.com/play/(?:[^/]+/)?(?P<id>[0-9]+)'
_VIDEO_DOMAIN = 'http://eu-mobile.webofstories.com/'
_GREAT_LIFE_STREAMER = 'rtmp://eu-cdn1.webofstories.com/cfx/st/'
_USER_STREAMER = 'rtmp://eu-users.webofstories.com/cfx/st/'
_TESTS = [
{
'url': 'http://www.webofstories.com/play/hans.bethe/71',
'md5': '373e4dd915f60cfe3116322642ddf364',
'info_dict': {
'id': '4536',
'ext': 'mp4',
'title': 'The temperature of the sun',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'Hans Bethe talks about calculating the temperature of the sun',
'duration': 238,
}
},
{
'url': 'http://www.webofstories.com/play/55908',
'md5': '2985a698e1fe3211022422c4b5ed962c',
'info_dict': {
'id': '55908',
'ext': 'mp4',
'title': 'The story of Gemmata obscuriglobus',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'Planctomycete talks about The story of Gemmata obscuriglobus',
'duration': 169,
}
},
]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
description = self._html_search_meta('description', webpage)
thumbnail = self._og_search_thumbnail(webpage)
story_filename = self._search_regex(
r'\.storyFileName\("([^"]+)"\)', webpage, 'story filename')
speaker_id = self._search_regex(
r'\.speakerId\("([^"]+)"\)', webpage, 'speaker ID')
story_id = self._search_regex(
r'\.storyId\((\d+)\)', webpage, 'story ID')
speaker_type = self._search_regex(
r'\.speakerType\("([^"]+)"\)', webpage, 'speaker type')
great_life = self._search_regex(
r'isGreatLifeStory\s*=\s*(true|false)', webpage, 'great life story')
is_great_life_series = great_life == 'true'
duration = int_or_none(self._search_regex(
r'\.duration\((\d+)\)', webpage, 'duration', fatal=False))
# URL building, see: http://www.webofstories.com/scripts/player.js
ms_prefix = ''
if speaker_type.lower() == 'ms':
ms_prefix = 'mini_sites/'
if is_great_life_series:
mp4_url = '{0:}lives/{1:}/{2:}.mp4'.format(
self._VIDEO_DOMAIN, speaker_id, story_filename)
rtmp_ext = 'flv'
streamer = self._GREAT_LIFE_STREAMER
play_path = 'stories/{0:}/{1:}'.format(
speaker_id, story_filename)
else:
mp4_url = '{0:}{1:}{2:}/{3:}.mp4'.format(
self._VIDEO_DOMAIN, ms_prefix, speaker_id, story_filename)
rtmp_ext = 'mp4'
streamer = self._USER_STREAMER
play_path = 'mp4:{0:}{1:}/{2}.mp4'.format(
ms_prefix, speaker_id, story_filename)
formats = [{
'format_id': 'mp4_sd',
'url': mp4_url,
}, {
'format_id': 'rtmp_sd',
'page_url': url,
'url': streamer,
'ext': rtmp_ext,
'play_path': play_path,
}]
self._sort_formats(formats)
return {
'id': story_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'description': description,
'duration': duration,
}

View File

@@ -30,7 +30,7 @@ class XboxClipsIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
r'>(?:Link|Download): <a href="([^"]+)">', webpage, 'video URL')
r'>(?:Link|Download): <a[^>]+href="([^"]+)"', webpage, 'video URL')
title = self._html_search_regex(
r'<title>XboxClips \| ([^<]+)</title>', webpage, 'title')
upload_date = unified_strdate(self._html_search_regex(

View File

@@ -14,7 +14,7 @@ from ..utils import (
class XHamsterIE(InfoExtractor):
"""Information Extractor for xHamster"""
_VALID_URL = r'http://(?:.+?\.)?xhamster\.com/movies/(?P<id>[0-9]+)/(?P<seo>.+?)\.html(?:\?.*)?'
_VALID_URL = r'(?P<proto>https?)://(?:.+?\.)?xhamster\.com/movies/(?P<id>[0-9]+)/(?P<seo>.+?)\.html(?:\?.*)?'
_TESTS = [
{
'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
@@ -39,7 +39,11 @@ class XHamsterIE(InfoExtractor):
'duration': 200,
'age_limit': 18,
}
}
},
{
'url': 'https://xhamster.com/movies/2272726/amber_slayed_by_the_knight.html',
'only_matching': True,
},
]
def _real_extract(self, url):
@@ -57,7 +61,8 @@ class XHamsterIE(InfoExtractor):
video_id = mobj.group('id')
seo = mobj.group('seo')
mrss_url = 'http://xhamster.com/movies/%s/%s.html' % (video_id, seo)
proto = mobj.group('proto')
mrss_url = '%s://xhamster.com/movies/%s/%s.html' % (proto, video_id, seo)
webpage = self._download_webpage(mrss_url, video_id)
title = self._html_search_regex(r'<title>(?P<title>.+?) - xHamster\.com</title>', webpage, 'title')

Some files were not shown because too many files have changed in this diff Show More