Compare commits

..

290 Commits

Author SHA1 Message Date
d0f2ab6969 release 2014.04.13 2014-04-13 03:22:30 +02:00
de906ef543 [aol] Add support for playlists (Fixes #2730) 2014-04-13 03:22:24 +02:00
2fb3deeca1 [tube8] Fix extraction and modernize 2014-04-13 03:56:32 +07:00
66398056f1 Merge branch 'master' of github.com:rg3/youtube-dl 2014-04-12 17:15:16 +02:00
77477fa4c9 Merge branch 'atomicparsley' (closes #2436) 2014-04-12 15:52:42 +02:00
a169e18ce1 [atomicparsley] Remove unneeded __init__ method 2014-04-12 15:51:40 +02:00
381640e3ac [brightcove] Only use url from meta element if it has the 'playerKey' field (fixes #2738) 2014-04-12 12:53:48 +02:00
37e3410137 [prosiebensat1] Add one more clip id pattern (Closes #2737) 2014-04-12 02:53:55 +07:00
97b5196960 [weibo] Modernize 2014-04-11 16:02:34 +02:00
6a4f3528c8 [firstpost] Fix extraction 2014-04-11 20:40:42 +07:00
b9c76aa1a9 [youtube] Add support for cleanvideosearch.com (Fixes #2734) 2014-04-11 13:53:05 +02:00
0d3070d364 release 2014.04.11.2 2014-04-11 09:44:33 +02:00
7753cadbfa [comedycentral:shows] Add support for TDS special editions (Fixes #2733) 2014-04-11 09:30:07 +02:00
3950450342 [pyvideo] Fix title 2014-04-11 02:20:50 +02:00
c82b1fdad6 [slideshare] Fix description 2014-04-11 02:19:15 +02:00
b0fb63abe8 [dailymotion:playlist] Fix title 2014-04-11 02:16:46 +02:00
3ab34c603e [comedycentral] Fix test md5sum 2014-04-11 02:14:31 +02:00
7d6413341a release 2014.04.11.1 2014-04-11 01:29:54 +02:00
140012d0f6 release 2014.04.11 2014-04-11 01:28:30 +02:00
4be9f8c814 [ninegag] Add support for p/ URLs 2014-04-11 01:25:24 +02:00
5c802bac37 [byutv] Fix test 2014-04-10 19:37:55 +07:00
6c30ff756a [mpora] Fix test 2014-04-10 19:10:03 +07:00
62749e4708 [morningstar] Also support 'Cover' (#2729) 2014-04-09 20:51:28 +02:00
6b7dee4b38 [morningstar] Recognize urls that use 'videoCenter' (fixes #2729) 2014-04-09 20:45:49 +02:00
ef2041eb4e [br] Add audio extraction and support more URLs (Closes #2728) 2014-04-09 20:19:27 +07:00
29e3e682af [comedycentral] Match more URLs
Looks like they only offer clips instead of full episodes now. We'll need to add new parsing code as well.
2014-04-09 11:43:15 +02:00
f983c44199 Merge pull request #2725 from foolscap/subtitles-error-fix
Fix subtitle download error reporting (Fixes #2724)
2014-04-09 10:16:06 +02:00
e4db19511a Fix subtitle download error reporting (Fixes #2724) 2014-04-08 15:59:27 +01:00
c47d21da80 [ntv] Update test 2014-04-08 19:11:40 +07:00
269aecd0c0 [ffmpeg] Do not pass in byets to subprocess (Fixes #2717) 2014-04-07 23:33:05 +02:00
aafddb2b0a Merge remote-tracking branch 'anisse/fix-content-encoding-charset' 2014-04-07 23:27:03 +02:00
6262ac8ac5 release 2014.04.07.4 2014-04-07 23:23:54 +02:00
89938c719e Fix Windows output for non-BMP unicode characters 2014-04-07 23:23:48 +02:00
ec0fafbb19 [extractor/common] fallback on utf-8 when charset is not found
fixes #2721
2014-04-07 23:10:16 +02:00
a5863bdf33 release 2014.04.07.3 2014-04-07 22:48:45 +02:00
b58ddb32ba [utils] Completely rewrite Windows output (Fixes #2672) 2014-04-07 22:48:13 +02:00
b9e12a8140 release 2014.04.07.2 2014-04-07 21:41:20 +02:00
104aa7388a Use our own encoding when writing strings 2014-04-07 21:40:34 +02:00
c3855d28b0 Merge branch 'master' of github.com:rg3/youtube-dl 2014-04-07 19:57:51 +02:00
734f90bb41 Use --encoding when outputting 2014-04-07 19:57:42 +02:00
91a6addeeb Add support for rtve.es/alacarta 2014-04-07 17:30:32 +02:00
9afb76c5ad release 2014.04.07.1 2014-04-07 15:28:55 +02:00
dfb2cb5cfd [teamcoco] Simplify ID management (Closes #2715) 2014-04-07 15:25:35 +02:00
650d688d10 release 2014.04.07 2014-04-07 13:11:37 +02:00
0ba77818f3 [ted] Add width and height (Fixes #2716) 2014-04-07 13:11:30 +02:00
09baa7da7e [rts] Update test 2014-04-07 00:34:23 +07:00
85e787f51d [cbsnews] Add support for cbsnews.com (Closes #2691) 2014-04-06 06:03:58 +07:00
2a9e1e453a Merge branch 'master' of github.com:rg3/youtube-dl 2014-04-05 20:05:47 +02:00
ee1e199685 [justin.tv] Modernize (Fixes #2705) 2014-04-05 17:56:36 +02:00
17c5a00774 [novamov] Simplify 2014-04-05 19:36:22 +07:00
15c0e8e7b2 [generic] Generalize novamov based embeds 2014-04-05 17:20:05 +07:00
cca37fba48 [divxstage] Fix typo in IE_NAME 2014-04-05 17:15:43 +07:00
9d0993ec4a [movshare] Support more domains 2014-04-05 17:00:18 +07:00
342f33bf9e [divxstage] Support more domains 2014-04-05 16:50:05 +07:00
7cd3bc5f99 [nowvideo] Support more domains 2014-04-05 16:38:57 +07:00
931055e6cb [videoweed] Revert _FILE_DELETED_REGEX 2014-04-05 16:32:14 +07:00
d0e4cf82f1 [movshare] Add _FILE_DELETED_REGEX 2014-04-05 16:31:38 +07:00
6f88df2c57 [divxstage] Add support for divxstage.eu 2014-04-05 16:29:44 +07:00
4479bf2762 [videoweed] Simplify 2014-04-05 16:09:28 +07:00
1ff7c0f7d8 [movshare] Add support for movshare.net 2014-04-05 16:09:03 +07:00
610e47c87e Credit @sainyamkapoor for videoweed extractor 2014-04-05 15:53:50 +07:00
50f566076f [generic] Add support for videoweed embeds 2014-04-05 15:49:45 +07:00
92810ff497 [nowvideo] Improve _VALID_URL 2014-04-05 15:35:21 +07:00
60ccc59a1c [novamov] Improve _VALID_URL 2014-04-05 15:34:54 +07:00
91745595d3 [videoweed] Simplify 2014-04-05 15:32:55 +07:00
d6e40507d0 [videoweed]Cleanup 2014-04-05 10:53:22 +05:30
deed48b472 [Videoweed] Added support for videoweed. 2014-04-05 10:40:03 +05:30
e4d41bfca5 Merge pull request #2696 from anovicecodemonkey/support-ustream-embeds
[UstreamIE] [generic] Added support for Ustream embed URLs (Fixes #2694)
2014-04-04 23:33:08 +02:00
a355b70f27 [cspan] Do not test number of playlist entries
Apparently, CSpan switches between single-file and multiple-file results. Either one is fine as long as we get the full four hours.
2014-04-04 23:16:22 +02:00
f8514f6186 [rts] Use visible id in file names
Maybe the internal ID is more precise, but it's totally confusing, and the obvious ID still allows a google search.
2014-04-04 23:13:55 +02:00
e09b8fcd9d [ro220] Make test case more flexible
Either one or two spaces is fine here.
2014-04-04 23:08:33 +02:00
7d1b527ff9 [motorsport] Fix on Python 3 2014-04-04 23:06:27 +02:00
f943c7b622 release 2014.04.04.7 2014-04-04 23:01:45 +02:00
676eb3f2dd Fix unicode_escape (Fixes #2695) 2014-04-04 23:00:51 +02:00
98b7cf1ace release 2014.04.04.6 2014-04-04 22:48:35 +02:00
c465afd736 [teamcoco] Fix regex in 2.6 (#2700)
The re engine does not want to repeat an empty string, for fear that something like

    (.*)*

could be matching the tokens ...

    ""
    "" ""
    "" "" ""

Of course, that's harmless with a question mark, although still somewhat strange.
2014-04-04 22:46:47 +02:00
b84d6e7fc4 Merge remote-tracking branch 'AGSPhoenix/teamcoco-fix' 2014-04-04 22:44:49 +02:00
2efd5d78c1 release 2014.04.04.5 2014-04-04 22:24:45 +02:00
c8edf47b3a [yahoo] Support https and -uploader URLs (Fixes #2701) 2014-04-04 22:23:59 +02:00
3b4c26a428 [pornhd] Avoid shadowing variable url 2014-04-04 22:22:30 +02:00
1525148114 Remove unused imports 2014-04-04 22:22:11 +02:00
9e0c5791c1 release 2014.04.04.4 2014-04-04 22:15:32 +02:00
29a1ab2afc Add alternative --prefer-unsecure spelling (Closes #2697) 2014-04-04 22:15:21 +02:00
fa387d2d99 Revert "Workaround for regex engine limitation"
This reverts commit 6d0d573eca.
2014-04-04 15:37:49 -04:00
6d0d573eca Workaround for regex engine limitation 2014-04-04 15:25:28 -04:00
bb799e811b Add a test for the new URL pages
Add a test for the pages with the video_id in the URL.
2014-04-04 13:52:35 -04:00
04ee53eca1 Support TeamCoco URLs with video_id in the title
If the URL has the video_id in it, use that since the current method of
finding the id breaks on those pages.

Fixes 2698.
2014-04-04 13:42:34 -04:00
659eb98a53 [breakcom] Fix YouTube videos extraction (fixes #2699) 2014-04-04 19:01:18 +02:00
ca6aada48e Fix _TEST for Ustream embed URLs 2014-04-05 03:26:29 +10:30
43df5a7e71 [keezmovies] Modernize 2014-04-04 18:52:43 +02:00
88f1c6de7b [yahoo] Modernize 2014-04-04 18:52:43 +02:00
65a40ab82b [pornhd] Update test checksum 2014-04-04 22:47:38 +07:00
4b9cced103 [pornhd] Fix extraction (Closes #2693) 2014-04-04 22:45:39 +07:00
5c38625259 [UstreamIE] [generic] Added support for Ustream embed URLs (Fixes #2694) 2014-04-05 00:53:09 +10:30
6344fa04bb [rts] Add more formats and audio support (Closes #2689) 2014-04-04 20:42:06 +07:00
e3ced9ed61 [downloader/common] Use compat_str with the error in try_rename (appeared in #2389)
Otherwise on python 2.x we get `UnicodeDecodeError` because it may contain non ascii characters.
2014-04-04 14:59:11 +02:00
5075d598bc release 2014.04.04.2 2014-04-04 02:24:21 +02:00
68eb8e90e6 [youtube:playlist] Fix playlists for logged-in users (Fixes #2690) 2014-04-04 02:23:36 +02:00
d3a96346c4 release 2014.04.04.3 2014-04-04 02:09:16 +02:00
0e518e2fea [cnet] Fall back to "videos" key 2014-04-04 02:09:04 +02:00
1e0a235f39 [dailymotion] Fix playlist+user 2014-04-04 02:04:16 +02:00
9ad400f75e [generic] Remove test case that has become a 404 2014-04-04 01:47:17 +02:00
3537b93d8a [tests] Fix YoutubeDL tests
Since bec1fad, the id, title, and url (also in formats) keys are mandatory. Change the tests to reflect that.
2014-04-04 01:45:49 +02:00
56eca2e956 release 2014.04.04.1 2014-04-04 00:25:43 +02:00
2ad4d1ba07 [morningstar] Add new extractor (Fixes #2687) 2014-04-04 00:25:35 +02:00
4853de808b release 2014.04.04 2014-04-04 00:06:06 +02:00
6ff5f12218 [motorsport] Add extractor (Fixes #2688) 2014-04-04 00:05:43 +02:00
52a180684f [README] Fix VALID_URL in extractor example 2014-04-03 23:25:23 +02:00
b21e25702f Merge pull request #2681 from phihag/readme-dev-instructions
[README] Improve developer instructions
2014-04-03 23:06:15 +02:00
983af2600f [wimp] Detect youtube videos (fixes #2686) 2014-04-03 20:44:51 +02:00
f34e6a2cd6 [comedycentral:shows] Do no include 6-digit identifier in display ID 2014-04-03 18:39:00 +02:00
a9f304031b release 2014.04.03.3 2014-04-03 16:21:54 +02:00
9271bc8355 [cnet] Add new extractor (Fixes #2679) 2014-04-03 16:21:21 +02:00
d1b3e3dd75 [README] Add md5 to code example 2014-04-03 15:59:04 +02:00
968ed2a777 [comedycentral] Add test for #2677 2014-04-03 15:31:04 +02:00
24de5d2556 release 2014.04.03.2 2014-04-03 15:28:56 +02:00
d26e981df4 Correct check for empty dirname (Fixes #2683) 2014-04-03 15:28:41 +02:00
e45d40b171 [youtube:subscriptions] Add space to the description 2014-04-03 15:13:52 +02:00
4a419b8851 [c56] Modernize and add duration extraction 2014-04-03 19:53:11 +07:00
5fbd672c38 [README] Improve developer instructions
Add a longer tutorial that should cover everything needed to start developing IEs.

Fixes #2676
2014-04-03 14:46:24 +02:00
bec1fad223 [YouTubeDL] Throw an early error if the info_dict result is invalid 2014-04-03 14:38:16 +02:00
177fed41bc [comedycentral:shows] Support guest/ URLs (Fixes #2677) 2014-04-03 14:38:16 +02:00
b900e7cba4 [downloader/f4m] Close the final video 2014-04-03 13:35:07 +02:00
14cb4979f0 MANIFEST.in: Only list the files from the docs folder that will be included (closes #2623)
Pruning the _build folder produced the message `no previously-included directories found matching 'docs/_build'` when installing from the source distribution.
2014-04-03 13:26:27 +02:00
69e61e30fe release 2014.04.03.1 2014-04-03 08:55:59 +02:00
cce929eaac [franceculture] Add extractor (Fixes #2669) 2014-04-03 08:55:38 +02:00
b6cfde99b7 Only mention websense URL once 2014-04-03 08:12:53 +02:00
1be99f052d release 2014.04.03 2014-04-03 06:09:45 +02:00
2410c43d83 Detect Websense censorship (Fixes #2670) 2014-04-03 06:09:38 +02:00
aea6e7fc3c [cspan] Support multiple segments (Fixes #2674) 2014-04-03 06:09:38 +02:00
91a76c40c0 [musicplayon] Add support for musicplayon.com 2014-04-02 22:10:20 +07:00
d2b194607c release 2014.04.02 2014-04-02 14:26:34 +02:00
f6177462db [youtube] feeds: Also look for the html in the 'content_html' field (fixes #2671) 2014-04-02 14:13:08 +02:00
9ddaf4ef8c [comedycentral] Change XPath .//guid to ./guid (fixes #2668)
It fails to find the element in python 2.6 and it's not required, the
element is a direct child of the item node.
2014-04-01 21:38:07 +02:00
97b5573848 [comedycentral] Update test title for 34cbc7ee8d 2014-04-01 21:29:40 +02:00
18c95c1ab0 [rutube] Use _download_json 2014-04-01 20:30:22 +02:00
0479c625a4 [brightcove] Encode object_str with utf-8 2014-04-01 20:17:35 +07:00
f659951e22 [vk] Support optional dash for oid in embedded links 2014-04-01 19:38:42 +07:00
5853a7316e release 2014.04.01.3 2014-04-01 13:17:15 +02:00
a612753db9 [utils] Correct decoding of large unicode codepoints in uppercase_escape (Fixes #2664) 2014-04-01 13:17:07 +02:00
c8fc3fb524 release 2014.04.01.2 2014-04-01 05:57:15 +02:00
5912c639df [youtube] Transform google's JSON dialect (fixes #2663) 2014-04-01 05:56:56 +02:00
017e4dd58c release 2014.04.01.1 2014-04-01 00:25:17 +02:00
651486621d [comedycentral] Allow URLs with query parts (fixes #2661) 2014-04-01 00:25:11 +02:00
28d9032c88 release 2014.04.01 2014-04-01 00:02:39 +02:00
16f4eb723a [comedycentral] Add support for /videos URLs (Fixes #2660) 2014-04-01 00:02:32 +02:00
1cbd410620 [pyvideo] Modernize 2014-03-31 19:31:48 +07:00
d41ac5f5dc release 2014.03.30.1 2014-03-30 15:57:47 +02:00
9c1fc022ae [generic] Warn before fallback to automatic search 2014-03-30 15:57:35 +02:00
83d548ef0f [youtube] Encode ytsearch query 2014-03-30 15:57:35 +02:00
c72477bd32 [rutube] Modernize 2014-03-30 15:35:07 +07:00
9a7b072e38 [wdr] Add support for more wdrmaus subpages 2014-03-30 07:42:35 +02:00
cbc4a6cc7e release 2014.03.30 2014-03-30 07:25:48 +02:00
cd7481a39e [wdr] Add support for wdrmaus.de (Fixes #2651) 2014-03-30 07:25:42 +02:00
acd213ed6d Remove unusued imports 2014-03-30 07:16:07 +02:00
77ffa95701 [jsinterp] Better error messages 2014-03-30 07:15:14 +02:00
2b25cb5d76 [youtube] Move JavaScript interpreter into its own module 2014-03-30 07:02:58 +02:00
62fec3b2ff Add new --encoding option (Fixes #2650) 2014-03-30 06:08:22 +02:00
e79162558e [wat] Modernize 2014-03-29 15:15:16 +01:00
2da67107ee [tf1] Modernize 2014-03-29 15:05:15 +01:00
2ff7f8975e [nba] Modernize 2014-03-29 14:57:48 +01:00
87a2566048 [metacritic] Modernize test 2014-03-29 14:57:48 +01:00
986f56736b [roxwel] Modernize 2014-03-29 14:57:44 +01:00
2583a0308b [huffpost] Modernize test 2014-03-29 14:35:45 +01:00
40c716d2a2 [ign] Modernize 2014-03-29 14:34:03 +01:00
79bfd01001 [kickstarter] Fix extraction, extract more info and modernize 2014-03-29 14:22:28 +01:00
f2bcdd8e02 [discovery] modernize 2014-03-29 14:22:27 +01:00
8c5850eeb4 release 2014.03.29 2014-03-29 14:01:53 +01:00
bd3e077a2d Merge branch 'master' of github.com:rg3/youtube-dl 2014-03-29 14:01:19 +01:00
7e70ac36b3 [bloomberg] Fix extraction (fixes #2154)
Stop using the OoyalaIE, extract the f4m url instead.
2014-03-29 11:55:12 +01:00
2cc0082dc0 Credit @phaer for OE1 (#2646) 2014-03-29 10:11:32 +01:00
056b56688a [ntv] Simplify 2014-03-29 15:55:03 +07:00
b17418313f [oe1] Simplify (#2646) 2014-03-28 23:23:58 +01:00
e9a6fd6a68 Merge remote-tracking branch 'phaer/add-oe1-support' 2014-03-28 23:21:58 +01:00
bf30f3bd9d release 2014.03.28 2014-03-28 23:14:54 +01:00
330edf2d84 Mention where to find keys in --dump-json (Fixes #2648) 2014-03-28 23:13:03 +01:00
43f775e4ca [comedycentral] Duration can now be a float (Fixes #2647) 2014-03-28 23:06:34 +01:00
8f6562448c [ntv] Move app guess outside formats loop 2014-03-28 23:09:56 +07:00
263f4b514b [ntv] Add support for ntv.ru (Closes #2581) 2014-03-28 23:01:08 +07:00
f0da3f1ef9 [oe1] Add support for oe1.orf.at. 2014-03-28 17:57:25 +02:00
cb3ac1c610 [smotri] Modernize and add support for emdebbed videos (Closes #2585) 2014-03-28 19:58:49 +07:00
8efd15f477 [canalplus] Fix video id extraction (Closes #2645) 2014-03-28 18:47:15 +07:00
d26ebe990f [ehow] Modernize 2014-03-27 21:23:02 +01:00
28acf5500a [appletrailers] Modernize 2014-03-27 21:10:51 +01:00
214c22c704 [niconico] Modernize 2014-03-27 21:01:09 +01:00
8cdafb47b9 [mooshare] Add support for URLs starting with 'www' 2014-03-27 19:08:35 +07:00
0dae5083f1 [urort] Add date 2014-03-27 02:56:23 +01:00
4c89bbd22c release 2014.03.27.1 2014-03-27 02:52:06 +01:00
e2b06e76c1 [urort] Add extractor (Fixes #2634) 2014-03-27 02:51:50 +01:00
e9c076c317 [clipsyndicate] Modernize 2014-03-27 02:30:00 +01:00
6c072e7d25 release 2014.03.27 2014-03-27 02:22:57 +01:00
ac6c104871 [ted] Add support for watch/ URLs (Fixes #2637) 2014-03-27 02:22:40 +01:00
69c01a9f68 [comedycentral] Add a testcase for extended-interviews URLs (#2636) 2014-03-27 02:02:48 +01:00
e55213ce35 Merge remote-tracking branch 'malept/tds-extended-interviews' 2014-03-27 02:02:18 +01:00
24a2aac445 [comedycentral] fix TDS extended interviews
The new website broke the URL format.
Added "playlist" as a valid ID keyword.
2014-03-26 10:51:02 -07:00
784763c565 we don't need to run ffmpeg more times 2014-03-26 15:22:52 +01:00
39c68260c0 fix ffmpeg metadatapp 2014-03-26 15:22:52 +01:00
149254d0d5 fix ffmpeg error, if youtube-dl runs more than once with --embed-thumbnail with same video 2014-03-26 15:22:52 +01:00
0c14e2fbe3 add post processor 2014-03-26 15:22:51 +01:00
98acdc895b Merge remote-tracking branch 'dstftw/download-referer-header' (closes #2628) 2014-03-26 15:20:11 +01:00
bd3b5b8b10 [slashdot] Remove extractor
The generic ooyala detection works fine.
2014-03-26 15:09:14 +01:00
9a90636805 [vice] Remove extractor
The generic ooyala detection works fine.
2014-03-26 15:03:34 +01:00
6a66ae96ed [cspan] Roll back unfinished rtmp support 2014-03-26 19:51:54 +07:00
2c8a4ba6b5 Makefile: include the docs in the tarball 2014-03-26 12:01:08 +01:00
ad8915b729 Add --no-warnings option (Fixes #2630) 2014-03-26 00:43:46 +01:00
34cbc7ee8d [comedycentral] Better titles 2014-03-25 23:46:51 +01:00
a59e40a1ea Replace 'referer' with 'http_referer' 2014-03-25 21:53:26 +07:00
ad0a75db6b [auengine] Add referer 2014-03-25 21:22:41 +07:00
1d0e49e1c7 Use explicitly set Referer header for downloading 2014-03-25 21:22:27 +07:00
b4461b6ebe [auengine] Modernize 2014-03-25 21:16:10 +07:00
80959224fe release 2014.03.25.1 2014-03-25 14:27:40 +01:00
865cbf4fc5 [comedycentral] Correct uri (Fixes #2627) 2014-03-25 14:27:23 +01:00
196f061cac release 2014.03.25 2014-03-25 04:01:13 +01:00
99b380c33b [comedycentral] Fix thedailyshow / thecolbertreport (Fixes #2600, #2596) 2014-03-25 04:00:57 +01:00
02e4482e22 release 2014.03.24.5 2014-03-24 23:23:38 +01:00
b8a792de80 Merge remote-tracking branch 'origin/master' into HEAD
Conflicts:
	youtube_dl/extractor/arte.py
2014-03-24 23:23:17 +01:00
fac55558ad [washingtonpost] Add extractor (Fixes #2622) 2014-03-24 23:21:20 +01:00
b2799ff96d [arte] Fix videos.arte.tv extraction 2014-03-24 22:38:51 +01:00
7a249480b4 [arte] Fix video.arte.tv extractor 2014-03-24 22:34:03 +01:00
f605128d13 [rts] Add thumbnail support 2014-03-24 22:32:04 +01:00
ba40a74666 [clipfish] Modernize 2014-03-24 22:30:32 +01:00
fb8ae2d438 release 2014.03.24.4 2014-03-24 22:03:51 +01:00
893f8832b5 [arte] Add support for embedded videos (Fixes #2620) 2014-03-24 22:01:47 +01:00
878d11ec29 [arte] Add support for multiple formats 2014-03-24 21:36:26 +01:00
515bbe4b5b [arte] Remove liveweb support
liveweb.arte.tv is no longer functional, everything has moved to concert
2014-03-24 21:31:19 +01:00
75f2e25ba9 [downloader/hls] Encode filename (Fixes #2609) 2014-03-24 21:23:05 +01:00
0d466d34a3 release 2014.03.24.3 2014-03-24 17:12:42 +01:00
6949d81095 [byutv] Add support (Fixes #2612) 2014-03-24 17:12:15 +01:00
f847ca02d3 [addanime] Modernize 2014-03-24 16:39:53 +01:00
510243ba58 release 2014.03.24.2 2014-03-24 15:00:47 +01:00
b540697a8a [veoh] Improve extraction, fix youtube extraction (Closes #2616) 2014-03-24 20:53:03 +07:00
0d3641e589 [cinemassacre] Fix #2815 2014-03-24 13:43:13 +01:00
72546c831e Merge pull request #2553 from anisse/master
Add an option to specify custom HTTP headers
2014-03-24 10:42:58 +01:00
d26db9269d release 2014.03.24.1 2014-03-24 10:25:58 +01:00
4c0941853a [devscripts/release] Check version number 2014-03-24 10:25:49 +01:00
c11726364e release 2014.03.24 2014-03-24 10:17:35 +01:00
c577d735c6 release 2013.03.24.2 2014-03-24 02:24:31 +01:00
9f0375f61a release 2013.03.24.1 2014-03-24 02:22:12 +01:00
5e114e4bfe [soundcloud] Always add streaming formats 2014-03-24 02:21:17 +01:00
83622b6d2f [soundcloud] Simplify string literals 2014-03-24 02:15:31 +01:00
3d87426c2d release 2013.03.24 2014-03-24 01:42:14 +01:00
ce328530a9 Merge remote-tracking branch 'origin/master' 2014-03-24 01:42:11 +01:00
f70daac108 [RTS] Add extractor (Fixes #2608) 2014-03-24 01:41:14 +01:00
912b38b428 [instagram] Fix info_dict key name 2014-03-24 01:40:09 +01:00
6e25c58ed7 Merge pull request #2567 from jaimeMF/sphinx-docs
Add initial sphinx docs
2014-03-24 00:50:32 +01:00
51fb2e98d2 [radiofrance] Modernize 2014-03-23 17:43:33 +01:00
38d63d846e [extractor/common] Clarify preference key in formats 2014-03-23 17:41:43 +01:00
07cec9776e release 2014.03.23 2014-03-23 16:06:41 +01:00
ea38e55fff [instagram] Add support for user profiles (Fixes #2606) 2014-03-23 16:06:07 +01:00
257cfebfe6 [test] Move expect_info_dict out of test_download 2014-03-23 15:52:21 +01:00
6eefe53329 [utils] Simplify setproctitle 2014-03-23 14:28:22 +01:00
1986025d2b [xbef] (Add extractor) 2014-03-23 14:04:36 +01:00
c9aa111b4f [worldstarhiphop] Modernize 2014-03-23 13:49:15 +01:00
bfcb6e3917 Merge remote-tracking branch 'fiocfun/xtube-user-extractor' 2014-03-23 13:36:14 +01:00
2c1396073e [metacafe] Remove accidently inserted comment string 2014-03-23 05:16:02 +07:00
401983c6a0 [metacafe] More modernize 2014-03-23 05:13:15 +07:00
391dc3ee07 [metacafe] Replace cbs test 2014-03-23 05:08:11 +07:00
be3b8fa30f [metacafe] Modernize 2014-03-23 05:05:31 +07:00
9f5809b3e8 [xtube] user playlist extractor 2014-03-23 00:16:35 +06:00
0320ddc192 [pornhub] Fix uploader extraction and extract counts 2014-03-22 21:30:22 +07:00
56dd55721c Remove unused imports and clarify variable names 2014-03-22 15:17:32 +01:00
231f76b530 [toypics] Separate user and video extraction (#2601) 2014-03-22 15:15:01 +01:00
55442a7812 Merge remote-tracking branch 'fiocfun/toypics-support' 2014-03-22 14:24:44 +01:00
43b81eb98a [youtube] Remove useless resolution fields from format definitions
These can be - and are - calculated automatically by the YoutubeDL core.
2014-03-22 14:22:41 +01:00
bfd718793c Merge remote-tracking branch 'hurda/patch-1' 2014-03-22 14:21:04 +01:00
a9c2896e22 Make missing test definition fields an error
If the result is not testable (for example, because a description changes often), either pass in a type or a regular expression (a string starting with 're:')
2014-03-22 14:20:07 +01:00
278229d195 itag 160 is 144p, not 192p 2014-03-22 12:15:45 +01:00
fa154d1dbe [videolectures.net] Make description optional 2014-03-22 12:10:56 +01:00
7e2ede9891 [generic] Run TED detection before JW Player detection
Otherwise it overwrittes the `mobj` variable.
2014-03-22 10:20:44 +01:00
74af99fc2f toypics.net support 2014-03-22 04:07:44 +06:00
0f2a2ba14b Merge remote-tracking branch 'dstftw/generic-webpage-unescape'
Conflicts:
	youtube_dl/extractor/generic.py
2014-03-21 22:14:24 +01:00
e24b5a8610 [ooyala] Modernize 2014-03-21 21:55:51 +01:00
750f9020ae [generic] Recognize more Ooyala embedded videos (#2569) 2014-03-21 21:51:33 +01:00
f82863851e Add an extractor for on.aol.com 2014-03-21 19:54:44 +01:00
933a5b3792 Add extractor for Engadget and 5min (closes #2465)
engadget.com uses the generic 5min.com service.
2014-03-21 19:13:46 +01:00
aa488e1385 [xtube] Fix formats extraction 2014-03-21 23:58:40 +07:00
d77650525d release 2014.03.21.5 2014-03-21 14:52:57 +01:00
3e50c29984 release 2014.03.21.4 2014-03-21 14:38:55 +01:00
64e7ad6045 [videolectures] (New extractor) 2014-03-21 14:38:41 +01:00
23f4a93bb4 [daum] Modernize 2014-03-21 14:38:41 +01:00
6f13b055f1 [cspan] Fix typo in a comment 2014-03-21 08:01:20 +01:00
1f91bd15c3 release 2014.03.21.3 2014-03-21 02:10:35 +01:00
11a15be4ce [cspan] Add support for newer videos (Fixes #2577) 2014-03-21 02:10:24 +01:00
14e17e18cb release 2014.03.21.2 2014-03-21 01:42:45 +01:00
1b124d1942 [parliamentliveuk] Add extractor 2014-03-21 01:42:28 +01:00
410afb2003 Add an option to specify custom HTTP headers 2014-03-17 16:40:41 +01:00
685052fc7b Add initial sphinx docs
With an initial guide for using youtube_dl from python programs.
2014-03-15 19:08:09 +01:00
d95e35d659 [generic] Add nowvideo test hidden behind percent encoding 2014-03-15 04:39:53 +07:00
1439073049 [generic] Add comment for unescaping webpage contents 2014-03-15 04:38:49 +07:00
1f7659dbe9 [generic] Unescape webpage contents 2014-03-15 04:21:17 +07:00
117 changed files with 4437 additions and 1429 deletions

View File

@ -3,3 +3,4 @@ include test/*.py
include test/*.json
include youtube-dl.bash-completion
include youtube-dl.1
recursive-include docs Makefile conf.py *.rst

View File

@ -72,8 +72,9 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude '__pycache' \
--exclude '.git' \
--exclude 'testdata' \
--exclude 'docs/_build' \
-- \
bin devscripts test youtube_dl \
bin devscripts test youtube_dl docs \
CHANGELOG LICENSE README.md README.txt \
Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion setup.py \
youtube-dl

View File

@ -28,6 +28,9 @@ which means you can modify it, redistribute it or use it however you like.
--user-agent UA specify a custom user agent
--referer REF specify a custom referer, use if the video
access is restricted to one domain
--add-header FIELD:VALUE specify a custom HTTP header and its value,
separated by a colon ':'. You can use this
option multiple times
--list-extractors List all supported extractors and the URLs
they would handle
--extractor-descriptions Output descriptions of all supported
@ -62,6 +65,7 @@ which means you can modify it, redistribute it or use it however you like.
configuration in ~/.config/youtube-dl.conf
(%APPDATA%/youtube-dl/config.txt on
Windows)
--encoding ENCODING Force the specified encoding (experimental)
## Video Selection:
--playlist-start NUMBER playlist video to start at (default is 1)
@ -166,6 +170,7 @@ which means you can modify it, redistribute it or use it however you like.
## Verbosity / Simulation Options:
-q, --quiet activates quiet mode
--no-warnings Ignore warnings
-s, --simulate do not download the video and do not write
anything to disk
--skip-download do not download the video
@ -177,7 +182,9 @@ which means you can modify it, redistribute it or use it however you like.
--get-duration simulate, quiet but print video length
--get-filename simulate, quiet but print output filename
--get-format simulate, quiet but print output format
-j, --dump-json simulate, quiet but print JSON information
-j, --dump-json simulate, quiet but print JSON information.
See --output for a description of available
keys.
--newline output progress bar as new lines
--no-progress do not print progress bar
--console-title display progress in console titlebar
@ -243,6 +250,7 @@ which means you can modify it, redistribute it or use it however you like.
default
--embed-subs embed subtitles in the video (only for mp4
videos)
--embed-thumbnail embed thumbnail in the audio as cover art
--add-metadata write metadata to the video file
--xattrs write metadata to the video file's xattrs
(using dublin core and xdg standards)
@ -364,7 +372,67 @@ If you want to create a build of youtube-dl yourself, you'll need
### Adding support for a new site
If you want to add support for a new site, copy *any* [recently modified](https://github.com/rg3/youtube-dl/commits/master/youtube_dl/extractor) file in `youtube_dl/extractor`, add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py). Have a look at [`youtube_dl/common/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L38). Don't forget to run the tests with `python test/test_download.py TestDownload.test_YourExtractor`! For a detailed tutorial, refer to [this blog post](http://filippo.io/add-support-for-a-new-video-site-to-youtube-dl/).
If you want to add support for a new site, you can follow this quick list (assuming your service is called `yourextractor`):
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class YourExtractorIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://yourextractor.com/watch/42',
'md5': 'TODO: md5 sum of the first 10KiB of the video file',
'info_dict': {
'id': '42',
'ext': 'mp4',
'title': 'Video title goes here',
# TODO more properties, either as:
# * A value
# * MD5 checksum; start the string with md5:
# * A regular expression; start the string with re:
# * Any Python type (for example int or float)
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
# TODO more code goes here, for example ...
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<h1>(.*?)</h1>', webpage, 'title')
return {
'id': video_id,
'title': title,
# TODO more properties (see youtube_dl/extractor/common.py)
}
5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done.
7. Have a look at [`youtube_dl/common/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L38). Add tests and code for as many as you want.
8. If you can, check the code with [pyflakes](https://pypi.python.org/pypi/pyflakes) (a good idea) and [pep8](https://pypi.python.org/pypi/pep8) (optional, ignore E501).
9. When the tests pass, [add](https://www.kernel.org/pub/software/scm/git/docs/git-add.html) the new files and [commit](https://www.kernel.org/pub/software/scm/git/docs/git-commit.html) them and [push](https://www.kernel.org/pub/software/scm/git/docs/git-push.html) the result, like this:
$ git add youtube_dl/extractor/__init__.py
$ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions!
# BUGS

View File

@ -22,6 +22,12 @@ fi
if [ -z "$1" ]; then echo "ERROR: specify version number like this: $0 1994.09.06"; exit 1; fi
version="$1"
major_version=$(echo "$version" | sed -n 's#^\([0-9]*\.[0-9]*\.[0-9]*\).*#\1#p')
if test "$major_version" '!=' "$(date '+%Y.%m.%d')"; then
echo "$version does not start with today's date!"
exit 1
fi
if [ ! -z "`git tag | grep "$version"`" ]; then echo 'ERROR: version already present'; exit 1; fi
if [ ! -z "`git status --porcelain | grep -v CHANGELOG`" ]; then echo 'ERROR: the working directory is not clean; commit or stash changes'; exit 1; fi
useless_files=$(find youtube_dl -type f -not -name '*.py')

1
docs/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
_build/

177
docs/Makefile Normal file
View File

@ -0,0 +1,177 @@
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
clean:
rm -rf $(BUILDDIR)/*
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/youtube-dl.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/youtube-dl.qhc"
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/youtube-dl"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/youtube-dl"
@echo "# devhelp"
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."
info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."
xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."

71
docs/conf.py Normal file
View File

@ -0,0 +1,71 @@
# -*- coding: utf-8 -*-
#
# youtube-dl documentation build configuration file, created by
# sphinx-quickstart on Fri Mar 14 21:05:43 2014.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys
import os
# Allows to import youtube_dl
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# -- General configuration ------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'youtube-dl'
copyright = u'2014, Ricardo Garcia Gonzalez'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
import youtube_dl
version = youtube_dl.__version__
# The full version, including alpha/beta/rc tags.
release = version
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build']
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'default'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Output file base name for HTML help builder.
htmlhelp_basename = 'youtube-dldoc'

23
docs/index.rst Normal file
View File

@ -0,0 +1,23 @@
Welcome to youtube-dl's documentation!
======================================
*youtube-dl* is a command-line program to download videos from YouTube.com and more sites.
It can also be used in Python code.
Developer guide
---------------
This section contains information for using *youtube-dl* from Python programs.
.. toctree::
:maxdepth: 2
module_guide
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

67
docs/module_guide.rst Normal file
View File

@ -0,0 +1,67 @@
Using the ``youtube_dl`` module
===============================
When using the ``youtube_dl`` module, you start by creating an instance of :class:`YoutubeDL` and adding all the available extractors:
.. code-block:: python
>>> from youtube_dl import YoutubeDL
>>> ydl = YoutubeDL()
>>> ydl.add_default_info_extractors()
Extracting video information
----------------------------
You use the :meth:`YoutubeDL.extract_info` method for getting the video information, which returns a dictionary:
.. code-block:: python
>>> info = ydl.extract_info('http://www.youtube.com/watch?v=BaW_jenozKc', download=False)
[youtube] Setting language
[youtube] BaW_jenozKc: Downloading webpage
[youtube] BaW_jenozKc: Downloading video info webpage
[youtube] BaW_jenozKc: Extracting video information
>>> info['title']
'youtube-dl test video "\'/\\ä↭𝕐'
>>> info['height'], info['width']
(720, 1280)
If you want to download or play the video you can get its url:
.. code-block:: python
>>> info['url']
'https://...'
Extracting playlist information
-------------------------------
The playlist information is extracted in a similar way, but the dictionary is a bit different:
.. code-block:: python
>>> playlist = ydl.extract_info('http://www.ted.com/playlists/13/open_source_open_world', download=False)
[TED] open_source_open_world: Downloading playlist webpage
...
>>> playlist['title']
'Open-source, open world'
You can access the videos in the playlist with the ``entries`` field:
.. code-block:: python
>>> for video in playlist['entries']:
... print('Video #%d: %s' % (video['playlist_index'], video['title']))
Video #1: How Arduino is open-sourcing imagination
Video #2: The year open data went worldwide
Video #3: Massive-scale online collaboration
Video #4: The art of asking
Video #5: How cognitive surplus will change the world
Video #6: The birth of Wikipedia
Video #7: Coding a better government
Video #8: The era of open innovation
Video #9: The currency of the new economy is trust

View File

@ -9,7 +9,10 @@ import sys
import youtube_dl.extractor
from youtube_dl import YoutubeDL
from youtube_dl.utils import preferredencoding
from youtube_dl.utils import (
compat_str,
preferredencoding,
)
def get_params(override=None):
@ -83,3 +86,45 @@ def gettestcases():
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
def expect_info_dict(self, expected_dict, got_dict):
for info_field, expected in expected_dict.items():
if isinstance(expected, compat_str) and expected.startswith('re:'):
got = got_dict.get(info_field)
match_str = expected[len('re:'):]
match_rex = re.compile(match_str)
self.assertTrue(
isinstance(got, compat_str) and match_rex.match(got),
u'field %s (value: %r) should match %r' % (info_field, got, match_str))
elif isinstance(expected, type):
got = got_dict.get(info_field)
self.assertTrue(isinstance(got, expected),
u'Expected type %r, but got value %r of type %r' % (expected, got, type(got)))
else:
if isinstance(expected, compat_str) and expected.startswith('md5:'):
got = 'md5:' + md5(got_dict.get(info_field))
else:
got = got_dict.get(info_field)
self.assertEqual(expected, got,
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
# Check for the presence of mandatory fields
for key in ('id', 'url', 'title', 'ext'):
self.assertTrue(got_dict.get(key), 'Missing mandatory field %s' % key)
# Check for mandatory fields that are automatically set by YoutubeDL
for key in ['webpage_url', 'extractor', 'extractor_key']:
self.assertTrue(got_dict.get(key), u'Missing field: %s' % key)
# Are checkable fields missing from the test case definition?
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
for key, value in got_dict.items()
if value and key in ('title', 'description', 'uploader', 'upload_date', 'timestamp', 'uploader_id', 'location'))
missing_keys = set(test_info_dict.keys()) - set(expected_dict.keys())
if missing_keys:
sys.stderr.write(u'\n"info_dict": ' + json.dumps(test_info_dict, ensure_ascii=False, indent=4) + u'\n')
self.assertFalse(
missing_keys,
'Missing keys in test definition: %s' % (
', '.join(sorted(missing_keys))))

View File

@ -26,16 +26,27 @@ class YDL(FakeYDL):
self.msgs.append(msg)
def _make_result(formats, **kwargs):
res = {
'formats': formats,
'id': 'testid',
'title': 'testttitle',
'extractor': 'testex',
}
res.update(**kwargs)
return res
class TestFormatSelection(unittest.TestCase):
def test_prefer_free_formats(self):
# Same resolution => download webm
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{'ext': 'webm', 'height': 460},
{'ext': 'mp4', 'height': 460},
{'ext': 'webm', 'height': 460, 'url': 'x'},
{'ext': 'mp4', 'height': 460, 'url': 'y'},
]
info_dict = {'formats': formats, 'extractor': 'test'}
info_dict = _make_result(formats)
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict)
@ -46,8 +57,8 @@ class TestFormatSelection(unittest.TestCase):
ydl = YDL()
ydl.params['prefer_free_formats'] = True
formats = [
{'ext': 'webm', 'height': 720},
{'ext': 'mp4', 'height': 1080},
{'ext': 'webm', 'height': 720, 'url': 'a'},
{'ext': 'mp4', 'height': 1080, 'url': 'b'},
]
info_dict['formats'] = formats
yie = YoutubeIE(ydl)
@ -60,9 +71,9 @@ class TestFormatSelection(unittest.TestCase):
ydl = YDL()
ydl.params['prefer_free_formats'] = False
formats = [
{'ext': 'webm', 'height': 720},
{'ext': 'mp4', 'height': 720},
{'ext': 'flv', 'height': 720},
{'ext': 'webm', 'height': 720, 'url': '_'},
{'ext': 'mp4', 'height': 720, 'url': '_'},
{'ext': 'flv', 'height': 720, 'url': '_'},
]
info_dict['formats'] = formats
yie = YoutubeIE(ydl)
@ -74,8 +85,8 @@ class TestFormatSelection(unittest.TestCase):
ydl = YDL()
ydl.params['prefer_free_formats'] = False
formats = [
{'ext': 'flv', 'height': 720},
{'ext': 'webm', 'height': 720},
{'ext': 'flv', 'height': 720, 'url': '_'},
{'ext': 'webm', 'height': 720, 'url': '_'},
]
info_dict['formats'] = formats
yie = YoutubeIE(ydl)
@ -91,8 +102,7 @@ class TestFormatSelection(unittest.TestCase):
{'format_id': 'great', 'url': 'http://example.com/great', 'preference': 3},
{'format_id': 'excellent', 'url': 'http://example.com/exc', 'preference': 4},
]
info_dict = {
'formats': formats, 'extractor': 'test', 'id': 'testvid'}
info_dict = _make_result(formats)
ydl = YDL()
ydl.process_ie_result(info_dict)
@ -120,12 +130,12 @@ class TestFormatSelection(unittest.TestCase):
def test_format_selection(self):
formats = [
{'format_id': '35', 'ext': 'mp4', 'preference': 1},
{'format_id': '45', 'ext': 'webm', 'preference': 2},
{'format_id': '47', 'ext': 'webm', 'preference': 3},
{'format_id': '2', 'ext': 'flv', 'preference': 4},
{'format_id': '35', 'ext': 'mp4', 'preference': 1, 'url': '_'},
{'format_id': '45', 'ext': 'webm', 'preference': 2, 'url': '_'},
{'format_id': '47', 'ext': 'webm', 'preference': 3, 'url': '_'},
{'format_id': '2', 'ext': 'flv', 'preference': 4, 'url': '_'},
]
info_dict = {'formats': formats, 'extractor': 'test'}
info_dict = _make_result(formats)
ydl = YDL({'format': '20/47'})
ydl.process_ie_result(info_dict.copy())
@ -154,12 +164,12 @@ class TestFormatSelection(unittest.TestCase):
def test_format_selection_audio(self):
formats = [
{'format_id': 'audio-low', 'ext': 'webm', 'preference': 1, 'vcodec': 'none'},
{'format_id': 'audio-mid', 'ext': 'webm', 'preference': 2, 'vcodec': 'none'},
{'format_id': 'audio-high', 'ext': 'flv', 'preference': 3, 'vcodec': 'none'},
{'format_id': 'vid', 'ext': 'mp4', 'preference': 4},
{'format_id': 'audio-low', 'ext': 'webm', 'preference': 1, 'vcodec': 'none', 'url': '_'},
{'format_id': 'audio-mid', 'ext': 'webm', 'preference': 2, 'vcodec': 'none', 'url': '_'},
{'format_id': 'audio-high', 'ext': 'flv', 'preference': 3, 'vcodec': 'none', 'url': '_'},
{'format_id': 'vid', 'ext': 'mp4', 'preference': 4, 'url': '_'},
]
info_dict = {'formats': formats, 'extractor': 'test'}
info_dict = _make_result(formats)
ydl = YDL({'format': 'bestaudio'})
ydl.process_ie_result(info_dict.copy())
@ -172,10 +182,10 @@ class TestFormatSelection(unittest.TestCase):
self.assertEqual(downloaded['format_id'], 'audio-low')
formats = [
{'format_id': 'vid-low', 'ext': 'mp4', 'preference': 1},
{'format_id': 'vid-high', 'ext': 'mp4', 'preference': 2},
{'format_id': 'vid-low', 'ext': 'mp4', 'preference': 1, 'url': '_'},
{'format_id': 'vid-high', 'ext': 'mp4', 'preference': 2, 'url': '_'},
]
info_dict = {'formats': formats, 'extractor': 'test'}
info_dict = _make_result(formats)
ydl = YDL({'format': 'bestaudio/worstaudio/best'})
ydl.process_ie_result(info_dict.copy())
@ -184,11 +194,11 @@ class TestFormatSelection(unittest.TestCase):
def test_format_selection_video(self):
formats = [
{'format_id': 'dash-video-low', 'ext': 'mp4', 'preference': 1, 'acodec': 'none'},
{'format_id': 'dash-video-high', 'ext': 'mp4', 'preference': 2, 'acodec': 'none'},
{'format_id': 'vid', 'ext': 'mp4', 'preference': 3},
{'format_id': 'dash-video-low', 'ext': 'mp4', 'preference': 1, 'acodec': 'none', 'url': '_'},
{'format_id': 'dash-video-high', 'ext': 'mp4', 'preference': 2, 'acodec': 'none', 'url': '_'},
{'format_id': 'vid', 'ext': 'mp4', 'preference': 3, 'url': '_'},
]
info_dict = {'formats': formats, 'extractor': 'test'}
info_dict = _make_result(formats)
ydl = YDL({'format': 'bestvideo'})
ydl.process_ie_result(info_dict.copy())
@ -217,10 +227,12 @@ class TestFormatSelection(unittest.TestCase):
for f1id, f2id in zip(order, order[1:]):
f1 = YoutubeIE._formats[f1id].copy()
f1['format_id'] = f1id
f1['url'] = 'url:' + f1id
f2 = YoutubeIE._formats[f2id].copy()
f2['format_id'] = f2id
f2['url'] = 'url:' + f2id
info_dict = {'formats': [f1, f2], 'extractor': 'youtube'}
info_dict = _make_result([f1, f2], extractor='youtube')
ydl = YDL()
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
@ -228,7 +240,7 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], f1id)
info_dict = {'formats': [f2, f1], 'extractor': 'youtube'}
info_dict = _make_result([f2, f1], extractor='youtube')
ydl = YDL()
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])

View File

@ -49,6 +49,7 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch('http://youtu.be/BaW_jenozKc', ['youtube'])
self.assertMatch('http://www.youtube.com/v/BaW_jenozKc', ['youtube'])
self.assertMatch('https://youtube.googleapis.com/v/BaW_jenozKc', ['youtube'])
self.assertMatch('http://www.cleanvideosearch.com/media/action/yt/watch?videoId=8v_4O44sfjM', ['youtube'])
def test_youtube_channel_matching(self):
assertChannel = lambda url: self.assertMatch(url, ['youtube:channel'])
@ -143,5 +144,37 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch('http://video.pbs.org/viralplayer/2365173446/', ['PBS'])
self.assertMatch('http://video.pbs.org/widget/partnerplayer/980042464/', ['PBS'])
def test_ComedyCentralShows(self):
self.assertMatch(
'http://thedailyshow.cc.com/extended-interviews/xm3fnq/andrew-napolitano-extended-interview',
['ComedyCentralShows'])
self.assertMatch(
'http://thecolbertreport.cc.com/videos/29w6fx/-realhumanpraise-for-fox-news',
['ComedyCentralShows'])
self.assertMatch(
'http://thecolbertreport.cc.com/videos/gh6urb/neil-degrasse-tyson-pt--1?xrs=eml_col_031114',
['ComedyCentralShows'])
self.assertMatch(
'http://thedailyshow.cc.com/guests/michael-lewis/3efna8/exclusive---michael-lewis-extended-interview-pt--3',
['ComedyCentralShows'])
self.assertMatch(
'http://thedailyshow.cc.com/episodes/sy7yv0/april-8--2014---denis-leary',
['ComedyCentralShows'])
self.assertMatch(
'http://thecolbertreport.cc.com/episodes/8ase07/april-8--2014---jane-goodall',
['ComedyCentralShows'])
self.assertMatch(
'http://thedailyshow.cc.com/video-playlists/npde3s/the-daily-show-19088-highlights',
['ComedyCentralShows'])
self.assertMatch(
'http://thedailyshow.cc.com/special-editions/2l8fdb/special-edition---a-look-back-at-food',
['ComedyCentralShows'])
def test_yahoo_https(self):
# https://github.com/rg3/youtube-dl/issues/2701
self.assertMatch(
'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html',
['Yahoo'])
if __name__ == '__main__':
unittest.main()

View File

@ -9,16 +9,16 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import (
get_params,
gettestcases,
try_rm,
expect_info_dict,
md5,
report_warning
try_rm,
report_warning,
)
import hashlib
import io
import json
import re
import socket
import youtube_dl.YoutubeDL
@ -135,40 +135,8 @@ def generator(test_case):
self.assertEqual(md5_for_file, tc['md5'])
with io.open(info_json_fn, encoding='utf-8') as infof:
info_dict = json.load(infof)
for (info_field, expected) in tc.get('info_dict', {}).items():
if isinstance(expected, compat_str) and expected.startswith('re:'):
got = info_dict.get(info_field)
match_str = expected[len('re:'):]
match_rex = re.compile(match_str)
self.assertTrue(
isinstance(got, compat_str) and match_rex.match(got),
u'field %s (value: %r) should match %r' % (info_field, got, match_str))
elif isinstance(expected, type):
got = info_dict.get(info_field)
self.assertTrue(isinstance(got, expected),
u'Expected type %r, but got value %r of type %r' % (expected, got, type(got)))
else:
if isinstance(expected, compat_str) and expected.startswith('md5:'):
got = 'md5:' + md5(info_dict.get(info_field))
else:
got = info_dict.get(info_field)
self.assertEqual(expected, got,
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
# Check for the presence of mandatory fields
for key in ('id', 'url', 'title', 'ext'):
self.assertTrue(key in info_dict.keys() and info_dict[key])
# Check for mandatory fields that are automatically set by YoutubeDL
for key in ['webpage_url', 'extractor', 'extractor_key']:
self.assertTrue(info_dict.get(key), u'Missing field: %s' % key)
# If checkable fields are missing from the test case, print the info_dict
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
for key, value in info_dict.items()
if value and key in ('title', 'description', 'uploader', 'upload_date', 'timestamp', 'uploader_id', 'location'))
if not all(key in tc.get('info_dict', {}).keys() for key in test_info_dict.keys()):
sys.stderr.write(u'\n"info_dict": ' + json.dumps(test_info_dict, ensure_ascii=False, indent=4) + u'\n')
expect_info_dict(self, tc.get('info_dict', {}), info_dict)
finally:
try_rm_tcs_files()

View File

@ -9,8 +9,10 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
from test.helper import (
expect_info_dict,
FakeYDL,
)
from youtube_dl.extractor import (
AcademicEarthCourseIE,
@ -37,6 +39,11 @@ from youtube_dl.extractor import (
GoogleSearchIE,
GenericIE,
TEDIE,
ToypicsUserIE,
XTubeUserIE,
InstagramUserIE,
CSpanIE,
AolIE,
)
@ -269,5 +276,68 @@ class TestPlaylists(unittest.TestCase):
self.assertEqual(result['title'], 'Who are the hackers?')
self.assertTrue(len(result['entries']) >= 6)
def test_toypics_user(self):
dl = FakeYDL()
ie = ToypicsUserIE(dl)
result = ie.extract('http://videos.toypics.net/Mikey')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'Mikey')
self.assertTrue(len(result['entries']) >= 17)
def test_xtube_user(self):
dl = FakeYDL()
ie = XTubeUserIE(dl)
result = ie.extract('http://www.xtube.com/community/profile.php?user=greenshowers')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'greenshowers')
self.assertTrue(len(result['entries']) >= 155)
def test_InstagramUser(self):
dl = FakeYDL()
ie = InstagramUserIE(dl)
result = ie.extract('http://instagram.com/porsche')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], 'porsche')
self.assertTrue(len(result['entries']) >= 2)
test_video = next(
e for e in result['entries']
if e['id'] == '614605558512799803_462752227')
dl.add_default_extra_info(test_video, ie, '(irrelevant URL)')
dl.process_video_result(test_video, download=False)
EXPECTED = {
'id': '614605558512799803_462752227',
'ext': 'mp4',
'title': '#Porsche Intelligent Performance.',
'thumbnail': 're:^https?://.*\.jpg',
'uploader': 'Porsche',
'uploader_id': 'porsche',
'timestamp': 1387486713,
'upload_date': '20131219',
}
expect_info_dict(self, EXPECTED, test_video)
def test_CSpan_playlist(self):
dl = FakeYDL()
ie = CSpanIE(dl)
result = ie.extract(
'http://www.c-span.org/video/?318608-1/gm-ignition-switch-recall')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '342759')
self.assertEqual(
result['title'], 'General Motors Ignition Switch Recall')
whole_duration = sum(e['duration'] for e in result['entries'])
self.assertEqual(whole_duration, 14855)
def test_aol_playlist(self):
dl = FakeYDL()
ie = AolIE(dl)
result = ie.extract(
'http://on.aol.com/playlist/brace-yourself---todays-weirdest-news-152147?icid=OnHomepageC4_Omg_Img#_videoid=518184316')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], '152147')
self.assertEqual(
result['title'], 'Brace Yourself - Today\'s Weirdest News')
self.assertTrue(len(result['entries']) >= 10)
if __name__ == '__main__':
unittest.main()

View File

@ -10,6 +10,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Various small unit tests
import io
import json
import xml.etree.ElementTree
#from youtube_dl.utils import htmlentity_transform
@ -35,6 +36,9 @@ from youtube_dl.utils import (
url_basename,
urlencode_postdata,
xpath_with_ns,
parse_iso8601,
strip_jsonp,
uppercase_escape,
)
if sys.version_info < (3, 0):
@ -266,5 +270,19 @@ class TestUtil(unittest.TestCase):
data = urlencode_postdata({'username': 'foo@bar.com', 'password': '1234'})
self.assertTrue(isinstance(data, bytes))
def test_parse_iso8601(self):
self.assertEqual(parse_iso8601('2014-03-23T23:04:26+0100'), 1395612266)
self.assertEqual(parse_iso8601('2014-03-23T22:04:26+0000'), 1395612266)
self.assertEqual(parse_iso8601('2014-03-23T22:04:26Z'), 1395612266)
def test_strip_jsonp(self):
stripped = strip_jsonp('cb ([ {"id":"532cb",\n\n\n"x":\n3}\n]\n);')
d = json.loads(stripped)
self.assertEqual(d, [{"id": "532cb", "x": 3}])
def test_uppercase_escpae(self):
self.assertEqual(uppercase_escape(u''), u'')
self.assertEqual(uppercase_escape(u'\\U0001d550'), u'𝕐')
if __name__ == '__main__':
unittest.main()

82
youtube_dl/YoutubeDL.py Normal file → Executable file
View File

@ -8,6 +8,7 @@ import datetime
import errno
import io
import json
import locale
import os
import platform
import re
@ -94,6 +95,7 @@ class YoutubeDL(object):
usenetrc: Use netrc for authentication instead.
verbose: Print additional info to stdout.
quiet: Do not print messages to stdout.
no_warnings: Do not print out anything for warnings.
forceurl: Force printing final URL.
forcetitle: Force printing title.
forceid: Force printing ID.
@ -158,6 +160,7 @@ class YoutubeDL(object):
include_ads: Download ads as well
default_search: Prepend this string if an input url is not valid.
'auto' for elaborate guessing
encoding: Use this encoding instead of the system-specified.
The following parameters are not used by YoutubeDL itself, they are used by
the FileDownloader:
@ -283,6 +286,9 @@ class YoutubeDL(object):
"""Print message to stdout if not in quiet mode."""
return self.to_stdout(message, skip_eol, check_quiet=True)
def _write_string(self, s, out=None):
write_string(s, out=out, encoding=self.params.get('encoding'))
def to_stdout(self, message, skip_eol=False, check_quiet=False):
"""Print message to stdout if not in quiet mode."""
if self.params.get('logger'):
@ -292,7 +298,7 @@ class YoutubeDL(object):
terminator = ['\n', ''][skip_eol]
output = message + terminator
write_string(output, self._screen_file)
self._write_string(output, self._screen_file)
def to_stderr(self, message):
"""Print message to stderr."""
@ -302,7 +308,7 @@ class YoutubeDL(object):
else:
message = self._bidi_workaround(message)
output = message + '\n'
write_string(output, self._err_file)
self._write_string(output, self._err_file)
def to_console_title(self, message):
if not self.params.get('consoletitle', False):
@ -312,21 +318,21 @@ class YoutubeDL(object):
# already of type unicode()
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
elif 'TERM' in os.environ:
write_string('\033]0;%s\007' % message, self._screen_file)
self._write_string('\033]0;%s\007' % message, self._screen_file)
def save_console_title(self):
if not self.params.get('consoletitle', False):
return
if 'TERM' in os.environ:
# Save the title on stack
write_string('\033[22;0t', self._screen_file)
self._write_string('\033[22;0t', self._screen_file)
def restore_console_title(self):
if not self.params.get('consoletitle', False):
return
if 'TERM' in os.environ:
# Restore the title from stack
write_string('\033[23;0t', self._screen_file)
self._write_string('\033[23;0t', self._screen_file)
def __enter__(self):
self.save_console_title()
@ -376,6 +382,8 @@ class YoutubeDL(object):
if self.params.get('logger') is not None:
self.params['logger'].warning(message)
else:
if self.params.get('no_warnings'):
return
if self._err_file.isatty() and os.name != 'nt':
_msg_header = '\033[0;33mWARNING:\033[0m'
else:
@ -512,13 +520,7 @@ class YoutubeDL(object):
'_type': 'compat_list',
'entries': ie_result,
}
self.add_extra_info(ie_result,
{
'extractor': ie.IE_NAME,
'webpage_url': url,
'webpage_url_basename': url_basename(url),
'extractor_key': ie.ie_key(),
})
self.add_default_extra_info(ie_result, ie, url)
if process:
return self.process_ie_result(ie_result, download, extra_info)
else:
@ -537,6 +539,14 @@ class YoutubeDL(object):
else:
self.report_error('no suitable InfoExtractor for URL %s' % url)
def add_default_extra_info(self, ie_result, ie, url):
self.add_extra_info(ie_result, {
'extractor': ie.IE_NAME,
'webpage_url': url,
'webpage_url_basename': url_basename(url),
'extractor_key': ie.ie_key(),
})
def process_ie_result(self, ie_result, download=True, extra_info={}):
"""
Take the result of the ie(may be modified) and resolve all unresolved
@ -695,6 +705,11 @@ class YoutubeDL(object):
def process_video_result(self, info_dict, download=True):
assert info_dict.get('_type', 'video') == 'video'
if 'id' not in info_dict:
raise ExtractorError('Missing "id" field in extractor result')
if 'title' not in info_dict:
raise ExtractorError('Missing "title" field in extractor result')
if 'playlist' not in info_dict:
# It isn't part of a playlist
info_dict['playlist'] = None
@ -726,6 +741,9 @@ class YoutubeDL(object):
# We check that all the formats have the format and format_id fields
for i, format in enumerate(formats):
if 'url' not in format:
raise ExtractorError('Missing "url" key in result (index %d)' % i)
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
if format.get('format') is None:
@ -736,7 +754,7 @@ class YoutubeDL(object):
)
# Automatically determine file extension if missing
if 'ext' not in format:
format['ext'] = determine_ext(format['url'])
format['ext'] = determine_ext(format['url']).lower()
format_limit = self.params.get('format_limit', None)
if format_limit:
@ -861,7 +879,7 @@ class YoutubeDL(object):
try:
dn = os.path.dirname(encodeFilename(filename))
if dn != '' and not os.path.exists(dn):
if dn and not os.path.exists(dn):
os.makedirs(dn)
except (OSError, IOError) as err:
self.report_error('unable to create directory ' + compat_str(err))
@ -918,7 +936,7 @@ class YoutubeDL(object):
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
subfile.write(sub)
except (OSError, IOError):
self.report_error('Cannot write subtitles file ' + descfn)
self.report_error('Cannot write subtitles file ' + sub_filename)
return
if self.params.get('writeinfojson', False):
@ -1195,7 +1213,17 @@ class YoutubeDL(object):
def print_debug_header(self):
if not self.params.get('verbose'):
return
write_string('[debug] youtube-dl version ' + __version__ + '\n')
write_string(
'[debug] Encodings: locale %s, fs %s, out %s, pref %s\n' % (
locale.getpreferredencoding(),
sys.getfilesystemencoding(),
sys.stdout.encoding,
self.get_encoding()),
encoding=None
)
self._write_string('[debug] youtube-dl version ' + __version__ + '\n')
try:
sp = subprocess.Popen(
['git', 'rev-parse', '--short', 'HEAD'],
@ -1204,20 +1232,20 @@ class YoutubeDL(object):
out, err = sp.communicate()
out = out.decode().strip()
if re.match('[0-9a-f]+', out):
write_string('[debug] Git HEAD: ' + out + '\n')
self._write_string('[debug] Git HEAD: ' + out + '\n')
except:
try:
sys.exc_clear()
except:
pass
write_string('[debug] Python version %s - %s' %
self._write_string('[debug] Python version %s - %s' %
(platform.python_version(), platform_name()) + '\n')
proxy_map = {}
for handler in self._opener.handlers:
if hasattr(handler, 'proxies'):
proxy_map.update(handler.proxies)
write_string('[debug] Proxy map: ' + compat_str(proxy_map) + '\n')
self._write_string('[debug] Proxy map: ' + compat_str(proxy_map) + '\n')
def _setup_opener(self):
timeout_val = self.params.get('socket_timeout')
@ -1259,3 +1287,19 @@ class YoutubeDL(object):
# (See https://github.com/rg3/youtube-dl/issues/1309 for details)
opener.addheaders = []
self._opener = opener
def encode(self, s):
if isinstance(s, bytes):
return s # Already encoded
try:
return s.encode(self.get_encoding())
except UnicodeEncodeError as err:
err.reason = err.reason + '. Check your system encoding configuration or use the --encoding option.'
raise
def get_encoding(self):
encoding = self.params.get('encoding')
if encoding is None:
encoding = preferredencoding()
return encoding

View File

@ -51,6 +51,8 @@ __authors__ = (
'David Wagner',
'Juan C. Olivares',
'Mattias Harrysson',
'phaer',
'Sainyam Kapoor',
)
__license__ = 'Public Domain'
@ -90,6 +92,8 @@ from .extractor import gen_extractors
from .version import __version__
from .YoutubeDL import YoutubeDL
from .postprocessor import (
AtomicParsleyPP,
FFmpegAudioFixPP,
FFmpegMetadataPP,
FFmpegVideoConvertor,
FFmpegExtractAudioPP,
@ -227,6 +231,9 @@ def parseOpts(overrideArguments=None):
general.add_option('--referer',
dest='referer', help='specify a custom referer, use if the video access is restricted to one domain',
metavar='REF', default=None)
general.add_option('--add-header',
dest='headers', help='specify a custom HTTP header and its value, separated by a colon \':\'. You can use this option multiple times', action="append",
metavar='FIELD:VALUE')
general.add_option('--list-extractors',
action='store_true', dest='list_extractors',
help='List all supported extractors and the URLs they would handle', default=False)
@ -238,7 +245,7 @@ def parseOpts(overrideArguments=None):
help='Use the specified HTTP/HTTPS proxy. Pass in an empty string (--proxy "") for direct connection')
general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.')
general.add_option(
'--prefer-insecure', action='store_true', dest='prefer_insecure',
'--prefer-insecure', '--prefer-unsecure', action='store_true', dest='prefer_insecure',
help='Use an unencrypted connection to retrieve information about the video. (Currently supported only for YouTube)')
general.add_option(
'--cache-dir', dest='cachedir', default=get_cachedir(), metavar='DIR',
@ -252,13 +259,17 @@ def parseOpts(overrideArguments=None):
general.add_option(
'--bidi-workaround', dest='bidi_workaround', action='store_true',
help=u'Work around terminals that lack bidirectional text support. Requires bidiv or fribidi executable in PATH')
general.add_option('--default-search',
dest='default_search', metavar='PREFIX',
help='Use this prefix for unqualified URLs. For example "gvsearch2:" downloads two videos from google videos for youtube-dl "large apple". By default (with value "auto") youtube-dl guesses.')
general.add_option(
'--default-search',
dest='default_search', metavar='PREFIX',
help='Use this prefix for unqualified URLs. For example "gvsearch2:" downloads two videos from google videos for youtube-dl "large apple". By default (with value "auto") youtube-dl guesses.')
general.add_option(
'--ignore-config',
action='store_true',
help='Do not read configuration files. When given in the global configuration file /etc/youtube-dl.conf: do not read the user configuration in ~/.config/youtube-dl.conf (%APPDATA%/youtube-dl/config.txt on Windows)')
general.add_option(
'--encoding', dest='encoding', metavar='ENCODING',
help='Force the specified encoding (experimental)')
selection.add_option(
'--playlist-start',
@ -361,6 +372,10 @@ def parseOpts(overrideArguments=None):
verbosity.add_option('-q', '--quiet',
action='store_true', dest='quiet', help='activates quiet mode', default=False)
verbosity.add_option(
'--no-warnings',
dest='no_warnings', action='store_true', default=False,
help='Ignore warnings')
verbosity.add_option('-s', '--simulate',
action='store_true', dest='simulate', help='do not download the video and do not write anything to disk', default=False)
verbosity.add_option('--skip-download',
@ -388,7 +403,7 @@ def parseOpts(overrideArguments=None):
help='simulate, quiet but print output format', default=False)
verbosity.add_option('-j', '--dump-json',
action='store_true', dest='dumpjson',
help='simulate, quiet but print JSON information', default=False)
help='simulate, quiet but print JSON information. See --output for a description of available keys.', default=False)
verbosity.add_option('--newline',
action='store_true', dest='progress_with_newline', help='output progress bar as new lines', default=False)
verbosity.add_option('--no-progress',
@ -490,6 +505,8 @@ def parseOpts(overrideArguments=None):
help='do not overwrite post-processed files; the post-processed files are overwritten by default')
postproc.add_option('--embed-subs', action='store_true', dest='embedsubtitles', default=False,
help='embed subtitles in the video (only for mp4 videos)')
postproc.add_option('--embed-thumbnail', action='store_true', dest='embedthumbnail', default=False,
help='embed thumbnail in the audio as cover art')
postproc.add_option('--add-metadata', action='store_true', dest='addmetadata', default=False,
help='write metadata to the video file')
postproc.add_option('--xattrs', action='store_true', dest='xattrs', default=False,
@ -532,8 +549,6 @@ def parseOpts(overrideArguments=None):
write_string(u'[debug] System config: ' + repr(_hide_login_info(systemConf)) + '\n')
write_string(u'[debug] User config: ' + repr(_hide_login_info(userConf)) + '\n')
write_string(u'[debug] Command-line args: ' + repr(_hide_login_info(commandLineConf)) + '\n')
write_string(u'[debug] Encodings: locale %r, fs %r, out %r, pref: %r\n' %
(locale.getpreferredencoding(), sys.getfilesystemencoding(), sys.stdout.encoding, preferredencoding()))
return parser, opts, args
@ -556,6 +571,16 @@ def _real_main(argv=None):
if opts.referer is not None:
std_headers['Referer'] = opts.referer
# Custom HTTP headers
if opts.headers is not None:
for h in opts.headers:
if h.find(':', 1) < 0:
parser.error(u'wrong header formatting, it should be key:value, not "%s"'%h)
key, value = h.split(':', 2)
if opts.verbose:
write_string(u'[debug] Adding header from command line option %s:%s\n'%(key, value))
std_headers[key] = value
# Dump user agent
if opts.dump_user_agent:
compat_print(std_headers['User-Agent'])
@ -657,7 +682,7 @@ def _real_main(argv=None):
date = DateRange.day(opts.date)
else:
date = DateRange(opts.dateafter, opts.datebefore)
if opts.default_search not in ('auto', None) and ':' not in opts.default_search:
if opts.default_search not in ('auto', 'auto_warning', None) and ':' not in opts.default_search:
parser.error(u'--default-search invalid; did you forget a colon (:) at the end?')
# Do not download videos when there are audio-only formats
@ -695,6 +720,7 @@ def _real_main(argv=None):
'password': opts.password,
'videopassword': opts.videopassword,
'quiet': (opts.quiet or any_printing),
'no_warnings': opts.no_warnings,
'forceurl': opts.geturl,
'forcetitle': opts.gettitle,
'forceid': opts.getid,
@ -767,6 +793,7 @@ def _real_main(argv=None):
'include_ads': opts.include_ads,
'default_search': opts.default_search,
'youtube_include_dash_manifest': opts.youtube_include_dash_manifest,
'encoding': opts.encoding,
}
with YoutubeDL(ydl_opts) as ydl:
@ -785,6 +812,10 @@ def _real_main(argv=None):
ydl.add_post_processor(FFmpegEmbedSubtitlePP(subtitlesformat=opts.subtitlesformat))
if opts.xattrs:
ydl.add_post_processor(XAttrMetadataPP())
if opts.embedthumbnail:
if not opts.addmetadata:
ydl.add_post_processor(FFmpegAudioFixPP())
ydl.add_post_processor(AtomicParsleyPP())
# Update version
if opts.update_self:

View File

@ -4,9 +4,10 @@ import sys
import time
from ..utils import (
compat_str,
encodeFilename,
timeconvert,
format_bytes,
timeconvert,
)
@ -173,7 +174,7 @@ class FileDownloader(object):
return
os.rename(encodeFilename(old_filename), encodeFilename(new_filename))
except (IOError, OSError) as err:
self.report_error(u'unable to rename file: %s' % str(err))
self.report_error(u'unable to rename file: %s' % compat_str(err))
def try_utime(self, filename, last_modified_hdr):
"""Try to set the last-modified time of the given file."""

View File

@ -297,6 +297,7 @@ class F4mFD(FileDownloader):
break
frags_filenames.append(frag_filename)
dest_stream.close()
self.report_finish(format_bytes(state['downloaded_bytes']), time.time() - start)
self.try_rename(tmpfilename, filename)

View File

@ -13,8 +13,10 @@ class HlsFD(FileDownloader):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
args = ['-y', '-i', url, '-f', 'mp4', '-c', 'copy',
'-bsf:a', 'aac_adtstoasc', tmpfilename]
args = [
'-y', '-i', url, '-f', 'mp4', '-c', 'copy',
'-bsf:a', 'aac_adtstoasc',
encodeFilename(tmpfilename, for_subprocess=True)]
for program in ['avconv', 'ffmpeg']:
try:

View File

@ -23,6 +23,8 @@ class HttpFD(FileDownloader):
headers = {'Youtubedl-no-compression': 'True'}
if 'user_agent' in info_dict:
headers['Youtubedl-user-agent'] = info_dict['user_agent']
if 'http_referer' in info_dict:
headers['Referer'] = info_dict['http_referer']
basic_request = compat_urllib_request.Request(url, None, headers)
request = compat_urllib_request.Request(url, None, headers)

View File

@ -2,6 +2,7 @@ from .academicearth import AcademicEarthCourseIE
from .addanime import AddAnimeIE
from .aftonbladet import AftonbladetIE
from .anitube import AnitubeIE
from .aol import AolIE
from .aparat import AparatIE
from .appletrailers import AppleTrailersIE
from .archiveorg import ArchiveOrgIE
@ -13,6 +14,7 @@ from .arte import (
ArteTVConcertIE,
ArteTVFutureIE,
ArteTVDDCIE,
ArteTVEmbedIE,
)
from .auengine import AUEngineIE
from .bambuser import BambuserIE, BambuserChannelIE
@ -24,11 +26,13 @@ from .bloomberg import BloombergIE
from .br import BRIE
from .breakcom import BreakIE
from .brightcove import BrightcoveIE
from .byutv import BYUtvIE
from .c56 import C56IE
from .canal13cl import Canal13clIE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .cbs import CBSIE
from .cbsnews import CBSNewsIE
from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE
from .chilloutzone import ChilloutzoneIE
@ -37,6 +41,7 @@ from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE
from .clipsyndicate import ClipsyndicateIE
from .cmt import CMTIE
from .cnet import CNETIE
from .cnn import (
CNNIE,
CNNBlogsIE,
@ -58,12 +63,14 @@ from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE
from .divxstage import DivxStageIE
from .dropbox import DropboxIE
from .ebaumsworld import EbaumsWorldIE
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .eitb import EitbIE
from .elpais import ElPaisIE
from .engadget import EngadgetIE
from .escapist import EscapistIE
from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE
@ -72,12 +79,14 @@ from .facebook import FacebookIE
from .faz import FazIE
from .firstpost import FirstpostIE
from .firsttv import FirstTVIE
from .fivemin import FiveMinIE
from .fktv import (
FKTVIE,
FKTVPosteckeIE,
)
from .flickr import FlickrIE
from .fourtube import FourTubeIE
from .franceculture import FranceCultureIE
from .franceinter import FranceInterIE
from .francetv import (
PluzzIE,
@ -109,7 +118,7 @@ from .imdb import (
)
from .ina import InaIE
from .infoq import InfoQIE
from .instagram import InstagramIE
from .instagram import InstagramIE, InstagramUserIE
from .internetvideoarchive import InternetVideoArchiveIE
from .iprima import IPrimaIE
from .ivi import (
@ -147,10 +156,14 @@ from .mixcloud import MixcloudIE
from .mpora import MporaIE
from .mofosex import MofosexIE
from .mooshare import MooshareIE
from .morningstar import MorningstarIE
from .motorsport import MotorsportIE
from .movshare import MovShareIE
from .mtv import (
MTVIE,
MTVIggyIE,
)
from .musicplayon import MusicPlayOnIE
from .muzu import MuzuTVIE
from .myspace import MySpaceIE
from .myspass import MySpassIE
@ -172,8 +185,11 @@ from .normalboots import NormalbootsIE
from .novamov import NovaMovIE
from .nowness import NownessIE
from .nowvideo import NowVideoIE
from .ntv import NTVIE
from .oe1 import OE1IE
from .ooyala import OoyalaIE
from .orf import ORFIE
from .parliamentliveuk import ParliamentLiveUKIE
from .pbs import PBSIE
from .photobucket import PhotobucketIE
from .playvid import PlayvidIE
@ -191,6 +207,8 @@ from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtlnow import RTLnowIE
from .rts import RTSIE
from .rtve import RTVEALaCartaIE
from .rutube import (
RutubeIE,
RutubeChannelIE,
@ -201,7 +219,6 @@ from .rutv import RUTVIE
from .savefrom import SaveFromIE
from .servingsys import ServingSysIE
from .sina import SinaIE
from .slashdot import SlashdotIE
from .slideshare import SlideshareIE
from .smotri import (
SmotriIE,
@ -235,6 +252,7 @@ from .theplatform import ThePlatformIE
from .thisav import ThisAVIE
from .tinypic import TinyPicIE
from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE
from .trutube import TruTubeIE
@ -249,18 +267,20 @@ from .udemy import (
UdemyCourseIE
)
from .unistra import UnistraIE
from .urort import UrortIE
from .ustream import UstreamIE, UstreamChannelIE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vesti import VestiIE
from .vevo import VevoIE
from .vice import ViceIE
from .viddler import ViddlerIE
from .videobam import VideoBamIE
from .videodetective import VideoDetectiveIE
from .videolecturesnet import VideoLecturesNetIE
from .videofyme import VideofyMeIE
from .videopremium import VideoPremiumIE
from .videoweed import VideoWeedIE
from .vimeo import (
VimeoIE,
VimeoChannelIE,
@ -273,16 +293,21 @@ from .vine import VineIE
from .viki import VikiIE
from .vk import VKIE
from .vube import VubeIE
from .washingtonpost import WashingtonPostIE
from .wat import WatIE
from .wdr import WDRIE
from .wdr import (
WDRIE,
WDRMausIE,
)
from .weibo import WeiboIE
from .wimp import WimpIE
from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
from .xbef import XBefIE
from .xhamster import XHamsterIE
from .xnxx import XNXXIE
from .xvideos import XVideosIE
from .xtube import XTubeIE
from .xtube import XTubeUserIE, XTubeIE
from .yahoo import (
YahooIE,
YahooNewsIE,

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@ -14,14 +16,14 @@ from ..utils import (
class AddAnimeIE(InfoExtractor):
_VALID_URL = r'^http://(?:\w+\.)?add-anime\.net/watch_video\.php\?(?:.*?)v=(?P<video_id>[\w_]+)(?:.*)'
IE_NAME = u'AddAnime'
_TEST = {
u'url': u'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
u'file': u'24MR3YO5SAS9.mp4',
u'md5': u'72954ea10bc979ab5e2eb288b21425a0',
u'info_dict': {
u"description": u"One Piece 606",
u"title": u"One Piece 606"
'url': 'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
'md5': '72954ea10bc979ab5e2eb288b21425a0',
'info_dict': {
'id': '24MR3YO5SAS9',
'ext': 'mp4',
'description': 'One Piece 606',
'title': 'One Piece 606',
}
}
@ -38,10 +40,10 @@ class AddAnimeIE(InfoExtractor):
redir_webpage = ee.cause.read().decode('utf-8')
action = self._search_regex(
r'<form id="challenge-form" action="([^"]+)"',
redir_webpage, u'Redirect form')
redir_webpage, 'Redirect form')
vc = self._search_regex(
r'<input type="hidden" name="jschl_vc" value="([^"]+)"/>',
redir_webpage, u'redirect vc value')
redir_webpage, 'redirect vc value')
av = re.search(
r'a\.value = ([0-9]+)[+]([0-9]+)[*]([0-9]+);',
redir_webpage)
@ -52,19 +54,19 @@ class AddAnimeIE(InfoExtractor):
parsed_url = compat_urllib_parse_urlparse(url)
av_val = av_res + len(parsed_url.netloc)
confirm_url = (
parsed_url.scheme + u'://' + parsed_url.netloc +
parsed_url.scheme + '://' + parsed_url.netloc +
action + '?' +
compat_urllib_parse.urlencode({
'jschl_vc': vc, 'jschl_answer': compat_str(av_val)}))
self._download_webpage(
confirm_url, video_id,
note=u'Confirming after redirect')
note='Confirming after redirect')
webpage = self._download_webpage(url, video_id)
formats = []
for format_id in ('normal', 'hq'):
rex = r"var %s_video_file = '(.*?)';" % re.escape(format_id)
video_url = self._search_regex(rex, webpage, u'video file URLx',
video_url = self._search_regex(rex, webpage, 'video file URLx',
fatal=False)
if not video_url:
continue
@ -72,14 +74,13 @@ class AddAnimeIE(InfoExtractor):
'format_id': format_id,
'url': video_url,
})
if not formats:
raise ExtractorError(u'Cannot find any video format!')
self._sort_formats(formats)
video_title = self._og_search_title(webpage)
video_description = self._og_search_description(webpage)
return {
'_type': 'video',
'id': video_id,
'id': video_id,
'formats': formats,
'title': video_title,
'description': video_description

View File

@ -0,0 +1,65 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .fivemin import FiveMinIE
class AolIE(InfoExtractor):
IE_NAME = 'on.aol.com'
_VALID_URL = r'''(?x)
(?:
aol-video:|
http://on\.aol\.com/
(?:
video/.*-|
playlist/(?P<playlist_display_id>[^/?#]+?)-(?P<playlist_id>[0-9]+)[?#].*_videoid=
)
)
(?P<id>[0-9]+)
(?:$|\?)
'''
_TEST = {
'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
'md5': '18ef68f48740e86ae94b98da815eec42',
'info_dict': {
'id': '518167793',
'ext': 'mp4',
'title': 'U.S. Official Warns Of \'Largest Ever\' IRS Phone Scam',
},
'add_ie': ['FiveMin'],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
playlist_id = mobj.group('playlist_id')
if playlist_id and not self._downloader.params.get('noplaylist'):
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
webpage = self._download_webpage(url, playlist_id)
title = self._html_search_regex(
r'<h1 class="video-title[^"]*">(.+?)</h1>', webpage, 'title')
playlist_html = self._search_regex(
r"(?s)<ul\s+class='video-related[^']*'>(.*?)</ul>", webpage,
'playlist HTML')
entries = [{
'_type': 'url',
'url': 'aol-video:%s' % m.group('id'),
'ie_key': 'Aol',
} for m in re.finditer(
r"<a\s+href='.*videoid=(?P<id>[0-9]+)'\s+class='video-thumb'>",
playlist_html)]
return {
'_type': 'playlist',
'id': playlist_id,
'display_id': mobj.group('playlist_display_id'),
'title': title,
'entries': entries,
}
return FiveMinIE._build_result(video_id)

View File

@ -6,7 +6,6 @@ import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
determine_ext,
)
@ -16,9 +15,10 @@ class AppleTrailersIE(InfoExtractor):
"url": "http://trailers.apple.com/trailers/wb/manofsteel/",
"playlist": [
{
"file": "manofsteel-trailer4.mov",
"md5": "d97a8e575432dbcb81b7c3acb741f8a8",
"info_dict": {
"id": "manofsteel-trailer4",
"ext": "mov",
"duration": 111,
"title": "Trailer 4",
"upload_date": "20130523",
@ -26,9 +26,10 @@ class AppleTrailersIE(InfoExtractor):
},
},
{
"file": "manofsteel-trailer3.mov",
"md5": "b8017b7131b721fb4e8d6f49e1df908c",
"info_dict": {
"id": "manofsteel-trailer3",
"ext": "mov",
"duration": 182,
"title": "Trailer 3",
"upload_date": "20130417",
@ -36,9 +37,10 @@ class AppleTrailersIE(InfoExtractor):
},
},
{
"file": "manofsteel-trailer.mov",
"md5": "d0f1e1150989b9924679b441f3404d48",
"info_dict": {
"id": "manofsteel-trailer",
"ext": "mov",
"duration": 148,
"title": "Trailer",
"upload_date": "20121212",
@ -46,15 +48,16 @@ class AppleTrailersIE(InfoExtractor):
},
},
{
"file": "manofsteel-teaser.mov",
"md5": "5fe08795b943eb2e757fa95cb6def1cb",
"info_dict": {
"id": "manofsteel-teaser",
"ext": "mov",
"duration": 93,
"title": "Teaser",
"upload_date": "20120721",
"uploader_id": "wb",
},
}
},
]
}
@ -65,16 +68,16 @@ class AppleTrailersIE(InfoExtractor):
movie = mobj.group('movie')
uploader_id = mobj.group('company')
playlist_url = compat_urlparse.urljoin(url, u'includes/playlists/itunes.inc')
playlist_url = compat_urlparse.urljoin(url, 'includes/playlists/itunes.inc')
def fix_html(s):
s = re.sub(r'(?s)<script[^<]*?>.*?</script>', u'', s)
s = re.sub(r'(?s)<script[^<]*?>.*?</script>', '', s)
s = re.sub(r'<img ([^<]*?)>', r'<img \1/>', s)
# The ' in the onClick attributes are not escaped, it couldn't be parsed
# like: http://trailers.apple.com/trailers/wb/gravity/
def _clean_json(m):
return u'iTunes.playURL(%s);' % m.group(1).replace('\'', '&#39;')
return 'iTunes.playURL(%s);' % m.group(1).replace('\'', '&#39;')
s = re.sub(self._JSON_RE, _clean_json, s)
s = u'<html>' + s + u'</html>'
s = '<html>' + s + u'</html>'
return s
doc = self._download_xml(playlist_url, movie, transform_source=fix_html)
@ -82,7 +85,7 @@ class AppleTrailersIE(InfoExtractor):
for li in doc.findall('./div/ul/li'):
on_click = li.find('.//a').attrib['onClick']
trailer_info_json = self._search_regex(self._JSON_RE,
on_click, u'trailer info')
on_click, 'trailer info')
trailer_info = json.loads(trailer_info_json)
title = trailer_info['title']
video_id = movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', title).lower()
@ -98,8 +101,7 @@ class AppleTrailersIE(InfoExtractor):
first_url = trailer_info['url']
trailer_id = first_url.split('/')[-1].rpartition('_')[0].lower()
settings_json_url = compat_urlparse.urljoin(url, 'includes/settings/%s.json' % trailer_id)
settings_json = self._download_webpage(settings_json_url, trailer_id, u'Downloading settings json')
settings = json.loads(settings_json)
settings = self._download_json(settings_json_url, trailer_id, 'Downloading settings json')
formats = []
for format in settings['metadata']['sizes']:
@ -107,7 +109,6 @@ class AppleTrailersIE(InfoExtractor):
format_url = re.sub(r'_(\d*p.mov)', r'_h\1', format['src'])
formats.append({
'url': format_url,
'ext': determine_ext(format_url),
'format': format['type'],
'width': format['width'],
'height': int(format['height']),

View File

@ -2,7 +2,6 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import (
@ -19,114 +18,41 @@ from ..utils import (
# is different for each one. The videos usually expire in 7 days, so we can't
# add tests.
class ArteTvIE(InfoExtractor):
_VIDEOS_URL = r'(?:http://)?videos\.arte\.tv/(?P<lang>fr|de)/.*-(?P<id>.*?)\.html'
_LIVEWEB_URL = r'(?:http://)?liveweb\.arte\.tv/(?P<lang>fr|de)/(?P<subpage>.+?)/(?P<name>.+)'
_LIVE_URL = r'index-[0-9]+\.html$'
class ArteTvIE(InfoExtractor):
_VALID_URL = r'http://videos\.arte\.tv/(?P<lang>fr|de)/.*-(?P<id>.*?)\.html'
IE_NAME = 'arte.tv'
@classmethod
def suitable(cls, url):
return any(re.match(regex, url) for regex in (cls._VIDEOS_URL, cls._LIVEWEB_URL))
# TODO implement Live Stream
# from ..utils import compat_urllib_parse
# def extractLiveStream(self, url):
# video_lang = url.split('/')[-4]
# info = self.grep_webpage(
# url,
# r'src="(.*?/videothek_js.*?\.js)',
# 0,
# [
# (1, 'url', 'Invalid URL: %s' % url)
# ]
# )
# http_host = url.split('/')[2]
# next_url = 'http://%s%s' % (http_host, compat_urllib_parse.unquote(info.get('url')))
# info = self.grep_webpage(
# next_url,
# r'(s_artestras_scst_geoFRDE_' + video_lang + '.*?)\'.*?' +
# '(http://.*?\.swf).*?' +
# '(rtmp://.*?)\'',
# re.DOTALL,
# [
# (1, 'path', 'could not extract video path: %s' % url),
# (2, 'player', 'could not extract video player: %s' % url),
# (3, 'url', 'could not extract video url: %s' % url)
# ]
# )
# video_url = '%s/%s' % (info.get('url'), info.get('path'))
def _real_extract(self, url):
mobj = re.match(self._VIDEOS_URL, url)
if mobj is not None:
id = mobj.group('id')
lang = mobj.group('lang')
return self._extract_video(url, id, lang)
mobj = re.match(self._VALID_URL, url)
lang = mobj.group('lang')
video_id = mobj.group('id')
mobj = re.match(self._LIVEWEB_URL, url)
if mobj is not None:
name = mobj.group('name')
lang = mobj.group('lang')
return self._extract_liveweb(url, name, lang)
if re.search(self._LIVE_URL, url) is not None:
raise ExtractorError('Arte live streams are not yet supported, sorry')
# self.extractLiveStream(url)
# return
raise ExtractorError('No video found')
def _extract_video(self, url, video_id, lang):
"""Extract from videos.arte.tv"""
ref_xml_url = url.replace('/videos/', '/do_delegate/videos/')
ref_xml_url = ref_xml_url.replace('.html', ',view,asPlayerXml.xml')
ref_xml_doc = self._download_xml(
ref_xml_url, video_id, note='Downloading metadata')
config_node = find_xpath_attr(ref_xml_doc, './/video', 'lang', lang)
config_xml_url = config_node.attrib['ref']
config_xml = self._download_webpage(
config = self._download_xml(
config_xml_url, video_id, note='Downloading configuration')
video_urls = list(re.finditer(r'<url quality="(?P<quality>.*?)">(?P<url>.*?)</url>', config_xml))
def _key(m):
quality = m.group('quality')
if quality == 'hd':
return 2
else:
return 1
# We pick the best quality
video_urls = sorted(video_urls, key=_key)
video_url = list(video_urls)[-1].group('url')
title = self._html_search_regex(r'<name>(.*?)</name>', config_xml, 'title')
thumbnail = self._html_search_regex(r'<firstThumbnailUrl>(.*?)</firstThumbnailUrl>',
config_xml, 'thumbnail')
return {'id': video_id,
'title': title,
'thumbnail': thumbnail,
'url': video_url,
'ext': 'flv',
}
formats = [{
'forma_id': q.attrib['quality'],
'url': q.text,
'ext': 'flv',
'quality': 2 if q.attrib['quality'] == 'hd' else 1,
} for q in config.findall('./urls/url')]
self._sort_formats(formats)
def _extract_liveweb(self, url, name, lang):
"""Extract form http://liveweb.arte.tv/"""
webpage = self._download_webpage(url, name)
video_id = self._search_regex(r'eventId=(\d+?)("|&)', webpage, 'event id')
config_doc = self._download_xml('http://download.liveweb.arte.tv/o21/liveweb/events/event-%s.xml' % video_id,
video_id, 'Downloading information')
event_doc = config_doc.find('event')
url_node = event_doc.find('video').find('urlHd')
if url_node is None:
url_node = event_doc.find('urlSd')
return {'id': video_id,
'title': event_doc.find('name%s' % lang.capitalize()).text,
'url': url_node.text.replace('MP4', 'mp4'),
'ext': 'flv',
'thumbnail': self._og_search_thumbnail(webpage),
}
title = config.find('.//name').text
thumbnail = config.find('.//firstThumbnailUrl').text
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}
class ArteTVPlus7IE(InfoExtractor):
@ -152,9 +78,7 @@ class ArteTVPlus7IE(InfoExtractor):
return self._extract_from_json_url(json_url, video_id, lang)
def _extract_from_json_url(self, json_url, video_id, lang):
json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
self.report_extraction(video_id)
info = json.loads(json_info)
info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer']
info_dict = {
@ -176,6 +100,8 @@ class ArteTVPlus7IE(InfoExtractor):
l = 'F'
elif lang == 'de':
l = 'A'
else:
l = lang
regexes = [r'VO?%s' % l, r'VO?.-ST%s' % l]
return any(re.match(r, f['versionCode']) for r in regexes)
# Some formats may not be in the same language as the url
@ -302,5 +228,25 @@ class ArteTVConcertIE(ArteTVPlus7IE):
'ext': 'mp4',
'title': 'The Notwist im Pariser Konzertclub "Divan du Monde"',
'upload_date': '20140128',
'description': 'md5:486eb08f991552ade77439fe6d82c305',
},
}
class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)
http://www\.arte\.tv
/playerv2/embed\.php\?json_url=
(?P<json_url>
http://arte\.tv/papi/tvguide/videos/stream/player/
(?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*
)
'''
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
lang = mobj.group('lang')
json_url = mobj.group('json_url')
return self._extract_from_json_url(json_url, video_id, lang)

View File

@ -11,22 +11,24 @@ from ..utils import (
class AUEngineIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?auengine\.com/embed\.php\?.*?file=(?P<id>[^&]+).*?'
_TEST = {
'url': 'http://auengine.com/embed.php?file=lfvlytY6&w=650&h=370',
'file': 'lfvlytY6.mp4',
'md5': '48972bdbcf1a3a2f5533e62425b41d4f',
'info_dict': {
'id': 'lfvlytY6',
'ext': 'mp4',
'title': '[Commie]The Legend of the Legendary Heroes - 03 - Replication Eye (Alpha Stigma)[F9410F5A]'
}
}
_VALID_URL = r'(?:http://)?(?:www\.)?auengine\.com/embed\.php\?.*?file=([^&]+).*?'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<title>(?P<title>.+?)</title>',
webpage, 'title')
title = self._html_search_regex(r'<title>(?P<title>.+?)</title>', webpage, 'title')
title = title.strip()
links = re.findall(r'\s(?:file|url):\s*["\']([^\'"]+)["\']', webpage)
links = map(compat_urllib_parse.unquote, links)
@ -39,14 +41,15 @@ class AUEngineIE(InfoExtractor):
elif '/videos/' in link:
video_url = link
if not video_url:
raise ExtractorError(u'Could not find video URL')
raise ExtractorError('Could not find video URL')
ext = '.' + determine_ext(video_url)
if ext == title[-len(ext):]:
title = title[:-len(ext)]
return {
'id': video_id,
'url': video_url,
'title': title,
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
'http_referer': 'http://www.auengine.com/flowplayer/flowplayer.commercial-3.2.14.swf',
}

View File

@ -1,22 +1,21 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .ooyala import OoyalaIE
class BloombergIE(InfoExtractor):
_VALID_URL = r'https?://www\.bloomberg\.com/video/(?P<name>.+?)\.html'
_TEST = {
u'url': u'http://www.bloomberg.com/video/shah-s-presentation-on-foreign-exchange-strategies-qurhIVlJSB6hzkVi229d8g.html',
u'file': u'12bzhqZTqQHmmlA8I-i0NpzJgcG5NNYX.mp4',
u'info_dict': {
u'title': u'Shah\'s Presentation on Foreign-Exchange Strategies',
u'description': u'md5:abc86e5236f9f0e4866c59ad36736686',
},
u'params': {
# Requires ffmpeg (m3u8 manifest)
u'skip_download': True,
'url': 'http://www.bloomberg.com/video/shah-s-presentation-on-foreign-exchange-strategies-qurhIVlJSB6hzkVi229d8g.html',
'md5': '7bf08858ff7c203c870e8a6190e221e5',
'info_dict': {
'id': 'qurhIVlJSB6hzkVi229d8g',
'ext': 'flv',
'title': 'Shah\'s Presentation on Foreign-Exchange Strategies',
'description': 'md5:0681e0d30dcdfc6abf34594961d8ea88',
},
}
@ -24,7 +23,16 @@ class BloombergIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
webpage = self._download_webpage(url, name)
embed_code = self._search_regex(
r'<source src="https?://[^/]+/[^/]+/[^/]+/([^/]+)', webpage,
'embed code')
return OoyalaIE._build_url_result(embed_code)
f4m_url = self._search_regex(
r'<source src="(https?://[^"]+\.f4m.*?)"', webpage,
'f4m url')
title = re.sub(': Video$', '', self._og_search_title(webpage))
return {
'id': name.split('-')[-1],
'title': title,
'url': f4m_url,
'ext': 'flv',
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@ -4,39 +4,72 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
int_or_none,
)
class BRIE(InfoExtractor):
IE_DESC = "Bayerischer Rundfunk Mediathek"
_VALID_URL = r"^https?://(?:www\.)?br\.de/mediathek/video/(?:sendungen/)?(?:[a-z0-9\-/]+/)?(?P<id>[a-z0-9\-]+)\.html$"
_BASE_URL = "http://www.br.de"
IE_DESC = 'Bayerischer Rundfunk Mediathek'
_VALID_URL = r'https?://(?:www\.)?br\.de/(?:[a-z0-9\-]+/)+(?P<id>[a-z0-9\-]+)\.html'
_BASE_URL = 'http://www.br.de'
_TESTS = [
{
"url": "http://www.br.de/mediathek/video/anselm-gruen-114.html",
"md5": "c4f83cf0f023ba5875aba0bf46860df2",
"info_dict": {
"id": "2c8d81c5-6fb7-4a74-88d4-e768e5856532",
"ext": "mp4",
"title": "Feiern und Verzichten",
"description": "Anselm Grün: Feiern und Verzichten",
"uploader": "BR/Birgit Baier",
"upload_date": "20140301"
'url': 'http://www.br.de/mediathek/video/anselm-gruen-114.html',
'md5': 'c4f83cf0f023ba5875aba0bf46860df2',
'info_dict': {
'id': '2c8d81c5-6fb7-4a74-88d4-e768e5856532',
'ext': 'mp4',
'title': 'Feiern und Verzichten',
'description': 'Anselm Grün: Feiern und Verzichten',
'uploader': 'BR/Birgit Baier',
'upload_date': '20140301',
}
},
{
"url": "http://www.br.de/mediathek/video/sendungen/unter-unserem-himmel/unter-unserem-himmel-alpen-ueber-den-pass-100.html",
"md5": "ab451b09d861dbed7d7cc9ab0be19ebe",
"info_dict": {
"id": "2c060e69-3a27-4e13-b0f0-668fac17d812",
"ext": "mp4",
"title": "Über den Pass",
"description": "Die Eroberung der Alpen: Über den Pass",
"uploader": None,
"upload_date": None
'url': 'http://www.br.de/mediathek/video/sendungen/unter-unserem-himmel/unter-unserem-himmel-alpen-ueber-den-pass-100.html',
'md5': 'ab451b09d861dbed7d7cc9ab0be19ebe',
'info_dict': {
'id': '2c060e69-3a27-4e13-b0f0-668fac17d812',
'ext': 'mp4',
'title': 'Über den Pass',
'description': 'Die Eroberung der Alpen: Über den Pass',
}
}
},
{
'url': 'http://www.br.de/nachrichten/schaeuble-haushaltsentwurf-bundestag-100.html',
'md5': '3db0df1a9a9cd9fa0c70e6ea8aa8e820',
'info_dict': {
'id': 'c6aae3de-2cf9-43f2-957f-f17fef9afaab',
'ext': 'aac',
'title': '"Keine neuen Schulden im nächsten Jahr"',
'description': 'Haushaltsentwurf: "Keine neuen Schulden im nächsten Jahr"',
}
},
{
'url': 'http://www.br.de/radio/bayern1/service/team/videos/team-video-erdelt100.html',
'md5': 'dbab0aef2e047060ea7a21fc1ce1078a',
'info_dict': {
'id': '6ba73750-d405-45d3-861d-1ce8c524e059',
'ext': 'mp4',
'title': 'Umweltbewusster Häuslebauer',
'description': 'Uwe Erdelt: Umweltbewusster Häuslebauer',
}
},
{
'url': 'http://www.br.de/fernsehen/br-alpha/sendungen/kant-fuer-anfaenger/kritik-der-reinen-vernunft/kant-kritik-01-metaphysik100.html',
'md5': '23bca295f1650d698f94fc570977dae3',
'info_dict': {
'id': 'd982c9ce-8648-4753-b358-98abb8aec43d',
'ext': 'mp4',
'title': 'Folge 1 - Metaphysik',
'description': 'Kant für Anfänger: Folge 1 - Metaphysik',
'uploader': 'Eva Maria Steimle',
'upload_date': '20140117',
}
},
]
def _real_extract(self, url):
@ -44,56 +77,63 @@ class BRIE(InfoExtractor):
display_id = mobj.group('id')
page = self._download_webpage(url, display_id)
xml_url = self._search_regex(
r"return BRavFramework\.register\(BRavFramework\('avPlayer_(?:[a-f0-9-]{36})'\)\.setup\({dataURL:'(/mediathek/video/[a-z0-9/~_.-]+)'}\)\);", page, "XMLURL")
r"return BRavFramework\.register\(BRavFramework\('avPlayer_(?:[a-f0-9-]{36})'\)\.setup\({dataURL:'(/(?:[a-z0-9\-]+/)+[a-z0-9/~_.-]+)'}\)\);", page, 'XMLURL')
xml = self._download_xml(self._BASE_URL + xml_url, None)
videos = []
for xml_video in xml.findall("video"):
video = {
"id": xml_video.get("externalId"),
"title": xml_video.find("title").text,
"formats": self._extract_formats(xml_video.find("assets")),
"thumbnails": self._extract_thumbnails(xml_video.find("teaserImage/variants")),
"description": " ".join(xml_video.find("shareTitle").text.splitlines()),
"webpage_url": xml_video.find("permalink").text
}
if xml_video.find("author").text:
video["uploader"] = xml_video.find("author").text
if xml_video.find("broadcastDate").text:
video["upload_date"] = "".join(reversed(xml_video.find("broadcastDate").text.split(".")))
videos.append(video)
medias = []
if len(videos) > 1:
for xml_media in xml.findall('video') + xml.findall('audio'):
media = {
'id': xml_media.get('externalId'),
'title': xml_media.find('title').text,
'formats': self._extract_formats(xml_media.find('assets')),
'thumbnails': self._extract_thumbnails(xml_media.find('teaserImage/variants')),
'description': ' '.join(xml_media.find('shareTitle').text.splitlines()),
'webpage_url': xml_media.find('permalink').text
}
if xml_media.find('author').text:
media['uploader'] = xml_media.find('author').text
if xml_media.find('broadcastDate').text:
media['upload_date'] = ''.join(reversed(xml_media.find('broadcastDate').text.split('.')))
medias.append(media)
if len(medias) > 1:
self._downloader.report_warning(
'found multiple videos; please '
'found multiple medias; please '
'report this with the video URL to http://yt-dl.org/bug')
if not videos:
raise ExtractorError('No video entries found')
return videos[0]
if not medias:
raise ExtractorError('No media entries found')
return medias[0]
def _extract_formats(self, assets):
def text_or_none(asset, tag):
elem = asset.find(tag)
return None if elem is None else elem.text
formats = [{
"url": asset.find("downloadUrl").text,
"ext": asset.find("mediaType").text,
"format_id": asset.get("type"),
"width": int(asset.find("frameWidth").text),
"height": int(asset.find("frameHeight").text),
"tbr": int(asset.find("bitrateVideo").text),
"abr": int(asset.find("bitrateAudio").text),
"vcodec": asset.find("codecVideo").text,
"container": asset.find("mediaType").text,
"filesize": int(asset.find("size").text),
} for asset in assets.findall("asset")
if asset.find("downloadUrl") is not None]
'url': text_or_none(asset, 'downloadUrl'),
'ext': text_or_none(asset, 'mediaType'),
'format_id': asset.get('type'),
'width': int_or_none(text_or_none(asset, 'frameWidth')),
'height': int_or_none(text_or_none(asset, 'frameHeight')),
'tbr': int_or_none(text_or_none(asset, 'bitrateVideo')),
'abr': int_or_none(text_or_none(asset, 'bitrateAudio')),
'vcodec': text_or_none(asset, 'codecVideo'),
'acodec': text_or_none(asset, 'codecAudio'),
'container': text_or_none(asset, 'mediaType'),
'filesize': int_or_none(text_or_none(asset, 'size')),
} for asset in assets.findall('asset')
if asset.find('downloadUrl') is not None]
self._sort_formats(formats)
return formats
def _extract_thumbnails(self, variants):
thumbnails = [{
"url": self._BASE_URL + variant.find("url").text,
"width": int(variant.find("width").text),
"height": int(variant.find("height").text),
} for variant in variants.findall("variant")]
thumbnails.sort(key=lambda x: x["width"] * x["height"], reverse=True)
'url': self._BASE_URL + variant.find('url').text,
'width': int_or_none(variant.find('width').text),
'height': int_or_none(variant.find('height').text),
} for variant in variants.findall('variant')]
thumbnails.sort(key=lambda x: x['width'] * x['height'], reverse=True)
return thumbnails

View File

@ -27,9 +27,10 @@ class BreakIE(InfoExtractor):
webpage, 'info json', flags=re.DOTALL)
info = json.loads(info_json)
video_url = info['videoUri']
m_youtube = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', video_url)
if m_youtube is not None:
return self.url_result(m_youtube.group(1), 'Youtube')
youtube_id = info.get('youtubeId')
if youtube_id:
return self.url_result(youtube_id, 'Youtube')
final_url = video_url + '?' + info['AuthToken']
return {
'id': video_id,

View File

@ -87,7 +87,7 @@ class BrightcoveIE(InfoExtractor):
object_str = object_str.replace('<--', '<!--')
object_str = fix_xml_ampersands(object_str)
object_doc = xml.etree.ElementTree.fromstring(object_str)
object_doc = xml.etree.ElementTree.fromstring(object_str.encode('utf-8'))
fv_el = find_xpath_attr(object_doc, './param', 'name', 'flashVars')
if fv_el is not None:
@ -140,7 +140,11 @@ class BrightcoveIE(InfoExtractor):
url_m = re.search(r'<meta\s+property="og:video"\s+content="(http://c.brightcove.com/[^"]+)"', webpage)
if url_m:
return [unescapeHTML(url_m.group(1))]
url = unescapeHTML(url_m.group(1))
# Some sites don't add it, we can't download with this url, for example:
# http://www.ktvu.com/videos/news/raw-video-caltrain-releases-video-of-man-almost/vCTZdY/
if 'playerKey' in url:
return [url]
matches = re.findall(
r'''(?sx)<object

View File

@ -0,0 +1,48 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class BYUtvIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?byutv.org/watch/[0-9a-f-]+/(?P<video_id>[^/?#]+)'
_TEST = {
'url': 'http://www.byutv.org/watch/44e80f7b-e3ba-43ba-8c51-b1fd96c94a79/granite-flats-talking',
'info_dict': {
'id': 'granite-flats-talking',
'ext': 'mp4',
'description': 'md5:4e9a7ce60f209a33eca0ac65b4918e1c',
'title': 'Talking',
'thumbnail': 're:^https?://.*promo.*'
},
'params': {
'skip_download': True,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id)
episode_code = self._search_regex(
r'(?s)episode:(.*?\}),\s*\n', webpage, 'episode information')
episode_json = re.sub(
r'(\n\s+)([a-zA-Z]+):\s+\'(.*?)\'', r'\1"\2": "\3"', episode_code)
ep = json.loads(episode_json)
if ep['providerType'] == 'Ooyala':
return {
'_type': 'url_transparent',
'ie_key': 'Ooyala',
'url': 'ooyala:%s' % ep['providerId'],
'id': video_id,
'title': ep['title'],
'description': ep.get('description'),
'thumbnail': ep.get('imageThumbnail'),
}
else:
raise ExtractorError('Unsupported provider %s' % ep['provider'])

View File

@ -2,39 +2,46 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
class C56IE(InfoExtractor):
_VALID_URL = r'https?://((www|player)\.)?56\.com/(.+?/)?(v_|(play_album.+-))(?P<textid>.+?)\.(html|swf)'
_VALID_URL = r'https?://(?:(?:www|player)\.)?56\.com/(?:.+?/)?(?:v_|(?:play_album.+-))(?P<textid>.+?)\.(?:html|swf)'
IE_NAME = '56.com'
_TEST = {
'url': 'http://www.56.com/u39/v_OTM0NDA3MTY.html',
'file': '93440716.flv',
'md5': 'e59995ac63d0457783ea05f93f12a866',
'info_dict': {
'id': '93440716',
'ext': 'flv',
'title': '网事知多少 第32期车怒',
'duration': 283.813,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url, flags=re.VERBOSE)
text_id = mobj.group('textid')
info_page = self._download_webpage('http://vxml.56.com/json/%s/' % text_id,
text_id, 'Downloading video info')
info = json.loads(info_page)['info']
formats = [{
'format_id': f['type'],
'filesize': int(f['filesize']),
'url': f['url']
} for f in info['rfiles']]
page = self._download_json(
'http://vxml.56.com/json/%s/' % text_id, text_id, 'Downloading video info')
info = page['info']
formats = [
{
'format_id': f['type'],
'filesize': int(f['filesize']),
'url': f['url']
} for f in info['rfiles']
]
self._sort_formats(formats)
return {
'id': info['vid'],
'title': info['Subject'],
'duration': int(info['duration']) / 1000.0,
'formats': formats,
'thumbnail': info.get('bimg') or info.get('img'),
}

View File

@ -28,7 +28,7 @@ class CanalplusIE(InfoExtractor):
video_id = mobj.groupdict().get('id')
if video_id is None:
webpage = self._download_webpage(url, mobj.group('path'))
video_id = self._search_regex(r'videoId = "(\d+)";', webpage, u'video id')
video_id = self._search_regex(r'<canal:player videoId="(\d+)"', webpage, u'video id')
info_url = self._VIDEO_INFO_TEMPLATE % video_id
doc = self._download_xml(info_url,video_id,
u'Downloading video info')

View File

@ -0,0 +1,87 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
class CBSNewsIE(InfoExtractor):
IE_DESC = 'CBS News'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/(?:[^/]+/)+(?P<id>[\da-z_-]+)'
_TESTS = [
{
'url': 'http://www.cbsnews.com/news/tesla-and-spacex-elon-musks-industrial-empire/',
'info_dict': {
'id': 'tesla-and-spacex-elon-musks-industrial-empire',
'ext': 'flv',
'title': 'Tesla and SpaceX: Elon Musk\'s industrial empire',
'thumbnail': 'http://beta.img.cbsnews.com/i/2014/03/30/60147937-2f53-4565-ad64-1bdd6eb64679/60-0330-pelley-640x360.jpg',
'duration': 791,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
'info_dict': {
'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack',
'ext': 'flv',
'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
'thumbnail': 'http://cbsnews2.cbsistatic.com/hub/i/r/2014/04/04/0c9fbc66-576b-41ca-8069-02d122060dd2/thumbnail/140x90/6dad7a502f88875ceac38202984b6d58/en-0404-werner-replace-640x360.jpg',
'duration': 205,
},
'params': {
# rtmp download
'skip_download': True,
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_info = json.loads(self._html_search_regex(
r'(?:<ul class="media-list items" id="media-related-items"><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
webpage, 'video JSON info'))
item = video_info['item'] if 'item' in video_info else video_info
title = item.get('articleTitle') or item.get('hed')
duration = item.get('duration')
thumbnail = item.get('mediaImage') or item.get('thumbnail')
formats = []
for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
uri = item.get('media' + format_id + 'URI')
if not uri:
continue
fmt = {
'url': uri,
'format_id': format_id,
}
if uri.startswith('rtmp'):
fmt.update({
'app': 'ondemand?auth=cbs',
'play_path': 'mp4:' + uri.split('<break>')[-1],
'player_url': 'http://www.cbsnews.com/[[IMPORT]]/vidtech.cbsinteractive.com/player/3_3_0/CBSI_PLAYER_HD.swf',
'page_url': 'http://www.cbsnews.com',
'ext': 'flv',
})
elif uri.endswith('.m3u8'):
fmt['ext'] = 'mp4'
formats.append(fmt)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@ -9,12 +9,12 @@ from ..utils import (
class CinemassacreIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?cinemassacre\.com/(?P<date_Y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/.+?'
_VALID_URL = r'http://(?:www\.)?cinemassacre\.com/(?P<date_Y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/(?P<display_id>[^?#/]+)'
_TESTS = [
{
'url': 'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
'file': '19911.mp4',
'md5': 'fde81fbafaee331785f58cd6c0d46190',
'md5': '782f8504ca95a0eba8fc9177c373eec7',
'info_dict': {
'upload_date': '20121110',
'title': '“Angry Video Game Nerd: The Movie” Trailer',
@ -24,7 +24,7 @@ class CinemassacreIE(InfoExtractor):
{
'url': 'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
'file': '521be8ef82b16.mp4',
'md5': 'd72f10cd39eac4215048f62ab477a511',
'md5': 'dec39ee5118f8d9cc067f45f9cbe3a35',
'info_dict': {
'upload_date': '20131002',
'title': 'The Mummys Hand (1940)',
@ -34,8 +34,9 @@ class CinemassacreIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, None) # Don't know video id yet
webpage = self._download_webpage(url, display_id)
video_date = mobj.group('date_Y') + mobj.group('date_m') + mobj.group('date_d')
mobj = re.search(r'src="(?P<embed_url>http://player\.screenwavemedia\.com/play/[a-zA-Z]+\.php\?id=(?:Cinemassacre-)?(?P<video_id>.+?))"', webpage)
if not mobj:
@ -43,33 +44,36 @@ class CinemassacreIE(InfoExtractor):
playerdata_url = mobj.group('embed_url')
video_id = mobj.group('video_id')
video_title = self._html_search_regex(r'<title>(?P<title>.+?)\|',
webpage, 'title')
video_description = self._html_search_regex(r'<div class="entry-content">(?P<description>.+?)</div>',
video_title = self._html_search_regex(
r'<title>(?P<title>.+?)\|', webpage, 'title')
video_description = self._html_search_regex(
r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, 'description', flags=re.DOTALL, fatal=False)
if len(video_description) == 0:
video_description = None
playerdata = self._download_webpage(playerdata_url, video_id)
sd_url = self._html_search_regex(r'file: \'(?P<sd_file>[^\']+)\', label: \'SD\'', playerdata, 'sd_file')
hd_url = self._html_search_regex(r'file: \'(?P<hd_file>[^\']+)\', label: \'HD\'', playerdata, 'hd_file')
sd_url = self._html_search_regex(r'file: \'([^\']+)\', label: \'SD\'', playerdata, 'sd_file')
hd_url = self._html_search_regex(
r'file: \'([^\']+)\', label: \'HD\'', playerdata, 'hd_file',
default=None)
video_thumbnail = self._html_search_regex(r'image: \'(?P<thumbnail>[^\']+)\'', playerdata, 'thumbnail', fatal=False)
formats = [
{
'url': sd_url,
'ext': 'mp4',
'format': 'sd',
'format_id': 'sd',
},
{
formats = [{
'url': sd_url,
'ext': 'mp4',
'format': 'sd',
'format_id': 'sd',
'quality': 1,
}]
if hd_url:
formats.append({
'url': hd_url,
'ext': 'mp4',
'format': 'hd',
'format_id': 'hd',
},
]
'quality': 2,
})
self._sort_formats(formats)
return {
'id': video_id,

View File

@ -1,22 +1,28 @@
from __future__ import unicode_literals
import re
import time
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
parse_duration,
)
class ClipfishIE(InfoExtractor):
IE_NAME = u'clipfish'
IE_NAME = 'clipfish'
_VALID_URL = r'^https?://(?:www\.)?clipfish\.de/.*?/video/(?P<id>[0-9]+)/'
_TEST = {
u'url': u'http://www.clipfish.de/special/game-trailer/video/3966754/fifa-14-e3-2013-trailer/',
u'file': u'3966754.mp4',
u'md5': u'2521cd644e862936cf2e698206e47385',
u'info_dict': {
u'title': u'FIFA 14 - E3 2013 Trailer',
u'duration': 82,
'url': 'http://www.clipfish.de/special/game-trailer/video/3966754/fifa-14-e3-2013-trailer/',
'md5': '2521cd644e862936cf2e698206e47385',
'info_dict': {
'id': '3966754',
'ext': 'mp4',
'title': 'FIFA 14 - E3 2013 Trailer',
'duration': 82,
},
u'skip': 'Blocked in the US'
}
@ -33,21 +39,10 @@ class ClipfishIE(InfoExtractor):
video_url = doc.find('filename').text
if video_url is None:
xml_bytes = xml.etree.ElementTree.tostring(doc)
raise ExtractorError(u'Cannot find video URL in document %r' %
raise ExtractorError('Cannot find video URL in document %r' %
xml_bytes)
thumbnail = doc.find('imageurl').text
duration_str = doc.find('duration').text
m = re.match(
r'^(?P<hours>[0-9]+):(?P<minutes>[0-9]{2}):(?P<seconds>[0-9]{2}):(?P<ms>[0-9]*)$',
duration_str)
if m:
duration = (
(int(m.group('hours')) * 60 * 60) +
(int(m.group('minutes')) * 60) +
(int(m.group('seconds')))
)
else:
duration = None
duration = parse_duration(doc.find('duration').text)
return {
'id': video_id,

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@ -11,13 +13,14 @@ class ClipsyndicateIE(InfoExtractor):
_VALID_URL = r'http://www\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.clipsyndicate.com/video/play/4629301/brick_briscoe',
u'md5': u'4d7d549451bad625e0ff3d7bd56d776c',
u'info_dict': {
u'id': u'4629301',
u'ext': u'mp4',
u'title': u'Brick Briscoe',
u'duration': 612,
'url': 'http://www.clipsyndicate.com/video/play/4629301/brick_briscoe',
'md5': '4d7d549451bad625e0ff3d7bd56d776c',
'info_dict': {
'id': '4629301',
'ext': 'mp4',
'title': 'Brick Briscoe',
'duration': 612,
'thumbnail': 're:^https?://.+\.jpg',
},
}
@ -26,13 +29,13 @@ class ClipsyndicateIE(InfoExtractor):
video_id = mobj.group('id')
js_player = self._download_webpage(
'http://eplayer.clipsyndicate.com/embed/player.js?va_id=%s' % video_id,
video_id, u'Downlaoding player')
video_id, 'Downlaoding player')
# it includes a required token
flvars = self._search_regex(r'flvars: "(.*?)"', js_player, u'flvars')
flvars = self._search_regex(r'flvars: "(.*?)"', js_player, 'flvars')
pdoc = self._download_xml(
'http://eplayer.clipsyndicate.com/osmf/playlist?%s' % flvars,
video_id, u'Downloading video info',
video_id, 'Downloading video info',
transform_source=fix_xml_ampersands)
track_doc = pdoc.find('trackList/track')

View File

@ -0,0 +1,75 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
)
class CNETIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cnet\.com/videos/(?P<id>[^/]+)/'
_TEST = {
'url': 'http://www.cnet.com/videos/hands-on-with-microsofts-windows-8-1-update/',
'md5': '041233212a0d06b179c87cbcca1577b8',
'info_dict': {
'id': '56f4ea68-bd21-4852-b08c-4de5b8354c60',
'ext': 'mp4',
'title': 'Hands-on with Microsoft Windows 8.1 Update',
'description': 'The new update to the Windows 8 OS brings improved performance for mouse and keyboard users.',
'thumbnail': 're:^http://.*/flmswindows8.jpg$',
'uploader_id': 'sarah.mitroff@cbsinteractive.com',
'uploader': 'Sarah Mitroff',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex(
r"<div class=\"cnetVideoPlayer\" data-cnet-video-options='([^']+)'",
webpage, 'data json')
data = json.loads(data_json)
vdata = data['video']
if not vdata:
vdata = data['videos'][0]
if not vdata:
raise ExtractorError('Cannot find video data')
video_id = vdata['id']
title = vdata['headline']
description = vdata.get('dek')
thumbnail = vdata.get('image', {}).get('path')
author = vdata.get('author')
if author:
uploader = '%s %s' % (author['firstName'], author['lastName'])
uploader_id = author.get('email')
else:
uploader = None
uploader_id = None
formats = [{
'format_id': '%s-%s-%s' % (
f['type'], f['format'],
int_or_none(f.get('bitrate'), 1000, default='')),
'url': f['uri'],
'tbr': int_or_none(f.get('bitrate'), 1000),
} for f in vdata['files']['data']]
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'formats': formats,
'description': description,
'uploader': uploader,
'uploader_id': uploader_id,
'thumbnail': thumbnail,
}

View File

@ -7,8 +7,8 @@ from .mtv import MTVServicesInfoExtractor
from ..utils import (
compat_str,
compat_urllib_parse,
ExtractorError,
float_or_none,
unified_strdate,
)
@ -21,7 +21,7 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
_TEST = {
'url': 'http://www.comedycentral.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
'md5': '4167875aae411f903b751a21f357f1ee',
'md5': 'c4f48e9eda1b16dd10add0744344b6d8',
'info_dict': {
'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354',
'ext': 'mp4',
@ -32,31 +32,34 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
class ComedyCentralShowsIE(InfoExtractor):
IE_DESC = 'The Daily Show / Colbert Report'
IE_DESC = 'The Daily Show / The Colbert Report'
# urls can be abbreviations like :thedailyshow or :colbert
# urls for episodes like:
# or urls for clips like: http://www.thedailyshow.com/watch/mon-december-10-2012/any-given-gun-day
# or: http://www.colbertnation.com/the-colbert-report-videos/421667/november-29-2012/moon-shattering-news
# or: http://www.colbertnation.com/the-colbert-report-collections/422008/festival-of-lights/79524
_VALID_URL = r"""^(:(?P<shortname>tds|thedailyshow|cr|colbert|colbertnation|colbertreport)
|(https?://)?(www\.)?
(?P<showname>thedailyshow|colbertnation)\.com/
(full-episodes/(?P<episode>.*)|
_VALID_URL = r'''(?x)^(:(?P<shortname>tds|thedailyshow|cr|colbert|colbertnation|colbertreport)
|https?://(:www\.)?
(?P<showname>thedailyshow|thecolbertreport)\.(?:cc\.)?com/
((?:full-)?episodes/(?:[0-9a-z]{6}/)?(?P<episode>.*)|
(?P<clip>
(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?))
|(watch/(?P<date>[^/]*)/(?P<tdstitle>.*)))|
(?:(?:guests/[^/]+|videos|video-playlists|special-editions)/[^/]+/(?P<videotitle>[^/?#]+))
|(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?))
|(watch/(?P<date>[^/]*)/(?P<tdstitle>.*))
)|
(?P<interview>
extended-interviews/(?P<interID>[0-9]+)/playlist_tds_extended_(?P<interview_title>.*?)/.*?)))
$"""
extended-interviews/(?P<interID>[0-9a-z]+)/(?:playlist_tds_extended_)?(?P<interview_title>.*?)(/.*?)?)))
(?:[?#].*|$)'''
_TEST = {
'url': 'http://www.thedailyshow.com/watch/thu-december-13-2012/kristen-stewart',
'file': '422212.mp4',
'url': 'http://thedailyshow.cc.com/watch/thu-december-13-2012/kristen-stewart',
'md5': '4e2f5cb088a83cd8cdb7756132f9739d',
'info_dict': {
"upload_date": "20121214",
"description": "Kristen Stewart",
"uploader": "thedailyshow",
"title": "thedailyshow-kristen-stewart part 1"
'id': 'ab9ab3e7-5a98-4dbe-8b21-551dc0523d55',
'ext': 'mp4',
'upload_date': '20121213',
'description': 'Kristen Stewart learns to let loose in "On the Road."',
'uploader': 'thedailyshow',
'title': 'thedailyshow kristen-stewart part 1',
}
}
@ -79,11 +82,6 @@ class ComedyCentralShowsIE(InfoExtractor):
'400': (384, 216),
}
@classmethod
def suitable(cls, url):
"""Receives a URL and returns True if suitable for this IE."""
return re.match(cls._VALID_URL, url, re.VERBOSE) is not None
@staticmethod
def _transform_rtmp_url(rtmp_video_url):
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp\.comedystor/.*)$', rtmp_video_url)
@ -99,14 +97,16 @@ class ComedyCentralShowsIE(InfoExtractor):
if mobj.group('shortname'):
if mobj.group('shortname') in ('tds', 'thedailyshow'):
url = 'http://www.thedailyshow.com/full-episodes/'
url = 'http://thedailyshow.cc.com/full-episodes/'
else:
url = 'http://www.colbertnation.com/full-episodes/'
url = 'http://thecolbertreport.cc.com/full-episodes/'
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
assert mobj is not None
if mobj.group('clip'):
if mobj.group('showname') == 'thedailyshow':
if mobj.group('videotitle'):
epTitle = mobj.group('videotitle')
elif mobj.group('showname') == 'thedailyshow':
epTitle = mobj.group('tdstitle')
else:
epTitle = mobj.group('cntitle')
@ -120,9 +120,9 @@ class ComedyCentralShowsIE(InfoExtractor):
epTitle = mobj.group('showname')
else:
epTitle = mobj.group('episode')
show_name = mobj.group('showname')
self.report_extraction(epTitle)
webpage,htmlHandle = self._download_webpage_handle(url, epTitle)
webpage, htmlHandle = self._download_webpage_handle(url, epTitle)
if dlNewest:
url = htmlHandle.geturl()
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
@ -130,71 +130,86 @@ class ComedyCentralShowsIE(InfoExtractor):
raise ExtractorError('Invalid redirected URL: ' + url)
if mobj.group('episode') == '':
raise ExtractorError('Redirected URL is still not specific: ' + url)
epTitle = mobj.group('episode')
epTitle = mobj.group('episode').rpartition('/')[-1]
mMovieParams = re.findall('(?:<param name="movie" value="|var url = ")(http://media.mtvnservices.com/([^"]*(?:episode|video).*?:.*?))"', webpage)
if len(mMovieParams) == 0:
# The Colbert Report embeds the information in a without
# a URL prefix; so extract the alternate reference
# and then add the URL prefix manually.
altMovieParams = re.findall('data-mgid="([^"]*(?:episode|video).*?:.*?)"', webpage)
altMovieParams = re.findall('data-mgid="([^"]*(?:episode|video|playlist).*?:.*?)"', webpage)
if len(altMovieParams) == 0:
raise ExtractorError('unable to find Flash URL in webpage ' + url)
else:
mMovieParams = [("http://media.mtvnservices.com/" + altMovieParams[0], altMovieParams[0])]
uri = mMovieParams[0][1]
indexUrl = 'http://shadow.comedycentral.com/feeds/video_player/mrss/?' + compat_urllib_parse.urlencode({'uri': uri})
idoc = self._download_xml(indexUrl, epTitle,
'Downloading show index',
'unable to download episode index')
# Correct cc.com in uri
uri = re.sub(r'(episode:[^.]+)(\.cc)?\.com', r'\1.cc.com', uri)
results = []
index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse.urlencode({'uri': uri}))
idoc = self._download_xml(
index_url, epTitle,
'Downloading show index', 'Unable to download episode index')
itemEls = idoc.findall('.//item')
for partNum,itemEl in enumerate(itemEls):
mediaId = itemEl.findall('./guid')[0].text
shortMediaId = mediaId.split(':')[-1]
showId = mediaId.split(':')[-2].replace('.com', '')
officialTitle = itemEl.findall('./title')[0].text
officialDate = unified_strdate(itemEl.findall('./pubDate')[0].text)
title = idoc.find('./channel/title').text
description = idoc.find('./channel/description').text
configUrl = ('http://www.comedycentral.com/global/feeds/entertainment/media/mediaGenEntertainment.jhtml?' +
compat_urllib_parse.urlencode({'uri': mediaId}))
cdoc = self._download_xml(configUrl, epTitle,
'Downloading configuration for %s' % shortMediaId)
entries = []
item_els = idoc.findall('.//item')
for part_num, itemEl in enumerate(item_els):
upload_date = unified_strdate(itemEl.findall('./pubDate')[0].text)
thumbnail = itemEl.find('.//{http://search.yahoo.com/mrss/}thumbnail').attrib.get('url')
content = itemEl.find('.//{http://search.yahoo.com/mrss/}content')
duration = float_or_none(content.attrib.get('duration'))
mediagen_url = content.attrib['url']
guid = itemEl.find('./guid').text.rpartition(':')[-1]
cdoc = self._download_xml(
mediagen_url, epTitle,
'Downloading configuration for segment %d / %d' % (part_num + 1, len(item_els)))
turls = []
for rendition in cdoc.findall('.//rendition'):
finfo = (rendition.attrib['bitrate'], rendition.findall('./src')[0].text)
turls.append(finfo)
if len(turls) == 0:
self._downloader.report_error('unable to download ' + mediaId + ': No videos found')
continue
formats = []
for format, rtmp_video_url in turls:
w, h = self._video_dimensions.get(format, (None, None))
formats.append({
'format_id': 'vhttp-%s' % format,
'url': self._transform_rtmp_url(rtmp_video_url),
'ext': self._video_extensions.get(format, 'mp4'),
'format_id': format,
'height': h,
'width': w,
})
formats.append({
'format_id': 'rtmp-%s' % format,
'url': rtmp_video_url,
'ext': self._video_extensions.get(format, 'mp4'),
'height': h,
'width': w,
})
self._sort_formats(formats)
effTitle = showId + '-' + epTitle + ' part ' + compat_str(partNum+1)
results.append({
'id': shortMediaId,
virtual_id = show_name + ' ' + epTitle + ' part ' + compat_str(part_num + 1)
entries.append({
'id': guid,
'title': virtual_id,
'formats': formats,
'uploader': showId,
'upload_date': officialDate,
'title': effTitle,
'thumbnail': None,
'description': compat_str(officialTitle),
'uploader': show_name,
'upload_date': upload_date,
'duration': duration,
'thumbnail': thumbnail,
'description': description,
})
return results
return {
'_type': 'playlist',
'entries': entries,
'title': show_name + ' ' + title,
'description': description,
}

View File

@ -74,7 +74,7 @@ class InfoExtractor(object):
"http", "https", "rtsp", "rtmp", "m3u8" or so.
* preference Order number of this format. If this field is
present and not None, the formats get sorted
by this field.
by this field, regardless of all other values.
-1 for default (order by other properties),
-2 or smaller for less than default.
* quality Order number of the video quality of this
@ -251,7 +251,21 @@ class InfoExtractor(object):
with open(filename, 'wb') as outf:
outf.write(webpage_bytes)
content = webpage_bytes.decode(encoding, 'replace')
try:
content = webpage_bytes.decode(encoding, 'replace')
except LookupError:
content = webpage_bytes.decode('utf-8', 'replace')
if (u'<title>Access to this site is blocked</title>' in content and
u'Websense' in content[:512]):
msg = u'Access to this webpage has been blocked by Websense filtering software in your network.'
blocked_iframe = self._html_search_regex(
r'<iframe src="([^"]+)"', content,
u'Websense information URL', default=None)
if blocked_iframe:
msg += u' Visit %s for more details' % blocked_iframe
raise ExtractorError(msg, expected=True)
return (content, urlh)
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True):

View File

@ -4,15 +4,16 @@ import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
unescapeHTML,
find_xpath_attr,
)
class CSpanIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?c-span\.org/video/\?(?P<id>\d+)'
_VALID_URL = r'http://(?:www\.)?c-span\.org/video/\?(?P<id>[0-9a-f]+)'
IE_DESC = 'C-SPAN'
_TEST = {
_TESTS = [{
'url': 'http://www.c-span.org/video/?313572-1/HolderonV',
'md5': '8e44ce11f0f725527daccc453f553eb0',
'info_dict': {
@ -22,13 +23,24 @@ class CSpanIE(InfoExtractor):
'description': 'Attorney General Eric Holder spoke to reporters following the Supreme Court decision in Shelby County v. Holder in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced until Congress established new guidelines for review.',
},
'skip': 'Regularly fails on travis, for unknown reasons',
}
}, {
'url': 'http://www.c-span.org/video/?c4486943/cspan-international-health-care-models',
# For whatever reason, the served video alternates between
# two different ones
#'md5': 'dbb0f047376d457f2ab8b3929cbb2d0c',
'info_dict': {
'id': '340723',
'ext': 'mp4',
'title': 'International Health Care Models',
'description': 'md5:7a985a2d595dba00af3d9c9f0783c967',
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_id = mobj.group('id')
webpage = self._download_webpage(url, page_id)
video_id = self._search_regex(r'data-progid=\'(\d+)\'>', webpage, 'video id')
video_id = self._search_regex(r'progid=\'?([0-9]+)\'?>', webpage, 'video id')
description = self._html_search_regex(
[
@ -43,18 +55,29 @@ class CSpanIE(InfoExtractor):
info_url = 'http://c-spanvideo.org/videoLibrary/assets/player/ajax-player.php?os=android&html5=program&id=' + video_id
data = self._download_json(info_url, video_id)
url = unescapeHTML(data['video']['files'][0]['path']['#text'])
doc = self._download_xml('http://www.c-span.org/common/services/flashXml.php?programid=' + video_id,
doc = self._download_xml(
'http://www.c-span.org/common/services/flashXml.php?programid=' + video_id,
video_id)
def find_string(s):
return find_xpath_attr(doc, './/string', 'name', s).text
title = find_xpath_attr(doc, './/string', 'name', 'title').text
thumbnail = find_xpath_attr(doc, './/string', 'name', 'poster').text
files = data['video']['files']
entries = [{
'id': '%s_%d' % (video_id, partnum + 1),
'title': (
title if len(files) == 1 else
'%s part %d' % (title, partnum + 1)),
'url': unescapeHTML(f['path']['#text']),
'description': description,
'thumbnail': thumbnail,
'duration': int_or_none(f.get('length', {}).get('#text')),
} for partnum, f in enumerate(files)]
return {
'_type': 'playlist',
'entries': entries,
'title': title,
'id': video_id,
'title': find_string('title'),
'url': url,
'description': description,
'thumbnail': find_string('poster'),
}

View File

@ -8,7 +8,6 @@ from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_urllib_request,
compat_str,
get_element_by_attribute,
get_element_by_id,
orderedSet,
str_to_int,
@ -180,7 +179,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = u'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/playlist/.+?".*?>.*?</a>.*?</div>'
_MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
_PAGE_TEMPLATE = 'https://www.dailymotion.com/playlist/%s/%s'
def _extract_entries(self, id):
@ -190,10 +189,9 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
webpage = self._download_webpage(request,
id, u'Downloading page %s' % pagenum)
playlist_el = get_element_by_attribute(u'class', u'row video_list', webpage)
video_ids.extend(re.findall(r'data-id="(.+?)"', playlist_el))
video_ids.extend(re.findall(r'data-id="(.+?)"', webpage))
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
break
return [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
for video_id in orderedSet(video_ids)]
@ -203,17 +201,17 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
return {'_type': 'playlist',
'id': playlist_id,
'title': get_element_by_id(u'playlist_name', webpage),
'entries': self._extract_entries(playlist_id),
}
return {
'_type': 'playlist',
'id': playlist_id,
'title': self._og_search_title(webpage),
'entries': self._extract_entries(playlist_id),
}
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = u'dailymotion:user'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/user/(?P<user>[^/]+)'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/user/.+?".*?>.*?</a>.*?</div>'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/user/(?P<user>[^/]+)'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
def _real_extract(self, url):

View File

@ -1,25 +1,28 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
determine_ext,
)
class DaumIE(InfoExtractor):
_VALID_URL = r'https?://(?:m\.)?tvpot\.daum\.net/.*?clipid=(?P<id>\d+)'
IE_NAME = u'daum.net'
IE_NAME = 'daum.net'
_TEST = {
u'url': u'http://tvpot.daum.net/clip/ClipView.do?clipid=52554690',
u'file': u'52554690.mp4',
u'info_dict': {
u'title': u'DOTA 2GETHER 시즌2 6회 - 2부',
u'description': u'DOTA 2GETHER 시즌2 6회 - 2부',
u'upload_date': u'20130831',
u'duration': 3868,
'url': 'http://tvpot.daum.net/clip/ClipView.do?clipid=52554690',
'info_dict': {
'id': '52554690',
'ext': 'mp4',
'title': 'DOTA 2GETHER 시즌2 6회 - 2부',
'description': 'DOTA 2GETHER 시즌2 6회 - 2부',
'upload_date': '20130831',
'duration': 3868,
},
}
@ -30,14 +33,14 @@ class DaumIE(InfoExtractor):
webpage = self._download_webpage(canonical_url, video_id)
full_id = self._search_regex(
r'<iframe src="http://videofarm.daum.net/controller/video/viewer/Video.html\?.*?vid=(.+?)[&"]',
webpage, u'full id')
webpage, 'full id')
query = compat_urllib_parse.urlencode({'vid': full_id})
info = self._download_xml(
'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id,
u'Downloading video info')
'Downloading video info')
urls = self._download_xml(
'http://videofarm.daum.net/controller/api/open/v1_2/MovieData.apixml?' + query,
video_id, u'Downloading video formats info')
video_id, 'Downloading video formats info')
self.to_screen(u'%s: Getting video urls' % video_id)
formats = []
@ -53,7 +56,6 @@ class DaumIE(InfoExtractor):
format_url = url_doc.find('result/url').text
formats.append({
'url': format_url,
'ext': determine_ext(format_url),
'format_id': profile,
})

View File

@ -10,9 +10,10 @@ class DiscoveryIE(InfoExtractor):
_VALID_URL = r'http://dsc\.discovery\.com\/[a-zA-Z0-9\-]*/[a-zA-Z0-9\-]*/videos/(?P<id>[a-zA-Z0-9\-]*)(.htm)?'
_TEST = {
'url': 'http://dsc.discovery.com/tv-shows/mythbusters/videos/mission-impossible-outtakes.htm',
'file': '614784.mp4',
'md5': 'e12614f9ee303a6ccef415cb0793eba2',
'info_dict': {
'id': '614784',
'ext': 'mp4',
'title': 'MythBusters: Mission Impossible Outtakes',
'description': ('Watch Jamie Hyneman and Adam Savage practice being'
' each other -- to the point of confusing Jamie\'s dog -- and '
@ -34,7 +35,7 @@ class DiscoveryIE(InfoExtractor):
formats = []
for f in info['mp4']:
formats.append(
{'url': f['src'], r'ext': r'mp4', 'tbr': int(f['bitrate'][:-1])})
{'url': f['src'], 'ext': 'mp4', 'tbr': int(f['bitrate'][:-1])})
return {
'id': info['contentId'],

View File

@ -0,0 +1,27 @@
from __future__ import unicode_literals
from .novamov import NovaMovIE
class DivxStageIE(NovaMovIE):
IE_NAME = 'divxstage'
IE_DESC = 'DivxStage'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'divxstage\.(?:eu|net|ch|co|at|ag)'}
_HOST = 'www.divxstage.eu'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<div class="video_det">\s*<strong>([^<]+)</strong>'
_DESCRIPTION_REGEX = r'<div class="video_det">\s*<strong>[^<]+</strong>\s*<p>([^<]+)</p>'
_TEST = {
'url': 'http://www.divxstage.eu/video/57f238e2e5e01',
'md5': '63969f6eb26533a1968c4d325be63e72',
'info_dict': {
'id': '57f238e2e5e01',
'ext': 'flv',
'title': 'youtubedl test video',
'description': 'This is a test video for youtubedl.',
}
}

View File

@ -1,23 +1,25 @@
from __future__ import unicode_literals
import re
from ..utils import (
compat_urllib_parse,
determine_ext
)
from .common import InfoExtractor
class EHowIE(InfoExtractor):
IE_NAME = u'eHow'
_VALID_URL = r'(?:https?://)?(?:www\.)?ehow\.com/[^/_?]*_(?P<id>[0-9]+)'
IE_NAME = 'eHow'
_VALID_URL = r'https?://(?:www\.)?ehow\.com/[^/_?]*_(?P<id>[0-9]+)'
_TEST = {
u'url': u'http://www.ehow.com/video_12245069_hardwood-flooring-basics.html',
u'file': u'12245069.flv',
u'md5': u'9809b4e3f115ae2088440bcb4efbf371',
u'info_dict': {
u"title": u"Hardwood Flooring Basics",
u"description": u"Hardwood flooring may be time consuming, but its ultimately a pretty straightforward concept. Learn about hardwood flooring basics with help from a hardware flooring business owner in this free video...",
u"uploader": u"Erick Nathan"
'url': 'http://www.ehow.com/video_12245069_hardwood-flooring-basics.html',
'md5': '9809b4e3f115ae2088440bcb4efbf371',
'info_dict': {
'id': '12245069',
'ext': 'flv',
'title': 'Hardwood Flooring Basics',
'description': 'Hardwood flooring may be time consuming, but its ultimately a pretty straightforward concept. Learn about hardwood flooring basics with help from a hardware flooring business owner in this free video...',
'uploader': 'Erick Nathan',
}
}
@ -26,21 +28,16 @@ class EHowIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r'(?:file|source)=(http[^\'"&]*)',
webpage, u'video URL')
final_url = compat_urllib_parse.unquote(video_url)
uploader = self._search_regex(r'<meta name="uploader" content="(.+?)" />',
webpage, u'uploader')
webpage, 'video URL')
final_url = compat_urllib_parse.unquote(video_url)
uploader = self._html_search_meta('uploader', webpage)
title = self._og_search_title(webpage).replace(' | eHow', '')
ext = determine_ext(final_url)
return {
'_type': 'video',
'id': video_id,
'url': final_url,
'ext': ext,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'id': video_id,
'url': final_url,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage),
'description': self._og_search_description(webpage),
'uploader': uploader,
'uploader': uploader,
}

View File

@ -0,0 +1,43 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .fivemin import FiveMinIE
from ..utils import (
url_basename,
)
class EngadgetIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://www.engadget.com/
(?:video/5min/(?P<id>\d+)|
[\d/]+/.*?)
'''
_TEST = {
'url': 'http://www.engadget.com/video/5min/518153925/',
'md5': 'c6820d4828a5064447a4d9fc73f312c9',
'info_dict': {
'id': '518153925',
'ext': 'mp4',
'title': 'Samsung Galaxy Tab Pro 8.4 Review',
},
'add_ie': ['FiveMin'],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
if video_id is not None:
return FiveMinIE._build_result(video_id)
else:
title = url_basename(url)
webpage = self._download_webpage(url, title)
ids = re.findall(r'<iframe[^>]+?playList=(\d+)', webpage)
return {
'_type': 'playlist',
'title': title,
'entries': [FiveMinIE._build_result(id) for id in ids]
}

View File

@ -6,7 +6,6 @@ from .common import InfoExtractor
class FirstpostIE(InfoExtractor):
IE_NAME = 'Firstpost.com'
_VALID_URL = r'http://(?:www\.)?firstpost\.com/[^/]+/.*-(?P<id>[0-9]+)\.html'
_TEST = {
@ -16,7 +15,6 @@ class FirstpostIE(InfoExtractor):
'id': '1025403',
'ext': 'mp4',
'title': 'India to launch indigenous aircraft carrier INS Vikrant today',
'description': 'Its flight deck is over twice the size of a football field, its power unit can light up the entire Kochi city and the cabling is enough to cover the distance between here to Delhi.',
}
}
@ -24,15 +22,26 @@ class FirstpostIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
r'<div.*?name="div_video".*?flashvars="([^"]+)">',
webpage, 'video URL')
data = self._download_xml(
'http://www.firstpost.com/getvideoxml-%s.xml' % video_id, video_id,
'Downloading video XML')
item = data.find('./playlist/item')
thumbnail = item.find('./image').text
title = item.find('./title').text
formats = [
{
'url': details.find('./file').text,
'format_id': details.find('./label').text.strip(),
'width': int(details.find('./width').text.strip()),
'height': int(details.find('./height').text.strip()),
} for details in item.findall('./source/file_details') if details.find('./file').text
]
return {
'id': video_id,
'url': video_url,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@ -0,0 +1,56 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_str,
)
class FiveMinIE(InfoExtractor):
IE_NAME = '5min'
_VALID_URL = r'''(?x)
(?:https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(.*?&)?playList=|
5min:)
(?P<id>\d+)
'''
_TEST = {
# From http://www.engadget.com/2013/11/15/ipad-mini-retina-display-review/
'url': 'http://pshared.5min.com/Scripts/PlayerSeed.js?sid=281&width=560&height=345&playList=518013791',
'md5': '4f7b0b79bf1a470e5004f7112385941d',
'info_dict': {
'id': '518013791',
'ext': 'mp4',
'title': 'iPad Mini with Retina Display Review',
},
}
@classmethod
def _build_result(cls, video_id):
return cls.url_result('5min:%s' % video_id, cls.ie_key())
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info = self._download_json(
'https://syn.5min.com/handlers/SenseHandler.ashx?func=GetResults&'
'playlist=%s&url=https' % video_id,
video_id)['binding'][0]
second_id = compat_str(int(video_id[:-2]) + 1)
formats = []
for quality, height in [(1, 320), (2, 480), (4, 720), (8, 1080)]:
if any(r['ID'] == quality for r in info['Renditions']):
formats.append({
'format_id': compat_str(quality),
'url': 'http://avideos.5min.com/%s/%s/%s_%s.mp4' % (second_id[-3:], second_id, video_id, quality),
'height': height,
})
return {
'id': video_id,
'title': info['Title'],
'formats': formats,
}

View File

@ -0,0 +1,77 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
compat_parse_qs,
compat_urlparse,
)
class FranceCultureIE(InfoExtractor):
_VALID_URL = r'(?P<baseurl>http://(?:www\.)?franceculture\.fr/)player/reecouter\?play=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.franceculture.fr/player/reecouter?play=4795174',
'info_dict': {
'id': '4795174',
'ext': 'mp3',
'title': 'Rendez-vous au pays des geeks',
'vcodec': 'none',
'uploader': 'Colette Fellous',
'upload_date': '20140301',
'duration': 3601,
'thumbnail': r're:^http://www\.franceculture\.fr/.*/images/player/Carnet-nomade\.jpg$',
'description': 'Avec :Jean-Baptiste Péretié pour son documentaire sur Arte "La revanche des « geeks », une enquête menée aux Etats-Unis dans la S ...',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
baseurl = mobj.group('baseurl')
webpage = self._download_webpage(url, video_id)
params_code = self._search_regex(
r"<param name='movie' value='/sites/all/modules/rf/rf_player/swf/loader.swf\?([^']+)' />",
webpage, 'parameter code')
params = compat_parse_qs(params_code)
video_url = compat_urlparse.urljoin(baseurl, params['urlAOD'][0])
title = self._html_search_regex(
r'<h1 class="title[^"]+">(.+?)</h1>', webpage, 'title')
uploader = self._html_search_regex(
r'(?s)<div id="emission".*?<span class="author">(.*?)</span>',
webpage, 'uploader', fatal=False)
thumbnail_part = self._html_search_regex(
r'(?s)<div id="emission".*?<img src="([^"]+)"', webpage,
'thumbnail', fatal=False)
if thumbnail_part is None:
thumbnail = None
else:
thumbnail = compat_urlparse.urljoin(baseurl, thumbnail_part)
description = self._html_search_regex(
r'(?s)<p class="desc">(.*?)</p>', webpage, 'description')
info = json.loads(params['infoData'][0])[0]
duration = info.get('media_length')
upload_date_candidate = info.get('media_section5')
upload_date = (
upload_date_candidate
if (upload_date_candidate is not None and
re.match(r'[0-9]{8}$', upload_date_candidate))
else None)
return {
'id': video_id,
'url': video_url,
'vcodec': 'none' if video_url.lower().endswith('.mp3') else None,
'duration': duration,
'uploader': uploader,
'upload_date': upload_date,
'title': title,
'thumbnail': thumbnail,
'description': description,
}

View File

@ -25,6 +25,7 @@ from ..utils import (
from .brightcove import BrightcoveIE
from .ooyala import OoyalaIE
from .rutv import RUTVIE
from .smotri import SmotriIE
class GenericIE(InfoExtractor):
@ -81,6 +82,17 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['Brightcove'],
},
{
'url': 'http://www.championat.com/video/football/v/87/87499.html',
'md5': 'fb973ecf6e4a78a67453647444222983',
'info_dict': {
'id': '3414141473001',
'ext': 'mp4',
'title': 'Видео. Удаление Дзагоева (ЦСКА)',
'description': 'Онлайн-трансляция матча ЦСКА - "Волга"',
'uploader': 'Championat',
},
},
# Direct link to a video
{
'url': 'http://media.w3.org/2010/05/sintel/trailer.mp4',
@ -171,7 +183,59 @@ class GenericIE(InfoExtractor):
'uploader': 'Ze Frank',
'description': 'md5:ddb2a40ecd6b6a147e400e535874947b',
}
}
},
# Embeded Ustream video
{
'url': 'http://www.american.edu/spa/pti/nsa-privacy-janus-2014.cfm',
'md5': '27b99cdb639c9b12a79bca876a073417',
'info_dict': {
'id': '45734260',
'ext': 'flv',
'uploader': 'AU SPA: The NSA and Privacy',
'title': 'NSA and Privacy Forum Debate featuring General Hayden and Barton Gellman'
}
},
# nowvideo embed hidden behind percent encoding
{
'url': 'http://www.waoanime.tv/the-super-dimension-fortress-macross-episode-1/',
'md5': '2baf4ddd70f697d94b1c18cf796d5107',
'info_dict': {
'id': '06e53103ca9aa',
'ext': 'flv',
'title': 'Macross Episode 001 Watch Macross Episode 001 onl',
'description': 'No description',
},
},
# arte embed
{
'url': 'http://www.tv-replay.fr/redirection/20-03-14/x-enius-arte-10753389.html',
'md5': '7653032cbb25bf6c80d80f217055fa43',
'info_dict': {
'id': '048195-004_PLUS7-F',
'ext': 'flv',
'title': 'X:enius',
'description': 'md5:d5fdf32ef6613cdbfd516ae658abf168',
'upload_date': '20140320',
},
'params': {
'skip_download': 'Requires rtmpdump'
}
},
# smotri embed
{
'url': 'http://rbctv.rbc.ru/archive/news/562949990879132.shtml',
'md5': 'ec40048448e9284c9a1de77bb188108b',
'info_dict': {
'id': 'v27008541fad',
'ext': 'mp4',
'title': 'Крым и Севастополь вошли в состав России',
'description': 'md5:fae01b61f68984c7bd2fa741e11c3175',
'duration': 900,
'upload_date': '20140318',
'uploader': 'rbctv_2012_4',
'uploader_id': 'rbctv_2012_4',
},
},
]
def report_download_webpage(self, video_id):
@ -260,13 +324,16 @@ class GenericIE(InfoExtractor):
if not parsed_url.scheme:
default_search = self._downloader.params.get('default_search')
if default_search is None:
default_search = 'auto'
default_search = 'auto_warning'
if default_search == 'auto':
if default_search in ('auto', 'auto_warning'):
if '/' in url:
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url)
else:
if default_search == 'auto_warning':
self._downloader.report_warning(
'Falling back to youtube search for %s . Set --default-search to "auto" to suppress this warning.' % url)
return self.url_result('ytsearch:' + url)
else:
assert ':' in default_search
@ -323,6 +390,11 @@ class GenericIE(InfoExtractor):
except compat_xml_parse_error:
pass
# Sometimes embedded video player is hidden behind percent encoding
# (e.g. https://github.com/rg3/youtube-dl/issues/2448)
# Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse.unquote(webpage)
# it's tempting to parse this further, but you would
# have to take into account all the variations like
# Video Title - Site Name
@ -424,9 +496,10 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'))
# Look for Ooyala videos
mobj = re.search(r'player.ooyala.com/[^"?]+\?[^"]*?(?:embedCode|ec)=([^"&]+)', webpage)
mobj = (re.search(r'player.ooyala.com/[^"?]+\?[^"]*?(?:embedCode|ec)=(?P<ec>[^"&]+)', webpage) or
re.search(r'OO.Player.create\([\'"].*?[\'"],\s*[\'"](?P<ec>.{32})[\'"]', webpage))
if mobj is not None:
return OoyalaIE._build_url_result(mobj.group(1))
return OoyalaIE._build_url_result(mobj.group('ec'))
# Look for Aparat videos
mobj = re.search(r'<iframe src="(http://www\.aparat\.com/video/[^"]+)"', webpage)
@ -438,17 +511,18 @@ class GenericIE(InfoExtractor):
if mobj is not None:
return self.url_result(mobj.group(1), 'Mpora')
# Look for embedded NovaMov player
# Look for embedded NovaMov-based player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://(?:(?:embed|www)\.)?novamov\.com/embed\.php.+?)\1', webpage)
r'''(?x)<iframe[^>]+?src=(["\'])
(?P<url>http://(?:(?:embed|www)\.)?
(?:novamov\.com|
nowvideo\.(?:ch|sx|eu|at|ag|co)|
videoweed\.(?:es|com)|
movshare\.(?:net|sx|ag)|
divxstage\.(?:eu|net|ch|co|at|ag))
/embed\.php.+?)\1''', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'NovaMov')
# Look for embedded NowVideo player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://(?:(?:embed|www)\.)?nowvideo\.(?:ch|sx|eu)/embed\.php.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'NowVideo')
return self.url_result(mobj.group('url'))
# Look for embedded Facebook player
mobj = re.search(
@ -488,6 +562,30 @@ class GenericIE(InfoExtractor):
if rutv_url:
return self.url_result(rutv_url, 'RUTV')
# Look for embedded TED player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://embed\.ted\.com/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'TED')
# Look for embedded Ustream videos
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://www\.ustream\.tv/embed/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'Ustream')
# Look for embedded arte.tv player
mobj = re.search(
r'<script [^>]*?src="(?P<url>http://www\.arte\.tv/playerv2/embed[^"]+)"',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'ArteTVEmbed')
# Look for embedded smotri.com player
smotri_url = SmotriIE._extract_url(webpage)
if smotri_url:
return self.url_result(smotri_url, 'Smotri')
# Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None:
@ -500,12 +598,6 @@ class GenericIE(InfoExtractor):
# Broaden the search a little bit: JWPlayer JS loader
mobj = re.search(r'[^A-Za-z0-9]?file["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage)
# Look for embedded TED player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://embed\.ted\.com/.+?)\1', webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'TED')
if mobj is None:
# Try to find twitter cards info
mobj = re.search(r'<meta (?:property|name)="twitter:player:stream" (?:content|value)="(.+?)"', webpage)

View File

@ -21,9 +21,10 @@ class HuffPostIE(InfoExtractor):
_TEST = {
'url': 'http://live.huffingtonpost.com/r/segment/legalese-it/52dd3e4b02a7602131000677',
'file': '52dd3e4b02a7602131000677.mp4',
'md5': '55f5e8981c1c80a64706a44b74833de8',
'info_dict': {
'id': '52dd3e4b02a7602131000677',
'ext': 'mp4',
'title': 'Legalese It! with @MikeSacksHP',
'description': 'This week on Legalese It, Mike talks to David Bosco about his new book on the ICC, "Rough Justice," he also discusses the Virginia AG\'s historic stance on gay marriage, the execution of Edgar Tamayo, the ICC\'s delay of Kenya\'s President and more. ',
'duration': 1549,

View File

@ -1,10 +1,8 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import (
determine_ext,
)
class IGNIE(InfoExtractor):
@ -14,52 +12,57 @@ class IGNIE(InfoExtractor):
"""
_VALID_URL = r'https?://.+?\.ign\.com/(?P<type>videos|show_videos|articles|(?:[^/]*/feature))(/.+)?/(?P<name_or_id>.+)'
IE_NAME = u'ign.com'
IE_NAME = 'ign.com'
_CONFIG_URL_TEMPLATE = 'http://www.ign.com/videos/configs/id/%s.config'
_DESCRIPTION_RE = [r'<span class="page-object-description">(.+?)</span>',
r'id="my_show_video">.*?<p>(.*?)</p>',
]
_DESCRIPTION_RE = [
r'<span class="page-object-description">(.+?)</span>',
r'id="my_show_video">.*?<p>(.*?)</p>',
]
_TESTS = [
{
u'url': u'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
u'file': u'8f862beef863986b2785559b9e1aa599.mp4',
u'md5': u'eac8bdc1890980122c3b66f14bdd02e9',
u'info_dict': {
u'title': u'The Last of Us Review',
u'description': u'md5:c8946d4260a4d43a00d5ae8ed998870c',
'url': 'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
'md5': 'eac8bdc1890980122c3b66f14bdd02e9',
'info_dict': {
'id': '8f862beef863986b2785559b9e1aa599',
'ext': 'mp4',
'title': 'The Last of Us Review',
'description': 'md5:c8946d4260a4d43a00d5ae8ed998870c',
}
},
{
u'url': u'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
u'playlist': [
'url': 'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
'playlist': [
{
u'file': u'5ebbd138523268b93c9141af17bec937.mp4',
u'info_dict': {
u'title': u'GTA 5 Video Review',
u'description': u'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
'info_dict': {
'id': '5ebbd138523268b93c9141af17bec937',
'ext': 'mp4',
'title': 'GTA 5 Video Review',
'description': 'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
},
},
{
u'file': u'638672ee848ae4ff108df2a296418ee2.mp4',
u'info_dict': {
u'title': u'26 Twisted Moments from GTA 5 in Slow Motion',
u'description': u'The twisted beauty of GTA 5 in stunning slow motion.',
'info_dict': {
'id': '638672ee848ae4ff108df2a296418ee2',
'ext': 'mp4',
'title': '26 Twisted Moments from GTA 5 in Slow Motion',
'description': 'The twisted beauty of GTA 5 in stunning slow motion.',
},
},
],
u'params': {
u'skip_download': True,
'params': {
'skip_download': True,
},
},
]
def _find_video_id(self, webpage):
res_id = [r'data-video-id="(.+?)"',
r'<object id="vid_(.+?)"',
r'<meta name="og:image" content=".*/(.+?)-(.+?)/.+.jpg"',
]
res_id = [
r'data-video-id="(.+?)"',
r'<object id="vid_(.+?)"',
r'<meta name="og:image" content=".*/(.+?)-(.+?)/.+.jpg"',
]
return self._search_regex(res_id, webpage, 'video id')
def _real_extract(self, url):
@ -68,7 +71,7 @@ class IGNIE(InfoExtractor):
page_type = mobj.group('type')
webpage = self._download_webpage(url, name_or_id)
if page_type == 'articles':
video_url = self._search_regex(r'var videoUrl = "(.+?)"', webpage, u'video url')
video_url = self._search_regex(r'var videoUrl = "(.+?)"', webpage, 'video url')
return self.url_result(video_url, ie='IGN')
elif page_type != 'video':
multiple_urls = re.findall(
@ -80,41 +83,37 @@ class IGNIE(InfoExtractor):
video_id = self._find_video_id(webpage)
result = self._get_video_info(video_id)
description = self._html_search_regex(self._DESCRIPTION_RE,
webpage, 'video description',
flags=re.DOTALL)
webpage, 'video description', flags=re.DOTALL)
result['description'] = description
return result
def _get_video_info(self, video_id):
config_url = self._CONFIG_URL_TEMPLATE % video_id
config = json.loads(self._download_webpage(config_url, video_id,
u'Downloading video info'))
config = self._download_json(config_url, video_id)
media = config['playlist']['media']
video_url = media['url']
return {'id': media['metadata']['videoId'],
'url': video_url,
'ext': determine_ext(video_url),
'title': media['metadata']['title'],
'thumbnail': media['poster'][0]['url'].replace('{size}', 'grande'),
}
return {
'id': media['metadata']['videoId'],
'url': media['url'],
'title': media['metadata']['title'],
'thumbnail': media['poster'][0]['url'].replace('{size}', 'grande'),
}
class OneUPIE(IGNIE):
"""Extractor for 1up.com, it uses the ign videos system."""
_VALID_URL = r'https?://gamevideos\.1up\.com/(?P<type>video)/id/(?P<name_or_id>.+)'
IE_NAME = '1up.com'
_DESCRIPTION_RE = r'<div id="vid_summary">(.+?)</div>'
_TEST = {
u'url': u'http://gamevideos.1up.com/video/id/34976',
u'file': u'34976.mp4',
u'md5': u'68a54ce4ebc772e4b71e3123d413163d',
u'info_dict': {
u'title': u'Sniper Elite V2 - Trailer',
u'description': u'md5:5d289b722f5a6d940ca3136e9dae89cf',
'url': 'http://gamevideos.1up.com/video/id/34976',
'md5': '68a54ce4ebc772e4b71e3123d413163d',
'info_dict': {
'id': '34976',
'ext': 'mp4',
'title': 'Sniper Elite V2 - Trailer',
'description': 'md5:5d289b722f5a6d940ca3136e9dae89cf',
}
}
@ -123,7 +122,6 @@ class OneUPIE(IGNIE):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
id = mobj.group('name_or_id')
result = super(OneUPIE, self)._real_extract(url)
result['id'] = id
result['id'] = mobj.group('name_or_id')
return result

View File

@ -3,6 +3,9 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
)
class InstagramIE(InfoExtractor):
@ -37,3 +40,68 @@ class InstagramIE(InfoExtractor):
'uploader_id': uploader_id,
'description': desc,
}
class InstagramUserIE(InfoExtractor):
_VALID_URL = r'http://instagram\.com/(?P<username>[^/]{2,})/?(?:$|[?#])'
IE_DESC = 'Instagram user profile'
IE_NAME = 'instagram:user'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
uploader_id = mobj.group('username')
entries = []
page_count = 0
media_url = 'http://instagram.com/%s/media' % uploader_id
while True:
page = self._download_json(
media_url, uploader_id,
note='Downloading page %d ' % (page_count + 1),
)
page_count += 1
for it in page['items']:
if it.get('type') != 'video':
continue
like_count = int_or_none(it.get('likes', {}).get('count'))
user = it.get('user', {})
formats = [{
'format_id': k,
'height': v.get('height'),
'width': v.get('width'),
'url': v['url'],
} for k, v in it['videos'].items()]
self._sort_formats(formats)
thumbnails_el = it.get('images', {})
thumbnail = thumbnails_el.get('thumbnail', {}).get('url')
title = it.get('caption', {}).get('text', it['id'])
entries.append({
'id': it['id'],
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'webpage_url': it.get('link'),
'uploader': user.get('full_name'),
'uploader_id': user.get('username'),
'like_count': like_count,
'timestamp': int_or_none(it.get('created_time')),
})
if not page['items']:
break
max_id = page['items'][-1]['id']
media_url = (
'http://instagram.com/%s/media?max_id=%s' % (
uploader_id, max_id))
return {
'_type': 'playlist',
'entries': entries,
'id': uploader_id,
'title': uploader_id,
}

View File

@ -1,9 +1,12 @@
from __future__ import unicode_literals
import json
import os
import re
from .common import InfoExtractor
from ..utils import (
compat_str,
ExtractorError,
formatSeconds,
)
@ -24,34 +27,31 @@ class JustinTVIE(InfoExtractor):
/?(?:\#.*)?$
"""
_JUSTIN_PAGE_LIMIT = 100
IE_NAME = u'justin.tv'
IE_NAME = 'justin.tv'
IE_DESC = 'justin.tv and twitch.tv'
_TEST = {
u'url': u'http://www.twitch.tv/thegamedevhub/b/296128360',
u'file': u'296128360.flv',
u'md5': u'ecaa8a790c22a40770901460af191c9a',
u'info_dict': {
u"upload_date": u"20110927",
u"uploader_id": 25114803,
u"uploader": u"thegamedevhub",
u"title": u"Beginner Series - Scripting With Python Pt.1"
'url': 'http://www.twitch.tv/thegamedevhub/b/296128360',
'md5': 'ecaa8a790c22a40770901460af191c9a',
'info_dict': {
'id': '296128360',
'ext': 'flv',
'upload_date': '20110927',
'uploader_id': 25114803,
'uploader': 'thegamedevhub',
'title': 'Beginner Series - Scripting With Python Pt.1'
}
}
def report_download_page(self, channel, offset):
"""Report attempt to download a single page of videos."""
self.to_screen(u'%s: Downloading video information from %d to %d' %
(channel, offset, offset + self._JUSTIN_PAGE_LIMIT))
# Return count of items, list of *valid* items
def _parse_page(self, url, video_id):
info_json = self._download_webpage(url, video_id,
u'Downloading video info JSON',
u'unable to download video info JSON')
'Downloading video info JSON',
'unable to download video info JSON')
response = json.loads(info_json)
if type(response) != list:
error_text = response.get('error', 'unknown error')
raise ExtractorError(u'Justin.tv API: %s' % error_text)
raise ExtractorError('Justin.tv API: %s' % error_text)
info = []
for clip in response:
video_url = clip['video_file_url']
@ -62,7 +62,7 @@ class JustinTVIE(InfoExtractor):
video_id = clip['id']
video_title = clip.get('title', video_id)
info.append({
'id': video_id,
'id': compat_str(video_id),
'url': video_url,
'title': video_title,
'uploader': clip.get('channel_name', video_uploader_id),
@ -74,8 +74,6 @@ class JustinTVIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'invalid URL: %s' % url)
api_base = 'http://api.justin.tv'
paged = False
@ -89,40 +87,41 @@ class JustinTVIE(InfoExtractor):
webpage = self._download_webpage(url, chapter_id)
m = re.search(r'PP\.archive_id = "([0-9]+)";', webpage)
if not m:
raise ExtractorError(u'Cannot find archive of a chapter')
raise ExtractorError('Cannot find archive of a chapter')
archive_id = m.group(1)
api = api_base + '/broadcast/by_chapter/%s.xml' % chapter_id
doc = self._download_xml(api, chapter_id,
note=u'Downloading chapter information',
errnote=u'Chapter information download failed')
doc = self._download_xml(
api, chapter_id,
note='Downloading chapter information',
errnote='Chapter information download failed')
for a in doc.findall('.//archive'):
if archive_id == a.find('./id').text:
break
else:
raise ExtractorError(u'Could not find chapter in chapter information')
raise ExtractorError('Could not find chapter in chapter information')
video_url = a.find('./video_file_url').text
video_ext = video_url.rpartition('.')[2] or u'flv'
video_ext = video_url.rpartition('.')[2] or 'flv'
chapter_api_url = u'https://api.twitch.tv/kraken/videos/c' + chapter_id
chapter_info_json = self._download_webpage(chapter_api_url, u'c' + chapter_id,
note='Downloading chapter metadata',
errnote='Download of chapter metadata failed')
chapter_info = json.loads(chapter_info_json)
chapter_api_url = 'https://api.twitch.tv/kraken/videos/c' + chapter_id
chapter_info = self._download_json(
chapter_api_url, 'c' + chapter_id,
note='Downloading chapter metadata',
errnote='Download of chapter metadata failed')
bracket_start = int(doc.find('.//bracket_start').text)
bracket_end = int(doc.find('.//bracket_end').text)
# TODO determine start (and probably fix up file)
# youtube-dl -v http://www.twitch.tv/firmbelief/c/1757457
#video_url += u'?start=' + TODO:start_timestamp
#video_url += '?start=' + TODO:start_timestamp
# bracket_start is 13290, but we want 51670615
self._downloader.report_warning(u'Chapter detected, but we can just download the whole file. '
u'Chapter starts at %s and ends at %s' % (formatSeconds(bracket_start), formatSeconds(bracket_end)))
self._downloader.report_warning('Chapter detected, but we can just download the whole file. '
'Chapter starts at %s and ends at %s' % (formatSeconds(bracket_start), formatSeconds(bracket_end)))
info = {
'id': u'c' + chapter_id,
'id': 'c' + chapter_id,
'url': video_url,
'ext': video_ext,
'title': chapter_info['title'],
@ -131,14 +130,12 @@ class JustinTVIE(InfoExtractor):
'uploader': chapter_info['channel']['display_name'],
'uploader_id': chapter_info['channel']['name'],
}
return [info]
return info
else:
video_id = mobj.group('videoid')
api = api_base + '/broadcast/by_archive/%s.json' % video_id
self.report_extraction(video_id)
info = []
entries = []
offset = 0
limit = self._JUSTIN_PAGE_LIMIT
while True:
@ -146,8 +143,12 @@ class JustinTVIE(InfoExtractor):
self.report_download_page(video_id, offset)
page_url = api + ('?offset=%d&limit=%d' % (offset, limit))
page_count, page_info = self._parse_page(page_url, video_id)
info.extend(page_info)
entries.extend(page_info)
if not paged or page_count != limit:
break
offset += limit
return info
return {
'_type': 'playlist',
'id': video_id,
'entries': entries,
}

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import os
import re
@ -11,22 +13,22 @@ from ..aes import (
aes_decrypt_text
)
class KeezMoviesIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>keezmovies\.com/video/.+?(?P<videoid>[0-9]+))(?:[/?&]|$)'
_VALID_URL = r'^https?://(?:www\.)?keezmovies\.com/video/.+?(?P<videoid>[0-9]+)(?:[/?&]|$)'
_TEST = {
u'url': u'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711',
u'file': u'1214711.mp4',
u'md5': u'6e297b7e789329923fcf83abb67c9289',
u'info_dict': {
u"title": u"Petite Asian Lady Mai Playing In Bathtub",
u"age_limit": 18,
'url': 'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711',
'file': '1214711.mp4',
'md5': '6e297b7e789329923fcf83abb67c9289',
'info_dict': {
'title': 'Petite Asian Lady Mai Playing In Bathtub',
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
@ -38,10 +40,10 @@ class KeezMoviesIE(InfoExtractor):
embedded_url = mobj.group(1)
return self.url_result(embedded_url)
video_title = self._html_search_regex(r'<h1 [^>]*>([^<]+)', webpage, u'title')
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'video_url=(.+?)&amp;', webpage, u'video_url'))
if webpage.find('encrypted=true')!=-1:
password = self._html_search_regex(r'video_title=(.+?)&amp;', webpage, u'password')
video_title = self._html_search_regex(r'<h1 [^>]*>([^<]+)', webpage, 'title')
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'video_url=(.+?)&amp;', webpage, 'video_url'))
if 'encrypted=true' in webpage:
password = self._html_search_regex(r'video_title=(.+?)&amp;', webpage, 'password')
video_url = aes_decrypt_text(video_url, password, 32).decode('utf-8')
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]

View File

@ -1,37 +1,39 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class KickStarterIE(InfoExtractor):
_VALID_URL = r'https?://www\.kickstarter\.com/projects/(?P<id>\d*)/.*'
_VALID_URL = r'https?://www\.kickstarter\.com/projects/(?P<id>[^/]*)/.*'
_TEST = {
u"url": u"https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant?ref=home_location",
u"file": u"1404461844.mp4",
u"md5": u"c81addca81327ffa66c642b5d8b08cab",
u"info_dict": {
u"title": u"Intersection: The Story of Josh Grant by Kyle Cowling",
'url': 'https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant?ref=home_location',
'md5': 'c81addca81327ffa66c642b5d8b08cab',
'info_dict': {
'id': '1404461844',
'ext': 'mp4',
'title': 'Intersection: The Story of Josh Grant by Kyle Cowling',
'description': 'A unique motocross documentary that examines the '
'life and mind of one of sports most elite athletes: Josh Grant.',
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
webpage_src = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r'data-video="(.*?)">',
webpage_src, u'video URL')
if 'mp4' in video_url:
ext = 'mp4'
else:
ext = 'flv'
video_title = self._html_search_regex(r"<title>(.*?)</title>",
webpage_src, u'title').rpartition(u'\u2014 Kickstarter')[0].strip()
video_url = self._search_regex(r'data-video-url="(.*?)"',
webpage, 'video URL')
video_title = self._html_search_regex(r'<title>(.*?)</title>',
webpage, 'title').rpartition('— Kickstarter')[0].strip()
results = [{
'id': video_id,
'url': video_url,
'title': video_title,
'ext': ext,
}]
return results
return {
'id': video_id,
'url': video_url,
'title': video_title,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@ -9,104 +11,103 @@ from ..utils import (
ExtractorError,
)
class MetacafeIE(InfoExtractor):
"""Information Extractor for metacafe.com."""
_VALID_URL = r'(?:http://)?(?:www\.)?metacafe\.com/watch/([^/]+)/([^/]+)/.*'
class MetacafeIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?metacafe\.com/watch/([^/]+)/([^/]+)/.*'
_DISCLAIMER = 'http://www.metacafe.com/family_filter/'
_FILTER_POST = 'http://www.metacafe.com/f/index.php?inputType=filter&controllerGroup=user'
IE_NAME = u'metacafe'
IE_NAME = 'metacafe'
_TESTS = [
# Youtube video
{
u"add_ie": ["Youtube"],
u"url": u"http://metacafe.com/watch/yt-_aUehQsCQtM/the_electric_company_short_i_pbs_kids_go/",
u"file": u"_aUehQsCQtM.mp4",
u"info_dict": {
u"upload_date": u"20090102",
u"title": u"The Electric Company | \"Short I\" | PBS KIDS GO!",
u"description": u"md5:2439a8ef6d5a70e380c22f5ad323e5a8",
u"uploader": u"PBS",
u"uploader_id": u"PBS"
}
},
# Normal metacafe video
{
u'url': u'http://www.metacafe.com/watch/11121940/news_stuff_you_wont_do_with_your_playstation_4/',
u'md5': u'6e0bca200eaad2552e6915ed6fd4d9ad',
u'info_dict': {
u'id': u'11121940',
u'ext': u'mp4',
u'title': u'News: Stuff You Won\'t Do with Your PlayStation 4',
u'uploader': u'ign',
u'description': u'Sony released a massive FAQ on the PlayStation Blog detailing the PS4\'s capabilities and limitations.',
# Youtube video
{
'add_ie': ['Youtube'],
'url': 'http://metacafe.com/watch/yt-_aUehQsCQtM/the_electric_company_short_i_pbs_kids_go/',
'info_dict': {
'id': '_aUehQsCQtM',
'ext': 'mp4',
'upload_date': '20090102',
'title': 'The Electric Company | "Short I" | PBS KIDS GO!',
'description': 'md5:2439a8ef6d5a70e380c22f5ad323e5a8',
'uploader': 'PBS',
'uploader_id': 'PBS'
}
},
},
# AnyClip video
{
u"url": u"http://www.metacafe.com/watch/an-dVVXnuY7Jh77J/the_andromeda_strain_1971_stop_the_bomb_part_3/",
u"file": u"an-dVVXnuY7Jh77J.mp4",
u"info_dict": {
u"title": u"The Andromeda Strain (1971): Stop the Bomb Part 3",
u"uploader": u"anyclip",
u"description": u"md5:38c711dd98f5bb87acf973d573442e67",
# Normal metacafe video
{
'url': 'http://www.metacafe.com/watch/11121940/news_stuff_you_wont_do_with_your_playstation_4/',
'md5': '6e0bca200eaad2552e6915ed6fd4d9ad',
'info_dict': {
'id': '11121940',
'ext': 'mp4',
'title': 'News: Stuff You Won\'t Do with Your PlayStation 4',
'uploader': 'ign',
'description': 'Sony released a massive FAQ on the PlayStation Blog detailing the PS4\'s capabilities and limitations.',
},
},
},
# age-restricted video
{
u'url': u'http://www.metacafe.com/watch/5186653/bbc_internal_christmas_tape_79_uncensored_outtakes_etc/',
u'md5': u'98dde7c1a35d02178e8ab7560fe8bd09',
u'info_dict': {
u'id': u'5186653',
u'ext': u'mp4',
u'title': u'BBC INTERNAL Christmas Tape \'79 - UNCENSORED Outtakes, Etc.',
u'uploader': u'Dwayne Pipe',
u'description': u'md5:950bf4c581e2c059911fa3ffbe377e4b',
u'age_limit': 18,
# AnyClip video
{
'url': 'http://www.metacafe.com/watch/an-dVVXnuY7Jh77J/the_andromeda_strain_1971_stop_the_bomb_part_3/',
'info_dict': {
'id': 'an-dVVXnuY7Jh77J',
'ext': 'mp4',
'title': 'The Andromeda Strain (1971): Stop the Bomb Part 3',
'uploader': 'anyclip',
'description': 'md5:38c711dd98f5bb87acf973d573442e67',
},
},
},
# cbs video
{
u'url': u'http://www.metacafe.com/watch/cb-0rOxMBabDXN6/samsung_galaxy_note_2_samsungs_next_generation_phablet/',
u'info_dict': {
u'id': u'0rOxMBabDXN6',
u'ext': u'flv',
u'title': u'Samsung Galaxy Note 2: Samsung\'s next-generation phablet',
u'description': u'md5:54d49fac53d26d5a0aaeccd061ada09d',
u'duration': 129,
# age-restricted video
{
'url': 'http://www.metacafe.com/watch/5186653/bbc_internal_christmas_tape_79_uncensored_outtakes_etc/',
'md5': '98dde7c1a35d02178e8ab7560fe8bd09',
'info_dict': {
'id': '5186653',
'ext': 'mp4',
'title': 'BBC INTERNAL Christmas Tape \'79 - UNCENSORED Outtakes, Etc.',
'uploader': 'Dwayne Pipe',
'description': 'md5:950bf4c581e2c059911fa3ffbe377e4b',
'age_limit': 18,
},
},
u'params': {
# rtmp download
u'skip_download': True,
# cbs video
{
'url': 'http://www.metacafe.com/watch/cb-8VD4r_Zws8VP/open_this_is_face_the_nation_february_9/',
'info_dict': {
'id': '8VD4r_Zws8VP',
'ext': 'flv',
'title': 'Open: This is Face the Nation, February 9',
'description': 'md5:8a9ceec26d1f7ed6eab610834cc1a476',
'duration': 96,
},
'params': {
# rtmp download
'skip_download': True,
},
},
},
]
def report_disclaimer(self):
"""Report disclaimer retrieval."""
self.to_screen(u'Retrieving disclaimer')
self.to_screen('Retrieving disclaimer')
def _real_initialize(self):
# Retrieve disclaimer
self.report_disclaimer()
self._download_webpage(self._DISCLAIMER, None, False, u'Unable to retrieve disclaimer')
self._download_webpage(self._DISCLAIMER, None, False, 'Unable to retrieve disclaimer')
# Confirm age
disclaimer_form = {
'filters': '0',
'submit': "Continue - I'm over 18",
}
}
request = compat_urllib_request.Request(self._FILTER_POST, compat_urllib_parse.urlencode(disclaimer_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self.report_age_confirmation()
self._download_webpage(request, None, False, u'Unable to confirm age')
self._download_webpage(request, None, False, 'Unable to confirm age')
def _real_extract(self, url):
# Extract id and simplified title from URL
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError('Invalid URL: %s' % url)
video_id = mobj.group(1)
@ -153,23 +154,24 @@ class MetacafeIE(InfoExtractor):
else:
mobj = re.search(r' name="flashvars" value="(.*?)"', webpage)
if mobj is None:
raise ExtractorError(u'Unable to extract media URL')
raise ExtractorError('Unable to extract media URL')
vardict = compat_parse_qs(mobj.group(1))
if 'mediaData' not in vardict:
raise ExtractorError(u'Unable to extract media URL')
mobj = re.search(r'"mediaURL":"(?P<mediaURL>http.*?)",(.*?)"key":"(?P<key>.*?)"', vardict['mediaData'][0])
raise ExtractorError('Unable to extract media URL')
mobj = re.search(
r'"mediaURL":"(?P<mediaURL>http.*?)",(.*?)"key":"(?P<key>.*?)"', vardict['mediaData'][0])
if mobj is None:
raise ExtractorError(u'Unable to extract media URL')
raise ExtractorError('Unable to extract media URL')
mediaURL = mobj.group('mediaURL').replace('\\/', '/')
video_url = '%s?__gda__=%s' % (mediaURL, mobj.group('key'))
video_ext = determine_ext(video_url)
video_title = self._html_search_regex(r'(?im)<title>(.*) - Video</title>', webpage, u'title')
video_title = self._html_search_regex(r'(?im)<title>(.*) - Video</title>', webpage, 'title')
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
video_uploader = self._html_search_regex(
r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);',
webpage, u'uploader nickname', fatal=False)
webpage, 'uploader nickname', fatal=False)
if re.search(r'"contentRating":"restricted"', webpage) is not None:
age_limit = 18
@ -177,14 +179,12 @@ class MetacafeIE(InfoExtractor):
age_limit = 0
return {
'_type': 'video',
'id': video_id,
'url': video_url,
'id': video_id,
'url': video_url,
'description': description,
'uploader': video_uploader,
'upload_date': None,
'title': video_title,
'title': video_title,
'thumbnail':thumbnail,
'ext': video_ext,
'ext': video_ext,
'age_limit': age_limit,
}

View File

@ -13,8 +13,9 @@ class MetacriticIE(InfoExtractor):
_TEST = {
'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',
'file': '3698222.mp4',
'info_dict': {
'id': '3698222',
'ext': 'mp4',
'title': 'inFamous: Second Son - inSide Sucker Punch: Smoke & Mirrors',
'description': 'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.',
'duration': 221,

View File

@ -14,7 +14,7 @@ from ..utils import (
class MooshareIE(InfoExtractor):
IE_NAME = 'mooshare'
IE_DESC = 'Mooshare.biz'
_VALID_URL = r'http://mooshare\.biz/(?P<id>[\da-z]{12})'
_VALID_URL = r'http://(?:www\.)?mooshare\.biz/(?P<id>[\da-z]{12})'
_TESTS = [
{

View File

@ -0,0 +1,47 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class MorningstarIE(InfoExtractor):
IE_DESC = 'morningstar.com'
_VALID_URL = r'https?://(?:www\.)?morningstar\.com/[cC]over/video[cC]enter\.aspx\?id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.morningstar.com/cover/videocenter.aspx?id=615869',
'md5': '6c0acface7a787aadc8391e4bbf7b0f5',
'info_dict': {
'id': '615869',
'ext': 'mp4',
'title': 'Get Ahead of the Curve on 2013 Taxes',
'description': "Vanguard's Joel Dickson on managing higher tax rates for high-income earners and fund capital-gain distributions in 2013.",
'thumbnail': r're:^https?://.*m(?:orning)?star\.com/.+thumb\.jpg$'
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<h1 id="titleLink">(.*?)</h1>', webpage, 'title')
video_url = self._html_search_regex(
r'<input type="hidden" id="hidVideoUrl" value="([^"]+)"',
webpage, 'video URL')
thumbnail = self._html_search_regex(
r'<input type="hidden" id="hidSnapshot" value="([^"]+)"',
webpage, 'thumbnail', fatal=False)
description = self._html_search_regex(
r'<div id="mstarDeck".*?>(.*?)</div>',
webpage, 'description', fatal=False)
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': thumbnail,
'description': description,
}

View File

@ -0,0 +1,63 @@
# coding: utf-8
from __future__ import unicode_literals
import hashlib
import json
import re
import time
from .common import InfoExtractor
from ..utils import (
compat_parse_qs,
compat_str,
int_or_none,
)
class MotorsportIE(InfoExtractor):
IE_DESC = 'motorsport.com'
_VALID_URL = r'http://www\.motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/(?:$|[?#])'
_TEST = {
'url': 'http://www.motorsport.com/f1/video/main-gallery/red-bull-racing-2014-rules-explained/',
'md5': '5592cb7c5005d9b2c163df5ac3dc04e4',
'info_dict': {
'id': '7063',
'ext': 'mp4',
'title': 'Red Bull Racing: 2014 Rules Explained',
'duration': 207,
'description': 'A new clip from Red Bull sees Daniel Ricciardo and Sebastian Vettel explain the 2014 Formula One regulations which are arguably the most complex the sport has ever seen.',
'uploader': 'rainiere',
'thumbnail': r're:^http://.*motorsport\.com/.+\.jpg$'
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id)
flashvars_code = self._html_search_regex(
r'<embed id="player".*?flashvars="([^"]+)"', webpage, 'flashvars')
flashvars = compat_parse_qs(flashvars_code)
params = json.loads(flashvars['parameters'][0])
e = compat_str(int(time.time()) + 24 * 60 * 60)
base_video_url = params['location'] + '?e=' + e
s = 'h3hg713fh32'
h = hashlib.md5((s + base_video_url).encode('utf-8')).hexdigest()
video_url = base_video_url + '&h=' + h
uploader = self._html_search_regex(
r'(?s)<span class="label">Video by: </span>(.*?)</a>', webpage,
'uploader', fatal=False)
return {
'id': params['video_id'],
'display_id': display_id,
'title': params['title'],
'url': video_url,
'description': params.get('description'),
'thumbnail': params.get('main_thumb'),
'duration': int_or_none(params.get('duration')),
'uploader': uploader,
}

View File

@ -0,0 +1,27 @@
from __future__ import unicode_literals
from .novamov import NovaMovIE
class MovShareIE(NovaMovIE):
IE_NAME = 'movshare'
IE_DESC = 'MovShare'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'movshare\.(?:net|sx|ag)'}
_HOST = 'www.movshare.net'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<strong>Title:</strong> ([^<]+)</p>'
_DESCRIPTION_REGEX = r'<strong>Description:</strong> ([^<]+)</p>'
_TEST = {
'url': 'http://www.movshare.net/video/559e28be54d96',
'md5': 'abd31a2132947262c50429e1d16c1bfd',
'info_dict': {
'id': '559e28be54d96',
'ext': 'flv',
'title': 'dissapeared image',
'description': 'optical illusion dissapeared image magic illusion',
}
}

View File

@ -4,9 +4,7 @@ import json
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
)
from ..utils import int_or_none
class MporaIE(InfoExtractor):
@ -20,7 +18,7 @@ class MporaIE(InfoExtractor):
'info_dict': {
'title': 'Katy Curd - Winter in the Forest',
'duration': 416,
'uploader': 'petenewman',
'uploader': 'Peter Newman Media',
},
}

View File

@ -0,0 +1,75 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class MusicPlayOnIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=100&play)=(?P<id>\d+)'
_TEST = {
'url': 'http://en.musicplayon.com/play?v=433377',
'info_dict': {
'id': '433377',
'ext': 'mp4',
'title': 'Rick Ross - Interview On Chelsea Lately (2014)',
'description': 'Rick Ross Interview On Chelsea Lately',
'duration': 342,
'uploader': 'ultrafish',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id)
title = self._og_search_title(page)
description = self._og_search_description(page)
thumbnail = self._og_search_thumbnail(page)
duration = self._html_search_meta('video:duration', page, 'duration', fatal=False)
view_count = self._og_search_property('count', page, fatal=False)
uploader = self._html_search_regex(
r'<div>by&nbsp;<a href="[^"]+" class="purple">([^<]+)</a></div>', page, 'uploader', fatal=False)
formats = [
{
'url': 'http://media0-eu-nl.musicplayon.com/stream-mobile?id=%s&type=.mp4' % video_id,
'ext': 'mp4',
}
]
manifest = self._download_webpage(
'http://en.musicplayon.com/manifest.m3u8?v=%s' % video_id, video_id, 'Downloading manifest')
for entry in manifest.split('#')[1:]:
if entry.startswith('EXT-X-STREAM-INF:'):
meta, url, _ = entry.split('\n')
params = dict(param.split('=') for param in meta.split(',')[1:])
formats.append({
'url': url,
'ext': 'mp4',
'tbr': int(params['BANDWIDTH']),
'width': int(params['RESOLUTION'].split('x')[1]),
'height': int(params['RESOLUTION'].split('x')[-1]),
'format_note': params['NAME'].replace('"', '').strip(),
})
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'duration': int_or_none(duration),
'view_count': int_or_none(view_count),
'formats': formats,
}

View File

@ -6,12 +6,13 @@ from .common import InfoExtractor
class NBAIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:watch\.|www\.)?nba\.com/(?:nba/)?video(/[^?]*?)(?:/index\.html)?(?:\?.*)?$'
_VALID_URL = r'https?://(?:watch\.|www\.)?nba\.com/(?:nba/)?video(?P<id>/[^?]*?)(?:/index\.html)?(?:\?.*)?$'
_TEST = {
'url': 'http://www.nba.com/video/games/nets/2012/12/04/0021200253-okc-bkn-recap.nba/index.html',
'file': u'0021200253-okc-bkn-recap.nba.mp4',
'md5': u'c0edcfc37607344e2ff8f13c378c88a4',
'info_dict': {
'id': '0021200253-okc-bkn-recap.nba',
'ext': 'mp4',
'description': 'Kevin Durant scores 32 points and dishes out six assists as the Thunder beat the Nets in Brooklyn.',
'title': 'Thunder vs. Nets',
},
@ -19,7 +20,7 @@ class NBAIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
@ -33,7 +34,6 @@ class NBAIE(InfoExtractor):
return {
'id': shortened_video_id,
'url': video_url,
'ext': 'mp4',
'title': title,
'description': description,
}

View File

@ -1,12 +1,10 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import socket
from .common import InfoExtractor
from ..utils import (
compat_http_client,
compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
@ -18,57 +16,54 @@ from ..utils import (
class NiconicoIE(InfoExtractor):
IE_NAME = u'niconico'
IE_DESC = u'ニコニコ動画'
IE_NAME = 'niconico'
IE_DESC = 'ニコニコ動画'
_TEST = {
u'url': u'http://www.nicovideo.jp/watch/sm22312215',
u'file': u'sm22312215.mp4',
u'md5': u'd1a75c0823e2f629128c43e1212760f9',
u'info_dict': {
u'title': u'Big Buck Bunny',
u'uploader': u'takuya0301',
u'uploader_id': u'2698420',
u'upload_date': u'20131123',
u'description': u'(c) copyright 2008, Blender Foundation / www.bigbuckbunny.org',
'url': 'http://www.nicovideo.jp/watch/sm22312215',
'md5': 'd1a75c0823e2f629128c43e1212760f9',
'info_dict': {
'id': 'sm22312215',
'ext': 'mp4',
'title': 'Big Buck Bunny',
'uploader': 'takuya0301',
'uploader_id': '2698420',
'upload_date': '20131123',
'description': '(c) copyright 2008, Blender Foundation / www.bigbuckbunny.org',
},
u'params': {
u'username': u'ydl.niconico@gmail.com',
u'password': u'youtube-dl',
'params': {
'username': 'ydl.niconico@gmail.com',
'password': 'youtube-dl',
},
}
_VALID_URL = r'^https?://(?:www\.|secure\.)?nicovideo\.jp/watch/([a-z][a-z][0-9]+)(?:.*)$'
_NETRC_MACHINE = 'niconico'
# If True it will raise an error if no login info is provided
_LOGIN_REQUIRED = True
def _real_initialize(self):
self._login()
def _login(self):
(username, password) = self._get_login_info()
# No authentication to be performed
if username is None:
if self._LOGIN_REQUIRED:
raise ExtractorError(u'No login info available, needed for using %s.' % self.IE_NAME, expected=True)
return False
# Login is required
raise ExtractorError('No login info available, needed for using %s.' % self.IE_NAME, expected=True)
# Log in
login_form_strs = {
u'mail': username,
u'password': password,
'mail': username,
'password': password,
}
# Convert to UTF-8 *before* urlencode because Python 2.x's urlencode
# chokes on unicode
login_form = dict((k.encode('utf-8'), v.encode('utf-8')) for k,v in login_form_strs.items())
login_form = dict((k.encode('utf-8'), v.encode('utf-8')) for k, v in login_form_strs.items())
login_data = compat_urllib_parse.urlencode(login_form).encode('utf-8')
request = compat_urllib_request.Request(
u'https://secure.nicovideo.jp/secure/login', login_data)
'https://secure.nicovideo.jp/secure/login', login_data)
login_results = self._download_webpage(
request, u'', note=u'Logging in', errnote=u'Unable to log in')
request, None, note='Logging in', errnote='Unable to log in')
if re.search(r'(?i)<h1 class="mb8p4">Log in error</h1>', login_results) is not None:
self._downloader.report_warning(u'unable to log in: bad username or password')
self._downloader.report_warning('unable to log in: bad username or password')
return False
return True
@ -82,12 +77,12 @@ class NiconicoIE(InfoExtractor):
video_info = self._download_xml(
'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id, video_id,
note=u'Downloading video info page')
note='Downloading video info page')
# Get flv info
flv_info_webpage = self._download_webpage(
u'http://flapi.nicovideo.jp/api/getflv?v=' + video_id,
video_id, u'Downloading flv info')
'http://flapi.nicovideo.jp/api/getflv?v=' + video_id,
video_id, 'Downloading flv info')
video_real_url = compat_urlparse.parse_qs(flv_info_webpage)['url'][0]
# Start extracting information
@ -106,22 +101,22 @@ class NiconicoIE(InfoExtractor):
url = 'http://seiga.nicovideo.jp/api/user/info?id=' + video_uploader_id
try:
user_info = self._download_xml(
url, video_id, note=u'Downloading user information')
url, video_id, note='Downloading user information')
video_uploader = user_info.find('.//nickname').text
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self._downloader.report_warning(u'Unable to download user info webpage: %s' % compat_str(err))
except ExtractorError as err:
self._downloader.report_warning('Unable to download user info webpage: %s' % compat_str(err))
return {
'id': video_id,
'url': video_real_url,
'title': video_title,
'ext': video_extension,
'format': video_format,
'thumbnail': video_thumbnail,
'id': video_id,
'url': video_real_url,
'title': video_title,
'ext': video_extension,
'format': video_format,
'thumbnail': video_thumbnail,
'description': video_description,
'uploader': video_uploader,
'uploader': video_uploader,
'upload_date': video_upload_date,
'uploader_id': video_uploader_id,
'view_count': video_view_count,
'view_count': video_view_count,
'webpage_url': video_webpage_url,
}

View File

@ -7,9 +7,14 @@ from .common import InfoExtractor
class NineGagIE(InfoExtractor):
IE_NAME = '9gag'
_VALID_URL = r'^https?://(?:www\.)?9gag\.tv/v/(?P<id>[0-9]+)'
_VALID_URL = r'''(?x)^https?://(?:www\.)?9gag\.tv/
(?:
v/(?P<numid>[0-9]+)|
p/(?P<id>[a-zA-Z0-9]+)/(?P<display_id>[^?#/]+)
)
'''
_TEST = {
_TESTS = [{
"url": "http://9gag.tv/v/1912",
"info_dict": {
"id": "1912",
@ -20,17 +25,33 @@ class NineGagIE(InfoExtractor):
"thumbnail": "re:^https?://",
},
'add_ie': ['Youtube']
}
},
{
'url': 'http://9gag.tv/p/KklwM/alternate-banned-opening-scene-of-gravity?ref=fsidebar',
'info_dict': {
'id': 'KklwM',
'ext': 'mp4',
'display_id': 'alternate-banned-opening-scene-of-gravity',
"description": "While Gravity was a pretty awesome movie already, YouTuber Krishna Shenoi came up with a way to improve upon it, introducing a much better solution to Sandra Bullock's seemingly endless tumble in space. The ending is priceless.",
'title': "Banned Opening Scene Of \"Gravity\" That Changes The Whole Movie",
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = mobj.group('numid') or mobj.group('id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, display_id)
youtube_id = self._html_search_regex(
r'(?s)id="jsid-video-post-container".*?data-external-id="([^"]+)"',
webpage, 'video ID')
title = self._html_search_regex(
r'(?s)id="jsid-video-post-container".*?data-title="([^"]+)"',
webpage, 'title', default=None)
if not title:
title = self._og_search_title(webpage)
description = self._html_search_regex(
r'(?s)<div class="video-caption">.*?<p>(.*?)</p>', webpage,
'description', fatal=False)
@ -46,7 +67,8 @@ class NineGagIE(InfoExtractor):
'url': youtube_id,
'ie_key': 'Youtube',
'id': video_id,
'title': self._og_search_title(webpage),
'display_id': display_id,
'title': title,
'description': description,
'view_count': view_count,
'thumbnail': self._og_search_thumbnail(webpage),

View File

@ -13,7 +13,8 @@ class NovaMovIE(InfoExtractor):
IE_NAME = 'novamov'
IE_DESC = 'NovaMov'
_VALID_URL = r'http://(?:(?:www\.)?%(host)s/video/|(?:(?:embed|www)\.)%(host)s/embed\.php\?(?:.*?&)?v=)(?P<videoid>[a-z\d]{13})' % {'host': 'novamov\.com'}
_VALID_URL_TEMPLATE = r'http://(?:(?:www\.)?%(host)s/(?:file|video)/|(?:(?:embed|www)\.)%(host)s/embed\.php\?(?:.*?&)?v=)(?P<id>[a-z\d]{13})'
_VALID_URL = _VALID_URL_TEMPLATE % {'host': 'novamov\.com'}
_HOST = 'www.novamov.com'
@ -36,18 +37,17 @@ class NovaMovIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
video_id = mobj.group('id')
page = self._download_webpage(
'http://%s/video/%s' % (self._HOST, video_id), video_id, 'Downloading video page')
if re.search(self._FILE_DELETED_REGEX, page) is not None:
raise ExtractorError(u'Video %s does not exist' % video_id, expected=True)
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
filekey = self._search_regex(self._FILEKEY_REGEX, page, 'filekey')
title = self._html_search_regex(self._TITLE_REGEX, page, 'title', fatal=False)
description = self._html_search_regex(self._DESCRIPTION_REGEX, page, 'description', default='', fatal=False)
api_response = self._download_webpage(

View File

@ -7,7 +7,7 @@ class NowVideoIE(NovaMovIE):
IE_NAME = 'nowvideo'
IE_DESC = 'NowVideo'
_VALID_URL = r'http://(?:(?:www\.)?%(host)s/video/|(?:(?:embed|www)\.)%(host)s/embed\.php\?(?:.*?&)?v=)(?P<videoid>[a-z\d]{13})' % {'host': 'nowvideo\.(?:ch|sx|eu)'}
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:ch|sx|eu|at|ag|co)'}
_HOST = 'www.nowvideo.ch'

157
youtube_dl/extractor/ntv.py Normal file
View File

@ -0,0 +1,157 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
unescapeHTML
)
class NTVIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?ntv\.ru/(?P<id>.+)'
_TESTS = [
{
'url': 'http://www.ntv.ru/novosti/863142/',
'info_dict': {
'id': '746000',
'ext': 'flv',
'title': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
'description': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
'duration': 136,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.ntv.ru/video/novosti/750370/',
'info_dict': {
'id': '750370',
'ext': 'flv',
'title': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
'description': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
'duration': 172,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.ntv.ru/peredacha/segodnya/m23700/o232416',
'info_dict': {
'id': '747480',
'ext': 'flv',
'title': '«Сегодня». 21 марта 2014 года. 16:00 ',
'description': '«Сегодня». 21 марта 2014 года. 16:00 ',
'duration': 1496,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.ntv.ru/kino/Koma_film',
'info_dict': {
'id': '758100',
'ext': 'flv',
'title': 'Остросюжетный фильм «Кома»',
'description': 'Остросюжетный фильм «Кома»',
'duration': 5592,
},
'params': {
# rtmp download
'skip_download': True,
},
},
{
'url': 'http://www.ntv.ru/serial/Delo_vrachey/m31760/o233916/',
'info_dict': {
'id': '751482',
'ext': 'flv',
'title': '«Дело врачей»: «Деревце жизни»',
'description': '«Дело врачей»: «Деревце жизни»',
'duration': 2590,
},
'params': {
# rtmp download
'skip_download': True,
},
},
]
_VIDEO_ID_REGEXES = [
r'<meta property="og:url" content="http://www\.ntv\.ru/video/(\d+)',
r'<video embed=[^>]+><id>(\d+)</id>',
r'<video restriction[^>]+><key>(\d+)</key>'
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id, 'Downloading page')
for pattern in self._VIDEO_ID_REGEXES:
mobj = re.search(pattern, page)
if mobj:
break
if not mobj:
raise ExtractorError('No media links available for %s' % video_id)
video_id = mobj.group(1)
player = self._download_xml('http://www.ntv.ru/vi%s/' % video_id, video_id, 'Downloading video XML')
title = unescapeHTML(player.find('./data/title').text)
description = unescapeHTML(player.find('./data/description').text)
video = player.find('./data/video')
video_id = video.find('./id').text
thumbnail = video.find('./splash').text
duration = int(video.find('./totaltime').text)
view_count = int(video.find('./views').text)
puid22 = video.find('./puid22').text
apps = {
'4': 'video1',
'7': 'video2',
}
app = apps[puid22] if puid22 in apps else apps['4']
formats = []
for format_id in ['', 'hi', 'webm']:
file = video.find('./%sfile' % format_id)
if file is None:
continue
size = video.find('./%ssize' % format_id)
formats.append({
'url': 'rtmp://media.ntv.ru/%s' % app,
'app': app,
'play_path': file.text,
'rtmp_conn': 'B:1',
'player_url': 'http://www.ntv.ru/swf/vps1.swf?update=20131128',
'page_url': 'http://www.ntv.ru',
'flash_ver': 'LNX 11,2,202,341',
'rtmp_live': True,
'ext': 'flv',
'filesize': int(size.text),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'view_count': view_count,
'formats': formats,
}

View File

@ -0,0 +1,40 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import datetime
import re
from .common import InfoExtractor
# audios on oe1.orf.at are only available for 7 days, so we can't
# add tests.
class OE1IE(InfoExtractor):
IE_DESC = 'oe1.orf.at'
_VALID_URL = r'http://oe1\.orf\.at/programm/(?P<id>[0-9]+)'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
show_id = mobj.group('id')
data = self._download_json(
'http://oe1.orf.at/programm/%s/konsole' % show_id,
show_id
)
timestamp = datetime.datetime.strptime('%s %s' % (
data['item']['day_label'],
data['item']['time']
), '%d.%m.%Y %H:%M')
unix_timestamp = calendar.timegm(timestamp.utctimetuple())
return {
'id': show_id,
'title': data['item']['title'],
'url': data['item']['url_stream'],
'ext': 'mp3',
'description': data['item'].get('info'),
'timestamp': unix_timestamp
}

View File

@ -1,20 +1,23 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import unescapeHTML
class OoyalaIE(InfoExtractor):
_VALID_URL = r'https?://.+?\.ooyala\.com/.*?(?:embedCode|ec)=(?P<id>.+?)(&|$)'
_VALID_URL = r'(?:ooyala:|https?://.+?\.ooyala\.com/.*?(?:embedCode|ec)=)(?P<id>.+?)(&|$)'
_TEST = {
# From http://it.slashdot.org/story/13/04/25/178216/recovering-data-from-broken-hard-drives-and-ssds-video
u'url': u'http://player.ooyala.com/player.js?embedCode=pxczE2YjpfHfn1f3M-ykG_AmJRRn0PD8',
u'file': u'pxczE2YjpfHfn1f3M-ykG_AmJRRn0PD8.mp4',
u'md5': u'3f5cceb3a7bf461d6c29dc466cf8033c',
u'info_dict': {
u'title': u'Explaining Data Recovery from Hard Drives and SSDs',
u'description': u'How badly damaged does a drive have to be to defeat Russell and his crew? Apparently, smashed to bits.',
'url': 'http://player.ooyala.com/player.js?embedCode=pxczE2YjpfHfn1f3M-ykG_AmJRRn0PD8',
'md5': '3f5cceb3a7bf461d6c29dc466cf8033c',
'info_dict': {
'id': 'pxczE2YjpfHfn1f3M-ykG_AmJRRn0PD8',
'ext': 'mp4',
'title': 'Explaining Data Recovery from Hard Drives and SSDs',
'description': 'How badly damaged does a drive have to be to defeat Russell and his crew? Apparently, smashed to bits.',
},
}
@ -28,13 +31,14 @@ class OoyalaIE(InfoExtractor):
ie=cls.ie_key())
def _extract_result(self, info, more_info):
return {'id': info['embedCode'],
'ext': 'mp4',
'title': unescapeHTML(info['title']),
'url': info.get('ipad_url') or info['url'],
'description': unescapeHTML(more_info['description']),
'thumbnail': more_info['promo'],
}
return {
'id': info['embedCode'],
'ext': 'mp4',
'title': unescapeHTML(info['title']),
'url': info.get('ipad_url') or info['url'],
'description': unescapeHTML(more_info['description']),
'thumbnail': more_info['promo'],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@ -42,22 +46,23 @@ class OoyalaIE(InfoExtractor):
player_url = 'http://player.ooyala.com/player.js?embedCode=%s' % embedCode
player = self._download_webpage(player_url, embedCode)
mobile_url = self._search_regex(r'mobile_player_url="(.+?)&device="',
player, u'mobile player url')
player, 'mobile player url')
mobile_player = self._download_webpage(mobile_url, embedCode)
videos_info = self._search_regex(
r'var streams=window.oo_testEnv\?\[\]:eval\("\((\[{.*?}\])\)"\);',
mobile_player, u'info').replace('\\"','"')
videos_more_info = self._search_regex(r'eval\("\(({.*?\\"promo\\".*?})\)"', mobile_player, u'more info').replace('\\"','"')
mobile_player, 'info').replace('\\"','"')
videos_more_info = self._search_regex(r'eval\("\(({.*?\\"promo\\".*?})\)"', mobile_player, 'more info').replace('\\"','"')
videos_info = json.loads(videos_info)
videos_more_info =json.loads(videos_more_info)
if videos_more_info.get('lineup'):
videos = [self._extract_result(info, more_info) for (info, more_info) in zip(videos_info, videos_more_info['lineup'])]
return {'_type': 'playlist',
'id': embedCode,
'title': unescapeHTML(videos_more_info['title']),
'entries': videos,
}
return {
'_type': 'playlist',
'id': embedCode,
'title': unescapeHTML(videos_more_info['title']),
'entries': videos,
}
else:
return self._extract_result(videos_info[0], videos_more_info)

View File

@ -0,0 +1,53 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class ParliamentLiveUKIE(InfoExtractor):
IE_NAME = 'parliamentlive.tv'
IE_DESC = 'UK parliament videos'
_VALID_URL = r'https?://www\.parliamentlive\.tv/Main/Player\.aspx\?(?:[^&]+&)*?meetingId=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.parliamentlive.tv/Main/Player.aspx?meetingId=15121&player=windowsmedia',
'info_dict': {
'id': '15121',
'ext': 'asf',
'title': 'hoc home affairs committee, 18 mar 2014.pm',
'description': 'md5:033b3acdf83304cd43946b2d5e5798d1',
},
'params': {
'skip_download': True, # Requires mplayer (mms)
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
asx_url = self._html_search_regex(
r'embed.*?src="([^"]+)" name="MediaPlayer"', webpage,
'metadata URL')
asx = self._download_xml(asx_url, video_id, 'Downloading ASX metadata')
video_url = asx.find('.//REF').attrib['HREF']
title = self._search_regex(
r'''(?x)player\.setClipDetails\(
(?:(?:[0-9]+|"[^"]+"),\s*){2}
"([^"]+",\s*"[^"]+)"
''',
webpage, 'title').replace('", "', ', ')
description = self._html_search_regex(
r'(?s)<span id="MainContentPlaceHolder_CaptionsBlock_WitnessInfo">(.*?)</span>',
webpage, 'description')
return {
'id': video_id,
'ext': 'asf',
'url': video_url,
'title': title,
'description': description,
}

View File

@ -1,44 +1,81 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import compat_urllib_parse
from ..utils import int_or_none
class PornHdIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?pornhd\.com/(?:[a-z]{2,4}/)?videos/(?P<video_id>[0-9]+)/(?P<video_title>.+)'
_VALID_URL = r'http://(?:www\.)?pornhd\.com/(?:[a-z]{2,4}/)?videos/(?P<id>\d+)'
_TEST = {
'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
'file': '1962.flv',
'md5': '35272469887dca97abd30abecc6cdf75',
'md5': '956b8ca569f7f4d8ec563e2c41598441',
'info_dict': {
"title": "sierra-day-gets-his-cum-all-over-herself-hd-porn-video",
"age_limit": 18,
'id': '1962',
'ext': 'mp4',
'title': 'Sierra loves doing laundry',
'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294',
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
video_title = mobj.group('video_title')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
next_url = self._html_search_regex(
r'&hd=(http.+?)&', webpage, 'video URL')
next_url = compat_urllib_parse.unquote(next_url)
title = self._og_search_title(webpage)
TITLE_SUFFIX = ' porn HD Video | PornHD.com '
if title.endswith(TITLE_SUFFIX):
title = title[:-len(TITLE_SUFFIX)]
video_url = self._download_webpage(
next_url, video_id, note='Retrieving video URL',
errnote='Could not retrieve video URL')
age_limit = 18
description = self._html_search_regex(
r'<div class="description">([^<]+)</div>', webpage, 'description', fatal=False)
view_count = int_or_none(self._html_search_regex(
r'(\d+) views </span>', webpage, 'view count', fatal=False))
formats = [
{
'url': format_url,
'ext': format.lower(),
'format_id': '%s-%s' % (format.lower(), quality.lower()),
'quality': 1 if quality.lower() == 'high' else 0,
} for format, quality, format_url in re.findall(
r'var __video([\da-zA-Z]+?)(Low|High)StreamUrl = \'(http://.+?)\?noProxy=1\'', webpage)
]
mobj = re.search(r'flashVars = (?P<flashvars>{.+?});', webpage)
if mobj:
flashvars = json.loads(mobj.group('flashvars'))
formats.extend([
{
'url': flashvars['hashlink'].replace('?noProxy=1', ''),
'ext': 'flv',
'format_id': 'flv-low',
'quality': 0,
},
{
'url': flashvars['hd'].replace('?noProxy=1', ''),
'ext': 'flv',
'format_id': 'flv-high',
'quality': 1,
}
])
thumbnail = flashvars['urlWallpaper']
else:
thumbnail = self._og_search_thumbnail(webpage)
self._sort_formats(formats)
return {
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': video_title,
'age_limit': age_limit,
'title': title,
'description': description,
'thumbnail': thumbnail,
'view_count': view_count,
'formats': formats,
'age_limit': 18,
}

View File

@ -8,6 +8,7 @@ from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urllib_parse,
str_to_int,
)
from ..aes import (
aes_decrypt_text
@ -27,6 +28,12 @@ class PornHubIE(InfoExtractor):
}
}
def _extract_count(self, pattern, webpage, name):
count = self._html_search_regex(pattern, webpage, '%s count' % name, fatal=False)
if count:
count = str_to_int(count)
return count
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
@ -37,11 +44,19 @@ class PornHubIE(InfoExtractor):
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
video_uploader = self._html_search_regex(r'<b>From: </b>(?:\s|<[^>]*>)*(.+?)<', webpage, 'uploader', fatal=False)
video_uploader = self._html_search_regex(
r'(?s)<div class="video-info-row">\s*From:&nbsp;.+?<(?:a href="/users/|<span class="username)[^>]+>(.+?)<',
webpage, 'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, 'thumbnail', fatal=False)
if thumbnail:
thumbnail = compat_urllib_parse.unquote(thumbnail)
view_count = self._extract_count(r'<span class="count">([\d,\.]+)</span> views', webpage, 'view')
like_count = self._extract_count(r'<span class="votesUp">([\d,\.]+)</span>', webpage, 'like')
dislike_count = self._extract_count(r'<span class="votesDown">([\d,\.]+)</span>', webpage, 'dislike')
comment_count = self._extract_count(
r'All comments \(<var class="videoCommentCount">([\d,\.]+)</var>', webpage, 'comment')
video_urls = list(map(compat_urllib_parse.unquote , re.findall(r'"quality_[0-9]{3}p":"([^"]+)', webpage)))
if webpage.find('"encrypted":true') != -1:
password = compat_urllib_parse.unquote_plus(self._html_search_regex(r'"video_title":"([^"]+)', webpage, 'password'))
@ -77,6 +92,10 @@ class PornHubIE(InfoExtractor):
'uploader': video_uploader,
'title': video_title,
'thumbnail': thumbnail,
'view_count': view_count,
'like_count': like_count,
'dislike_count': dislike_count,
'comment_count': comment_count,
'formats': formats,
'age_limit': 18,
}

View File

@ -160,6 +160,7 @@ class ProSiebenSat1IE(InfoExtractor):
_CLIPID_REGEXES = [
r'"clip_id"\s*:\s+"(\d+)"',
r'clipid: "(\d+)"',
r'clipId=(\d+)',
]
_TITLE_REGEXES = [
r'<h2 class="subtitle" itemprop="name">\s*(.+?)</h2>',

View File

@ -1,3 +1,5 @@
from __future__ import unicode_literals
import re
import os
@ -5,45 +7,51 @@ from .common import InfoExtractor
class PyvideoIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?pyvideo\.org/video/(?P<id>\d+)/(.*)'
_TESTS = [{
u'url': u'http://pyvideo.org/video/1737/become-a-logging-expert-in-30-minutes',
u'file': u'24_4WWkSmNo.mp4',
u'md5': u'de317418c8bc76b1fd8633e4f32acbc6',
u'info_dict': {
u"title": u"Become a logging expert in 30 minutes",
u"description": u"md5:9665350d466c67fb5b1598de379021f7",
u"upload_date": u"20130320",
u"uploader": u"NextDayVideo",
u"uploader_id": u"NextDayVideo",
_VALID_URL = r'http://(?:www\.)?pyvideo\.org/video/(?P<id>\d+)/(.*)'
_TESTS = [
{
'url': 'http://pyvideo.org/video/1737/become-a-logging-expert-in-30-minutes',
'md5': 'de317418c8bc76b1fd8633e4f32acbc6',
'info_dict': {
'id': '24_4WWkSmNo',
'ext': 'mp4',
'title': 'Become a logging expert in 30 minutes',
'description': 'md5:9665350d466c67fb5b1598de379021f7',
'upload_date': '20130320',
'uploader': 'NextDayVideo',
'uploader_id': 'NextDayVideo',
},
'add_ie': ['Youtube'],
},
u'add_ie': ['Youtube'],
},
{
u'url': u'http://pyvideo.org/video/2542/gloriajw-spotifywitherikbernhardsson182m4v',
u'md5': u'5fe1c7e0a8aa5570330784c847ff6d12',
u'info_dict': {
u'id': u'2542',
u'ext': u'm4v',
u'title': u'Gloriajw-SpotifyWithErikBernhardsson182',
{
'url': 'http://pyvideo.org/video/2542/gloriajw-spotifywitherikbernhardsson182m4v',
'md5': '5fe1c7e0a8aa5570330784c847ff6d12',
'info_dict': {
'id': '2542',
'ext': 'm4v',
'title': 'Gloriajw-SpotifyWithErikBernhardsson182',
},
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
m_youtube = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', webpage)
webpage = self._download_webpage(url, video_id)
m_youtube = re.search(r'(https?://www\.youtube\.com/watch\?v=.*)', webpage)
if m_youtube is not None:
return self.url_result(m_youtube.group(1), 'Youtube')
title = self._html_search_regex(r'<div class="section">.*?<h3>([^>]+?)</h3>',
webpage, u'title', flags=re.DOTALL)
video_url = self._search_regex([r'<source src="(.*?)"',
r'<dt>Download</dt>.*?<a href="(.+?)"'],
webpage, u'video url', flags=re.DOTALL)
title = self._html_search_regex(
r'<div class="section">.*?<h3(?:\s+class="[^"]*")?>([^>]+?)</h3>',
webpage, 'title', flags=re.DOTALL)
video_url = self._search_regex(
[r'<source src="(.*?)"', r'<dt>Download</dt>.*?<a href="(.+?)"'],
webpage, 'video url', flags=re.DOTALL)
return {
'id': video_id,
'title': os.path.splitext(title)[0],

View File

@ -1,4 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
@ -6,16 +8,17 @@ from .common import InfoExtractor
class RadioFranceIE(InfoExtractor):
_VALID_URL = r'^https?://maison\.radiofrance\.fr/radiovisions/(?P<id>[^?#]+)'
IE_NAME = u'radiofrance'
IE_NAME = 'radiofrance'
_TEST = {
u'url': u'http://maison.radiofrance.fr/radiovisions/one-one',
u'file': u'one-one.ogg',
u'md5': u'bdbb28ace95ed0e04faab32ba3160daf',
u'info_dict': {
u"title": u"One to one",
u"description": u"Plutôt que d'imaginer la radio de demain comme technologie ou comme création de contenu, je veux montrer que quelles que soient ses évolutions, j'ai l'intime conviction que la radio continuera d'être un grand média de proximité pour les auditeurs.",
u"uploader": u"Thomas Hercouët",
'url': 'http://maison.radiofrance.fr/radiovisions/one-one',
'md5': 'bdbb28ace95ed0e04faab32ba3160daf',
'info_dict': {
'id': 'one-one',
'ext': 'ogg',
"title": "One to one",
"description": "Plutôt que d'imaginer la radio de demain comme technologie ou comme création de contenu, je veux montrer que quelles que soient ses évolutions, j'ai l'intime conviction que la radio continuera d'être un grand média de proximité pour les auditeurs.",
"uploader": "Thomas Hercouët",
},
}
@ -24,27 +27,28 @@ class RadioFranceIE(InfoExtractor):
video_id = m.group('id')
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<h1>(.*?)</h1>', webpage, u'title')
title = self._html_search_regex(r'<h1>(.*?)</h1>', webpage, 'title')
description = self._html_search_regex(
r'<div class="bloc_page_wrapper"><div class="text">(.*?)</div>',
webpage, u'description', fatal=False)
webpage, 'description', fatal=False)
uploader = self._html_search_regex(
r'<div class="credit">&nbsp;&nbsp;&copy;&nbsp;(.*?)</div>',
webpage, u'uploader', fatal=False)
webpage, 'uploader', fatal=False)
formats_str = self._html_search_regex(
r'class="jp-jplayer[^"]*" data-source="([^"]+)">',
webpage, u'audio URLs')
webpage, 'audio URLs')
formats = [
{
'format_id': fm[0],
'url': fm[1],
'vcodec': 'none',
'preference': i,
}
for fm in
re.findall(r"([a-z0-9]+)\s*:\s*'([^']+)'", formats_str)
for i, fm in
enumerate(re.findall(r"([a-z0-9]+)\s*:\s*'([^']+)'", formats_str))
]
# No sorting, we don't know any more about these formats
self._sort_formats(formats)
return {
'id': video_id,

View File

@ -18,7 +18,7 @@ class Ro220IE(InfoExtractor):
'md5': '03af18b73a07b4088753930db7a34add',
'info_dict': {
"title": "Luati-le Banii sez 4 ep 1",
"description": "Iata-ne reveniti dupa o binemeritata vacanta. Va astept si pe Facebook cu pareri si comentarii.",
"description": "re:^Iata-ne reveniti dupa o binemeritata vacanta\. +Va astept si pe Facebook cu pareri si comentarii.$",
}
}

View File

@ -1,5 +1,6 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import unified_strdate, determine_ext
@ -9,41 +10,44 @@ class RoxwelIE(InfoExtractor):
_VALID_URL = r'https?://www\.roxwel\.com/player/(?P<filename>.+?)(\.|\?|$)'
_TEST = {
u'url': u'http://www.roxwel.com/player/passionpittakeawalklive.html',
u'file': u'passionpittakeawalklive.flv',
u'md5': u'd9dea8360a1e7d485d2206db7fe13035',
u'info_dict': {
u'title': u'Take A Walk (live)',
u'uploader': u'Passion Pit',
u'description': u'Passion Pit performs "Take A Walk\" live at The Backyard in Austin, Texas. ',
'url': 'http://www.roxwel.com/player/passionpittakeawalklive.html',
'info_dict': {
'id': 'passionpittakeawalklive',
'ext': 'flv',
'title': 'Take A Walk (live)',
'uploader': 'Passion Pit',
'uploader_id': 'passionpit',
'upload_date': '20120928',
'description': 'Passion Pit performs "Take A Walk\" live at The Backyard in Austin, Texas. ',
},
u'skip': u'Requires rtmpdump',
'params': {
# rtmp download
'skip_download': True,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
filename = mobj.group('filename')
info_url = 'http://www.roxwel.com/api/videos/%s' % filename
info_page = self._download_webpage(info_url, filename,
u'Downloading video info')
info = self._download_json(info_url, filename)
self.report_extraction(filename)
info = json.loads(info_page)
rtmp_rates = sorted([int(r.replace('flv_', '')) for r in info['media_rates'] if r.startswith('flv_')])
best_rate = rtmp_rates[-1]
url_page_url = 'http://roxwel.com/pl_one_time.php?filename=%s&quality=%s' % (filename, best_rate)
rtmp_url = self._download_webpage(url_page_url, filename, u'Downloading video url')
rtmp_url = self._download_webpage(url_page_url, filename, 'Downloading video url')
ext = determine_ext(rtmp_url)
if ext == 'f4v':
rtmp_url = rtmp_url.replace(filename, 'mp4:%s' % filename)
return {'id': filename,
'title': info['title'],
'url': rtmp_url,
'ext': 'flv',
'description': info['description'],
'thumbnail': info.get('player_image_url') or info.get('image_url_large'),
'uploader': info['artist'],
'uploader_id': info['artistname'],
'upload_date': unified_strdate(info['dbdate']),
}
return {
'id': filename,
'title': info['title'],
'url': rtmp_url,
'ext': 'flv',
'description': info['description'],
'thumbnail': info.get('player_image_url') or info.get('image_url_large'),
'uploader': info['artist'],
'uploader_id': info['artistname'],
'upload_date': unified_strdate(info['dbdate']),
}

154
youtube_dl/extractor/rts.py Normal file
View File

@ -0,0 +1,154 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
parse_iso8601,
unescapeHTML,
compat_str,
)
class RTSIE(InfoExtractor):
IE_DESC = 'RTS.ch'
_VALID_URL = r'^https?://(?:www\.)?rts\.ch/(?:[^/]+/){2,}(?P<id>[0-9]+)-.*?\.html'
_TESTS = [
{
'url': 'http://www.rts.ch/archives/tv/divers/3449373-les-enfants-terribles.html',
'md5': '753b877968ad8afaeddccc374d4256a5',
'info_dict': {
'id': '3449373',
'ext': 'mp4',
'duration': 1488,
'title': 'Les Enfants Terribles',
'description': 'France Pommier et sa soeur Luce Feral, les deux filles de ce groupe de 5.',
'uploader': 'Divers',
'upload_date': '19680921',
'timestamp': -40280400,
'thumbnail': 're:^https?://.*\.image'
},
},
{
'url': 'http://www.rts.ch/emissions/passe-moi-les-jumelles/5624067-entre-ciel-et-mer.html',
'md5': 'c148457a27bdc9e5b1ffe081a7a8337b',
'info_dict': {
'id': '5624067',
'ext': 'mp4',
'duration': 3720,
'title': 'Les yeux dans les cieux - Mon homard au Canada',
'description': 'md5:d22ee46f5cc5bac0912e5a0c6d44a9f7',
'uploader': 'Passe-moi les jumelles',
'upload_date': '20140404',
'timestamp': 1396635300,
'thumbnail': 're:^https?://.*\.image'
},
},
{
'url': 'http://www.rts.ch/video/sport/hockey/5745975-1-2-kloten-fribourg-5-2-second-but-pour-gotteron-par-kwiatowski.html',
'md5': 'b4326fecd3eb64a458ba73c73e91299d',
'info_dict': {
'id': '5745975',
'ext': 'mp4',
'duration': 48,
'title': '1/2, Kloten - Fribourg (5-2): second but pour Gottéron par Kwiatowski',
'description': 'Hockey - Playoff',
'uploader': 'Hockey',
'upload_date': '20140403',
'timestamp': 1396556882,
'thumbnail': 're:^https?://.*\.image'
},
'skip': 'Blocked outside Switzerland',
},
{
'url': 'http://www.rts.ch/video/info/journal-continu/5745356-londres-cachee-par-un-epais-smog.html',
'md5': '9bb06503773c07ce83d3cbd793cebb91',
'info_dict': {
'id': '5745356',
'ext': 'mp4',
'duration': 33,
'title': 'Londres cachée par un épais smog',
'description': 'Un important voile de smog recouvre Londres depuis mercredi, provoqué par la pollution et du sable du Sahara.',
'uploader': 'Le Journal en continu',
'upload_date': '20140403',
'timestamp': 1396537322,
'thumbnail': 're:^https?://.*\.image'
},
},
{
'url': 'http://www.rts.ch/audio/couleur3/programmes/la-belle-video-de-stephane-laurenceau/5706148-urban-hippie-de-damien-krisl-03-04-2014.html',
'md5': 'dd8ef6a22dff163d063e2a52bc8adcae',
'info_dict': {
'id': '5706148',
'ext': 'mp3',
'duration': 123,
'title': '"Urban Hippie", de Damien Krisl',
'description': 'Des Hippies super glam.',
'upload_date': '20140403',
'timestamp': 1396551600,
},
},
]
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
def download_json(internal_id):
return self._download_json(
'http://www.rts.ch/a/%s.html?f=json/article' % internal_id,
video_id)
all_info = download_json(video_id)
# video_id extracted out of URL is not always a real id
if 'video' not in all_info and 'audio' not in all_info:
page = self._download_webpage(url, video_id)
internal_id = self._html_search_regex(
r'<(?:video|audio) data-id="([0-9]+)"', page,
'internal video id')
all_info = download_json(internal_id)
info = all_info['video']['JSONinfo'] if 'video' in all_info else all_info['audio']
upload_timestamp = parse_iso8601(info.get('broadcast_date'))
duration = info.get('duration') or info.get('cutout') or info.get('cutduration')
if isinstance(duration, compat_str):
duration = parse_duration(duration)
view_count = info.get('plays')
thumbnail = unescapeHTML(info.get('preview_image_url'))
def extract_bitrate(url):
return int_or_none(self._search_regex(
r'-([0-9]+)k\.', url, 'bitrate', default=None))
formats = [{
'format_id': fid,
'url': furl,
'tbr': extract_bitrate(furl),
} for fid, furl in info['streams'].items()]
if 'media' in info:
formats.extend([{
'format_id': '%s-%sk' % (media['ext'], media['rate']),
'url': 'http://download-video.rts.ch/%s' % media['url'],
'tbr': media['rate'] or extract_bitrate(media['url']),
} for media in info['media'] if media.get('rate')])
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'title': info['title'],
'description': info.get('intro'),
'duration': duration,
'view_count': view_count,
'uploader': info.get('programName'),
'timestamp': upload_timestamp,
'thumbnail': thumbnail,
}

View File

@ -0,0 +1,84 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import base64
from .common import InfoExtractor
from ..utils import (
struct_unpack,
)
class RTVEALaCartaIE(InfoExtractor):
IE_NAME = 'rtve.es:alacarta'
IE_DESC = 'RTVE a la carta'
_VALID_URL = r'http://www\.rtve\.es/alacarta/videos/[^/]+/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.rtve.es/alacarta/videos/balonmano/o-swiss-cup-masculina-final-espana-suecia/2491869/',
'md5': '18fcd45965bdd076efdb12cd7f6d7b9e',
'info_dict': {
'id': '2491869',
'ext': 'mp4',
'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia',
},
}
def _decrypt_url(self, png):
encrypted_data = base64.b64decode(png)
text_index = encrypted_data.find(b'tEXt')
text_chunk = encrypted_data[text_index-4:]
length = struct_unpack('!I', text_chunk[:4])[0]
# Use bytearray to get integers when iterating in both python 2.x and 3.x
data = bytearray(text_chunk[8:8+length])
data = [chr(b) for b in data if b != 0]
hash_index = data.index('#')
alphabet_data = data[:hash_index]
url_data = data[hash_index+1:]
alphabet = []
e = 0
d = 0
for l in alphabet_data:
if d == 0:
alphabet.append(l)
d = e = (e + 1) % 4
else:
d -= 1
url = ''
f = 0
e = 3
b = 1
for letter in url_data:
if f == 0:
l = int(letter)*10
f = 1
else:
if e == 0:
l += int(letter)
url += alphabet[l]
e = (b + 3) % 4
f = 0
b += 1
else:
e -= 1
return url
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info = self._download_json(
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
video_id)['page']['items'][0]
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/default/videos/%s.png' % video_id
png = self._download_webpage(png_url, video_id, 'Downloading url information')
video_url = self._decrypt_url(png)
return {
'id': video_id,
'title': info['title'],
'url': video_url,
'thumbnail': info['image'],
}

View File

@ -2,7 +2,6 @@
from __future__ import unicode_literals
import re
import json
import itertools
from .common import InfoExtractor
@ -20,8 +19,9 @@ class RutubeIE(InfoExtractor):
_TEST = {
'url': 'http://rutube.ru/video/3eac3b4561676c17df9132a9a1e62e3e/',
'file': '3eac3b4561676c17df9132a9a1e62e3e.mp4',
'info_dict': {
'id': '3eac3b4561676c17df9132a9a1e62e3e',
'ext': 'mp4',
'title': 'Раненный кенгуру забежал в аптеку',
'description': 'http://www.ntdtv.ru ',
'duration': 80,
@ -38,15 +38,15 @@ class RutubeIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
api_response = self._download_webpage('http://rutube.ru/api/video/%s/?format=json' % video_id,
video_id, 'Downloading video JSON')
video = json.loads(api_response)
api_response = self._download_webpage('http://rutube.ru/api/play/trackinfo/%s/?format=json' % video_id,
video_id, 'Downloading trackinfo JSON')
trackinfo = json.loads(api_response)
video = self._download_json(
'http://rutube.ru/api/video/%s/?format=json' % video_id,
video_id, 'Downloading video JSON')
trackinfo = self._download_json(
'http://rutube.ru/api/play/trackinfo/%s/?format=json' % video_id,
video_id, 'Downloading trackinfo JSON')
# Some videos don't have the author field
author = trackinfo.get('author') or {}
m3u8_url = trackinfo['video_balancer'].get('m3u8')
@ -79,10 +79,9 @@ class RutubeChannelIE(InfoExtractor):
def _extract_videos(self, channel_id, channel_title=None):
entries = []
for pagenum in itertools.count(1):
api_response = self._download_webpage(
page = self._download_json(
self._PAGE_TEMPLATE % (channel_id, pagenum),
channel_id, 'Downloading page %s' % pagenum)
page = json.loads(api_response)
results = page['results']
if not results:
break
@ -108,10 +107,9 @@ class RutubeMovieIE(RutubeChannelIE):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
movie_id = mobj.group('id')
api_response = self._download_webpage(
movie = self._download_json(
self._MOVIE_TEMPLATE % movie_id, movie_id,
'Downloading movie JSON')
movie = json.loads(api_response)
movie_name = movie['name']
return self._extract_videos(movie_id, movie_name)

View File

@ -1,24 +0,0 @@
import re
from .common import InfoExtractor
class SlashdotIE(InfoExtractor):
_VALID_URL = r'https?://tv\.slashdot\.org/video/\?embed=(?P<id>.*?)(&|$)'
_TEST = {
u'add_ie': ['Ooyala'],
u'url': u'http://tv.slashdot.org/video/?embed=JscHMzZDplD0p-yNLOzTfzC3Q3xzJaUz',
u'file': u'JscHMzZDplD0p-yNLOzTfzC3Q3xzJaUz.mp4',
u'md5': u'd2222e7a4a4c1541b3e0cf732fb26735',
u'info_dict': {
u'title': u' Meet the Stampede Supercomputing Cluster\'s Administrator',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
ooyala_url = self._search_regex(r'<script src="(.*?)"', webpage, 'ooyala url')
return self.url_result(ooyala_url, 'Ooyala')

View File

@ -39,7 +39,8 @@ class SlideshareIE(InfoExtractor):
ext = info['jsplayer']['video_extension']
video_url = compat_urlparse.urljoin(bucket, doc + '-SD.' + ext)
description = self._html_search_regex(
r'<p class="description.*?"[^>]*>(.*?)</p>', webpage, 'description')
r'<p\s+(?:style="[^"]*"\s+)?class="description.*?"[^>]*>(.*?)</p>', webpage,
'description', fatal=False)
return {
'_type': 'video',

View File

@ -13,22 +13,24 @@ from ..utils import (
compat_urllib_request,
ExtractorError,
url_basename,
int_or_none,
)
class SmotriIE(InfoExtractor):
IE_DESC = 'Smotri.com'
IE_NAME = 'smotri'
_VALID_URL = r'^https?://(?:www\.)?(?P<url>smotri\.com/video/view/\?id=(?P<videoid>v(?P<realvideoid>[0-9]+)[a-z0-9]{4}))'
_VALID_URL = r'^https?://(?:www\.)?(?:smotri\.com/video/view/\?id=|pics\.smotri\.com/(?:player|scrubber_custom8)\.swf\?file=)(?P<videoid>v(?P<realvideoid>[0-9]+)[a-z0-9]{4})'
_NETRC_MACHINE = 'smotri'
_TESTS = [
# real video id 2610366
{
'url': 'http://smotri.com/video/view/?id=v261036632ab',
'file': 'v261036632ab.mp4',
'md5': '2a7b08249e6f5636557579c368040eb9',
'info_dict': {
'id': 'v261036632ab',
'ext': 'mp4',
'title': 'катастрофа с камер видеонаблюдения',
'uploader': 'rbc2008',
'uploader_id': 'rbc08',
@ -40,9 +42,10 @@ class SmotriIE(InfoExtractor):
# real video id 57591
{
'url': 'http://smotri.com/video/view/?id=v57591cb20',
'file': 'v57591cb20.flv',
'md5': '830266dfc21f077eac5afd1883091bcd',
'info_dict': {
'id': 'v57591cb20',
'ext': 'flv',
'title': 'test',
'uploader': 'Support Photofile@photofile',
'uploader_id': 'support-photofile',
@ -54,9 +57,10 @@ class SmotriIE(InfoExtractor):
# video-password
{
'url': 'http://smotri.com/video/view/?id=v1390466a13c',
'file': 'v1390466a13c.mp4',
'md5': 'f6331cef33cad65a0815ee482a54440b',
'info_dict': {
'id': 'v1390466a13c',
'ext': 'mp4',
'title': 'TOCCA_A_NOI_-_LE_COSE_NON_VANNO_CAMBIAMOLE_ORA-1',
'uploader': 'timoxa40',
'uploader_id': 'timoxa40',
@ -71,9 +75,10 @@ class SmotriIE(InfoExtractor):
# age limit + video-password
{
'url': 'http://smotri.com/video/view/?id=v15408898bcf',
'file': 'v15408898bcf.flv',
'md5': '91e909c9f0521adf5ee86fbe073aad70',
'info_dict': {
'id': 'v15408898bcf',
'ext': 'flv',
'title': 'этот ролик не покажут по ТВ',
'uploader': 'zzxxx',
'uploader_id': 'ueggb',
@ -85,7 +90,22 @@ class SmotriIE(InfoExtractor):
'params': {
'videopassword': '333'
}
}
},
# swf player
{
'url': 'http://pics.smotri.com/scrubber_custom8.swf?file=v9188090500',
'md5': '4d47034979d9390d14acdf59c4935bc2',
'info_dict': {
'id': 'v9188090500',
'ext': 'mp4',
'title': 'Shakira - Don\'t Bother',
'uploader': 'HannahL',
'uploader_id': 'lisaha95',
'upload_date': '20090331',
'description': 'Shakira - Don\'t Bother, видео Shakira - Don\'t Bother',
'thumbnail': 'http://frame8.loadup.ru/44/0b/918809.7.3.jpg',
},
},
]
_SUCCESS = 0
@ -93,6 +113,21 @@ class SmotriIE(InfoExtractor):
_PASSWORD_DETECTED = 2
_VIDEO_NOT_FOUND = 3
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<embed[^>]src=(["\'])(?P<url>http://pics\.smotri\.com/(?:player|scrubber_custom8)\.swf\?file=v.+?\1)',
webpage)
if mobj is not None:
return mobj.group('url')
mobj = re.search(
r'''(?x)<div\s+class="video_file">http://smotri\.com/video/download/file/[^<]+</div>\s*
<div\s+class="video_image">[^<]+</div>\s*
<div\s+class="video_id">(?P<id>[^<]+)</div>''', webpage)
if mobj is not None:
return 'http://smotri.com/video/view/?id=%s' % mobj.group('id')
def _search_meta(self, name, html, display_name=None):
if display_name is None:
display_name = name
@ -134,7 +169,7 @@ class SmotriIE(InfoExtractor):
# Video JSON does not provide enough meta data
# We will extract some from the video web page instead
video_page_url = 'http://' + mobj.group('url')
video_page_url = 'http://smotri.com/video/view/?id=%s' % video_id
video_page = self._download_webpage(video_page_url, video_id, 'Downloading video page')
# Warning if video is unavailable
@ -222,7 +257,7 @@ class SmotriIE(InfoExtractor):
'upload_date': video_upload_date,
'uploader_id': video_uploader_id,
'duration': video_duration,
'view_count': video_view_count,
'view_count': int_or_none(video_view_count),
'age_limit': 18 if adult_content else 0,
'video_page_url': video_page_url
}

View File

@ -100,7 +100,7 @@ class SoundcloudIE(InfoExtractor):
def report_resolve(self, video_id):
"""Report information extraction."""
self.to_screen(u'%s: Resolving id' % video_id)
self.to_screen('%s: Resolving id' % video_id)
@classmethod
def _resolv_url(cls, url):
@ -124,45 +124,46 @@ class SoundcloudIE(InfoExtractor):
'description': info['description'],
'thumbnail': thumbnail,
}
formats = []
if info.get('downloadable', False):
# We can build a direct link to the song
format_url = (
'https://api.soundcloud.com/tracks/{0}/download?client_id={1}'.format(
track_id, self._CLIENT_ID))
result['formats'] = [{
formats.append({
'format_id': 'download',
'ext': info.get('original_format', 'mp3'),
'url': format_url,
'vcodec': 'none',
}]
else:
# We have to retrieve the url
streams_url = ('http://api.soundcloud.com/i1/tracks/{0}/streams?'
'client_id={1}&secret_token={2}'.format(track_id, self._IPHONE_CLIENT_ID, secret_token))
stream_json = self._download_webpage(
streams_url,
track_id, 'Downloading track url')
'preference': 10,
})
formats = []
format_dict = json.loads(stream_json)
for key, stream_url in format_dict.items():
if key.startswith(u'http'):
formats.append({
'format_id': key,
'ext': ext,
'url': stream_url,
'vcodec': 'none',
})
elif key.startswith(u'rtmp'):
# The url doesn't have an rtmp app, we have to extract the playpath
url, path = stream_url.split('mp3:', 1)
formats.append({
'format_id': key,
'url': url,
'play_path': 'mp3:' + path,
'ext': ext,
'vcodec': 'none',
})
# We have to retrieve the url
streams_url = ('http://api.soundcloud.com/i1/tracks/{0}/streams?'
'client_id={1}&secret_token={2}'.format(track_id, self._IPHONE_CLIENT_ID, secret_token))
stream_json = self._download_webpage(
streams_url,
track_id, 'Downloading track url')
format_dict = json.loads(stream_json)
for key, stream_url in format_dict.items():
if key.startswith('http'):
formats.append({
'format_id': key,
'ext': ext,
'url': stream_url,
'vcodec': 'none',
})
elif key.startswith('rtmp'):
# The url doesn't have an rtmp app, we have to extract the playpath
url, path = stream_url.split('mp3:', 1)
formats.append({
'format_id': key,
'url': url,
'play_path': 'mp3:' + path,
'ext': ext,
'vcodec': 'none',
})
if not formats:
# We fallback to the stream_url in the original info, this
@ -188,7 +189,7 @@ class SoundcloudIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url, flags=re.VERBOSE)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError('Invalid URL: %s' % url)
track_id = mobj.group('track_id')
token = None
@ -226,7 +227,7 @@ class SoundcloudSetIE(SoundcloudIE):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError('Invalid URL: %s' % url)
# extract uploader (which is in the url)
uploader = mobj.group(1)
@ -243,7 +244,7 @@ class SoundcloudSetIE(SoundcloudIE):
info = json.loads(info_json)
if 'errors' in info:
for err in info['errors']:
self._downloader.report_error(u'unable to download video webpage: %s' % compat_str(err['error_message']))
self._downloader.report_error('unable to download video webpage: %s' % compat_str(err['error_message']))
return
self.report_extraction(full_title)

View File

@ -9,8 +9,18 @@ from ..utils import (
class TeamcocoIE(InfoExtractor):
_VALID_URL = r'http://teamcoco\.com/video/(?P<url_title>.*)'
_TEST = {
_VALID_URL = r'http://teamcoco\.com/video/(?P<video_id>[0-9]+)?/?(?P<display_id>.*)'
_TESTS = [
{
'url': 'http://teamcoco.com/video/80187/conan-becomes-a-mary-kay-beauty-consultant',
'file': '80187.mp4',
'md5': '3f7746aa0dc86de18df7539903d399ea',
'info_dict': {
'title': 'Conan Becomes A Mary Kay Beauty Consultant',
'description': 'Mary Kay is perhaps the most trusted name in female beauty, so of course Conan is a natural choice to sell their products.'
}
},
{
'url': 'http://teamcoco.com/video/louis-ck-interview-george-w-bush',
'file': '19705.mp4',
'md5': 'cde9ba0fa3506f5f017ce11ead928f9a',
@ -19,22 +29,23 @@ class TeamcocoIE(InfoExtractor):
"title": "Louis C.K. Interview Pt. 1 11/3/11"
}
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError('Invalid URL: %s' % url)
url_title = mobj.group('url_title')
webpage = self._download_webpage(url, url_title)
video_id = self._html_search_regex(
r'<article class="video" data-id="(\d+?)"',
webpage, 'video id')
self.report_extraction(video_id)
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
video_id = mobj.group("video_id")
if not video_id:
video_id = self._html_search_regex(
r'<article class="video" data-id="(\d+?)"',
webpage, 'video id')
data_url = 'http://teamcoco.com/cvp/2.0/%s.xml' % video_id
data = self._download_xml(data_url, video_id, 'Downloading data webpage')
data = self._download_xml(
data_url, display_id, 'Downloading data webpage')
qualities = ['500k', '480p', '1000k', '720p', '1080p']
formats = []
@ -69,6 +80,7 @@ class TeamcocoIE(InfoExtractor):
return {
'id': video_id,
'display_id': display_id,
'formats': formats,
'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage),

View File

@ -18,12 +18,14 @@ class TEDIE(SubtitlesInfoExtractor):
(?P<type_playlist>playlists(?:/\d+)?) # We have a playlist
|
((?P<type_talk>talks)) # We have a simple talk
|
(?P<type_watch>watch)/[^/]+/[^/]+
)
(/lang/(.*?))? # The url may contain the language
/(?P<name>\w+) # Here goes the name and then ".html"
/(?P<name>[\w-]+) # Here goes the name and then ".html"
.*)$
'''
_TEST = {
_TESTS = [{
'url': 'http://www.ted.com/talks/dan_dennett_on_our_consciousness.html',
'md5': '4ea1dada91e4174b53dac2bb8ace429d',
'info_dict': {
@ -35,13 +37,24 @@ class TEDIE(SubtitlesInfoExtractor):
'consciousness, but that half the time our brains are '
'actively fooling us.'),
'uploader': 'Dan Dennett',
'width': 854,
}
}
}, {
'url': 'http://www.ted.com/watch/ted-institute/ted-bcg/vishal-sikka-the-beauty-and-power-of-algorithms',
'md5': '226f4fb9c62380d11b7995efa4c87994',
'info_dict': {
'id': 'vishal-sikka-the-beauty-and-power-of-algorithms',
'ext': 'mp4',
'title': 'Vishal Sikka: The beauty and power of algorithms',
'thumbnail': 're:^https?://.+\.jpg',
'description': 'Adaptive, intelligent, and consistent, algorithms are emerging as the ultimate app for everything from matching consumers to products to assessing medical diagnoses. Vishal Sikka shares his appreciation for the algorithm, charting both its inherent beauty and its growing power.',
}
}]
_FORMATS_PREFERENCE = {
'low': 1,
'medium': 2,
'high': 3,
_NATIVE_FORMATS = {
'low': {'preference': 1, 'width': 320, 'height': 180},
'medium': {'preference': 2, 'width': 512, 'height': 288},
'high': {'preference': 3, 'width': 854, 'height': 480},
}
def _extract_info(self, webpage):
@ -57,6 +70,8 @@ class TEDIE(SubtitlesInfoExtractor):
name = m.group('name')
if m.group('type_talk'):
return self._talk_info(url, name)
elif m.group('type_watch'):
return self._watch_info(url, name)
else:
return self._playlist_videos_info(url, name)
@ -84,12 +99,14 @@ class TEDIE(SubtitlesInfoExtractor):
talk_info = self._extract_info(webpage)['talks'][0]
formats = [{
'ext': 'mp4',
'url': format_url,
'format_id': format_id,
'format': format_id,
'preference': self._FORMATS_PREFERENCE.get(format_id, -1),
} for (format_id, format_url) in talk_info['nativeDownloads'].items()]
for f in formats:
finfo = self._NATIVE_FORMATS.get(f['format_id'])
if finfo:
f.update(finfo)
self._sort_formats(formats)
video_id = compat_str(talk_info['id'])
@ -123,3 +140,26 @@ class TEDIE(SubtitlesInfoExtractor):
else:
self._downloader.report_warning(u'video doesn\'t have subtitles')
return {}
def _watch_info(self, url, name):
webpage = self._download_webpage(url, name)
config_json = self._html_search_regex(
r"data-config='([^']+)", webpage, 'config')
config = json.loads(config_json)
video_url = config['video']['url']
thumbnail = config.get('image', {}).get('url')
title = self._html_search_regex(
r"(?s)<h1(?:\s+class='[^']+')?>(.+?)</h1>", webpage, 'title')
description = self._html_search_regex(
r'(?s)<h4 class="[^"]+" id="h3--about-this-talk">.*?</h4>(.*?)</div>',
webpage, 'description', fatal=False)
return {
'id': name,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
'description': description,
}

View File

@ -1,33 +1,37 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
class TF1IE(InfoExtractor):
"""TF1 uses the wat.tv player."""
_VALID_URL = r'http://videos\.tf1\.fr/.*-(.*?)\.html'
_VALID_URL = r'http://videos\.tf1\.fr/.*-(?P<id>.*?)\.html'
_TEST = {
u'url': u'http://videos.tf1.fr/auto-moto/citroen-grand-c4-picasso-2013-presentation-officielle-8062060.html',
u'file': u'10635995.mp4',
u'md5': u'2e378cc28b9957607d5e88f274e637d8',
u'info_dict': {
u'title': u'Citroën Grand C4 Picasso 2013 : présentation officielle',
u'description': u'Vidéo officielle du nouveau Citroën Grand C4 Picasso, lancé à l\'automne 2013.',
'url': 'http://videos.tf1.fr/auto-moto/citroen-grand-c4-picasso-2013-presentation-officielle-8062060.html',
'info_dict': {
'id': '10635995',
'ext': 'mp4',
'title': 'Citroën Grand C4 Picasso 2013 : présentation officielle',
'description': 'Vidéo officielle du nouveau Citroën Grand C4 Picasso, lancé à l\'automne 2013.',
},
'params': {
# Sometimes wat serves the whole file with the --test option
'skip_download': True,
},
u'skip': u'Sometimes wat serves the whole file with the --test option',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
id = mobj.group(1)
webpage = self._download_webpage(url, id)
embed_url = self._html_search_regex(r'"(https://www.wat.tv/embedframe/.*?)"',
webpage, 'embed url')
embed_page = self._download_webpage(embed_url, id, u'Downloading embed player page')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
embed_url = self._html_search_regex(
r'"(https://www.wat.tv/embedframe/.*?)"', webpage, 'embed url')
embed_page = self._download_webpage(embed_url, video_id,
'Downloading embed player page')
wat_id = self._search_regex(r'UVID=(.*?)&', embed_page, 'wat id')
wat_info = self._download_webpage('http://www.wat.tv/interface/contentv3/%s' % wat_id, id, u'Downloading Wat info')
wat_info = json.loads(wat_info)['media']
wat_url = wat_info['url']
return self.url_result(wat_url, 'Wat')
wat_info = self._download_json(
'http://www.wat.tv/interface/contentv3/%s' % wat_id, video_id)
return self.url_result(wat_info['media']['url'], 'Wat')

View File

@ -0,0 +1,75 @@
from .common import InfoExtractor
import re
class ToypicsIE(InfoExtractor):
IE_DESC = 'Toypics user profile'
_VALID_URL = r'http://videos\.toypics\.net/view/(?P<id>[0-9]+)/.*'
_TEST = {
'url': 'http://videos.toypics.net/view/514/chancebulged,-2-1/',
'md5': '16e806ad6d6f58079d210fe30985e08b',
'info_dict': {
'id': '514',
'ext': 'mp4',
'title': 'Chance-Bulge\'d, 2',
'age_limit': 18,
'uploader': 'kidsune',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
page = self._download_webpage(url, video_id)
video_url = self._html_search_regex(
r'src:\s+"(http://static[0-9]+\.toypics\.net/flvideo/[^"]+)"', page, 'video URL')
title = self._html_search_regex(
r'<title>Toypics - ([^<]+)</title>', page, 'title')
username = self._html_search_regex(
r'toypics.net/([^/"]+)" class="user-name">', page, 'username')
return {
'id': video_id,
'url': video_url,
'title': title,
'uploader': username,
'age_limit': 18,
}
class ToypicsUserIE(InfoExtractor):
IE_DESC = 'Toypics user profile'
_VALID_URL = r'http://videos\.toypics\.net/(?P<username>[^/?]+)(?:$|[?#])'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
username = mobj.group('username')
profile_page = self._download_webpage(
url, username, note='Retrieving profile page')
video_count = int(self._search_regex(
r'public/">Public Videos \(([0-9]+)\)</a></li>', profile_page,
'video count'))
PAGE_SIZE = 8
urls = []
page_count = (video_count + PAGE_SIZE + 1) // PAGE_SIZE
for n in range(1, page_count + 1):
lpage_url = url + '/public/%d' % n
lpage = self._download_webpage(
lpage_url, username,
note='Downloading page %d/%d' % (n, page_count))
urls.extend(
re.findall(
r'<p class="video-entry-title">\n\s*<a href="(http://videos.toypics.net/view/[^"]+)">',
lpage))
return {
'_type': 'playlist',
'id': username,
'entries': [{
'_type': 'url',
'url': eurl,
'ie_key': 'Toypics',
} for eurl in urls]
}

View File

@ -1,63 +1,83 @@
import os
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
compat_urllib_request,
int_or_none,
str_to_int,
)
from ..aes import (
aes_decrypt_text
)
from ..aes import aes_decrypt_text
class Tube8IE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?(?P<url>tube8\.com/.+?/(?P<videoid>\d+)/?)$'
_VALID_URL = r'https?://(?:www\.)?tube8\.com/(?:[^/]+/){2}(?P<id>\d+)'
_TEST = {
u'url': u'http://www.tube8.com/teen/kasia-music-video/229795/',
u'file': u'229795.mp4',
u'md5': u'e9e0b0c86734e5e3766e653509475db0',
u'info_dict': {
u"description": u"hot teen Kasia grinding",
u"uploader": u"unknown",
u"title": u"Kasia music video",
u"age_limit": 18,
'url': 'http://www.tube8.com/teen/kasia-music-video/229795/',
'file': '229795.mp4',
'md5': 'e9e0b0c86734e5e3766e653509475db0',
'info_dict': {
'description': 'hot teen Kasia grinding',
'uploader': 'unknown',
'title': 'Kasia music video',
'age_limit': 18,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
url = 'http://www.' + mobj.group('url')
video_id = mobj.group('id')
req = compat_urllib_request.Request(url)
req.add_header('Cookie', 'age_verified=1')
webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'videotitle ="([^"]+)', webpage, u'title')
video_description = self._html_search_regex(r'>Description:</strong>(.+?)<', webpage, u'description', fatal=False)
video_uploader = self._html_search_regex(r'>Submitted by:</strong>(?:\s|<[^>]*>)*(.+?)<', webpage, u'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, u'thumbnail', fatal=False)
if thumbnail:
thumbnail = thumbnail.replace('\\/', '/')
flashvars = json.loads(self._html_search_regex(
r'var flashvars\s*=\s*({.+?})', webpage, 'flashvars'))
video_url = self._html_search_regex(r'"video_url":"([^"]+)', webpage, u'video_url')
if webpage.find('"encrypted":true')!=-1:
password = self._html_search_regex(r'"video_title":"([^"]+)', webpage, u'password')
video_url = aes_decrypt_text(video_url, password, 32).decode('utf-8')
video_url = flashvars['video_url']
if flashvars.get('encrypted') is True:
video_url = aes_decrypt_text(video_url, flashvars['video_title'], 32).decode('utf-8')
path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:]
format = path.split('/')[4].split('_')[:2]
format = "-".join(format)
format_id = '-'.join(path.split('/')[4].split('_')[:2])
thumbnail = flashvars.get('image_url')
title = self._html_search_regex(
r'videotitle\s*=\s*"([^"]+)', webpage, 'title')
description = self._html_search_regex(
r'>Description:</strong>(.+?)<', webpage, 'description', fatal=False)
uploader = self._html_search_regex(
r'<strong class="video-username">(?:<a href="[^"]+">)?([^<]+)(?:</a>)?</strong>',
webpage, 'uploader', fatal=False)
like_count = int_or_none(self._html_search_regex(
r"rupVar\s*=\s*'(\d+)'", webpage, 'like count', fatal=False))
dislike_count = int_or_none(self._html_search_regex(
r"rdownVar\s*=\s*'(\d+)'", webpage, 'dislike count', fatal=False))
view_count = self._html_search_regex(
r'<strong>Views: </strong>([\d,\.]+)</li>', webpage, 'view count', fatal=False)
if view_count:
view_count = str_to_int(view_count)
comment_count = self._html_search_regex(
r'<span id="allCommentsCount">(\d+)</span>', webpage, 'comment count', fatal=False)
if comment_count:
comment_count = str_to_int(comment_count)
return {
'id': video_id,
'uploader': video_uploader,
'title': video_title,
'thumbnail': thumbnail,
'description': video_description,
'url': video_url,
'ext': extension,
'format': format,
'format_id': format,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'format_id': format_id,
'view_count': view_count,
'like_count': like_count,
'dislike_count': dislike_count,
'comment_count': comment_count,
'age_limit': 18,
}

View File

@ -0,0 +1,61 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
unified_strdate,
)
class UrortIE(InfoExtractor):
IE_DESC = 'NRK P3 Urørt'
_VALID_URL = r'https?://(?:www\.)?urort\.p3\.no/#!/Band/(?P<id>[^/]+)$'
_TEST = {
'url': 'https://urort.p3.no/#!/Band/Gerilja',
'md5': '5ed31a924be8a05e47812678a86e127b',
'info_dict': {
'id': '33124-4',
'ext': 'mp3',
'title': 'The Bomb',
'thumbnail': 're:^https?://.+\.jpg',
'like_count': int,
'uploader': 'Gerilja',
'uploader_id': 'Gerilja',
'upload_date': '20100323',
},
'params': {
'matchtitle': '^The Bomb$', # To test, we want just one video
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
fstr = compat_urllib_parse.quote("InternalBandUrl eq '%s'" % playlist_id)
json_url = 'http://urort.p3.no/breeze/urort/TrackDtos?$filter=' + fstr
songs = self._download_json(json_url, playlist_id)
print(songs[0])
entries = [{
'id': '%d-%s' % (s['BandId'], s['$id']),
'title': s['Title'],
'url': s['TrackUrl'],
'ext': 'mp3',
'uploader_id': playlist_id,
'uploader': s.get('BandName', playlist_id),
'like_count': s.get('LikeCount'),
'thumbnail': 'http://urort.p3.no/cloud/images/%s' % s['Image'],
'upload_date': unified_strdate(s.get('Released')),
} for s in songs]
return {
'_type': 'playlist',
'id': playlist_id,
'title': playlist_id,
'entries': entries,
}

View File

@ -11,7 +11,7 @@ from ..utils import (
class UstreamIE(InfoExtractor):
_VALID_URL = r'https?://www\.ustream\.tv/recorded/(?P<videoID>\d+)'
_VALID_URL = r'https?://www\.ustream\.tv/(?P<type>recorded|embed)/(?P<videoID>\d+)'
IE_NAME = 'ustream'
_TEST = {
'url': 'http://www.ustream.tv/recorded/20274954',
@ -25,6 +25,13 @@ class UstreamIE(InfoExtractor):
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
if m.group('type') == 'embed':
video_id = m.group('videoID')
webpage = self._download_webpage(url, video_id)
desktop_video_id = self._html_search_regex(r'ContentVideoIds=\["([^"]*?)"\]', webpage, 'desktop_video_id')
desktop_url = 'http://www.ustream.tv/recorded/' + desktop_video_id
return self.url_result(desktop_url, 'Ustream')
video_id = m.group('videoID')
video_url = 'http://tcdn.ustream.tv/video/%s' % video_id

View File

@ -4,26 +4,99 @@ import re
import json
from .common import InfoExtractor
from ..utils import compat_urllib_request
from ..utils import (
compat_urllib_request,
int_or_none,
)
class VeohIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?veoh\.com/(?:watch|iphone/#_Watch)/v(?P<id>\d*)'
_VALID_URL = r'http://(?:www\.)?veoh\.com/(?:watch|iphone/#_Watch)/(?P<id>(?:v|yapi-)[\da-zA-Z]+)'
_TEST = {
'url': 'http://www.veoh.com/watch/v56314296nk7Zdmz3',
'file': '56314296.mp4',
'md5': '620e68e6a3cff80086df3348426c9ca3',
'info_dict': {
'title': 'Straight Backs Are Stronger',
'uploader': 'LUMOback',
'description': 'At LUMOback, we believe straight backs are stronger. The LUMOback Posture & Movement Sensor: It gently vibrates when you slouch, inspiring improved posture and mobility. Use the app to track your data and improve your posture over time. ',
_TESTS = [
{
'url': 'http://www.veoh.com/watch/v56314296nk7Zdmz3',
'md5': '620e68e6a3cff80086df3348426c9ca3',
'info_dict': {
'id': '56314296',
'ext': 'mp4',
'title': 'Straight Backs Are Stronger',
'uploader': 'LUMOback',
'description': 'At LUMOback, we believe straight backs are stronger. The LUMOback Posture & Movement Sensor: It gently vibrates when you slouch, inspiring improved posture and mobility. Use the app to track your data and improve your posture over time. ',
},
},
{
'url': 'http://www.veoh.com/watch/v27701988pbTc4wzN?h1=Chile+workers+cover+up+to+avoid+skin+damage',
'md5': '4a6ff84b87d536a6a71e6aa6c0ad07fa',
'info_dict': {
'id': '27701988',
'ext': 'mp4',
'title': 'Chile workers cover up to avoid skin damage',
'description': 'md5:2bd151625a60a32822873efc246ba20d',
'uploader': 'afp-news',
'duration': 123,
},
},
{
'url': 'http://www.veoh.com/watch/v69525809F6Nc4frX',
'md5': '4fde7b9e33577bab2f2f8f260e30e979',
'note': 'Embedded ooyala video',
'info_dict': {
'id': '69525809',
'ext': 'mp4',
'title': 'Doctors Alter Plan For Preteen\'s Weight Loss Surgery',
'description': 'md5:f5a11c51f8fb51d2315bca0937526891',
'uploader': 'newsy-videos',
},
},
]
def _extract_formats(self, source):
formats = []
link = source.get('aowPermalink')
if link:
formats.append({
'url': link,
'ext': 'mp4',
'format_id': 'aow',
})
link = source.get('fullPreviewHashLowPath')
if link:
formats.append({
'url': link,
'format_id': 'low',
})
link = source.get('fullPreviewHashHighPath')
if link:
formats.append({
'url': link,
'format_id': 'high',
})
return formats
def _extract_video(self, source):
return {
'id': source.get('videoId'),
'title': source.get('title'),
'description': source.get('description'),
'thumbnail': source.get('highResImage') or source.get('medResImage'),
'uploader': source.get('username'),
'duration': int_or_none(source.get('length')),
'view_count': int_or_none(source.get('views')),
'age_limit': 18 if source.get('isMature') == 'true' or source.get('isSexy') == 'true' else 0,
'formats': self._extract_formats(source),
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
if video_id.startswith('v'):
rsp = self._download_xml(
r'http://www.veoh.com/api/findByPermalink?permalink=%s' % video_id, video_id, 'Downloading video XML')
if rsp.get('stat') == 'ok':
return self._extract_video(rsp.find('./videoList/video'))
webpage = self._download_webpage(url, video_id)
age_limit = 0
if 'class="adultwarning-container"' in webpage:
@ -33,24 +106,16 @@ class VeohIE(InfoExtractor):
request.add_header('Cookie', 'confirmedAdult=true')
webpage = self._download_webpage(request, video_id)
m_youtube = re.search(r'http://www\.youtube\.com/v/(.*?)(\&|")', webpage)
m_youtube = re.search(r'http://www\.youtube\.com/v/(.*?)(\&|"|\?)', webpage)
if m_youtube is not None:
youtube_id = m_youtube.group(1)
self.to_screen('%s: detected Youtube video.' % video_id)
return self.url_result(youtube_id, 'Youtube')
self.report_extraction(video_id)
info = self._search_regex(r'videoDetailsJSON = \'({.*?})\';', webpage, 'info')
info = json.loads(info)
video_url = info.get('fullPreviewHashHighPath') or info.get('fullPreviewHashLowPath')
info = json.loads(
self._search_regex(r'videoDetailsJSON = \'({.*?})\';', webpage, 'info').replace('\\\'', '\''))
return {
'id': info['videoId'],
'title': info['title'],
'url': video_url,
'uploader': info['username'],
'thumbnail': info.get('highResImage') or info.get('medResImage'),
'description': info['description'],
'view_count': info['views'],
'age_limit': age_limit,
}
video = self._extract_video(info)
video['age_limit'] = age_limit
return video

View File

@ -1,38 +0,0 @@
import re
from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..utils import ExtractorError
class ViceIE(InfoExtractor):
_VALID_URL = r'http://www\.vice\.com/.*?/(?P<name>.+)'
_TEST = {
u'url': u'http://www.vice.com/Fringes/cowboy-capitalists-part-1',
u'file': u'43cW1mYzpia9IlestBjVpd23Yu3afAfp.mp4',
u'info_dict': {
u'title': u'VICE_COWBOYCAPITALISTS_PART01_v1_VICE_WM_1080p.mov',
},
u'params': {
# Requires ffmpeg (m3u8 manifest)
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
webpage = self._download_webpage(url, name)
try:
ooyala_url = self._og_search_video_url(webpage)
except ExtractorError:
try:
embed_code = self._search_regex(
r'OO.Player.create\(\'ooyalaplayer\', \'(.+?)\'', webpage,
u'ooyala embed code')
ooyala_url = OoyalaIE._url_for_embed_code(embed_code)
except ExtractorError:
raise ExtractorError(u'The page doesn\'t contain a video', expected=True)
return self.url_result(ooyala_url, ie='Ooyala')

View File

@ -0,0 +1,70 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
find_xpath_attr,
int_or_none,
parse_duration,
unified_strdate,
)
class VideoLecturesNetIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?videolectures\.net/(?P<id>[^/#?]+)/'
IE_NAME = 'videolectures.net'
_TEST = {
'url': 'http://videolectures.net/promogram_igor_mekjavic_eng/',
'info_dict': {
'id': 'promogram_igor_mekjavic_eng',
'ext': 'mp4',
'title': 'Automatics, robotics and biocybernetics',
'description': 'md5:815fc1deb6b3a2bff99de2d5325be482',
'upload_date': '20130627',
'duration': 565,
'thumbnail': 're:http://.*\.jpg',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
smil_url = 'http://videolectures.net/%s/video/1/smil.xml' % video_id
smil = self._download_xml(smil_url, video_id)
title = find_xpath_attr(smil, './/meta', 'name', 'title').attrib['content']
description_el = find_xpath_attr(smil, './/meta', 'name', 'abstract')
description = (
None if description_el is None
else description_el.attrib['content'])
upload_date = unified_strdate(
find_xpath_attr(smil, './/meta', 'name', 'date').attrib['content'])
switch = smil.find('.//switch')
duration = parse_duration(switch.attrib.get('dur'))
thumbnail_el = find_xpath_attr(switch, './image', 'type', 'thumbnail')
thumbnail = (
None if thumbnail_el is None else thumbnail_el.attrib.get('src'))
formats = [{
'url': v.attrib['src'],
'width': int_or_none(v.attrib.get('width')),
'height': int_or_none(v.attrib.get('height')),
'filesize': int_or_none(v.attrib.get('size')),
'tbr': int_or_none(v.attrib.get('systemBitrate')) / 1000.0,
'ext': v.attrib.get('ext'),
} for v in switch.findall('./video')
if v.attrib.get('proto') == 'http']
return {
'id': video_id,
'title': title,
'description': description,
'upload_date': upload_date,
'duration': duration,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@ -0,0 +1,26 @@
from __future__ import unicode_literals
from .novamov import NovaMovIE
class VideoWeedIE(NovaMovIE):
IE_NAME = 'videoweed'
IE_DESC = 'VideoWeed'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'videoweed\.(?:es|com)'}
_HOST = 'www.videoweed.es'
_FILE_DELETED_REGEX = r'>This file no longer exists on our servers.<'
_TITLE_REGEX = r'<h1 class="text_shadow">([^<]+)</h1>'
_TEST = {
'url': 'http://www.videoweed.es/file/b42178afbea14',
'md5': 'abd31a2132947262c50429e1d16c1bfd',
'info_dict': {
'id': 'b42178afbea14',
'ext': 'flv',
'title': 'optical illusion dissapeared image magic illusion',
'description': ''
},
}

Some files were not shown because too many files have changed in this diff Show More