Compare commits

...

479 Commits

Author SHA1 Message Date
Sergey M․
8e751a185c release 2017.10.07 2017-10-07 05:02:53 +07:00
Sergey M․
3fc8f5b7c2 [ChangeLog] Actualize 2017-10-07 05:01:38 +07:00
Sergey M․
665f42d8c1 [reddit] Sort formats (closes #14430) 2017-10-07 01:40:00 +07:00
Sergey M․
e952847541 [PULL_REQUEST_TEMPLATE.md] Add explicit entry on flake8 2017-10-07 00:58:19 +07:00
remis
b1a7bf44b9 [lnkgo] Relax _VALID_URL 2017-10-06 23:59:09 +07:00
Jalaz Kumar
2e2a8e97d5 [pornflip] Extend _VALID_URL (closes #14405) 2017-10-06 23:56:31 +07:00
Sergey M․
ac93c09ab2 [xtube] Add support for embedded URLs (closes #14417) 2017-10-06 23:53:32 +07:00
Sergey M․
cd6fc19ed7 [YoutubeDL] Ignore duplicates in --playlist-items
E.g. '--playlist-items 2-4,3-4,3' should result in '[2,3,4]', not '[2,3,4,3,4,3]'
2017-10-06 23:50:34 +07:00
Sergey M․
86a15ed64b [test_YoutubeDL] Add test for #14425 2017-10-06 23:41:28 +07:00
Sergey M․
7e85e8729f [YoutubeDL] Fix out of range --playlist-items for iterable playlists and reduce code duplication (closes #14425) 2017-10-06 23:34:46 +07:00
Sergey M․
6be08ce602 [utils] Use in OnDemandPagedList by default
Not using cache results in redundant network I/O due to downloading the same pages while using --playlist-items n-m
2017-10-06 23:13:53 +07:00
Sergey M․
cf5f6ed5be [xvideos] Add support for embed URLs and improve extraction (closes #14409) 2017-10-05 00:27:24 +07:00
Philipp Hagemeister
6b46285e85 [comedycentral] new shortcut :theopposition for "The Opposition" show 2017-10-04 07:45:13 +02:00
Sergey M․
6e736d86e7 [beeg] Fix extraction (closes #14403) 2017-10-04 04:27:42 +07:00
M.K
c110944fa2 [extractor/common] Fix typo in _parse_mpd_formats 2017-10-04 03:50:27 +07:00
Sergey M․
9524dca3ac [README.md] Use revision bound link to YoutubeDL options (closes #14401) 2017-10-04 02:53:20 +07:00
Jakub Wilk
3e4cedf9e8 [tvn24] Relax _VALID_URL 2017-10-03 23:28:13 +07:00
remitamine
bfd484ccff Merge pull request #14392 from snipem/nbc-fix
Fix for JSON meta data download(closes #13651)
2017-10-03 14:49:55 +00:00
Matthias Küch
b7e14f06a4 Fix for JSON meta data download
Added fixes according to #13651 and user @remitamine
2017-10-03 15:17:28 +02:00
Sergey M․
d2ae7e24e5 [postprocessor/ffmpeg] Convert to opus using libopus (closes #14381) 2017-10-02 04:43:25 +07:00
Sergey M․
544ffb7790 [ketnet] Add support for videos without direct sources (closes #14377) 2017-10-02 04:15:12 +07:00
Sergey M․
117589dfa2 [canvas] Generalize mediazone.vrt.be extractor and rework canvas and een 2017-10-02 04:14:36 +07:00
Sergey M․
839728f5bf [afreecatv] Add support for adult videos (closes #14376) 2017-10-02 03:28:25 +07:00
Sergey M․
fcdd37d053 release 2017.10.01 2017-10-01 21:54:11 +07:00
Sergey M․
1dd126180e [ChangeLog] Actualize 2017-10-01 21:45:56 +07:00
Rafal Borczuch
4e599194d6 [tvp] Add support for new URL schema (closes #14368) 2017-10-01 18:59:00 +07:00
Sergey M․
c5b7014a9c [generic] Add support for single format Video.js embeds (closes #14371) 2017-10-01 07:01:42 +07:00
Sergey M․
c8da40d834 [yahoo] Bypass geo restriction for brightcove (#14210) 2017-10-01 04:49:27 +07:00
Sergey M․
b69ca0ccfc [yahoo] Use extracted brightcove account id (closes #14210) 2017-10-01 04:37:42 +07:00
Giuseppe Fabiano
2c53bd51c6 [rtve:alacarta] Fix extraction (closes #14290) 2017-10-01 03:21:17 +07:00
Sergey M․
3836b02ce8 [YoutubeDL] PEP 8 2017-09-30 22:56:40 +07:00
Sergey M․
fa3fdeb41f [yahoo] Fix some tests 2017-09-30 22:54:22 +07:00
Sergey M․
eb9a15be60 [yahoo] Add support for custom brigthcove embeds (closes #14210) 2017-09-30 22:47:03 +07:00
Sergey M․
3600fd591d [YoutubeDL] Document youtube_include_dash_manifest 2017-09-28 00:46:48 +07:00
Sergey M․
63d990d285 [generic] Add support for Video.js embeds 2017-09-28 00:37:30 +07:00
Timendum
b14b2283a0 [gfycat] Add support for /gifs/detail URLs (closes #14322) 2017-09-27 22:48:47 +07:00
Sergey M․
02d01e15f1 [generic] Fix infinite recursion for twitter:player URLs (closes #14339) 2017-09-26 21:47:18 +07:00
Sergey M․
db96252831 [xhamsterembed] Fix extraction (closes #14308) 2017-09-24 19:23:08 +07:00
Sergey M․
8b389f7e3c Credit the author of multiple generic HTML5 embeds fix 2017-09-24 18:21:38 +07:00
Sergey M․
9fc41bcb6b release 2017.09.24 2017-09-24 00:22:50 +07:00
Sergey M․
10cab6613f [ChangeLog] Actualize 2017-09-24 00:21:34 +07:00
Sergey M․
4d182955a2 [kakao] Fix _VALID_URL 2017-09-24 00:19:27 +07:00
Sergey M․
011da618bd [openload] Fix _load_cookies for python 2.6 2017-09-24 00:12:40 +07:00
Sergey M․
4c54b89e03 Hide experimental phantomjs wrapper 2017-09-24 00:08:27 +07:00
Sergey M․
a87d7b4953 Credit @nbppp2 for americastestkitchen (#13996) 2017-09-23 23:27:28 +07:00
Sergey M․
2f3933aa1e Credit @ishitatsuyuki for mixcloud fix (#14132) 2017-09-23 23:26:35 +07:00
Sergey M․
aab20aabfc Credit @jdong92 for voot (#14059) 2017-09-23 23:23:27 +07:00
Sergey M․
16f54d0751 Credit @codeasashu for voot (#11814) 2017-09-23 23:20:20 +07:00
Sergey M․
07d1344c85 Credit @coreynicholson for vlive:playlist (#13613) 2017-09-23 23:16:27 +07:00
Sergey M․
47b5dfb047 Credit @luboss for joj (#13268) 2017-09-23 23:14:41 +07:00
Sergey M․
e3440d824a [24video] Fix timestamp extraction and make non fatal (#14295) 2017-09-23 07:46:53 +07:00
Sergey M․
136507b39a [24video] Add support for 24video.adult (closes #14295) 2017-09-23 07:41:22 +07:00
Sergey M․
7f4921b38d [heise] PEP 8 2017-09-23 07:28:29 +07:00
Sergey M․
f70ddd4aeb [kakao] Improve (closes #14007) 2017-09-23 07:28:24 +07:00
Namnamseo
1c22d7a7f3 [kakao] Add extractor (closes #12298) 2017-09-23 07:28:19 +07:00
Giuseppe Fabiano
5c1452e8f1 [twitter] Add support for user_id-less URLs (closes #14270) 2017-09-23 06:38:09 +07:00
Sergey M․
4bb58fa118 [americastestkitchen] Improve (closes #13996) 2017-09-23 06:29:20 +07:00
Dan Weber
13de91c9e9 [americastestkitchen] Add extractor (closes #10764) 2017-09-23 06:29:07 +07:00
kayb94
9ce1ac4046 [generic] Fix support for multiple HTML5 videos on one page (closes #14080) 2017-09-23 05:49:48 +07:00
Sergey M․
095774e591 [mixcloud] Improve and simplify (closes #14132) 2017-09-23 05:37:03 +07:00
Tatsuyuki Ishi
2384f5a64e [mixcloud] Fix extraction (closes #14088) 2017-09-23 05:36:57 +07:00
Yen Chi Hsuan
8c2895305d [options] Accept lrc as a subtitle conversion target format (closes #14292) 2017-09-23 02:30:03 +08:00
Sergey M․
8c6919e433 [lynda] Add support for educourse.ga (closes #14286) 2017-09-21 23:00:35 +07:00
Giuseppe Fabiano
f6ff52b473 [beeg] Fix extraction (closes #14275) 2017-09-21 04:05:33 +07:00
Parmjit Virk
12ea5c79fb [nbcsports:vplayer] Correct theplatform URL (closes #13873) 2017-09-21 02:53:06 +07:00
capital-G
3b65a6fbf3 [twitter] Fix duration extraction 2017-09-20 03:58:06 +07:00
Sergey M․
dc76eef092 [tvplay] Bypass geo restriction 2017-09-20 00:00:04 +07:00
Kareem Moussa
8a1a60d173 [devscripts/check-porn] Fix gettestcases import 2017-09-19 22:51:20 +07:00
kayb94
4d8c4b46d5 [heise] Add support for YouTube embeds 2017-09-17 22:46:52 +07:00
Sergey M․
9c2a17f2ce [popcorntv] Add extractor (closes #5914, closes #14211) 2017-09-17 22:19:57 +07:00
Yen Chi Hsuan
4ed2d7b7d1 Fix flake8 issues after #14225 2017-09-17 13:53:04 +08:00
Vijay Singh
8251af63a1 [viki] Update app data (closes #14181) 2017-09-16 22:45:23 +07:00
Windom
790d379e4d [morningstar] Relax _VALID_URL 2017-09-16 22:39:46 +07:00
Yen Chi Hsuan
3869028ffb [utils] Use bytes-like objects in dfxp2srt
This fixes handling of non-UTF8 TTML subtitles

Closes #14191
2017-09-16 12:18:38 +08:00
Yen Chi Hsuan
68d43a61b5 Ignore TTML subtitles 2017-09-16 12:14:48 +08:00
Yen Chi Hsuan
a88d461dff Merge pull request #14225 from Tithen-Firion/openload-phantomjs-method
Openload phantomjs method
2017-09-16 02:28:28 +08:00
Sergey M․
a4245acef8 [noovo] Fix extraction (closes #14214) 2017-09-15 23:12:19 +07:00
Sergey M․
6be44a50ed [dailymotion:playlist] Relax _VALID_URL (closes #14219) 2017-09-15 22:25:38 +07:00
Sergey M․
b763e1d68c [twitch] Add support for go.twitch.tv URLs (closes #14215) 2017-09-15 22:18:38 +07:00
Sergey M․
cbf85239bb [vgtv] Relax _VALID_URL (closes #14223) 2017-09-15 22:13:30 +07:00
Sergey M․
159d304a9f release 2017.09.15 2017-09-15 21:48:06 +07:00
Sergey M․
86e55e317c [ChangeLog] Actualize 2017-09-15 21:45:18 +07:00
Sergey M․
c46680fb2a [condenast] Fix extraction (closes #14196, closes #14207) 2017-09-15 02:01:17 +07:00
Philipp Hagemeister
fad9fc537d [tv4] fix a test URL 2017-09-14 20:47:23 +02:00
Philipp Hagemeister
0732a90579 [orf] Add new extractor for f4m stories 2017-09-14 20:37:46 +02:00
Sergey M․
319fc70676 [tv4] Relax _VALID_URL (closes #14206) 2017-09-14 23:50:19 +07:00
Sergey M․
e7c3e33456 [downloader/fragment] Restart inconsistent incomplete fragment downloads (#13731) 2017-09-14 23:19:53 +07:00
Yen Chi Hsuan
757984af90 Merge pull request #12909 from remitamine/raw-sub
[YoutubeDL] write raw subtitle files
2017-09-13 17:36:40 +08:00
Sergey M․
2f483758bc [animeondemand] Improve and modernize 2017-09-11 04:32:35 +07:00
Sergey M․
018cc61549 [animeondemand] Bypass geo restriction 2017-09-11 04:23:42 +07:00
Sergey M․
2709d9fa28 [animeondemand] Add support for flash videos (closes #9944) 2017-09-11 04:23:42 +07:00
Sergey M․
7dacceae75 release 2017.09.11 2017-09-11 03:30:33 +07:00
Sergey M․
43df248f10 [ChangeLog] Actualize 2017-09-11 03:27:43 +07:00
Sergey M․
f12a6e88b2 [rutube:playlist] Fix suitable (closes #14166) 2017-09-11 03:23:00 +07:00
Sergey M․
806498cf2f release 2017.09.10 2017-09-10 22:16:55 +07:00
Sergey M․
b98339b54b [ChangeLog] Actualize 2017-09-10 22:15:55 +07:00
Sergey M․
bf6ec2fea9 [fox] Fix extraction (#14147) 2017-09-10 22:08:32 +07:00
Sergey M․
c3dd44e085 [rutube] Use bool_or_none 2017-09-10 19:09:27 +07:00
Sergey M․
c7e327c4d4 [utils] Introduce bool_or_none 2017-09-10 19:08:39 +07:00
Sergey M․
48b813748d [rutube] Rework and generalize playlist extractors (closes #13565) 2017-09-10 18:40:33 +07:00
luceatnobis
debed8d759 [rutube:playlist] Add extractor (closes #13534) 2017-09-10 18:40:33 +07:00
kayb94
51aee72d16 [README.md] Clarify how to run extractor specific test cases 2017-09-08 22:13:17 +07:00
Olivier Bilodeau
931edb2ada [radiocanada] Add fallback for title extraction 2017-09-08 21:53:24 +07:00
Sergey M․
5113b69124 [abcnews,chilloutsoze,cracked,vice,vk] Use dedicated YouTube embeds extraction routines 2017-09-06 00:50:25 +07:00
Sergey M․
66c9fa36c1 [youtube] Separate methods for embeds extraction 2017-09-06 00:48:37 +07:00
Sergey M․
c5c9bf0c12 [YoutubeDL] Ensure dir existence for each requested format (closes #14116) 2017-09-05 23:31:34 +07:00
Sergey M․
880fa66f4f [redtube] Fix formats extraction (closes #14122) 2017-09-05 22:45:49 +07:00
Sergey M․
6348671c4a [arte] Relax unavailability check (closes #14112) 2017-09-04 23:08:40 +07:00
Sergey M․
efc57145c1 [manyvids] Improve (closes #14059) 2017-09-03 17:32:23 +07:00
John D
e9b865267a [manyvids] Add support for preview videos (closes #14053) 2017-09-03 17:31:53 +07:00
Sergey M․
bc35f07537 [vidme:user] Make tests only matching (closes #14054) 2017-09-03 17:03:51 +07:00
theychx
0b4a8eb3ac [vidme:user] Relax _VALID_URLs 2017-09-03 17:03:45 +07:00
Sergey M․
c1c1585b31 [bpb] Improve (closes #14086) 2017-09-03 16:43:33 +07:00
Timendum
0cbb841ba9 [bpb] Fix extraction (closes #14043) 2017-09-03 16:39:12 +07:00
Sergey M
d7c7100e3d [soundcloud] Simplify and add test (closes #14093) 2017-09-03 16:29:58 +07:00
Tatsuyuki Ishi
73602bcd0c [soundcloud] Fix download URL with private tracks 2017-09-03 16:28:34 +07:00
Sergey M․
23b2df82c7 [aliexpress:live] Fix issues (closes #13698, closes #13707) 2017-09-03 16:05:31 +07:00
dubber0
503115540d [aliexpress:live] Add extractor 2017-09-03 16:05:00 +07:00
Sergey M․
64f0e30b93 [viidea] Capture and output lecture error message (#14099) 2017-09-02 15:44:49 +07:00
Sergey M․
a3431e1224 [radiocanada] Skip unsupported platforms (closes #14100) 2017-09-02 15:33:54 +07:00
Sergey M․
a2022b0c40 release 2017.09.02 2017-09-02 01:08:32 +07:00
Sergey M․
8681ed7fc8 [ChangeLog] Actualize 2017-09-02 01:04:22 +07:00
Sergey M․
8d81f3e36d [youtube] Force old layout for each webpage (closes #14083) 2017-09-02 00:58:19 +07:00
Sergey M․
7998520933 [youtube] Fix upload date extraction (closes #14065) 2017-08-31 00:47:58 +07:00
Sergey M․
5b4bfbfc3b [charlierose] Add support for episodes (closes #14062) 2017-08-30 23:50:33 +07:00
Sergey M․
53647dfd0a [bbccouk] Add support for w-prefixed ids (closes #14056) 2017-08-30 05:27:56 +07:00
Yen Chi Hsuan
22f65a9efc Merge pull request #14048 from ryandesign/patch-1
Fix build failures with old cp and zip
2017-08-28 11:22:27 +08:00
Ryan Schmidt
c75c384fb6 Fix build failures with old cp and zip 2017-08-27 18:07:09 -05:00
Sergey M․
1b41da488d [googledrive] Extend _VALID_URL (closes #9785) 2017-08-28 00:50:41 +07:00
Sergey M․
fea82c1780 [googledrive] Add support for source format (closes #14046) 2017-08-28 00:39:22 +07:00
Sergey M․
3902cdd0e3 [pornhd] Fix extraction (closes #14005) 2017-08-27 22:37:26 +07:00
Sergey M․
2cfa7cbdd0 release 2017.08.27.1 2017-08-27 06:09:29 +07:00
Sergey M․
cc0412ef91 [ChangeLog] Actualize 2017-08-27 06:06:49 +07:00
Sergey M․
1c9c8de29e [youtube] Fix extraction with --youtube-skip-dash-manifest enabled (closes #14037) 2017-08-27 06:06:39 +07:00
Sergey M․
f031b76065 release 2017.08.27 2017-08-27 04:28:04 +07:00
Sergey M․
62c06c593d [ChangeLog] Actualize 2017-08-27 04:27:19 +07:00
Sergey M․
ff17be3ac9 [extractor/generic] Extract from LD-JSON last of all
Previous sources may contain several formats, e.g. http://tamasha.com/v/PgGZ
2017-08-27 03:31:40 +07:00
Sergey M․
1ed4549942 [extractor/common] Extract format id from label attribute of source tag for HTML5 videos (#14034) 2017-08-27 03:27:05 +07:00
Sergey M․
dd121cc1ca [extractor/common] Extract height from res attribute of source tag for HTML5 videos (closes #14034) 2017-08-27 03:12:56 +07:00
Sergey M․
a3c3a1e128 [http] Rework HTTP downloader
* Simplify code and split into separate routines to facilitate maintaining
* Make retry mechanism work on errors during actual download not only during connection establishment phase
* Retry on ECONNRESET and ETIMEDOUT during reading data from network
* Retry on content too short and various timeout errors
* Show error description on retry
* Closes #506, closes #809, closes #2849, closes #4240, closes #6023, closes #8625, closes #9483
2017-08-27 02:22:30 +07:00
Sergey M․
085d9dd9be [rai] Fix audio formats extraction (closes #14024) 2017-08-26 22:02:49 +07:00
Vijay Singh
151978f38a [mixcloud] Fix extraction (closes #14020) 2017-08-26 19:32:57 +07:00
Sergey M․
c7121fa7b8 [youtube] Fix controversy videos extraction (closes #14027, closes #14029) 2017-08-26 15:38:38 +07:00
Vijay Singh
745968bc72 [mixcloud] Fix extraction (closes #14015) 2017-08-24 22:28:44 +07:00
Sergey M․
df235dbba8 release 2017.08.23 2017-08-23 23:23:13 +07:00
Sergey M․
c4bdc68113 [ChangeLog] Actualize 2017-08-23 23:21:19 +07:00
Sergey M․
5bae33485c [toutv] PEP 8 2017-08-23 22:50:00 +07:00
Sergey M․
0830f3e048 [cbc:watch] Bypass geo-restriction (closes #13993) 2017-08-23 22:45:45 +07:00
Sergey M․
8d7a24aff6 [toutv] Relax DRM check (closes #13994) 2017-08-23 22:28:09 +07:00
Sergey M․
37d9af306a [googledrive] Simplify and carry long lines (#13638) 2017-08-23 00:33:53 +07:00
Sergey M․
e01c3d2ef7 [extractor/common] Introduce _parse_xml 2017-08-23 00:32:41 +07:00
Parmjit Virk
05915e379a [googledrive] Add support for subtitles (fixes #13619) 2017-08-22 23:48:59 +07:00
Yen Chi Hsuan
7b67b60773 Merge pull request #13669 from bmwiedemann/master
[build] Override timestamps in zip file
2017-08-22 21:51:20 +08:00
Sergey M․
8d9c2a681a [pornhub] Relax uploader regex (closes #13906, closes #13975) 2017-08-21 23:06:27 +07:00
Alan Yee
903d4d1625 [README.md] Switch to HTTPS URLs 2017-08-20 23:35:39 +07:00
Luca Steeb
8239c6791a [bandcamp:album] Extract track titles 2017-08-20 23:32:33 +07:00
Sergey M․
b359e977b9 [extractor/common] Make HLS and DASH extraction non fatal in _parse_html5_media_entries (closes #13970) 2017-08-20 14:16:58 +07:00
Bernhard M. Wiedemann
305d99f0bd [build] Override timestamps in zip file
to make build reproducible.
See https://reproducible-builds.org/ for why this is good

Copying files to not interfere with freshness detection.
2017-08-19 21:43:48 +02:00
Sergey M․
d3d45e0a45 [bbccouk] Add support for events URLs (closes #13893) 2017-08-19 23:54:15 +07:00
Yen Chi Hsuan
381ad4f309 [liveleak] Support multi-video pages (closes #6542) 2017-08-19 22:48:00 +08:00
Yen Chi Hsuan
e2481b9b6e [ChangeLog] Fix 2017-08-19 22:28:58 +08:00
Yen Chi Hsuan
09747ba766 [liveleak] Support another liveleak embedding pattern (closes #13336) 2017-08-19 22:28:13 +08:00
Yen Chi Hsuan
f8f18f332f [cda] Fix extraction (closes #13935) 2017-08-19 21:44:47 +08:00
Yen Chi Hsuan
95f3f7c20a [utils] Fix unescapeHTML for misformed string like "&a"" (#13935) 2017-08-19 21:40:53 +08:00
Sergey M․
f5469da9e6 [laola1tv] Add support for tv.ittf.com (closes #13965) 2017-08-19 19:48:20 +07:00
Sergey M․
d14d9d8903 [mixcloud] Fix extraction (closes #13958) 2017-08-18 23:31:42 +07:00
Sergey M․
ea004d34f8 release 2017.08.18 2017-08-18 01:05:27 +07:00
Sergey M․
2738965d98 [ChangeLog] Actualize 2017-08-18 01:03:20 +07:00
Sergey M․
4a91910365 [qqmusic:toplist] PEP 8 2017-08-18 01:00:07 +07:00
Sergey M․
c0892b2b46 [arte] Detect unavailable videos (closes #13945) 2017-08-18 00:58:23 +07:00
Sergey M․
a5ac0c4755 [YoutubeDL] Sanitize byte string format URLs (#13951) 2017-08-17 23:59:12 +07:00
Sergey M․
5551d7714d [generic] Convert redirect URLs to unicode strings (closes #13951) 2017-08-17 23:58:01 +07:00
Sergey M․
5f5c7b92dd [udemy] Fix paid course detection (#13943) 2017-08-17 23:14:46 +07:00
Sergey M․
93d0583e34 [pluralsight] Use RPC API for course extraction (closes #13937) 2017-08-17 22:45:40 +07:00
Yen Chi Hsuan
5d28169747 Credit Genki Sky for clippit (bfabd17b33) 2017-08-17 21:21:17 +08:00
Yen Chi Hsuan
7ddab7742c [ChangeLog] Add an entry for Genki Sky's patch 2017-08-17 16:56:37 +08:00
Genki Sky
bfabd17b33 Add new extractor 2017-08-17 16:56:06 +08:00
Yen Chi Hsuan
12f5304556 [ChangeLog] Add entry for #13805 2017-08-17 16:40:56 +08:00
Yen Chi Hsuan
25a6e769a1 [qqmusic] Fix tests and cleanup 2017-08-17 16:39:57 +08:00
Yen Chi Hsuan
d22b67f356 Merge pull request #13805 from gam2046/master
Fix QQ Music url changed
2017-08-17 16:11:35 +08:00
Sergey M․
a1aa659662 [periscope] Renew HLS extraction (closes #13917) 2017-08-16 23:03:42 +07:00
Sergey M․
4850478543 [extractor/common] Add support for float durations in _parse_mpd_formats (closes #13919) 2017-08-15 23:58:00 +07:00
forDream
134d85a7bd [qqmusic] review 2017-08-15 13:14:35 +08:00
forDream
5c037c0d1f [qqmusic]support QQMusicSingerIE 2017-08-15 13:14:35 +08:00
forDream
5d1bd3b907 [qqmusic]update valid url 2017-08-15 13:14:34 +08:00
forDream
19ada898dc fix QQ Music Url changed 2017-08-15 13:14:34 +08:00
Sergey M․
da20951a57 [mixcloud] Extract decrypt key 2017-08-14 22:39:05 +07:00
Sergey M․
16393d6535 release 2017.08.13 2017-08-13 08:58:30 +07:00
Sergey M․
4f049e4aa8 [ChangeLog] Actualize 2017-08-13 08:00:15 +07:00
Sergey M․
475bcb225f [pornhub:playlistbase] Skip videos from drop-down menu for all playlists (closes #12819, closes #13902) 2017-08-13 07:53:02 +07:00
Sergey M․
b3c6515365 [fourtube] Add support for other sites (closes #6022, closes #7859, closes #13901) 2017-08-13 07:23:29 +07:00
Sergey M․
eb02940cc7 [generic] Add test for #13895 2017-08-13 01:11:27 +07:00
Sergey M․
4ef9152428 [limelight] Improve embeds detection (closes #13895) 2017-08-13 00:58:39 +07:00
Sergey M․
0c43a481b9 [reddit] Add extractors (closes #13847) 2017-08-12 23:43:51 +07:00
Sergey M․
868f79db41 [extractor/common] Fix _media_formats 2017-08-12 19:24:26 +07:00
Sergey M․
70851a95c3 [aparat] Extract all formats (closes #13887) 2017-08-12 17:18:23 +07:00
Sergey M․
e74e3b63e3 [YoutubeDL] Make sure format id is not empty 2017-08-12 17:14:11 +07:00
Sergey M․
ac8491fcca [extractor/common] Make _family_friendly_search optional 2017-08-12 17:11:35 +07:00
Sergey M․
82889d4ae5 [extractor/common] Respect source's type attribute for HTML5 media (closes #13892) 2017-08-12 16:48:11 +07:00
Sergey M․
92a5c41532 [mixcloud] Fix play info decryption (closes #13885) 2017-08-12 16:30:50 +07:00
Sergey M․
1663bd6e1c [generic] Replace vzaar embed test 2017-08-11 22:02:00 +07:00
tetra-eder
41918eaa5c [generic] Add support for vzaar embeds 2017-08-11 22:00:39 +07:00
Sergey M․
6ed99754bb release 2017.08.09 2017-08-09 23:52:22 +07:00
Sergey M․
0e7dfa7d16 [ChangeLog] Actualize 2017-08-09 23:49:53 +07:00
Sergey M․
baba5f4d1d [xxxymovies] Fix title extraction (closes #13868) 2017-08-09 23:46:49 +07:00
Sergey M․
dee04d24a4 [nick] Add support for nick.com.pl (closes #13860) 2017-08-09 23:12:02 +07:00
Sergey M․
5b3ddadcc3 [mixcloud] Fix play info decryption (closes #13867) 2017-08-09 22:55:13 +07:00
Sergey M․
5b232f46dc [utils] Skip missing params in cli_bool_option (closes #13865) 2017-08-09 22:28:19 +07:00
Alex Seiler
4bf22f7a10 [20min] Fix embeds extraction 2017-08-08 05:41:38 +07:00
Sergey M․
15d1e8a23d [dplayit] Fix extraction (closes #13851) 2017-08-07 22:43:42 +07:00
Yen Chi Hsuan
ee6a611665 [niconico] Support videos with multiple formats (closes #13522) 2017-08-07 00:19:46 +08:00
Yen Chi Hsuan
463e7216c8 [niconico] Support HTML5-only videos (closes #13806) 2017-08-06 23:07:28 +08:00
Sergey M․
903a183b6a release 2017.08.06 2017-08-06 09:05:36 +07:00
Sergey M․
92740e4241 [ChangeLog] Actualize 2017-08-06 09:02:14 +07:00
Sergey M․
fac188c695 [pluralsight] Fix format selection 2017-08-06 08:44:28 +07:00
Sergey M․
16afce174e [mpora] Remove extractor (closes #13826) 2017-08-06 08:18:16 +07:00
Sergey M․
e2b4808fd8 [voot] Improve extraction (#10255, closes #11814) 2017-08-06 08:05:29 +07:00
Ashutosh Chaudhary
daaaf5f594 [voot] Add extractor 2017-08-06 08:05:24 +07:00
Sergey M․
f172c86dcd [vlive:channel] Limit number of videos per page to 100 (closes #13830) 2017-08-05 21:17:55 +07:00
Sergey M․
1d5472290f [podomatic] Extend _VALID_URL (closes #13827) 2017-08-05 08:28:12 +07:00
Sergey M․
c983cc3b71 [cinchcast] Extend _VALID_URL 2017-08-05 08:17:01 +07:00
Sergey M․
1141e9104b Use relative paths for DASH fragments (closes #12990)
10x reduced JSON size
refs #13810
2017-08-05 07:40:29 +07:00
Sergey M․
8519b88f67 [yandexdisk] Relax _VALID_URL (closes #13824) 2017-08-05 00:59:07 +07:00
Sergey M․
bbbe1cebfc [mlb] Update test (closes #13777) 2017-08-05 00:09:36 +07:00
Sergey M․
f31fd0693b [vidme] Extract DASH and HLS formats 2017-08-05 00:00:21 +07:00
Sergey M․
799802f368 [teamfour] Remove extractor (closes #13782)
Now covered with generic extractor
2017-08-04 23:54:28 +07:00
Sergey M․
b3b5870cba [pornhd] Fix extraction (closes #13783) 2017-08-04 23:51:03 +07:00
Sergey M․
57a38a38c3 [udemy] Fix subtitles extraction (closes #13812) 2017-08-04 23:45:13 +07:00
Matt Crupi
11a6793f80 [mlb] Extend _VALID_URL (closes #13740) 2017-08-04 22:46:54 +07:00
Justin Quan
1f03fef994 [README.md] Improve grammar 2017-08-04 22:43:44 +07:00
Sergey M․
183062a4ab [pbs] Add support for new URL schema (closes #13801) 2017-08-03 23:19:59 +07:00
Tithen-Firion
feee8d32e4 [phantomjs] add exe version to debug info 2017-08-03 14:17:25 +02:00
Sergey M․
8cda78ef72 [test_YoutubeDL] Add a test for #10083 2017-08-02 23:12:34 +07:00
Sergey M․
9118c9f18a [nrktv] Update API host (closes #13796) 2017-08-01 05:21:00 +07:00
Sergey M․
5c9ea67bc0 release 2017.07.30.1 2017-07-30 20:47:31 +07:00
Sergey M․
f701827e31 [ChangeLog] Actualize 2017-07-30 19:43:09 +07:00
Sergey M․
8b9f50d7cb [watchbox] Add extractor (#13739) 2017-07-30 19:09:44 +07:00
Sergey M․
0ed4758023 [clipfish] Remove extractor 2017-07-30 19:08:44 +07:00
Sergey M․
a0a477b885 [youjizz] Fix extraction (closes #13744) 2017-07-30 15:48:22 +07:00
Grzegorz Ruciński
198d4cb40c [generic] Add support for another ooyala embed pattern (closes #13727) 2017-07-30 01:30:04 +07:00
Sergey M․
ca127ab2c1 [ard] Add support for lives (closes #13771) 2017-07-29 23:07:28 +07:00
Sergey M․
e445850e69 [soundcloud] Update client id 2017-07-29 18:45:57 +07:00
Sergey M․
836ef26486 [soundcloud:trackstation] Add extractor (closes #13733) 2017-07-29 18:41:42 +07:00
Sergey M․
c04017519d [svtplay] Use geo verification proxy for API request 2017-07-29 15:30:53 +07:00
Sergey M․
2a7a823211 [svtplay] Update API URL (closes #13767) 2017-07-29 15:25:32 +07:00
Sergey M․
95908ce453 [extractor/generic] PEP 8 2017-07-29 15:13:12 +07:00
Sergey M․
cbbe66635f [yandexdisk] Add extractor (closes #13755) 2017-07-29 15:10:19 +07:00
Sergey M․
c5a49ff084 [downloader/hls] Use redirect URL as manifest base (#13755) 2017-07-29 15:02:41 +07:00
Philipp Hagemeister
24e966e8da [megaphone] Add extractor 2017-07-28 12:13:19 +02:00
Sergey M․
9682666bda [amcnetworks] Make rating optional (closes #12453) 2017-07-27 02:04:51 +07:00
Sergey M․
f9c48d895b [cloudy] Fix extraction (closes #13737) 2017-07-26 23:12:43 +07:00
Sergey M․
c99d6890cb [nickru] Add extractor 2017-07-23 21:02:06 +07:00
Sergey M․
70bfab0e9a [mtv] Improve thumbnal extraction 2017-07-23 21:02:06 +07:00
nyuszika7h
f0e31e32c9 [nick] Automate geo-restriction bypass (#13711) 2017-07-23 20:40:04 +07:00
nyuszika7h
3150976669 [ISSUE_TEMPLATE_tmpl.md] Minor improvements 2017-07-23 20:33:18 +07:00
Yen Chi Hsuan
e3ce912c3d [niconico] improve error reporting (#13696) 2017-07-23 16:25:30 +08:00
Yen Chi Hsuan
73095e013f [options] Typo 2017-07-23 16:24:18 +08:00
Yen Chi Hsuan
905d18a7aa [options] Correctly hide login info from debug outputs (#13696)
Iterate over opts instead of PRIVATE_OPTS for both performance and
correctness
2017-07-23 16:22:14 +08:00
Sergey M․
0db492c02a release 2017.07.23 2017-07-23 01:09:09 +07:00
Sergey M․
425f41319a [ChangeLog] Actualize 2017-07-23 01:06:08 +07:00
Sergey M․
71dde5eecf [itv] Fix production id extraction (closes #13671) 2017-07-23 00:59:07 +07:00
Sergey M․
935d6c20c0 [vidio] Make duration non fatal and fix typo 2017-07-23 00:44:50 +07:00
Sergey M․
e0f1fb0a27 [mtv] Skip missing video parts (closes #13690) 2017-07-23 00:25:23 +07:00
Sergey M․
0017d9ad6d [YoutubeDL] Improve default format specification (closes #13704) 2017-07-23 00:12:01 +07:00
Sergey M․
327c8364f1 [sportbox:embed] Fix extraction 2017-07-22 21:35:14 +07:00
dubber0
359aa2fdd1 [npo] Add support for npo3.nl URLs 2017-07-22 19:15:55 +07:00
Sergey M․
f76c02c87b [dramafever] Fix tests 2017-07-22 11:41:40 +07:00
Sergey M․
7d9a1db111 [dramafever] Remove video id from title (closes #13699) 2017-07-22 11:40:46 +07:00
Sergey M․
0396806f67 [YoutubeDL] Do not override id, extractor and extractor_key in url_transparent
All these meta fields must be borrowed from final extractor that actually performs extraction.
This commit fixes extractor id in download archives for url_transparent downloads. Previously, 'transparent' extractor was erroneously
used for extractor archive id, e.g. 'eggheadlesson 4n8ugwwj5t' instead of 'wistia 4n8ugwwj5t'.
2017-07-21 00:13:32 +07:00
Sergey M․
dc6520aa3d [egghead:lesson] Add extractor (#6635) 2017-07-20 23:22:36 +07:00
Sergey M․
c653326a14 [funnyordie] Extract more metadata (closes #13677) 2017-07-20 22:50:56 +07:00
Yen Chi Hsuan
3fcf346ac1 [youku:show] Refine playlist extraction
Handle playlists that the initial page is not the first page
2017-07-20 23:20:46 +08:00
Yen Chi Hsuan
fa63cf6c23 [youku:show] Fix playlist extraction (closes #13248) 2017-07-20 22:57:51 +08:00
Yen Chi Hsuan
85f5a74b6c [tbs] Mark as broken and skip invalid tests 2017-07-20 21:19:09 +08:00
Yen Chi Hsuan
d20b1c6725 [dispeak] Recognize sevt subdomain (closes #13276) 2017-07-20 18:14:14 +08:00
Sergey M․
bb176df3bb [spiegel:article] Move test 2017-07-17 22:19:40 +07:00
Sergey M․
83d00044c1 [adn] Improve error reporting (#13663) 2017-07-16 20:50:32 +07:00
Sergey M․
7abed4e06c [crunchyroll] Relax series and season regex (closes #13659) 2017-07-16 12:40:45 +07:00
Sergey M․
13eb526f11 [nexx:embed] PEP 8 2017-07-16 05:23:19 +07:00
Sergey M․
00d06e3cfc [spiegel:article] Add support for nexx iframe embeds (closes #13029) 2017-07-16 04:38:20 +07:00
Sergey M․
749ca5eced [extractor/common] Fix playlist_from_matches 2017-07-16 04:33:14 +07:00
Sergey M․
3f59b0154a [nexx:embed] Add extractor for iframe embeds 2017-07-16 04:32:37 +07:00
Sergey M․
089b97cfee [nexx] Improve JS embed extraction 2017-07-16 04:30:48 +07:00
Sergey M․
decf86044d [pearvideo] Improve (closes #13031) 2017-07-16 03:06:04 +07:00
troywith77
94b817edeb [pearvideo] Add extractor 2017-07-16 03:02:31 +07:00
Sergey M․
cea931a9e5 release 2017.07.15 2017-07-15 07:36:05 +07:00
Sergey M․
ef78563e9c [ChangeLog] Actualize 2017-07-15 07:33:26 +07:00
Sergey M․
961ea474b6 [YoutubeDL] PEP 8 2017-07-15 07:02:57 +07:00
Sergey M․
ea3f20494f [youtube] PEP 8 2017-07-15 07:02:57 +07:00
Sergey M․
c7604d79e9 [spiegeltv] Delegate extraction to nexx (closes #13159) 2017-07-15 07:02:57 +07:00
Sergey M․
4e826cd9ae [nexx] Add extractor (closes #10807, closes #13465) 2017-07-15 07:02:57 +07:00
Robin Neatherway
2583c0b54e Fix bugs caused by typos 2017-07-14 23:08:32 +07:00
Sergey M․
7d02dcfaa2 [youtube] Don't capture YouTube Red ad for creator meta field (closes #13621) 2017-07-14 22:37:04 +07:00
satunnainen
00dbdfc1f7 [slideshare] Fix extraction 2017-07-14 22:11:07 +07:00
rrooij
f354d84807 [5tv] Add another video URL pattern (closes #13354) 2017-07-14 22:10:17 +07:00
Sergey M․
15da37c7dc [YoutubeDL] Don't expand env variables in meta fields (closes #13637) 2017-07-14 00:42:12 +07:00
Sergey M․
9a0942ad55 [drtv] Make HLS and HDS extraction non fatal 2017-07-11 22:59:56 +07:00
Sergey M․
f2bb33a986 [ted] Fix subtitles extraction (closes #13628, closes #13629) 2017-07-11 21:36:45 +07:00
Yen Chi Hsuan
3615bfe1b4 [twitter] Fix remaining tests 2017-07-11 16:46:37 +08:00
Yen Chi Hsuan
e8f20ffa03 [vine] Make sure the title won't be empty
And fix a relevant TwitterCard test case
2017-07-11 16:05:15 +08:00
Yen Chi Hsuan
9be31e771c [twitter] Support HLS streams in vmap URLs 2017-07-11 15:48:48 +08:00
Yen Chi Hsuan
7f176ac477 [periscope] Support pscp.tv URLs in embedded frames
And fix a relevant twitter test
2017-07-11 15:35:19 +08:00
Yen Chi Hsuan
2edfd745df [twitter] Extract mp4 urls via mobile API (closes #12726) 2017-07-11 15:19:36 +08:00
Yen Chi Hsuan
708f6f511e [niconico] Fix authentication error handling (closes #12486) 2017-07-11 15:04:45 +08:00
Yen Chi Hsuan
bb13949197 [niconico] Check login errors (#12486) 2017-07-11 15:03:11 +08:00
Yen Chi Hsuan
c3c94ca4a4 [giantbomb] Extract m3u8 formats (closes #13626) 2017-07-10 21:34:27 +08:00
Sergey M․
e3cd1fcdd1 [vlive:playlist] Relax and simplify 2017-07-10 04:32:24 +07:00
coreynicholson
b71c18b434 [vlive:playlist] Add extractor 2017-07-10 04:24:04 +07:00
Sergey M․
7bf539edcc [eagleplatform] Fix test 2017-07-10 00:14:41 +07:00
Sergey M․
65c416dda8 release 2017.07.09 2017-07-09 20:16:38 +07:00
Sergey M․
207acd8465 [ChangeLog] Actualize 2017-07-09 20:15:15 +07:00
Sergey M․
71a1db8919 [dailymail] Add support for embeds 2017-07-09 20:06:24 +07:00
Sergey M․
6e925598d6 [csjw] Add coding cookie 2017-07-09 19:18:12 +07:00
Sergey M․
73cf76a93f [joj] Rewrite and add support for generic embeds (closes #13268) 2017-07-09 19:17:54 +07:00
luboss
256a746d21 [joj] Add extractor 2017-07-09 19:17:38 +07:00
Sergey M․
58179eb7d9 [abc.net.au:iview] Extract more formats (closes #13492, closes #13489) 2017-07-09 17:55:40 +07:00
Sergey M․
485cb37576 [egghead:course] Improve (closes #13370) 2017-07-09 17:30:49 +07:00
Santiago Calcagno
ed84454d35 [egghead:course] Fix extraction 2017-07-09 17:30:25 +07:00
Sergey M․
a02682fd13 Keep in sync with ffmpeg's current malformed AAC bitstream wording (closes #13587) 2017-07-09 17:09:44 +07:00
Sergey M․
0d2f0b0357 [csjw] Make description optional 2017-07-09 17:05:11 +07:00
Sergey M․
c319d1c483 [csjw] Fix issues and improve extraction (closes #13525) 2017-07-09 17:01:05 +07:00
Christopher Smith
d2b9f362fa [cjsw] Add extractor 2017-07-09 17:01:00 +07:00
Sergey M․
4328ddf82b [extractor/common] Add support for AMP tags in _parse_html5_media_entries 2017-07-09 16:29:52 +07:00
Sergey M․
250b042c7e [generic] Add tests for #13557 2017-07-09 16:02:38 +07:00
Sergey M․
665e945246 [eagleplatform] Add support for referrer protected videos (closes #13557) 2017-07-09 15:57:58 +07:00
Sergey M․
5af2fd7fa0 [eagleplatform] Add support for another embed pattern (#13557) 2017-07-09 15:55:04 +07:00
mlindner
15237fcd51 [veoh] Extend _VALID_URL 2017-07-09 14:54:52 +07:00
rrooij
7a57730907 [npo:live] Fix live stream id extraction (closes #13568) 2017-07-09 14:21:40 +07:00
Sergey M․
8b347a389e [googledrive] Fix height extraction (closes #13603) 2017-07-09 00:26:13 +07:00
Sergey M․
a49804816c [dailymotion] Add support for new layout (close #13580) 2017-07-08 18:12:15 +07:00
Yen Chi Hsuan
eadd313321 [yam] Remove extractor
mymedia.yam.com is dead. An wikipedia user also pointed out that Yam's
blog service is no longer available. [1]

[1] https://zh.wikipedia.org/zh-tw/%E5%A4%A9%E7%A9%BA%E9%83%A8%E8%90%BD
2017-07-08 15:48:05 +08:00
Sergey M․
d852c6bc59 [xhamster] Extract all formats and fix duration extraction (#13593) 2017-07-07 22:49:11 +07:00
Sergey M․
00e5c36315 [xhamster] Add support for new URL schema (closes #13593) 2017-07-07 22:27:34 +07:00
Sergey M․
8a04ade86b Credit @parmjitv for #13322, #13503, #13541, #13549 2017-07-06 23:15:23 +07:00
Sergey M․
ab328411d5 Credit @orng for ruv (#13396) 2017-07-06 23:15:16 +07:00
Sergey M․
ddeff4be3f Credit @gfabiano for #13382, #13385, #13415 2017-07-06 23:15:09 +07:00
Parmjit Virk
60d4401c5e [espn] Extend _VALID_URL (fixes #13244) 2017-07-06 22:55:59 +07:00
Sergey M․
dee2ff1d81 [test_utils] Fix tests under Windows 2017-07-06 00:25:37 +07:00
Sergey M․
6554708252 [kaltura] Fix typo in subtitles extraction (closes #13569) 2017-07-05 23:20:50 +07:00
Sergey M․
0a2e1b2e30 [vier] Adapt extraction to redesign (#13575) 2017-07-05 22:52:47 +07:00
Yen Chi Hsuan
babbc04d45 [xuite] Move to the new HTML5 API and reduce # of requests 2017-07-05 23:27:12 +08:00
Yen Chi Hsuan
609ff8ca19 [utils] Support attributes with no values in get_elements_by_attribute() 2017-07-05 23:27:12 +08:00
Sergey M․
b6c9fe4162 release 2017.07.02 2017-07-02 20:17:10 +07:00
Sergey M․
4d9ba27bba [ChangeLog] Actualize 2017-07-02 20:12:40 +07:00
Sergey M․
50ae3f646e [thisoldhouse] Add more fallbacks for video id (closes #13541) 2017-07-02 20:06:15 +07:00
Parmjit Virk
99a7e76240 [thisoldhouse] Update test 2017-07-02 20:05:11 +07:00
Parmjit Virk
a3a6d01a96 [thisoldhouse] Fix video id extraction (closes #13540) 2017-07-02 20:04:51 +07:00
Sergey M․
02d61a65e2 [xfileshare] Extend format regex (closes #13536) 2017-07-02 08:00:22 +07:00
Sergey M․
9b35297be1 [extractors] Add import for tastytrade 2017-07-01 18:39:29 +07:00
Sergey M․
4917478803 [ted] Fix extraction (closes #13535)) 2017-07-01 18:39:01 +07:00
Sergey M․
54faac2235 [tastytrade] Add extractor (closes #13521) 2017-06-30 22:20:30 +07:00
Sergey M․
c69701c6ab [extractor/common] Improve _json_ld 2017-06-30 22:19:06 +07:00
Sergey M․
d4f8ce6e91 [dplayit] Relax video id regex (closes #13524) 2017-06-30 21:55:45 +07:00
Sergey M․
b311b0ead2 [generic] Extract more generic metadata (closes #13527) 2017-06-30 21:42:04 +07:00
Sergey M․
72d256c434 [bbccouk] Extend _VALID_URL 2017-06-29 22:29:28 +07:00
Sergey M․
b2ed954fc6 [bbccouk] Capture and output error message (closes #13518) 2017-06-29 22:27:53 +07:00
Sergey M․
a919ca0ad6 [cbsnews] Actualize test 2017-06-28 22:30:12 +07:00
Parmjit Virk
88d6b7c2bd [cbsnews] Relax video info regex (fixes #13284) 2017-06-28 22:21:35 +07:00
Sergey M․
fd1c5fba6b [facebook] Add test for plugin video embed (#13493) 2017-06-27 22:38:59 +07:00
Sergey M․
0646e34c7d [facebook] Add support for plugin video embeds and multiple embeds (closes #13493) 2017-06-27 22:38:54 +07:00
Sergey M․
bf2dc9cc6e [soundcloud] Fix tests 2017-06-27 21:26:46 +07:00
Viktor Szakats
f1c051009b [soundcloud] Switch to https for API requests 2017-06-27 21:20:18 +07:00
Sergey M․
33ffb645a6 [pandatv] Switch to https for API and download URLs 2017-06-26 22:11:09 +07:00
Xuan Hu (Sean)
35544690e4 [pandatv] Add support for https URLs 2017-06-26 22:00:31 +07:00
Yen Chi Hsuan
136503e302 [ChangeLog] Update after #13494 2017-06-26 19:56:07 +08:00
Luca Steeb
4a87de72df [niconico] fix sp subdomain links 2017-06-25 21:30:05 +02:00
Sergey M․
a7ce8f16c4 release 2017.06.25 2017-06-25 05:16:06 +07:00
Sergey M․
a5aea53fc8 [ChangeLog] Actualize 2017-06-25 05:13:12 +07:00
Sergey M․
0c7a631b61 [adobepass] Add support for ATTOTT MSO (DIRECTV NOW) (closes #13472) 2017-06-25 05:03:17 +07:00
Sergey M․
fd9ee4de8c [wsj] Add support for barrons.com (closes #13470) 2017-06-25 02:15:35 +07:00
Argn0
5744cf6c03 [ign] Add another video id pattern (closes #13328) 2017-06-25 01:59:15 +07:00
Sergey M․
9c48b5a193 [raiplay:live] Improve and add test (closes #13414) 2017-06-25 01:49:27 +07:00
james
449c665776 [raiplay:live] Add extractor 2017-06-25 01:48:54 +07:00
Sergey M․
23aec3d623 [redbulltv] Restore hls format prefix 2017-06-25 01:10:31 +07:00
Sergey M․
27449ad894 [redbulltv] Add support for lives and segments (closes #13486)) 2017-06-25 01:09:12 +07:00
Sergey M․
bd65f18153 [onetpl] Add support for videos embedded via pulsembed (closes #13482) 2017-06-24 18:33:31 +07:00
Sergey M․
73af5cc817 [YoutubeDL] Skip malformed formats for better extraction robustness 2017-06-23 21:18:33 +07:00
Sergey M․
b5f523ed62 [ooyala] Add test for missing stream['url']['data'] 2017-06-23 20:56:48 +07:00
Sergey M․
4f4dd8d797 [ooyala] Make more robust 2017-06-23 20:56:21 +07:00
Sergey M․
4cb18ab1b9 [ooyala] Skip empty format URLs (closes #13471, closes #13476) 2017-06-23 20:50:48 +07:00
Sergey M․
ac7409eec5 [hgtv.com:show] Fix typo 2017-06-23 02:54:12 +07:00
Sergey M․
170719414d release 2017.06.23 2017-06-23 02:13:21 +07:00
Sergey M․
38dad4737f [ChangeLog] Actualize 2017-06-23 02:10:54 +07:00
Sergey M․
ddbb4c5c3e [youtube] Adapt to new automatic captions rendition (closes #13467) 2017-06-23 02:00:19 +07:00
Sergey M․
fa3ea7223a [hgtv.com:show] Relax video config regex and update test (closes #13279, closes #13461) 2017-06-23 00:42:42 +07:00
Parmjit Virk
0f4a5a73e7 [drtuber] Fix formats extraction (fixes 12058) 2017-06-23 00:08:36 +07:00
Sergey M․
18166bb8e8 [youporn] Fix upload date extraction 2017-06-22 00:47:02 +07:00
Sergey M․
d4893e764b [youporn] Improve formats extraction 2017-06-22 00:40:15 +07:00
Sergey M․
97b6e30113 [youporn] Fix title extraction (closes #13456) 2017-06-22 00:20:45 +07:00
Sergey M․
9be9ec5980 [googledrive] Fix formats' sorting (closes #13443) 2017-06-20 22:58:33 +07:00
Giuseppe Fabiano
048b55804d [watchindianporn] Fix extraction (closes #13411) 2017-06-20 04:30:45 +07:00
Giuseppe Fabiano
6ce79d7ac0 [abcotvs] Fix test md5 2017-06-20 04:07:00 +07:00
Sergey M․
1641ca402d [vimeo] Add fallback mp4 extension for original format 2017-06-20 01:27:59 +07:00
Sergey M․
85cbcede5b [ruv] Improve, extract all formats and metadata (closes #13396) 2017-06-19 23:46:03 +07:00
Orn
a1de83e5f0 [ruv] Add extractor 2017-06-19 23:45:45 +07:00
Sergey M․
fee00b3884 [viu] Fix extraction on older python 2.6 2017-06-19 22:57:37 +07:00
Sergey M․
2d2132ac6e [adobepass] Fix extraction on older python 2.6 2017-06-19 22:54:53 +07:00
Yen Chi Hsuan
cc2ffe5afe [pandora.tv] Fix upload_date extraction (closes #12846) 2017-06-19 16:20:36 +08:00
Sergey M․
560050669b [asiancrush] Add extractor (closes #13420) 2017-06-18 20:18:51 +07:00
Sergey M․
eaa006d1bd release 2017.06.18 2017-06-18 00:16:49 +07:00
Sergey M․
a6f29820c6 [ChangeLog] Actualize 2017-06-18 00:15:43 +07:00
Sergey M․
1433734c35 [downloader/common] Use utils.shell_quote for debug command line 2017-06-17 23:50:21 +07:00
Sergey M․
aefce8e6dc [utils] Use compat_shlex_quote in shell_quote 2017-06-17 23:48:58 +07:00
Sergey M․
8b6ac49ecc [postprocessor/execafterdownload] Encode command line (closes #13407) 2017-06-17 23:16:53 +07:00
Sergey M․
b08e235f09 [compat] Fix compat_shlex_quote on Windows (closes #5889, closes #10254) 2017-06-17 23:14:24 +07:00
Sergey M․
be80986ed9 [postprocessor/metadatafromtitle] Fix missing optional meta fields (closes #13408) 2017-06-17 19:05:10 +07:00
Yen Chi Hsuan
473e87064b [devscripts/prepare_manpage] Fix deprecated escape sequence on py36 2017-06-17 17:37:25 +08:00
Yen Chi Hsuan
4f90d2aeac [Makefile] Excluding __pycache__ correctly (#13400) 2017-06-17 17:09:24 +08:00
Jakub Adam Wieczorek
b230fefc3c [polskieradio] Fix extraction 2017-06-16 04:57:56 +07:00
Sergey M․
96a2daa1ee [extractor/common] Improve jwplayer subtitles extraction 2017-06-15 23:40:39 +07:00
gfabiano
0ea6efbb7a [xfileshare] Add support for fastvideo.me 2017-06-15 21:41:49 +07:00
Yen Chi Hsuan
6a9cb29509 [extractor/common] Fix json dumping with --geo-bypass
The line "[debug] Using fake IP %s (%s) as X-Forwarded-For." was printed
to stdout even with -j/-J, which breaks the resultant JSON.
2017-06-15 13:04:36 +08:00
Yen Chi Hsuan
ca27037171 [bilibili] Fix extraction of videos with double quotes in titles
Closes #13387
2017-06-15 11:19:03 +08:00
gfabiano
0bf4b71b75 [4tube] Fix extraction (closes #13381) 2017-06-15 04:16:50 +07:00
Marcin Cieślak
5215f45327 [disney] Add support for disneychannel.de 2017-06-15 04:13:04 +07:00
Sergey M․
0a268c6e11 [extractor/common] Improve jwplayer formats extraction (closes #13379) 2017-06-14 22:02:15 +07:00
Sergey M․
7dd5415cd0 [npo] Improve _VALID_URL (closes #13376) 2017-06-14 21:33:40 +07:00
Sergey M․
b5dc33daa9 [corus] Add support for showcase.ca 2017-06-13 23:27:27 +07:00
Sergey M․
97fa1f8dc4 [corus] Add support for history.ca (closes #13359) 2017-06-13 23:16:21 +07:00
Sergey M․
b081f53b08 [compat] Add compat_HTMLParseError to __all__ 2017-06-12 02:36:43 +07:00
Sergey M․
cb1e6d8985 release 2017.06.12 2017-06-12 02:23:17 +07:00
Sergey M․
9932ac5c58 [ChangeLog] Actualize 2017-06-12 02:01:15 +07:00
Sergey M․
bf87c36c93 [xfileshare] PEP 8 2017-06-12 02:01:12 +07:00
Sergey M․
b4a3d461e4 [utils] Handle HTMLParseError in extract_attributes (closes #13349) 2017-06-12 01:52:24 +07:00
Sergey M․
72b409559c [compat] Introduce compat_HTMLParseError 2017-06-12 01:50:32 +07:00
Sergey M․
534863e057 [xfileshare] Add support for rapidvideo (closes #13348) 2017-06-12 00:16:47 +07:00
Sergey M․
16bc958287 [xfileshare] Modernize and pass referrer 2017-06-12 00:14:04 +07:00
Sergey M․
624bd0104c [rutv] Add support for testplayer.vgtrk.com (closes #13347) 2017-06-11 21:36:19 +07:00
Sergey M․
28a4d6cce8 [newgrounds] Extract more metadata (closes #13232) 2017-06-11 21:30:06 +07:00
Sergey M․
2ae2ffda5e [utils] Improve unified_timestamp 2017-06-11 21:27:22 +07:00
Sergey M․
70e7967202 [newgrounds:playlist] Add extractor (closes #10611) 2017-06-11 20:50:55 +07:00
Sergey M․
6e999fbc12 [newgrounds] Improve formats and uploader extraction (closes #13346) 2017-06-11 19:44:44 +07:00
Sergey M․
7409af9eb3 [msn] Fix formats extraction 2017-06-11 08:56:53 +07:00
Sergey M․
4e3637034c [extractor/generic] Ensure format id is unicode string 2017-06-10 23:56:20 +07:00
Sergey M․
1afd0b0da7 [extractor/common] Return unicode string from _match_id 2017-06-09 00:40:03 +07:00
Sergey M․
7515830422 [turbo] Ensure format id is string 2017-06-09 00:31:56 +07:00
Sergey M․
f5521ea209 [sexu] Ensure height is int 2017-06-09 00:30:23 +07:00
Sergey M․
34646967ba [jove] Ensure comment count is int 2017-06-09 00:29:20 +07:00
Sergey M․
e4d2e76d8e [golem] Ensure format id is string 2017-06-09 00:27:11 +07:00
Sergey M․
87f5646937 [gfycat] Ensure filesize is int 2017-06-09 00:24:23 +07:00
Sergey M․
cc69a3de1b [foxgay] Ensure height is int 2017-06-09 00:22:14 +07:00
Sergey M․
15aeeb1188 [flickr] Ensure format id is string 2017-06-09 00:20:07 +07:00
Sergey M․
1693bebe4d [sohu] Fix numeric fields 2017-06-09 00:16:42 +07:00
Sergey M․
4244a13a1d [safari] Improve authentication detection (closes #13319) 2017-06-08 23:20:48 +07:00
Sergey M․
931adf8cc1 [liveleak] Ensure height is int (closes #13313) 2017-06-08 22:54:30 +07:00
Sergey M․
c996943418 [YoutubeDL] Sanitize more fields (#13313) 2017-06-08 22:53:14 +07:00
Sergey M․
76e6378358 [README.md] Improve man page formatting 2017-06-08 22:02:42 +07:00
Sergey M․
a355b57f58 [README.md] Clarify output template references (closes #13316) 2017-06-08 21:52:19 +07:00
Sergey M․
1508da30c2 [streamango] Skip download for test (closes #13292) 2017-06-07 21:53:40 +07:00
Luca Steeb
eb703e5380 [streamango] Make title optional 2017-06-07 21:53:33 +07:00
Sergey M․
0a3924e746 [rtlnl] Improve _VALID_URL (closes #13295) 2017-06-06 21:21:44 +07:00
Sergey M․
e1db730d86 [tvplayer] Fix extraction (closes #13291) 2017-06-06 00:13:57 +07:00
Sergey M․
537191826f release 2017.06.05 2017-06-05 00:48:07 +07:00
Sergey M․
130880ba48 [ChangeLog] Actualize 2017-06-05 00:43:38 +07:00
Sergey M․
f8ba3fda4d Credit @jktjkt for dvtv formats (#13063) 2017-06-05 00:38:44 +07:00
Sergey M․
e1b90cc3db Credit @mikf for beam:vod (#13032) 2017-06-05 00:35:41 +07:00
Sergey M․
43e6579558 Credit @adamvoss for bandcamp:weekly (#12758) 2017-06-04 23:22:19 +07:00
Sergey M․
6d923aab35 [bandcamp:weekly] Improve and extract more metadata (closes #12758) 2017-06-04 23:21:30 +07:00
Adam Voss
62bafabc09 [bandcamp:weekly] Add extractor 2017-06-04 23:21:07 +07:00
Sergey M․
9edcdac90c [pornhub:uservideos] Add missing raise 2017-06-04 20:39:55 +07:00
Sergey M․
cd138d8bd4 [pornhub:playlist] Fix extraction (closes #13281) 2017-06-04 15:54:19 +07:00
Sergey M․
cd750b731c [godtv] Remove extractor (closes #13175) 2017-06-03 22:08:12 +07:00
CeruleanSky
4bede0d8f5 [YoutubeDL] Don't emit ANSI escape codes on Windows 2017-06-03 19:14:23 +07:00
Sergey M․
f129c3f349 [safari] Fix typo (closes #13252) 2017-06-02 01:03:51 +07:00
Sergey M․
39d4c1be4d [youtube] Improve chapters extraction (closes #13247) 2017-06-01 23:29:45 +07:00
Sergey M․
f7a747ce59 [1tv] Lower preference for http formats (closes #13246) 2017-06-01 22:19:52 +07:00
Sergey M․
4489d41816 [francetv] Relax _VALID_URL 2017-06-01 00:15:15 +07:00
Sergey M․
87b5184a0d [drbonanza] Fix extraction (closes #13231) 2017-05-31 23:56:32 +07:00
Remita Amine
c56ad5c975 [packtpub] Fix authentication(closes #13240) 2017-05-31 15:44:29 +01:00
Sergey M․
6b7ce85cdc [README.md] Mention http_dash_segments protocol 2017-05-30 23:50:48 +07:00
Yen Chi Hsuan
d10d0e3cf8 [README.md] Add an example for how to use .netrc on Windows
That's a Python bug: http://bugs.python.org/issue28334
Most likely it will be fixed in Python 3.7: https://github.com/python/cpython/pull/123
2017-05-29 14:58:07 +08:00
Tithen-Firion
c89267d31a Merge branch 'master' into openload-phantomjs-method 2017-05-04 11:00:06 +02:00
Remita Amine
5ff1bc0cc1 [YoutubeDL] write raw subtitle files 2017-04-29 20:03:03 +01:00
Tithen-Firion
7552f96352 [openload] Add required version 2017-04-29 12:41:57 +02:00
Tithen-Firion
98f9d87381 [phantomjs] Add required version checking 2017-04-29 12:41:42 +02:00
Tithen-Firion
fcace2d1ad [openload] raise not found before executing js 2017-04-29 10:30:45 +02:00
Tithen-Firion
40e41780f1 [phantomjs] add cookie support 2017-04-25 15:12:54 +02:00
Tithen-Firion
da57ebaf84 [openload] separate PhantomJS code from extractor 2017-04-25 01:06:14 +02:00
Tithen-Firion
47e0cef46e [openload] rewrite extractor 2017-04-16 00:34:34 +02:00
218 changed files with 7977 additions and 2601 deletions

View File

@@ -1,16 +1,16 @@
## Please follow the guide below
- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
- Use *Preview* tab to see how your issue will actually look like
- Put an `x` into all the boxes [ ] relevant to your *issue* (like this: `[x]`)
- Use the *Preview* tab to see what your issue will actually look like
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.05.29*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.05.29**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.10.07*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.10.07**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
### What is the purpose of your *issue*?
@@ -28,14 +28,14 @@
### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl -v <your command line>`), copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
```
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.05.29
[debug] youtube-dl version 2017.10.07
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -1,16 +1,16 @@
## Please follow the guide below
- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
- Use *Preview* tab to see how your issue will actually look like
- Put an `x` into all the boxes [ ] relevant to your *issue* (like this: `[x]`)
- Use the *Preview* tab to see what your issue will actually look like
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *%(version)s*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *%(version)s*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **%(version)s**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
### What is the purpose of your *issue*?
@@ -28,9 +28,9 @@
### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl -v <your command line>`), copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
```
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']

View File

@@ -9,6 +9,7 @@
### Before submitting a *pull request* make sure you have:
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
- [ ] Checked the code with [flake8](https://pypi.python.org/pypi/flake8)
### In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under [Unlicense](http://unlicense.org/). Check one of the following options:
- [ ] I am the original author of this code and I am willing to release it under [Unlicense](http://unlicense.org/)

1
.gitignore vendored
View File

@@ -22,6 +22,7 @@ cover/
updates_key.pem
*.egg-info
*.srt
*.ttml
*.sbv
*.vtt
*.flv

14
AUTHORS
View File

@@ -217,3 +217,17 @@ Marvin Ewald
Frédéric Bournival
Timendum
gritstub
Adam Voss
Mike Fährmann
Jan Kundrát
Giuseppe Fabiano
Örn Guðjónsson
Parmjit Virk
Genki Sky
Ľuboš Katrinec
Corey Nicholson
Ashutosh Chaudhary
John Dong
Tatsuyuki Ishi
Daniel Weber
Kay Bouché

View File

@@ -3,7 +3,7 @@
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Command-line args: [u'-v', u'https://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2015.12.06
[debug] Git HEAD: 135392e
@@ -34,7 +34,7 @@ For bug reports, this means that your report should contain the *complete* outpu
If your server has multiple IPs or you suspect censorship, adding `--call-home` may be a good idea to get more diagnostics. If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--dump-pages` (warning: this will yield a rather large output, redirect it to the file `log.txt` by adding `>log.txt 2>&1` to your command-line) or upload the `.dump` files you get when you add `--write-pages` [somewhere](https://gist.github.com/).
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `http://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `http://www.youtube.com/`) is *not* an example URL.
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `https://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `https://www.youtube.com/`) is *not* an example URL.
### Are you using the latest version?
@@ -70,7 +70,7 @@ It may sound strange, but some bug reports we receive are completely unrelated t
# DEVELOPER INSTRUCTIONS
Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
Most users do not need to build youtube-dl and can [download the builds](https://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
To run youtube-dl as a developer, you don't need to build anything either. Simply execute
@@ -82,6 +82,8 @@ To run the test, simply invoke your favorite test runner, or execute a test file
python test/test_download.py
nosetests
See item 6 of [new extractor tutorial](#adding-support-for-a-new-site) for how to run extractor specific test cases.
If you want to create a build of youtube-dl yourself, you'll need
* python
@@ -118,7 +120,7 @@ After you have ensured this site is distributing its content legally, you can fo
class YourExtractorIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://yourextractor.com/watch/42',
'url': 'https://yourextractor.com/watch/42',
'md5': 'TODO: md5 sum of the first 10241 bytes of the video file (use --test)',
'info_dict': {
'id': '42',
@@ -149,10 +151,10 @@ After you have ensured this site is distributing its content legally, you can fo
}
```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py

496
ChangeLog
View File

@@ -1,3 +1,499 @@
version 2017.10.07
Core
* [YoutubeDL] Ignore duplicates in --playlist-items
* [YoutubeDL] Fix out of range --playlist-items for iterable playlists and
reduce code duplication (#14425)
+ [utils] Use cache in OnDemandPagedList by default
* [postprocessor/ffmpeg] Convert to opus using libopus (#14381)
Extractors
* [reddit] Sort formats (#14430)
* [lnkgo] Relax URL regular expression (#14423)
* [pornflip] Extend URL regular expression (#14405, #14406)
+ [xtube] Add support for embed URLs (#14417)
+ [xvideos] Add support for embed URLs and improve extraction (#14409)
* [beeg] Fix extraction (#14403)
* [tvn24] Relax URL regular expression (#14395)
* [nbc] Fix extraction (#13651, #13715, #14137, #14198, #14312, #14314, #14378,
#14392, #14414, #14419, #14431)
+ [ketnet] Add support for videos without direct sources (#14377)
* [canvas] Generalize mediazone.vrt.be extractor and rework canvas and een
+ [afreecatv] Add support for adult videos (#14376)
version 2017.10.01
Core
* [YoutubeDL] Document youtube_include_dash_manifest
Extractors
+ [tvp] Add support for new URL schema (#14368)
+ [generic] Add support for single format Video.js embeds (#14371)
* [yahoo] Bypass geo restriction for brightcove (#14210)
* [yahoo] Use extracted brightcove account id (#14210)
* [rtve:alacarta] Fix extraction (#14290)
+ [yahoo] Add support for custom brigthcove embeds (#14210)
+ [generic] Add support for Video.js embeds
+ [gfycat] Add support for /gifs/detail URLs (#14322)
* [generic] Fix infinite recursion for twitter:player URLs (#14339)
* [xhamsterembed] Fix extraction (#14308)
version 2017.09.24
Core
+ [options] Accept lrc as a subtitle conversion target format (#14292)
* [utils] Fix handling raw TTML subtitles (#14191)
Extractors
* [24video] Fix timestamp extraction and make non fatal (#14295)
+ [24video] Add support for 24video.adult (#14295)
+ [kakao] Add support for tv.kakao.com (#12298, #14007)
+ [twitter] Add support for URLs without user id (#14270)
+ [americastestkitchen] Add support for americastestkitchen.com (#10764,
#13996)
* [generic] Fix support for multiple HTML5 videos on one page (#14080)
* [mixcloud] Fix extraction (#14088, #14132)
+ [lynda] Add support for educourse.ga (#14286)
* [beeg] Fix extraction (#14275)
* [nbcsports:vplayer] Correct theplatform URL (#13873)
* [twitter] Fix duration extraction (#14141)
* [tvplay] Bypass geo restriction
+ [heise] Add support for YouTube embeds (#14109)
+ [popcorntv] Add support for popcorntv.it (#5914, #14211)
* [viki] Update app data (#14181)
* [morningstar] Relax URL regular expression (#14222)
* [openload] Fix extraction (#14225, #14257)
* [noovo] Fix extraction (#14214)
* [dailymotion:playlist] Relax URL regular expression (#14219)
+ [twitch] Add support for go.twitch.tv URLs (#14215)
* [vgtv] Relax URL regular expression (#14223)
version 2017.09.15
Core
* [downloader/fragment] Restart inconsistent incomplete fragment downloads
(#13731)
* [YoutubeDL] Download raw subtitles files (#12909, #14191)
Extractors
* [condenast] Fix extraction (#14196, #14207)
+ [orf] Add support for f4m stories
* [tv4] Relax URL regular expression (#14206)
* [animeondemand] Bypass geo restriction
+ [animeondemand] Add support for flash videos (#9944)
version 2017.09.11
Extractors
* [rutube:playlist] Fix suitable (#14166)
version 2017.09.10
Core
+ [utils] Introduce bool_or_none
* [YoutubeDL] Ensure dir existence for each requested format (#14116)
Extractors
* [fox] Fix extraction (#14147)
* [rutube] Use bool_or_none
* [rutube] Rework and generalize playlist extractors (#13565)
+ [rutube:playlist] Add support for playlists (#13534, #13565)
+ [radiocanada] Add fallback for title extraction (#14145)
* [vk] Use dedicated YouTube embeds extraction routine
* [vice] Use dedicated YouTube embeds extraction routine
* [cracked] Use dedicated YouTube embeds extraction routine
* [chilloutzone] Use dedicated YouTube embeds extraction routine
* [abcnews] Use dedicated YouTube embeds extraction routine
* [youtube] Separate methods for embeds extraction
* [redtube] Fix formats extraction (#14122)
* [arte] Relax unavailability check (#14112)
+ [manyvids] Add support for preview videos from manyvids.com (#14053, #14059)
* [vidme:user] Relax URL regular expression (#14054)
* [bpb] Fix extraction (#14043, #14086)
* [soundcloud] Fix download URL with private tracks (#14093)
* [aliexpress:live] Add support for live.aliexpress.com (#13698, #13707)
* [viidea] Capture and output lecture error message (#14099)
* [radiocanada] Skip unsupported platforms (#14100)
version 2017.09.02
Extractors
* [youtube] Force old layout for each webpage (#14068, #14072, #14074, #14076,
#14077, #14079, #14082, #14083, #14094, #14095, #14096)
* [youtube] Fix upload date extraction (#14065)
+ [charlierose] Add support for episodes (#14062)
+ [bbccouk] Add support for w-prefixed ids (#14056)
* [googledrive] Extend URL regular expression (#9785)
+ [googledrive] Add support for source format (#14046)
* [pornhd] Fix extraction (#14005)
version 2017.08.27.1
Extractors
* [youtube] Fix extraction with --youtube-skip-dash-manifest enabled (#14037)
version 2017.08.27
Core
+ [extractor/common] Extract height and format id for HTML5 videos (#14034)
* [downloader/http] Rework HTTP downloader (#506, #809, #2849, #4240, #6023,
#8625, #9483)
* Simplify code and split into separate routines to facilitate maintaining
* Make retry mechanism work on errors during actual download not only
during connection establishment phase
* Retry on ECONNRESET and ETIMEDOUT during reading data from network
* Retry on content too short
* Show error description on retry
Extractors
* [generic] Lower preference for extraction from LD-JSON
* [rai] Fix audio formats extraction (#14024)
* [youtube] Fix controversy videos extraction (#14027, #14029)
* [mixcloud] Fix extraction (#14015, #14020)
version 2017.08.23
Core
+ [extractor/common] Introduce _parse_xml
* [extractor/common] Make HLS and DASH extraction in_parse_html5_media_entries
non fatal (#13970)
* [utils] Fix unescapeHTML for misformed string like "&a&quot;" (#13935)
Extractors
* [cbc:watch] Bypass geo restriction (#13993)
* [toutv] Relax DRM check (#13994)
+ [googledrive] Add support for subtitles (#13619, #13638)
* [pornhub] Relax uploader regular expression (#13906, #13975)
* [bandcamp:album] Extract track titles (#13962)
+ [bbccouk] Add support for events URLs (#13893)
+ [liveleak] Support multi-video pages (#6542)
+ [liveleak] Support another liveleak embedding pattern (#13336)
* [cda] Fix extraction (#13935)
+ [laola1tv] Add support for tv.ittf.com (#13965)
* [mixcloud] Fix extraction (#13958, #13974, #13980, #14003)
version 2017.08.18
Core
* [YoutubeDL] Sanitize byte string format URLs (#13951)
+ [extractor/common] Add support for float durations in _parse_mpd_formats
(#13919)
Extractors
* [arte] Detect unavailable videos (#13945)
* [generic] Convert redirect URLs to unicode strings (#13951)
* [udemy] Fix paid course detection (#13943)
* [pluralsight] Use RPC API for course extraction (#13937)
+ [clippit] Add support for clippituser.tv
+ [qqmusic] Support new URL schemes (#13805)
* [periscope] Renew HLS extraction (#13917)
* [mixcloud] Extract decrypt key
version 2017.08.13
Core
* [YoutubeDL] Make sure format id is not empty
* [extractor/common] Make _family_friendly_search optional
* [extractor/common] Respect source's type attribute for HTML5 media (#13892)
Extractors
* [pornhub:playlistbase] Skip videos from drop-down menu (#12819, #13902)
+ [fourtube] Add support pornerbros.com (#6022)
+ [fourtube] Add support porntube.com (#7859, #13901)
+ [fourtube] Add support fux.com
* [limelight] Improve embeds detection (#13895)
+ [reddit] Add support for v.redd.it and reddit.com (#13847)
* [aparat] Extract all formats (#13887)
* [mixcloud] Fix play info decryption (#13885)
+ [generic] Add support for vzaar embeds (#13876)
version 2017.08.09
Core
* [utils] Skip missing params in cli_bool_option (#13865)
Extractors
* [xxxymovies] Fix title extraction (#13868)
+ [nick] Add support for nick.com.pl (#13860)
* [mixcloud] Fix play info decryption (#13867)
* [20min] Fix embeds extraction (#13852)
* [dplayit] Fix extraction (#13851)
+ [niconico] Support videos with multiple formats (#13522)
+ [niconico] Support HTML5-only videos (#13806)
version 2017.08.06
Core
* Use relative paths for DASH fragments (#12990)
Extractors
* [pluralsight] Fix format selection
- [mpora] Remove extractor (#13826)
+ [voot] Add support for voot.com (#10255, #11644, #11814, #12350, #13218)
* [vlive:channel] Limit number of videos per page to 100 (#13830)
* [podomatic] Extend URL regular expression (#13827)
* [cinchcast] Extend URL regular expression
* [yandexdisk] Relax URL regular expression (#13824)
* [vidme] Extract DASH and HLS formats
- [teamfour] Remove extractor (#13782)
* [pornhd] Fix extraction (#13783)
* [udemy] Fix subtitles extraction (#13812)
* [mlb] Extend URL regular expression (#13740, #13773)
+ [pbs] Add support for new URL schema (#13801)
* [nrktv] Update API host (#13796)
version 2017.07.30.1
Core
* [downloader/hls] Use redirect URL as manifest base (#13755)
* [options] Correctly hide login info from debug outputs (#13696)
Extractors
+ [watchbox] Add support for watchbox.de (#13739)
- [clipfish] Remove extractor
+ [youjizz] Fix extraction (#13744)
+ [generic] Add support for another ooyala embed pattern (#13727)
+ [ard] Add support for lives (#13771)
* [soundcloud] Update client id
+ [soundcloud:trackstation] Add support for track stations (#13733)
* [svtplay] Use geo verification proxy for API request
* [svtplay] Update API URL (#13767)
+ [yandexdisk] Add support for yadi.sk (#13755)
+ [megaphone] Add support for megaphone.fm
* [amcnetworks] Make rating optional (#12453)
* [cloudy] Fix extraction (#13737)
+ [nickru] Add support for nickelodeon.ru
* [mtv] Improve thumbnal extraction
* [nick] Automate geo-restriction bypass (#13711)
* [niconico] Improve error reporting (#13696)
version 2017.07.23
Core
* [YoutubeDL] Improve default format specification (#13704)
* [YoutubeDL] Do not override id, extractor and extractor_key for
url_transparent entities
* [extractor/common] Fix playlist_from_matches
Extractors
* [itv] Fix production id extraction (#13671, #13703)
* [vidio] Make duration non fatal and fix typo
* [mtv] Skip missing video parts (#13690)
* [sportbox:embed] Fix extraction
+ [npo] Add support for npo3.nl URLs (#13695)
* [dramafever] Remove video id from title (#13699)
+ [egghead:lesson] Add support for lessons (#6635)
* [funnyordie] Extract more metadata (#13677)
* [youku:show] Fix playlist extraction (#13248)
+ [dispeak] Recognize sevt subdomain (#13276)
* [adn] Improve error reporting (#13663)
* [crunchyroll] Relax series and season regular expression (#13659)
+ [spiegel:article] Add support for nexx iframe embeds (#13029)
+ [nexx:embed] Add support for iframe embeds
* [nexx] Improve JS embed extraction
+ [pearvideo] Add support for pearvideo.com (#13031)
version 2017.07.15
Core
* [YoutubeDL] Don't expand environment variables in meta fields (#13637)
Extractors
* [spiegeltv] Delegate extraction to nexx extractor (#13159)
+ [nexx] Add support for nexx.cloud (#10807, #13465)
* [generic] Fix rutube embeds extraction (#13641)
* [karrierevideos] Fix title extraction (#13641)
* [youtube] Don't capture YouTube Red ad for creator meta field (#13621)
* [slideshare] Fix extraction (#13617)
+ [5tv] Add another video URL pattern (#13354, #13606)
* [drtv] Make HLS and HDS extraction non fatal
* [ted] Fix subtitles extraction (#13628, #13629)
* [vine] Make sure the title won't be empty
+ [twitter] Support HLS streams in vmap URLs
+ [periscope] Support pscp.tv URLs in embedded frames
* [twitter] Extract mp4 urls via mobile API (#12726)
* [niconico] Fix authentication error handling (#12486)
* [giantbomb] Extract m3u8 formats (#13626)
+ [vlive:playlist] Add support for playlists (#13613)
version 2017.07.09
Core
+ [extractor/common] Add support for AMP tags in _parse_html5_media_entries
+ [utils] Support attributes with no values in get_elements_by_attribute
Extractors
+ [dailymail] Add support for embeds
+ [joj] Add support for joj.sk (#13268)
* [abc.net.au:iview] Extract more formats (#13492, #13489)
* [egghead:course] Fix extraction (#6635, #13370)
+ [cjsw] Add support for cjsw.com (#13525)
+ [eagleplatform] Add support for referrer protected videos (#13557)
+ [eagleplatform] Add support for another embed pattern (#13557)
* [veoh] Extend URL regular expression (#13601)
* [npo:live] Fix live stream id extraction (#13568, #13605)
* [googledrive] Fix height extraction (#13603)
+ [dailymotion] Add support for new layout (#13580)
- [yam] Remove extractor
* [xhamster] Extract all formats and fix duration extraction (#13593)
+ [xhamster] Add support for new URL schema (#13593)
* [espn] Extend URL regular expression (#13244, #13549)
* [kaltura] Fix typo in subtitles extraction (#13569)
* [vier] Adapt extraction to redesign (#13575)
version 2017.07.02
Core
* [extractor/common] Improve _json_ld
Extractors
+ [thisoldhouse] Add more fallbacks for video id
* [thisoldhouse] Fix video id extraction (#13540, #13541)
* [xfileshare] Extend format regular expression (#13536)
* [ted] Fix extraction (#13535)
+ [tastytrade] Add support for tastytrade.com (#13521)
* [dplayit] Relax video id regular expression (#13524)
+ [generic] Extract more generic metadata (#13527)
+ [bbccouk] Capture and output error message (#13501, #13518)
* [cbsnews] Relax video info regular expression (#13284, #13503)
+ [facebook] Add support for plugin video embeds and multiple embeds (#13493)
* [soundcloud] Switch to https for API requests (#13502)
* [pandatv] Switch to https for API and download URLs
+ [pandatv] Add support for https URLs (#13491)
+ [niconico] Support sp subdomain (#13494)
version 2017.06.25
Core
+ [adobepass] Add support for DIRECTV NOW (mso ATTOTT) (#13472)
* [YoutubeDL] Skip malformed formats for better extraction robustness
Extractors
+ [wsj] Add support for barrons.com (#13470)
+ [ign] Add another video id pattern (#13328)
+ [raiplay:live] Add support for live streams (#13414)
+ [redbulltv] Add support for live videos and segments (#13486)
+ [onetpl] Add support for videos embedded via pulsembed (#13482)
* [ooyala] Make more robust
* [ooyala] Skip empty format URLs (#13471, #13476)
* [hgtv.com:show] Fix typo
version 2017.06.23
Core
* [adobepass] Fix extraction on older python 2.6
Extractors
* [youtube] Adapt to new automatic captions rendition (#13467)
* [hgtv.com:show] Relax video config regular expression (#13279, #13461)
* [drtuber] Fix formats extraction (#12058)
* [youporn] Fix upload date extraction
* [youporn] Improve formats extraction
* [youporn] Fix title extraction (#13456)
* [googledrive] Fix formats sorting (#13443)
* [watchindianporn] Fix extraction (#13411, #13415)
+ [vimeo] Add fallback mp4 extension for original format
+ [ruv] Add support for ruv.is (#13396)
* [viu] Fix extraction on older python 2.6
* [pandora.tv] Fix upload_date extraction (#12846)
+ [asiancrush] Add support for asiancrush.com (#13420)
version 2017.06.18
Core
* [downloader/common] Use utils.shell_quote for debug command line
* [utils] Use compat_shlex_quote in shell_quote
* [postprocessor/execafterdownload] Encode command line (#13407)
* [compat] Fix compat_shlex_quote on Windows (#5889, #10254)
* [postprocessor/metadatafromtitle] Fix missing optional meta fields processing
in --metadata-from-title (#13408)
* [extractor/common] Fix json dumping with --geo-bypass
+ [extractor/common] Improve jwplayer subtitles extraction
+ [extractor/common] Improve jwplayer formats extraction (#13379)
Extractors
* [polskieradio] Fix extraction (#13392)
+ [xfileshare] Add support for fastvideo.me (#13385)
* [bilibili] Fix extraction of videos with double quotes in titles (#13387)
* [4tube] Fix extraction (#13381, #13382)
+ [disney] Add support for disneychannel.de (#13383)
* [npo] Improve URL regular expression (#13376)
+ [corus] Add support for showcase.ca
+ [corus] Add support for history.ca (#13359)
version 2017.06.12
Core
* [utils] Handle compat_HTMLParseError in extract_attributes (#13349)
+ [compat] Introduce compat_HTMLParseError
* [utils] Improve unified_timestamp
* [extractor/generic] Ensure format id is unicode string
* [extractor/common] Return unicode string from _match_id
+ [YoutubeDL] Sanitize more fields (#13313)
Extractors
+ [xfileshare] Add support for rapidvideo.tv (#13348)
* [xfileshare] Modernize and pass Referer
+ [rutv] Add support for testplayer.vgtrk.com (#13347)
+ [newgrounds] Extract more metadata (#13232)
+ [newgrounds:playlist] Add support for playlists (#10611)
* [newgrounds] Improve formats and uploader extraction (#13346)
* [msn] Fix formats extraction
* [turbo] Ensure format id is string
* [sexu] Ensure height is int
* [jove] Ensure comment count is int
* [golem] Ensure format id is string
* [gfycat] Ensure filesize is int
* [foxgay] Ensure height is int
* [flickr] Ensure format id is string
* [sohu] Fix numeric fields
* [safari] Improve authentication detection (#13319)
* [liveleak] Ensure height is int (#13313)
* [streamango] Make title optional (#13292)
* [rtlnl] Improve URL regular expression (#13295)
* [tvplayer] Fix extraction (#13291)
version 2017.06.05
Core
* [YoutubeDL] Don't emit ANSI escape codes on Windows (#13270)
Extractors
+ [bandcamp:weekly] Add support for bandcamp weekly (#12758)
* [pornhub:playlist] Fix extraction (#13281)
- [godtv] Remove extractor (#13175)
* [safari] Fix typo (#13252)
* [youtube] Improve chapters extraction (#13247)
* [1tv] Lower preference for HTTP formats (#13246)
* [francetv] Relax URL regular expression
* [drbonanza] Fix extraction (#13231)
* [packtpub] Fix authentication (#13240)
version 2017.05.29
Extractors

View File

@@ -46,8 +46,15 @@ tar: youtube-dl.tar.gz
pypi-files: youtube-dl.bash-completion README.txt youtube-dl.1 youtube-dl.fish
youtube-dl: youtube_dl/*.py youtube_dl/*/*.py
zip --quiet youtube-dl youtube_dl/*.py youtube_dl/*/*.py
zip --quiet --junk-paths youtube-dl youtube_dl/__main__.py
mkdir -p zip
for d in youtube_dl youtube_dl/downloader youtube_dl/extractor youtube_dl/postprocessor ; do \
mkdir -p zip/$$d ;\
cp -pPR $$d/*.py zip/$$d/ ;\
done
touch -t 200001010101 zip/youtube_dl/*.py zip/youtube_dl/*/*.py
mv zip/youtube_dl/__main__.py zip/
cd zip ; zip -q ../youtube-dl youtube_dl/*.py youtube_dl/*/*.py __main__.py
rm -rf zip
echo '#!$(PYTHON)' > youtube-dl
cat youtube-dl.zip >> youtube-dl
rm youtube-dl.zip
@@ -101,7 +108,7 @@ youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-
--exclude '*.pyc' \
--exclude '*.pyo' \
--exclude '*~' \
--exclude '__pycache' \
--exclude '__pycache__' \
--exclude '.git' \
--exclude 'testdata' \
--exclude 'docs/_build' \

101
README.md
View File

@@ -25,7 +25,7 @@ If you do not have curl, you can alternatively use a recent wget:
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](https://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
You can also use pip:
@@ -33,7 +33,7 @@ You can also use pip:
This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.
OS X users can install youtube-dl with [Homebrew](http://brew.sh/):
OS X users can install youtube-dl with [Homebrew](https://brew.sh/):
brew install youtube-dl
@@ -145,18 +145,18 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--max-views COUNT Do not download any videos with more than
COUNT views
--match-filter FILTER Generic video filter. Specify any key (see
help for -o for a list of available keys)
to match if the key is present, !key to
check if the key is not present, key >
NUMBER (like "comment_count > 12", also
works with >=, <, <=, !=, =) to compare
against a number, key = 'LITERAL' (like
"uploader = 'Mike Smith'", also works with
!=) to match against a string literal and &
to require multiple matches. Values which
are not known are excluded unless you put a
question mark (?) after the operator. For
example, to only match videos that have
the "OUTPUT TEMPLATE" for a list of
available keys) to match if the key is
present, !key to check if the key is not
present, key > NUMBER (like "comment_count
> 12", also works with >=, <, <=, !=, =) to
compare against a number, key = 'LITERAL'
(like "uploader = 'Mike Smith'", also works
with !=) to match against a string literal
and & to require multiple matches. Values
which are not known are excluded unless you
put a question mark (?) after the operator.
For example, to only match videos that have
been liked more than 100 times and disliked
less than 50 times (or the dislike
functionality is not available at the given
@@ -277,8 +277,8 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--get-filename Simulate, quiet but print output filename
--get-format Simulate, quiet but print output format
-j, --dump-json Simulate, quiet but print JSON information.
See --output for a description of available
keys.
See the "OUTPUT TEMPLATE" for a description
of available keys.
-J, --dump-single-json Simulate, quiet but print JSON information
for each command-line argument. If the URL
refers to a playlist, dump the whole
@@ -427,7 +427,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
syntax. Example: --exec 'adb push {}
/sdcard/Music/ && rm {}'
--convert-subs FORMAT Convert the subtitles to other format
(currently supported: srt|ass|vtt)
(currently supported: srt|ass|vtt|lrc)
# CONFIGURATION
@@ -458,7 +458,7 @@ You can also use `--config-location` if you want to use custom configuration fil
### Authentication with `.netrc` file
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you:
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](https://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you:
```
touch $HOME/.netrc
chmod a-rwx,u+rw $HOME/.netrc
@@ -474,7 +474,10 @@ machine twitch login my_twitch_account_name password my_twitch_password
```
To activate authentication with the `.netrc` file you should pass `--netrc` to youtube-dl or place it in the [configuration file](#configuration).
On Windows you may also need to setup the `%HOME%` environment variable manually.
On Windows you may also need to setup the `%HOME%` environment variable manually. For example:
```
set HOME=%USERPROFILE%
```
# OUTPUT TEMPLATE
@@ -482,7 +485,7 @@ The `-o` option allows users to indicate a template for the output file names.
**tl;dr:** [navigate me to examples](#output-template-examples).
The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "http://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a formatting operations. Allowed names along with sequence type are:
The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a formatting operations. Allowed names along with sequence type are:
- `id` (string): Video identifier
- `title` (string): Video title
@@ -532,13 +535,14 @@ The basic usage is not to set any template arguments when downloading a single f
- `playlist_id` (string): Playlist identifier
- `playlist_title` (string): Playlist title
Available for the video that belongs to some logical chapter or section:
- `chapter` (string): Name or title of the chapter the video belongs to
- `chapter_number` (numeric): Number of the chapter the video belongs to
- `chapter_id` (string): Id of the chapter the video belongs to
Available for the video that is an episode of some series or programme:
- `series` (string): Title of the series or programme the video episode belongs to
- `season` (string): Title of the season the video episode belongs to
- `season_number` (numeric): Number of the season the video episode belongs to
@@ -548,6 +552,7 @@ Available for the video that is an episode of some series or programme:
- `episode_id` (string): Id of the video episode
Available for the media that is a track or a part of a music album:
- `track` (string): Title of the track
- `track_number` (numeric): Number of the track within an album or a disc
- `track_id` (string): Id of the track
@@ -579,7 +584,7 @@ If you are using an output template inside a Windows batch file then you must es
#### Output template examples
Note on Windows you may need to use double quotes instead of single.
Note that on Windows you may need to use double quotes instead of single.
```bash
$ youtube-dl --get-filename -o '%(title)s.%(ext)s' BaW_jenozKc
@@ -598,7 +603,7 @@ $ youtube-dl -o '%(uploader)s/%(playlist)s/%(playlist_index)s - %(title)s.%(ext)
$ youtube-dl -u user -p password -o '~/MyVideos/%(playlist)s/%(chapter_number)s - %(chapter)s/%(title)s.%(ext)s' https://www.udemy.com/java-tutorial/
# Download entire series season keeping each series and each season in separate directory under C:/MyVideos
$ youtube-dl -o "C:/MyVideos/%(series)s/%(season_number)s - %(season)s/%(episode_number)s - %(episode)s.%(ext)s" http://videomore.ru/kino_v_detalayah/5_sezon/367617
$ youtube-dl -o "C:/MyVideos/%(series)s/%(season_number)s - %(season)s/%(episode_number)s - %(episode)s.%(ext)s" https://videomore.ru/kino_v_detalayah/5_sezon/367617
# Stream the video being downloaded to stdout
$ youtube-dl -o - BaW_jenozKc
@@ -649,7 +654,7 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
- `acodec`: Name of the audio codec in use
- `vcodec`: Name of the video codec in use
- `container`: Name of the container format
- `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `m3u8`, or `m3u8_native`)
- `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`)
- `format_id`: A short description of the format
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.
@@ -666,7 +671,7 @@ If you want to preserve the old format selection behavior (prior to youtube-dl 2
#### Format selection examples
Note on Windows you may need to use double quotes instead of single.
Note that on Windows you may need to use double quotes instead of single.
```bash
# Download best mp4 format available or any other best if no mp4 available
@@ -711,17 +716,17 @@ $ youtube-dl --dateafter 20000101 --datebefore 20091231
### How do I update youtube-dl?
If you've followed [our manual installation instructions](http://rg3.github.io/youtube-dl/download.html), you can simply run `youtube-dl -U` (or, on Linux, `sudo youtube-dl -U`).
If you've followed [our manual installation instructions](https://rg3.github.io/youtube-dl/download.html), you can simply run `youtube-dl -U` (or, on Linux, `sudo youtube-dl -U`).
If you have used pip, a simple `sudo pip install -U youtube-dl` is sufficient to update.
If you have installed youtube-dl using a package manager like *apt-get* or *yum*, use the standard system update mechanism to update. Note that distribution packages are often outdated. As a rule of thumb, youtube-dl releases at least once a month, and often weekly or even daily. Simply go to http://yt-dl.org/ to find out the current version. Unfortunately, there is nothing we youtube-dl developers can do if your distribution serves a really outdated version. You can (and should) complain to your distribution in their bugtracker or support forum.
If you have installed youtube-dl using a package manager like *apt-get* or *yum*, use the standard system update mechanism to update. Note that distribution packages are often outdated. As a rule of thumb, youtube-dl releases at least once a month, and often weekly or even daily. Simply go to https://yt-dl.org to find out the current version. Unfortunately, there is nothing we youtube-dl developers can do if your distribution serves a really outdated version. You can (and should) complain to your distribution in their bugtracker or support forum.
As a last resort, you can also uninstall the version installed by your package manager and follow our manual installation instructions. For that, remove the distribution's package, with a line like
sudo apt-get remove -y youtube-dl
Afterwards, simply follow [our manual installation instructions](http://rg3.github.io/youtube-dl/download.html):
Afterwards, simply follow [our manual installation instructions](https://rg3.github.io/youtube-dl/download.html):
```
sudo wget https://yt-dl.org/latest/youtube-dl -O /usr/local/bin/youtube-dl
@@ -761,11 +766,11 @@ Apparently YouTube requires you to pass a CAPTCHA test if you download too much.
youtube-dl works fine on its own on most sites. However, if you want to convert video/audio, you'll need [avconv](https://libav.org/) or [ffmpeg](https://www.ffmpeg.org/). On some sites - most notably YouTube - videos can be retrieved in a higher quality format without sound. youtube-dl will detect whether avconv/ffmpeg is present and automatically pick the best option.
Videos or video formats streamed via RTMP protocol can only be downloaded when [rtmpdump](https://rtmpdump.mplayerhq.hu/) is installed. Downloading MMS and RTSP videos requires either [mplayer](http://mplayerhq.hu/) or [mpv](https://mpv.io/) to be installed.
Videos or video formats streamed via RTMP protocol can only be downloaded when [rtmpdump](https://rtmpdump.mplayerhq.hu/) is installed. Downloading MMS and RTSP videos requires either [mplayer](https://mplayerhq.hu/) or [mpv](https://mpv.io/) to be installed.
### I have downloaded a video but how can I play it?
Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org/) or [mplayer](http://www.mplayerhq.hu/).
Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](https://www.videolan.org/) or [mplayer](https://www.mplayerhq.hu/).
### I extracted a video URL with `-g`, but it does not play on another machine / in my web browser.
@@ -840,10 +845,10 @@ Use the `-o` to specify an [output template](#output-template), for example `-o
### How do I download a video starting with a `-`?
Either prepend `http://www.youtube.com/watch?v=` or separate the ID from the options with `--`:
Either prepend `https://www.youtube.com/watch?v=` or separate the ID from the options with `--`:
youtube-dl -- -wNyEUrxzFU
youtube-dl "http://www.youtube.com/watch?v=-wNyEUrxzFU"
youtube-dl "https://www.youtube.com/watch?v=-wNyEUrxzFU"
### How do I pass cookies to youtube-dl?
@@ -857,9 +862,9 @@ Passing cookies to youtube-dl is a good way to workaround login when a particula
### How do I stream directly to media player?
You will first need to tell youtube-dl to stream media to stdout with `-o -`, and also tell your media player to read from stdin (it must be capable of this for streaming) and then pipe former to latter. For example, streaming to [vlc](http://www.videolan.org/) can be achieved with:
You will first need to tell youtube-dl to stream media to stdout with `-o -`, and also tell your media player to read from stdin (it must be capable of this for streaming) and then pipe former to latter. For example, streaming to [vlc](https://www.videolan.org/) can be achieved with:
youtube-dl -o - "http://www.youtube.com/watch?v=BaW_jenozKcj" | vlc -
youtube-dl -o - "https://www.youtube.com/watch?v=BaW_jenozKcj" | vlc -
### How do I download only new videos from a playlist?
@@ -879,7 +884,7 @@ When youtube-dl detects an HLS video, it can download it either with the built-i
When youtube-dl knows that one particular downloader works better for a given website, that downloader will be picked. Otherwise, youtube-dl will pick the best downloader for general compatibility, which at the moment happens to be ffmpeg. This choice may change in future versions of youtube-dl, with improvements of the built-in downloader and/or ffmpeg.
In particular, the generic extractor (used when your website is not in the [list of supported sites by youtube-dl](http://rg3.github.io/youtube-dl/supportedsites.html) cannot mandate one specific downloader.
In particular, the generic extractor (used when your website is not in the [list of supported sites by youtube-dl](https://rg3.github.io/youtube-dl/supportedsites.html) cannot mandate one specific downloader.
If you put either `--hls-prefer-native` or `--hls-prefer-ffmpeg` into your configuration, a different subset of videos will fail to download correctly. Instead, it is much better to [file an issue](https://yt-dl.org/bug) or a pull request which details why the native or the ffmpeg HLS downloader is a better choice for your use case.
@@ -905,7 +910,7 @@ Feel free to bump the issue from time to time by writing a small comment ("Issue
### How can I detect whether a given URL is supported by youtube-dl?
For one, have a look at the [list of supported sites](docs/supportedsites.md). Note that it can sometimes happen that the site changes its URL scheme (say, from http://example.com/video/1234567 to http://example.com/v/1234567 ) and youtube-dl reports an URL of a service in that list as unsupported. In that case, simply report a bug.
For one, have a look at the [list of supported sites](docs/supportedsites.md). Note that it can sometimes happen that the site changes its URL scheme (say, from https://example.com/video/1234567 to https://example.com/v/1234567 ) and youtube-dl reports an URL of a service in that list as unsupported. In that case, simply report a bug.
It is *not* possible to detect whether a URL is supported or not. That's because youtube-dl contains a generic extractor which matches **all** URLs. You may be tempted to disable, exclude, or remove the generic extractor, but the generic extractor not only allows users to extract videos from lots of websites that embed a video from another service, but may also be used to extract video from a service that it's hosting itself. Therefore, we neither recommend nor support disabling, excluding, or removing the generic extractor.
@@ -919,7 +924,7 @@ youtube-dl is an open-source project manned by too few volunteers, so we'd rathe
# DEVELOPER INSTRUCTIONS
Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
Most users do not need to build youtube-dl and can [download the builds](https://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
To run youtube-dl as a developer, you don't need to build anything either. Simply execute
@@ -931,6 +936,8 @@ To run the test, simply invoke your favorite test runner, or execute a test file
python test/test_download.py
nosetests
See item 6 of [new extractor tutorial](#adding-support-for-a-new-site) for how to run extractor specific test cases.
If you want to create a build of youtube-dl yourself, you'll need
* python
@@ -967,7 +974,7 @@ After you have ensured this site is distributing its content legally, you can fo
class YourExtractorIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://yourextractor.com/watch/42',
'url': 'https://yourextractor.com/watch/42',
'md5': 'TODO: md5 sum of the first 10241 bytes of the video file (use --test)',
'info_dict': {
'id': '42',
@@ -998,10 +1005,10 @@ After you have ensured this site is distributing its content legally, you can fo
}
```
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
@@ -1157,10 +1164,10 @@ import youtube_dl
ydl_opts = {}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
ydl.download(['https://www.youtube.com/watch?v=BaW_jenozKc'])
```
Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L129-L279). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/3e4cedf9e8cd3157df2457df7274d0c842421945/youtube_dl/YoutubeDL.py#L137-L312). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:
@@ -1196,19 +1203,19 @@ ydl_opts = {
'progress_hooks': [my_hook],
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
ydl.download(['https://www.youtube.com/watch?v=BaW_jenozKc'])
```
# BUGS
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](https://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
```
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Command-line args: [u'-v', u'https://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2015.12.06
[debug] Git HEAD: 135392e
@@ -1239,7 +1246,7 @@ For bug reports, this means that your report should contain the *complete* outpu
If your server has multiple IPs or you suspect censorship, adding `--call-home` may be a good idea to get more diagnostics. If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--dump-pages` (warning: this will yield a rather large output, redirect it to the file `log.txt` by adding `>log.txt 2>&1` to your command-line) or upload the `.dump` files you get when you add `--write-pages` [somewhere](https://gist.github.com/).
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `http://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `http://www.youtube.com/`) is *not* an example URL.
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `https://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `https://www.youtube.com/`) is *not* an example URL.
### Are you using the latest version?

View File

@@ -14,7 +14,7 @@ import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from test.helper import gettestcases
from youtube_dl.utils import compat_urllib_parse_urlparse
from youtube_dl.utils import compat_urllib_request
@@ -24,7 +24,7 @@ if len(sys.argv) > 1:
else:
METHOD = 'EURISTIC'
for test in get_testcases():
for test in gettestcases():
if METHOD == 'EURISTIC':
try:
webpage = compat_urllib_request.urlopen(test['url'], timeout=10).read()

View File

@@ -8,7 +8,7 @@ import re
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
README_FILE = os.path.join(ROOT_DIR, 'README.md')
PREFIX = '''%YOUTUBE-DL(1)
PREFIX = r'''%YOUTUBE-DL(1)
# NAME

View File

@@ -38,11 +38,13 @@
- **afreecatv**: afreecatv.com
- **afreecatv:global**: afreecatv.com
- **AirMozilla**
- **AliExpressLive**
- **AlJazeera**
- **Allocine**
- **AlphaPorno**
- **AMCNetworks**
- **anderetijden**: npo.nl and ntr.nl
- **AmericasTestKitchen**
- **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **AnimeOnDemand**
- **anitube.se**
- **Anvato**
@@ -67,6 +69,8 @@
- **arte.tv:info**
- **arte.tv:magazine**
- **arte.tv:playlist**
- **AsianCrush**
- **AsianCrushPlaylist**
- **AtresPlayer**
- **ATTTechChannel**
- **ATVAt**
@@ -87,6 +91,7 @@
- **bambuser:channel**
- **Bandcamp**
- **Bandcamp:album**
- **Bandcamp:weekly**
- **bangumi.bilibili.com**: BiliBili番剧
- **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer
@@ -125,7 +130,8 @@
- **CamWithHer**
- **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas**: canvas.be and een.be
- **Canvas**
- **CanvasEen**: canvas.be and een.be
- **CarambaTV**
- **CarambaTVPage**
- **CartoonNetwork**
@@ -151,8 +157,9 @@
- **chirbit**
- **chirbit:profile**
- **Cinchcast**
- **Clipfish**
- **CJSW**
- **cliphunter**
- **Clippit**
- **ClipRs**
- **Clipsyndicate**
- **CloserToTruth**
@@ -234,6 +241,7 @@
- **EbaumsWorld**
- **EchoMsk**
- **egghead:course**: egghead.io course
- **egghead:lesson**: egghead.io lesson
- **eHow**
- **Einthusan**
- **eitb.tv**
@@ -290,6 +298,7 @@
- **Funimation**
- **FunnyOrDie**
- **Fusion**
- **Fux**
- **FXNetworks**
- **GameInformer**
- **GameOne**
@@ -310,7 +319,6 @@
- **Go**
- **Go90**
- **GodTube**
- **GodTV**
- **Golem**
- **GoogleDrive**
- **Goshgay**
@@ -358,6 +366,7 @@
- **IPrima**
- **iqiyi**: 爱奇艺
- **Ir90Tv**
- **ITTF**
- **ITV**
- **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations
@@ -367,9 +376,11 @@
- **Jamendo**
- **JamendoAlbum**
- **JeuxVideo**
- **Joj**
- **Jove**
- **jpopsuki.tv**
- **JWPlatform**
- **Kakao**
- **Kaltura**
- **Kamcord**
- **KanalPlay**: Kanal 5/9/11 Play
@@ -413,6 +424,7 @@
- **limelight:channel_list**
- **LiTV**
- **LiveLeak**
- **LiveLeakEmbed**
- **livestream**
- **livestream:original**
- **LnkGo**
@@ -429,12 +441,14 @@
- **MakerTV**
- **mangomolo:live**
- **mangomolo:video**
- **ManyVids**
- **MatchTV**
- **MDR**: MDR.DE and KiKA
- **media.ccc.de**
- **Medialaan**
- **Mediaset**
- **Medici**
- **megaphone.fm**: megaphone.fm embedded players
- **Meipai**: 美拍
- **MelonVOD**
- **META**
@@ -467,7 +481,6 @@
- **MovieFap**
- **Moviezine**
- **MovingImage**
- **MPORA**
- **MSN**
- **mtg**: MTG services
- **mtv**
@@ -512,10 +525,13 @@
- **netease:song**: 网易云音乐
- **Netzkino**
- **Newgrounds**
- **NewgroundsPlaylist**
- **Newstube**
- **NextMedia**: 蘋果日報
- **NextMediaActionNews**: 蘋果日報 - 動新聞
- **NextTV**: 壹電視
- **Nexx**
- **NexxEmbed**
- **nfb**: National Film Board of Canada
- **nfl.com**
- **NhkVod**
@@ -525,6 +541,7 @@
- **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com**
- **nick.de**
- **nickelodeonru**
- **nicknight**
- **niconico**: ニコニコ動画
- **NiconicoPlaylist**
@@ -546,7 +563,7 @@
- **NowTVList**
- **nowvideo**: NowVideo
- **Noz**
- **npo**: npo.nl and ntr.nl
- **npo**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **npo.nl:live**
- **npo.nl:radio**
- **npo.nl:radio:fragment**
@@ -579,6 +596,7 @@
- **Openload**
- **OraTV**
- **orf:fm4**: radio FM4
- **orf:fm4:story**: fm4.orf.at stories
- **orf:iptv**: iptv.ORF.at
- **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek
@@ -590,6 +608,7 @@
- **Patreon**
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag**
- **PearVideo**
- **People**
- **periscope**: Periscope
- **periscope:user**: Periscope user videos
@@ -611,7 +630,9 @@
- **Pokemon**
- **PolskieRadio**
- **PolskieRadioCategory**
- **PopcornTV**
- **PornCom**
- **PornerBros**
- **PornFlip**
- **PornHd**
- **PornHub**: PornHub and Thumbzilla
@@ -620,6 +641,7 @@
- **Pornotube**
- **PornoVoisines**
- **PornoXO**
- **PornTube**
- **PressTV**
- **PrimeShareTV**
- **PromptFile**
@@ -641,9 +663,12 @@
- **RadioJavan**
- **Rai**
- **RaiPlay**
- **RaiPlayLive**
- **RBMARadio**
- **RDS**: RDS.ca
- **RedBullTV**
- **Reddit**
- **RedditR**
- **RedTube**
- **RegioTV**
- **RENTV**
@@ -683,8 +708,10 @@
- **rutube:embed**: Rutube embedded videos
- **rutube:movie**: Rutube movies
- **rutube:person**: Rutube person videos
- **rutube:playlist**: Rutube playlists
- **RUTV**: RUTV.RU
- **Ruutu**
- **Ruv**
- **safari**: safaribooksonline.com online video
- **safari:api**
- **safari:course**: safaribooksonline.com online courses
@@ -723,6 +750,7 @@
- **soundcloud:playlist**
- **soundcloud:search**: Soundcloud search
- **soundcloud:set**
- **soundcloud:trackstation**
- **soundcloud:user**
- **soundgasm**
- **soundgasm:profile**
@@ -763,13 +791,13 @@
- **Tagesschau**
- **tagesschau:player**
- **Tass**
- **TBS**
- **TastyTrade**
- **TBS** (Currently broken)
- **TDSLifeway**
- **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos
- **TeachingChannel**
- **Teamcoco**
- **TeamFourStar**
- **TechTalks**
- **techtv.mit.edu**
- **ted**
@@ -934,13 +962,15 @@
- **vk:wallpost**
- **vlive**
- **vlive:channel**
- **vlive:playlist**
- **Vodlocker**
- **VODPl**
- **VODPlatform**
- **VoiceRepublic**
- **Voot**
- **VoxMedia**
- **Vporn**
- **vpro**: npo.nl and ntr.nl
- **vpro**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **Vrak**
- **VRT**: deredactie.be, sporza.be, cobra.be and cobra.canvas.be
- **vrv**
@@ -955,6 +985,7 @@
- **washingtonpost**
- **washingtonpost:article**
- **wat.tv**
- **WatchBox**
- **WatchIndianPorn**: Watch Indian Porn
- **WDR**
- **wdr:mobile**
@@ -966,7 +997,7 @@
- **wholecloud**: WholeCloud
- **Wimp**
- **Wistia**
- **wnl**: npo.nl and ntr.nl
- **wnl**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **WorldStarHipHop**
- **wrzuta.pl**
- **wrzuta.pl:playlist**
@@ -974,7 +1005,7 @@
- **WSJArticle**
- **XBef**
- **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV, FastVideo.me
- **XHamster**
- **XHamsterEmbed**
- **xiami:album**: 虾米音乐 - 专辑
@@ -990,7 +1021,7 @@
- **XVideos**
- **XXXYMovies**
- **Yahoo**: Yahoo screen and movies
- **Yam**: 蕃薯藤yam天空部落
- **YandexDisk**
- **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
- **yandexmusic:track**: Яндекс.Музыка - Трек

View File

@@ -10,6 +10,7 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, expect_dict, expect_value
from youtube_dl.compat import compat_etree_fromstring
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
@@ -488,6 +489,91 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
self.ie._sort_formats(formats)
expect_value(self, formats, expected_formats, None)
def test_parse_mpd_formats(self):
_TEST_CASES = [
(
# https://github.com/rg3/youtube-dl/issues/13919
'float_duration',
'http://unknown/manifest.mpd',
[{
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': '318597',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.42001f',
'tbr': 318.597,
'width': 340,
'height': 192,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': '638590',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.42001f',
'tbr': 638.59,
'width': 512,
'height': 288,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': '1022565',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.4d001f',
'tbr': 1022.565,
'width': 688,
'height': 384,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': '2046506',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.4d001f',
'tbr': 2046.506,
'width': 1024,
'height': 576,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': '3998017',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.640029',
'tbr': 3998.017,
'width': 1280,
'height': 720,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': '5997485',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.640032',
'tbr': 5997.485,
'width': 1920,
'height': 1080,
}]
),
]
for mpd_file, mpd_url, expected_formats in _TEST_CASES:
with io.open('./test/testdata/mpd/%s.mpd' % mpd_file,
mode='r', encoding='utf-8') as f:
formats = self.ie._parse_mpd_formats(
compat_etree_fromstring(f.read().encode('utf-8')),
mpd_url=mpd_url)
self.ie._sort_formats(formats)
expect_value(self, formats, expected_formats, None)
if __name__ == '__main__':
unittest.main()

View File

@@ -41,6 +41,7 @@ def _make_result(formats, **kwargs):
'id': 'testid',
'title': 'testttitle',
'extractor': 'testex',
'extractor_key': 'TestEx',
}
res.update(**kwargs)
return res
@@ -370,6 +371,19 @@ class TestFormatSelection(unittest.TestCase):
ydl = YDL({'format': 'best[height>360]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
def test_format_selection_issue_10083(self):
# See https://github.com/rg3/youtube-dl/issues/10083
formats = [
{'format_id': 'regular', 'height': 360, 'url': TEST_URL},
{'format_id': 'video', 'height': 720, 'acodec': 'none', 'url': TEST_URL},
{'format_id': 'audio', 'vcodec': 'none', 'url': TEST_URL},
]
info_dict = _make_result(formats)
ydl = YDL({'format': 'best[height>360]/bestvideo[height>360]+bestaudio'})
ydl.process_ie_result(info_dict.copy())
self.assertEqual(ydl.downloaded_info_dicts[0]['format_id'], 'video+audio')
def test_invalid_format_specs(self):
def assert_syntax_error(format_spec):
ydl = YDL({'format': format_spec})
@@ -448,6 +462,17 @@ class TestFormatSelection(unittest.TestCase):
pass
self.assertEqual(ydl.downloaded_info_dicts, [])
def test_default_format_spec(self):
ydl = YDL({'simulate': True})
self.assertEqual(ydl._default_format_spec({}), 'bestvideo+bestaudio/best')
ydl = YDL({'outtmpl': '-'})
self.assertEqual(ydl._default_format_spec({}), 'best')
ydl = YDL({})
self.assertEqual(ydl._default_format_spec({}, download=False), 'bestvideo+bestaudio/best')
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best')
class TestYoutubeDL(unittest.TestCase):
def test_subtitles(self):
@@ -527,6 +552,8 @@ class TestYoutubeDL(unittest.TestCase):
'ext': 'mp4',
'width': None,
'height': 1080,
'title1': '$PATH',
'title2': '%PATH%',
}
def fname(templ):
@@ -545,10 +572,14 @@ class TestYoutubeDL(unittest.TestCase):
self.assertEqual(fname('%(height)0 6d.%(ext)s'), ' 01080.mp4')
self.assertEqual(fname('%(height)0 6d.%(ext)s'), ' 01080.mp4')
self.assertEqual(fname('%(height) 0 6d.%(ext)s'), ' 01080.mp4')
self.assertEqual(fname('%%'), '%')
self.assertEqual(fname('%%%%'), '%%')
self.assertEqual(fname('%%(height)06d.%(ext)s'), '%(height)06d.mp4')
self.assertEqual(fname('%(width)06d.%(ext)s'), 'NA.mp4')
self.assertEqual(fname('%(width)06d.%%(ext)s'), 'NA.%(ext)s')
self.assertEqual(fname('%%(width)06d.%(ext)s'), '%(width)06d.mp4')
self.assertEqual(fname('Hello %(title1)s'), 'Hello $PATH')
self.assertEqual(fname('Hello %(title2)s'), 'Hello %PATH%')
def test_format_note(self):
ydl = YoutubeDL()
@@ -739,6 +770,12 @@ class TestYoutubeDL(unittest.TestCase):
result = get_ids({'playlist_items': '10'})
self.assertEqual(result, [])
result = get_ids({'playlist_items': '3-10'})
self.assertEqual(result, [3, 4])
result = get_ids({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result, [2, 3, 4])
def test_urlopen_no_file_protocol(self):
# see https://github.com/rg3/youtube-dl/issues/8227
ydl = YDL()
@@ -755,7 +792,8 @@ class TestYoutubeDL(unittest.TestCase):
'_type': 'url_transparent',
'url': 'foo2:',
'ie_key': 'Foo2',
'title': 'foo1 title'
'title': 'foo1 title',
'id': 'foo1_id',
}
class Foo2IE(InfoExtractor):
@@ -781,6 +819,9 @@ class TestYoutubeDL(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['url'], TEST_URL)
self.assertEqual(downloaded['title'], 'foo1 title')
self.assertEqual(downloaded['id'], 'testid')
self.assertEqual(downloaded['extractor'], 'testex')
self.assertEqual(downloaded['extractor_key'], 'TestEx')
if __name__ == '__main__':

26
test/test_options.py Normal file
View File

@@ -0,0 +1,26 @@
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.options import _hide_login_info
class TestOptions(unittest.TestCase):
def test_hide_login_info(self):
self.assertEqual(_hide_login_info(['-u', 'foo', '-p', 'bar']),
['-u', 'PRIVATE', '-p', 'PRIVATE'])
self.assertEqual(_hide_login_info(['-u']), ['-u'])
self.assertEqual(_hide_login_info(['-u', 'foo', '-u', 'bar']),
['-u', 'PRIVATE', '-u', 'PRIVATE'])
self.assertEqual(_hide_login_info(['--username=foo']),
['--username=PRIVATE'])
if __name__ == '__main__':
unittest.main()

View File

@@ -98,6 +98,7 @@ from youtube_dl.compat import (
compat_chr,
compat_etree_fromstring,
compat_getenv,
compat_os_name,
compat_setenv,
compat_urlparse,
compat_parse_qs,
@@ -278,6 +279,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unescapeHTML('&#47;'), '/')
self.assertEqual(unescapeHTML('&eacute;'), 'é')
self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;')
self.assertEqual(unescapeHTML('&a&quot;'), '&a"')
# HTML5 entities
self.assertEqual(unescapeHTML('&period;&apos;'), '.\'')
@@ -340,6 +342,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
self.assertEqual(unified_timestamp('2017-03-30T17:52:41Q'), 1490896361)
self.assertEqual(unified_timestamp('Sep 11, 2013 | 5:49 AM'), 1378878540)
def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@@ -447,7 +450,9 @@ class TestUtil(unittest.TestCase):
def test_shell_quote(self):
args = ['ffmpeg', '-i', encodeFilename('ñ€ß\'.mp4')]
self.assertEqual(shell_quote(args), """ffmpeg -i 'ñ€ß'"'"'.mp4'""")
self.assertEqual(
shell_quote(args),
"""ffmpeg -i 'ñ€ß'"'"'.mp4'""" if compat_os_name != 'nt' else '''ffmpeg -i "ñ€ß'.mp4"''')
def test_str_to_int(self):
self.assertEqual(str_to_int('123,456'), 123456)
@@ -915,6 +920,8 @@ class TestUtil(unittest.TestCase):
supports_outside_bmp = False
if supports_outside_bmp:
self.assertEqual(extract_attributes('<e x="Smile &#128512;!">'), {'x': 'Smile \U0001f600!'})
# Malformed HTML should not break attributes extraction on older Python
self.assertEqual(extract_attributes('<mal"formed/>'), {})
def test_clean_html(self):
self.assertEqual(clean_html('a:\nb'), 'a: b')
@@ -929,7 +936,7 @@ class TestUtil(unittest.TestCase):
def test_args_to_str(self):
self.assertEqual(
args_to_str(['foo', 'ba/r', '-baz', '2 be', '']),
'foo ba/r -baz \'2 be\' \'\''
'foo ba/r -baz \'2 be\' \'\'' if compat_os_name != 'nt' else 'foo ba/r -baz "2 be" ""'
)
def test_parse_filesize(self):
@@ -1057,7 +1064,7 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
<p begin="3" dur="-1">Ignored, three</p>
</div>
</body>
</tt>'''
</tt>'''.encode('utf-8')
srt_data = '''1
00:00:00,000 --> 00:00:01,000
The following line contains Chinese characters and special symbols
@@ -1082,7 +1089,7 @@ Line
<p begin="0" end="1">The first line</p>
</div>
</body>
</tt>'''
</tt>'''.encode('utf-8')
srt_data = '''1
00:00:00,000 --> 00:00:01,000
The first line
@@ -1108,7 +1115,7 @@ The first line
<p style="s1" tts:textDecoration="underline" begin="00:00:09.56" id="p2" end="00:00:12.36"><span style="s2" tts:color="lime">inner<br /> </span>style</p>
</div>
</body>
</tt>'''
</tt>'''.encode('utf-8')
srt_data = '''1
00:00:02,080 --> 00:00:05,839
<font color="white" face="sansSerif" size="16">default style<font color="red">custom style</font></font>
@@ -1131,6 +1138,26 @@ part 3</font></u>
'''
self.assertEqual(dfxp2srt(dfxp_data_with_style), srt_data)
dfxp_data_non_utf8 = '''<?xml version="1.0" encoding="UTF-16"?>
<tt xmlns="http://www.w3.org/ns/ttml" xml:lang="en" xmlns:tts="http://www.w3.org/ns/ttml#parameter">
<body>
<div xml:lang="en">
<p begin="0" end="1">Line 1</p>
<p begin="1" end="2">第二行</p>
</div>
</body>
</tt>'''.encode('utf-16')
srt_data = '''1
00:00:00,000 --> 00:00:01,000
Line 1
2
00:00:01,000 --> 00:00:02,000
第二行
'''
self.assertEqual(dfxp2srt(dfxp_data_non_utf8), srt_data)
def test_cli_option(self):
self.assertEqual(cli_option({'proxy': '127.0.0.1:3128'}, '--proxy', 'proxy'), ['--proxy', '127.0.0.1:3128'])
self.assertEqual(cli_option({'proxy': None}, '--proxy', 'proxy'), [])
@@ -1176,6 +1203,10 @@ part 3</font></u>
cli_bool_option(
{'nocheckcertificate': False}, '--check-certificate', 'nocheckcertificate', 'false', 'true', '='),
['--check-certificate=true'])
self.assertEqual(
cli_bool_option(
{}, '--check-certificate', 'nocheckcertificate', 'false', 'true', '='),
[])
def test_ohdave_rsa_encrypt(self):
N = 0xab86b6371b5318aaa1d3c9e612a9f1264f372323c8c0f19875b5fc3b3fd3afcc1e5bec527aa94bfa85bffc157e4245aebda05389a5357b75115ac94f074aefcd
@@ -1225,6 +1256,12 @@ part 3</font></u>
self.assertEqual(get_element_by_attribute('class', 'foo', html), None)
self.assertEqual(get_element_by_attribute('class', 'no-such-foo', html), None)
html = '''
<div itemprop="author" itemscope>foo</div>
'''
self.assertEqual(get_element_by_attribute('itemprop', 'author', html), 'foo')
def test_get_elements_by_class(self):
html = '''
<span class="foo bar">nice</span><span class="foo bar">also nice</span>

View File

@@ -254,6 +254,13 @@ class TestYoutubeChapters(unittest.TestCase):
'title': '3 - Из серпов луны...[Iz serpov luny]',
}]
),
(
# https://www.youtube.com/watch?v=xZW70zEasOk
# time point more than duration
'''● LCS Spring finals: Saturday and Sunday from <a href="#" onclick="yt.www.watch.player.seekTo(13*60+30);return false;">13:30</a> outside the venue! <br />● PAX East: Fri, Sat & Sun - more info in tomorrows video on the main channel!''',
283,
[]
),
]
def test_youtube_chapters(self):

18
test/testdata/mpd/float_duration.mpd vendored Normal file
View File

@@ -0,0 +1,18 @@
<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:mpeg:dash:schema:mpd:2011" type="static" minBufferTime="PT2S" profiles="urn:mpeg:dash:profile:isoff-on-demand:2011" mediaPresentationDuration="PT6014S">
<Period bitstreamSwitching="true">
<AdaptationSet mimeType="audio/mp4" codecs="mp4a.40.2" startWithSAP="1" segmentAlignment="true">
<SegmentTemplate timescale="1000000" presentationTimeOffset="0" initialization="ai_$RepresentationID$.mp4d" media="a_$RepresentationID$_$Number$.mp4d" duration="2000000.0" startNumber="0"></SegmentTemplate>
<Representation id="318597" bandwidth="61587"></Representation>
</AdaptationSet>
<AdaptationSet mimeType="video/mp4" startWithSAP="1" segmentAlignment="true">
<SegmentTemplate timescale="1000000" presentationTimeOffset="0" initialization="vi_$RepresentationID$.mp4d" media="v_$RepresentationID$_$Number$.mp4d" duration="2000000.0" startNumber="0"></SegmentTemplate>
<Representation id="318597" codecs="avc1.42001f" width="340" height="192" bandwidth="318597"></Representation>
<Representation id="638590" codecs="avc1.42001f" width="512" height="288" bandwidth="638590"></Representation>
<Representation id="1022565" codecs="avc1.4d001f" width="688" height="384" bandwidth="1022565"></Representation>
<Representation id="2046506" codecs="avc1.4d001f" width="1024" height="576" bandwidth="2046506"></Representation>
<Representation id="3998017" codecs="avc1.640029" width="1280" height="720" bandwidth="3998017"></Representation>
<Representation id="5997485" codecs="avc1.640032" width="1920" height="1080" bandwidth="5997485"></Representation>
</AdaptationSet>
</Period>
</MPD>

View File

@@ -26,6 +26,8 @@ import tokenize
import traceback
import random
from string import ascii_letters
from .compat import (
compat_basestring,
compat_cookiejar,
@@ -58,10 +60,12 @@ from .utils import (
format_bytes,
formatSeconds,
GeoRestrictedError,
int_or_none,
ISO3166Utils,
locked_file,
make_HTTPS_handler,
MaxDownloadsReached,
orderedSet,
PagedList,
parse_filesize,
PerRequestProxyHandler,
@@ -89,6 +93,7 @@ from .utils import (
)
from .cache import Cache
from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
from .extractor.openload import PhantomJSwrapper
from .downloader import get_suitable_downloader
from .downloader.rtmp import rtmpdump_version
from .postprocessor import (
@@ -300,8 +305,25 @@ class YoutubeDL(object):
otherwise prefer avconv.
postprocessor_args: A list of additional command-line arguments for the
postprocessor.
The following options are used by the Youtube extractor:
youtube_include_dash_manifest: If True (default), DASH manifests and related
data will be downloaded and processed by extractor.
You can reduce network I/O by disabling it if you don't
care about DASH.
"""
_NUMERIC_FIELDS = set((
'width', 'height', 'tbr', 'abr', 'asr', 'vbr', 'fps', 'filesize', 'filesize_approx',
'timestamp', 'upload_year', 'upload_month', 'upload_day',
'duration', 'view_count', 'like_count', 'dislike_count', 'repost_count',
'average_rating', 'comment_count', 'age_limit',
'start_time', 'end_time',
'chapter_number', 'season_number', 'episode_number',
'track_number', 'disc_number', 'release_year',
'playlist_index',
))
params = None
_ies = []
_pps = []
@@ -498,24 +520,25 @@ class YoutubeDL(object):
def to_console_title(self, message):
if not self.params.get('consoletitle', False):
return
if compat_os_name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow():
# c_wchar_p() might not be necessary if `message` is
# already of type unicode()
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
if compat_os_name == 'nt':
if ctypes.windll.kernel32.GetConsoleWindow():
# c_wchar_p() might not be necessary if `message` is
# already of type unicode()
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
elif 'TERM' in os.environ:
self._write_string('\033]0;%s\007' % message, self._screen_file)
def save_console_title(self):
if not self.params.get('consoletitle', False):
return
if 'TERM' in os.environ:
if compat_os_name != 'nt' and 'TERM' in os.environ:
# Save the title on stack
self._write_string('\033[22;0t', self._screen_file)
def restore_console_title(self):
if not self.params.get('consoletitle', False):
return
if 'TERM' in os.environ:
if compat_os_name != 'nt' and 'TERM' in os.environ:
# Restore the title from stack
self._write_string('\033[23;0t', self._screen_file)
@@ -638,22 +661,11 @@ class YoutubeDL(object):
r'%%(\1)0%dd' % field_size_compat_map[mobj.group('field')],
outtmpl)
NUMERIC_FIELDS = set((
'width', 'height', 'tbr', 'abr', 'asr', 'vbr', 'fps', 'filesize', 'filesize_approx',
'timestamp', 'upload_year', 'upload_month', 'upload_day',
'duration', 'view_count', 'like_count', 'dislike_count', 'repost_count',
'average_rating', 'comment_count', 'age_limit',
'start_time', 'end_time',
'chapter_number', 'season_number', 'episode_number',
'track_number', 'disc_number', 'release_year',
'playlist_index',
))
# Missing numeric fields used together with integer presentation types
# in format specification will break the argument substitution since
# string 'NA' is returned for missing fields. We will patch output
# template for missing fields to meet string presentation type.
for numeric_field in NUMERIC_FIELDS:
for numeric_field in self._NUMERIC_FIELDS:
if numeric_field not in template_dict:
# As of [1] format syntax is:
# %[mapping_key][conversion_flags][minimum_width][.precision][length_modifier]type
@@ -672,7 +684,19 @@ class YoutubeDL(object):
FORMAT_RE.format(numeric_field),
r'%({0})s'.format(numeric_field), outtmpl)
filename = expand_path(outtmpl % template_dict)
# expand_path translates '%%' into '%' and '$$' into '$'
# correspondingly that is not what we want since we need to keep
# '%%' intact for template dict substitution step. Working around
# with boundary-alike separator hack.
sep = ''.join([random.choice(ascii_letters) for _ in range(32)])
outtmpl = outtmpl.replace('%%', '%{0}%'.format(sep)).replace('$$', '${0}$'.format(sep))
# outtmpl should be expand_path'ed before template dict substitution
# because meta fields may contain env variables we don't want to
# be expanded. For example, for outtmpl "%(title)s.%(ext)s" and
# title "Hello $PATH", we don't want `$PATH` to be expanded.
filename = expand_path(outtmpl).replace(sep, '') % template_dict
# Temporary fix for #4787
# 'Treat' all problem characters by passing filename through preferredencoding
# to workaround encoding issues with subprocess on python2 @ Windows
@@ -844,7 +868,7 @@ class YoutubeDL(object):
force_properties = dict(
(k, v) for k, v in ie_result.items() if v is not None)
for f in ('_type', 'url', 'ie_key'):
for f in ('_type', 'url', 'id', 'extractor', 'extractor_key', 'ie_key'):
if f in force_properties:
del force_properties[f]
new_result = info.copy()
@@ -885,15 +909,25 @@ class YoutubeDL(object):
yield int(item)
else:
yield int(string_segment)
playlistitems = iter_playlistitems(playlistitems_str)
playlistitems = orderedSet(iter_playlistitems(playlistitems_str))
ie_entries = ie_result['entries']
def make_playlistitems_entries(list_ie_entries):
num_entries = len(list_ie_entries)
return [
list_ie_entries[i - 1] for i in playlistitems
if -num_entries <= i - 1 < num_entries]
def report_download(num_entries):
self.to_screen(
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, num_entries))
if isinstance(ie_entries, list):
n_all_entries = len(ie_entries)
if playlistitems:
entries = [
ie_entries[i - 1] for i in playlistitems
if -n_all_entries <= i - 1 < n_all_entries]
entries = make_playlistitems_entries(ie_entries)
else:
entries = ie_entries[playliststart:playlistend]
n_entries = len(entries)
@@ -911,20 +945,15 @@ class YoutubeDL(object):
entries = ie_entries.getslice(
playliststart, playlistend)
n_entries = len(entries)
self.to_screen(
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, n_entries))
report_download(n_entries)
else: # iterable
if playlistitems:
entry_list = list(ie_entries)
entries = [entry_list[i - 1] for i in playlistitems]
entries = make_playlistitems_entries(list(ie_entries))
else:
entries = list(itertools.islice(
ie_entries, playliststart, playlistend))
n_entries = len(entries)
self.to_screen(
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, n_entries))
report_download(n_entries)
if self.params.get('playlistreverse', False):
entries = entries[::-1]
@@ -1048,6 +1077,25 @@ class YoutubeDL(object):
return op(actual_value, comparison_value)
return _filter
def _default_format_spec(self, info_dict, download=True):
req_format_list = []
def can_have_partial_formats():
if self.params.get('simulate', False):
return True
if not download:
return True
if self.params.get('outtmpl', DEFAULT_OUTTMPL) == '-':
return False
if info_dict.get('is_live'):
return False
merger = FFmpegMergerPP(self)
return merger.available and merger.can_merge()
if can_have_partial_formats():
req_format_list.append('bestvideo+bestaudio')
req_format_list.append('best')
return '/'.join(req_format_list)
def build_format_selector(self, format_spec):
def syntax_error(note, start):
message = (
@@ -1344,9 +1392,28 @@ class YoutubeDL(object):
if 'title' not in info_dict:
raise ExtractorError('Missing "title" field in extractor result')
if not isinstance(info_dict['id'], compat_str):
self.report_warning('"id" field is not a string - forcing string conversion')
info_dict['id'] = compat_str(info_dict['id'])
def report_force_conversion(field, field_not, conversion):
self.report_warning(
'"%s" field is not %s - forcing %s conversion, there is an error in extractor'
% (field, field_not, conversion))
def sanitize_string_field(info, string_field):
field = info.get(string_field)
if field is None or isinstance(field, compat_str):
return
report_force_conversion(string_field, 'a string', 'string')
info[string_field] = compat_str(field)
def sanitize_numeric_fields(info):
for numeric_field in self._NUMERIC_FIELDS:
field = info.get(numeric_field)
if field is None or isinstance(field, compat_numeric_types):
continue
report_force_conversion(numeric_field, 'numeric', 'int')
info[numeric_field] = int_or_none(field)
sanitize_string_field(info_dict, 'id')
sanitize_numeric_fields(info_dict)
if 'playlist' not in info_dict:
# It isn't part of a playlist
@@ -1427,16 +1494,28 @@ class YoutubeDL(object):
if not formats:
raise ExtractorError('No video formats found!')
def is_wellformed(f):
url = f.get('url')
if not url:
self.report_warning(
'"url" field is missing or empty - skipping format, '
'there is an error in extractor')
return False
if isinstance(url, bytes):
sanitize_string_field(f, 'url')
return True
# Filter out malformed formats for better extraction robustness
formats = list(filter(is_wellformed, formats))
formats_dict = {}
# We check that all the formats have the format and format_id fields
for i, format in enumerate(formats):
if 'url' not in format:
raise ExtractorError('Missing "url" key in result (index %d)' % i)
sanitize_string_field(format, 'format_id')
sanitize_numeric_fields(format)
format['url'] = sanitize_url(format['url'])
if format.get('format_id') is None:
if not format.get('format_id'):
format['format_id'] = compat_str(i)
else:
# Sanitize format_id from characters used in format selector expression
@@ -1489,14 +1568,10 @@ class YoutubeDL(object):
req_format = self.params.get('format')
if req_format is None:
req_format_list = []
if (self.params.get('outtmpl', DEFAULT_OUTTMPL) != '-' and
not info_dict.get('is_live')):
merger = FFmpegMergerPP(self)
if merger.available and merger.can_merge():
req_format_list.append('bestvideo+bestaudio')
req_format_list.append('best')
req_format = '/'.join(req_format_list)
req_format = self._default_format_spec(info_dict, download=download)
if self.params.get('verbose'):
self.to_stdout('[debug] Default format spec: %s' % req_format)
format_selector = self.build_format_selector(req_format)
# While in format selection we may need to have an access to the original
@@ -1648,12 +1723,17 @@ class YoutubeDL(object):
if filename is None:
return
try:
dn = os.path.dirname(sanitize_path(encodeFilename(filename)))
if dn and not os.path.exists(dn):
os.makedirs(dn)
except (OSError, IOError) as err:
self.report_error('unable to create directory ' + error_to_compat_str(err))
def ensure_dir_exists(path):
try:
dn = os.path.dirname(path)
if dn and not os.path.exists(dn):
os.makedirs(dn)
return True
except (OSError, IOError) as err:
self.report_error('unable to create directory ' + error_to_compat_str(err))
return False
if not ensure_dir_exists(sanitize_path(encodeFilename(filename))):
return
if self.params.get('writedescription', False):
@@ -1696,29 +1776,30 @@ class YoutubeDL(object):
ie = self.get_info_extractor(info_dict['extractor_key'])
for sub_lang, sub_info in subtitles.items():
sub_format = sub_info['ext']
if sub_info.get('data') is not None:
sub_data = sub_info['data']
sub_filename = subtitles_filename(filename, sub_lang, sub_format)
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(sub_filename)):
self.to_screen('[info] Video subtitle %s.%s is already present' % (sub_lang, sub_format))
else:
try:
sub_data = ie._download_webpage(
sub_info['url'], info_dict['id'], note=False)
except ExtractorError as err:
self.report_warning('Unable to download subtitle for "%s": %s' %
(sub_lang, error_to_compat_str(err.cause)))
continue
try:
sub_filename = subtitles_filename(filename, sub_lang, sub_format)
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(sub_filename)):
self.to_screen('[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format))
self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
if sub_info.get('data') is not None:
try:
# Use newline='' to prevent conversion of newline characters
# See https://github.com/rg3/youtube-dl/issues/10268
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
subfile.write(sub_info['data'])
except (OSError, IOError):
self.report_error('Cannot write subtitles file ' + sub_filename)
return
else:
self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
# Use newline='' to prevent conversion of newline characters
# See https://github.com/rg3/youtube-dl/issues/10268
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
subfile.write(sub_data)
except (OSError, IOError):
self.report_error('Cannot write subtitles file ' + sub_filename)
return
try:
sub_data = ie._request_webpage(
sub_info['url'], info_dict['id'], note=False).read()
with io.open(encodeFilename(sub_filename), 'wb') as subfile:
subfile.write(sub_data)
except (ExtractorError, IOError, OSError, ValueError) as err:
self.report_warning('Unable to download subtitle for "%s": %s' %
(sub_lang, error_to_compat_str(err)))
continue
if self.params.get('writeinfojson', False):
infofn = replace_extension(filename, 'info.json', info_dict.get('ext'))
@@ -1791,8 +1872,11 @@ class YoutubeDL(object):
for f in requested_formats:
new_info = dict(info_dict)
new_info.update(f)
fname = self.prepare_filename(new_info)
fname = prepend_extension(fname, 'f%s' % f['format_id'], new_info['ext'])
fname = prepend_extension(
self.prepare_filename(new_info),
'f%s' % f['format_id'], new_info['ext'])
if not ensure_dir_exists(fname):
return
downloaded.append(fname)
partial_success = dl(fname, new_info)
success = success and partial_success
@@ -1859,7 +1943,7 @@ class YoutubeDL(object):
info_dict.get('protocol') == 'm3u8' and
self.params.get('hls_prefer_native')):
if fixup_policy == 'warn':
self.report_warning('%s: malformated aac bitstream.' % (
self.report_warning('%s: malformed AAC bitstream detected.' % (
info_dict['id']))
elif fixup_policy == 'detect_or_warn':
fixup_pp = FFmpegFixupM3u8PP(self)
@@ -1868,7 +1952,7 @@ class YoutubeDL(object):
info_dict['__postprocessors'].append(fixup_pp)
else:
self.report_warning(
'%s: malformated aac bitstream. %s'
'%s: malformed AAC bitstream detected. %s'
% (info_dict['id'], INSTALL_FFMPEG_MESSAGE))
else:
assert fixup_policy in ('ignore', 'never')
@@ -2146,6 +2230,7 @@ class YoutubeDL(object):
exe_versions = FFmpegPostProcessor.get_versions(self)
exe_versions['rtmpdump'] = rtmpdump_version()
exe_versions['phantomjs'] = PhantomJSwrapper._version()
exe_str = ', '.join(
'%s %s' % (exe, v)
for exe, v in sorted(exe_versions.items())

View File

@@ -206,7 +206,7 @@ def _real_main(argv=None):
if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg', 'mkv', 'avi']:
parser.error('invalid video recode format specified')
if opts.convertsubtitles is not None:
if opts.convertsubtitles not in ['srt', 'vtt', 'ass']:
if opts.convertsubtitles not in ['srt', 'vtt', 'ass', 'lrc']:
parser.error('invalid subtitle format specified')
if opts.date is not None:

View File

@@ -6,6 +6,7 @@ import collections
import email
import getpass
import io
import itertools
import optparse
import os
import re
@@ -15,7 +16,6 @@ import socket
import struct
import subprocess
import sys
import itertools
import xml.etree.ElementTree
@@ -2322,6 +2322,19 @@ try:
except ImportError: # Python 2
from HTMLParser import HTMLParser as compat_HTMLParser
try: # Python 2
from HTMLParser import HTMLParseError as compat_HTMLParseError
except ImportError: # Python <3.4
try:
from html.parser import HTMLParseError as compat_HTMLParseError
except ImportError: # Python >3.4
# HTMLParseError has been deprecated in Python 3.3 and removed in
# Python 3.5. Introducing dummy exception for Python >3.5 for compatible
# and uniform cross-version exceptiong handling
class compat_HTMLParseError(Exception):
pass
try:
from subprocess import DEVNULL
compat_subprocess_get_DEVNULL = lambda: DEVNULL
@@ -2604,14 +2617,22 @@ except ImportError: # Python 2
parsed_result[name] = [value]
return parsed_result
try:
from shlex import quote as compat_shlex_quote
except ImportError: # Python < 3.3
compat_os_name = os._name if os.name == 'java' else os.name
if compat_os_name == 'nt':
def compat_shlex_quote(s):
if re.match(r'^[-_\w./]+$', s):
return s
else:
return "'" + s.replace("'", "'\"'\"'") + "'"
return s if re.match(r'^[-_\w./]+$', s) else '"%s"' % s.replace('"', '\\"')
else:
try:
from shlex import quote as compat_shlex_quote
except ImportError: # Python < 3.3
def compat_shlex_quote(s):
if re.match(r'^[-_\w./]+$', s):
return s
else:
return "'" + s.replace("'", "'\"'\"'") + "'"
try:
@@ -2636,9 +2657,6 @@ def compat_ord(c):
return ord(c)
compat_os_name = os._name if os.name == 'java' else os.name
if sys.version_info >= (3, 0):
compat_getenv = os.getenv
compat_expanduser = os.path.expanduser
@@ -2880,8 +2898,16 @@ else:
compat_struct_pack = struct.pack
compat_struct_unpack = struct.unpack
try:
from future_builtins import zip as compat_zip
except ImportError: # not 2.6+ or is 3.x
try:
from itertools import izip as compat_zip # < 2.5 or 3.x
except ImportError:
compat_zip = zip
__all__ = [
'compat_HTMLParseError',
'compat_HTMLParser',
'compat_HTTPError',
'compat_basestring',
@@ -2929,5 +2955,6 @@ __all__ = [
'compat_urlretrieve',
'compat_xml_parse_error',
'compat_xpath',
'compat_zip',
'workaround_optparse_bug9161',
]

View File

@@ -8,10 +8,11 @@ import random
from ..compat import compat_os_name
from ..utils import (
decodeArgument,
encodeFilename,
error_to_compat_str,
decodeArgument,
format_bytes,
shell_quote,
timeconvert,
)
@@ -303,11 +304,11 @@ class FileDownloader(object):
"""Report attempt to resume at given byte."""
self.to_screen('[download] Resuming download at byte %s' % resume_len)
def report_retry(self, count, retries):
def report_retry(self, err, count, retries):
"""Report retry in case of HTTP error 5xx"""
self.to_screen(
'[download] Got server HTTP error. Retrying (attempt %d of %s)...'
% (count, self.format_retries(retries)))
'[download] Got server HTTP error: %s. Retrying (attempt %d of %s)...'
% (error_to_compat_str(err), count, self.format_retries(retries)))
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""
@@ -381,10 +382,5 @@ class FileDownloader(object):
if exe is None:
exe = os.path.basename(str_args[0])
try:
import pipes
shell_quote = lambda args: ' '.join(map(pipes.quote, str_args))
except ImportError:
shell_quote = repr
self.to_screen('[debug] %s command line: %s' % (
exe, shell_quote(str_args)))

View File

@@ -2,6 +2,7 @@ from __future__ import unicode_literals
from .fragment import FragmentFD
from ..compat import compat_urllib_error
from ..utils import urljoin
class DashSegmentsFD(FragmentFD):
@@ -12,12 +13,13 @@ class DashSegmentsFD(FragmentFD):
FD_NAME = 'dashsegments'
def real_download(self, filename, info_dict):
segments = info_dict['fragments'][:1] if self.params.get(
fragment_base_url = info_dict.get('fragment_base_url')
fragments = info_dict['fragments'][:1] if self.params.get(
'test', False) else info_dict['fragments']
ctx = {
'filename': filename,
'total_frags': len(segments),
'total_frags': len(fragments),
}
self._prepare_and_start_frag_download(ctx)
@@ -26,7 +28,7 @@ class DashSegmentsFD(FragmentFD):
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
frag_index = 0
for i, segment in enumerate(segments):
for i, fragment in enumerate(fragments):
frag_index += 1
if frag_index <= ctx['fragment_index']:
continue
@@ -36,7 +38,11 @@ class DashSegmentsFD(FragmentFD):
count = 0
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(ctx, segment['url'], info_dict)
fragment_url = fragment.get('url')
if not fragment_url:
assert fragment_base_url
fragment_url = urljoin(fragment_base_url, fragment['path'])
success, frag_content = self._download_fragment(ctx, fragment_url, info_dict)
if not success:
return False
self._append_fragment(ctx, frag_content)

View File

@@ -151,10 +151,15 @@ class FragmentFD(FileDownloader):
if self.__do_ytdl_file(ctx):
if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))):
self._read_ytdl_file(ctx)
if ctx['fragment_index'] > 0 and resume_len == 0:
self.report_error(
'Inconsistent state of incomplete fragment download. '
'Restarting from the beginning...')
ctx['fragment_index'] = resume_len = 0
self._write_ytdl_file(ctx)
else:
self._write_ytdl_file(ctx)
if ctx['fragment_index'] > 0:
assert resume_len > 0
assert ctx['fragment_index'] == 0
dest_stream, tmpfilename = sanitize_open(tmpfilename, open_mode)

View File

@@ -59,9 +59,9 @@ class HlsFD(FragmentFD):
man_url = info_dict['url']
self.to_screen('[%s] Downloading m3u8 manifest' % self.FD_NAME)
manifest = self.ydl.urlopen(self._prepare_url(info_dict, man_url)).read()
s = manifest.decode('utf-8', 'ignore')
urlh = self.ydl.urlopen(self._prepare_url(info_dict, man_url))
man_url = urlh.geturl()
s = urlh.read().decode('utf-8', 'ignore')
if not self.can_download(s, info_dict):
if info_dict.get('extra_param_to_segment_url'):

View File

@@ -22,8 +22,16 @@ from ..utils import (
class HttpFD(FileDownloader):
def real_download(self, filename, info_dict):
url = info_dict['url']
tmpfilename = self.temp_name(filename)
stream = None
class DownloadContext(dict):
__getattr__ = dict.get
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
ctx = DownloadContext()
ctx.filename = filename
ctx.tmpfilename = self.temp_name(filename)
ctx.stream = None
# Do not include the Accept-Encoding header
headers = {'Youtubedl-no-compression': 'True'}
@@ -38,46 +46,51 @@ class HttpFD(FileDownloader):
if is_test:
request.add_header('Range', 'bytes=0-%s' % str(self._TEST_FILE_SIZE - 1))
# Establish possible resume length
if os.path.isfile(encodeFilename(tmpfilename)):
resume_len = os.path.getsize(encodeFilename(tmpfilename))
else:
resume_len = 0
ctx.open_mode = 'wb'
ctx.resume_len = 0
open_mode = 'wb'
if resume_len != 0:
if self.params.get('continuedl', True):
self.report_resuming_byte(resume_len)
request.add_header('Range', 'bytes=%d-' % resume_len)
open_mode = 'ab'
else:
resume_len = 0
if self.params.get('continuedl', True):
# Establish possible resume length
if os.path.isfile(encodeFilename(ctx.tmpfilename)):
ctx.resume_len = os.path.getsize(encodeFilename(ctx.tmpfilename))
count = 0
retries = self.params.get('retries', 0)
while count <= retries:
class SucceedDownload(Exception):
pass
class RetryDownload(Exception):
def __init__(self, source_error):
self.source_error = source_error
def establish_connection():
if ctx.resume_len != 0:
self.report_resuming_byte(ctx.resume_len)
request.add_header('Range', 'bytes=%d-' % ctx.resume_len)
ctx.open_mode = 'ab'
# Establish connection
try:
data = self.ydl.urlopen(request)
ctx.data = self.ydl.urlopen(request)
# When trying to resume, Content-Range HTTP header of response has to be checked
# to match the value of requested Range HTTP header. This is due to a webservers
# that don't support resuming and serve a whole file with no Content-Range
# set in response despite of requested Range (see
# https://github.com/rg3/youtube-dl/issues/6057#issuecomment-126129799)
if resume_len > 0:
content_range = data.headers.get('Content-Range')
if ctx.resume_len > 0:
content_range = ctx.data.headers.get('Content-Range')
if content_range:
content_range_m = re.search(r'bytes (\d+)-', content_range)
# Content-Range is present and matches requested Range, resume is possible
if content_range_m and resume_len == int(content_range_m.group(1)):
break
if content_range_m and ctx.resume_len == int(content_range_m.group(1)):
return
# Content-Range is either not present or invalid. Assuming remote webserver is
# trying to send the whole file, resume is not possible, so wiping the local file
# and performing entire redownload
self.report_unable_to_resume()
resume_len = 0
open_mode = 'wb'
break
ctx.resume_len = 0
ctx.open_mode = 'wb'
return
except (compat_urllib_error.HTTPError, ) as err:
if (err.code < 500 or err.code >= 600) and err.code != 416:
# Unexpected HTTP error
@@ -86,15 +99,15 @@ class HttpFD(FileDownloader):
# Unable to resume (requested range not satisfiable)
try:
# Open the connection again without the range header
data = self.ydl.urlopen(basic_request)
content_length = data.info()['Content-Length']
ctx.data = self.ydl.urlopen(basic_request)
content_length = ctx.data.info()['Content-Length']
except (compat_urllib_error.HTTPError, ) as err:
if err.code < 500 or err.code >= 600:
raise
else:
# Examine the reported length
if (content_length is not None and
(resume_len - 100 < int(content_length) < resume_len + 100)):
(ctx.resume_len - 100 < int(content_length) < ctx.resume_len + 100)):
# The file had already been fully downloaded.
# Explanation to the above condition: in issue #175 it was revealed that
# YouTube sometimes adds or removes a few bytes from the end of the file,
@@ -102,152 +115,184 @@ class HttpFD(FileDownloader):
# I decided to implement a suggested change and consider the file
# completely downloaded if the file size differs less than 100 bytes from
# the one in the hard drive.
self.report_file_already_downloaded(filename)
self.try_rename(tmpfilename, filename)
self.report_file_already_downloaded(ctx.filename)
self.try_rename(ctx.tmpfilename, ctx.filename)
self._hook_progress({
'filename': filename,
'filename': ctx.filename,
'status': 'finished',
'downloaded_bytes': resume_len,
'total_bytes': resume_len,
'downloaded_bytes': ctx.resume_len,
'total_bytes': ctx.resume_len,
})
return True
raise SucceedDownload()
else:
# The length does not match, we start the download over
self.report_unable_to_resume()
resume_len = 0
open_mode = 'wb'
break
except socket.error as e:
if e.errno != errno.ECONNRESET:
ctx.resume_len = 0
ctx.open_mode = 'wb'
return
raise RetryDownload(err)
except socket.error as err:
if err.errno != errno.ECONNRESET:
# Connection reset is no problem, just retry
raise
raise RetryDownload(err)
# Retry
count += 1
if count <= retries:
self.report_retry(count, retries)
def download():
data_len = ctx.data.info().get('Content-length', None)
if count > retries:
self.report_error('giving up after %s retries' % retries)
return False
# Range HTTP header may be ignored/unsupported by a webserver
# (e.g. extractor/scivee.py, extractor/bambuser.py).
# However, for a test we still would like to download just a piece of a file.
# To achieve this we limit data_len to _TEST_FILE_SIZE and manually control
# block size when downloading a file.
if is_test and (data_len is None or int(data_len) > self._TEST_FILE_SIZE):
data_len = self._TEST_FILE_SIZE
data_len = data.info().get('Content-length', None)
# Range HTTP header may be ignored/unsupported by a webserver
# (e.g. extractor/scivee.py, extractor/bambuser.py).
# However, for a test we still would like to download just a piece of a file.
# To achieve this we limit data_len to _TEST_FILE_SIZE and manually control
# block size when downloading a file.
if is_test and (data_len is None or int(data_len) > self._TEST_FILE_SIZE):
data_len = self._TEST_FILE_SIZE
if data_len is not None:
data_len = int(data_len) + resume_len
min_data_len = self.params.get('min_filesize')
max_data_len = self.params.get('max_filesize')
if min_data_len is not None and data_len < min_data_len:
self.to_screen('\r[download] File is smaller than min-filesize (%s bytes < %s bytes). Aborting.' % (data_len, min_data_len))
return False
if max_data_len is not None and data_len > max_data_len:
self.to_screen('\r[download] File is larger than max-filesize (%s bytes > %s bytes). Aborting.' % (data_len, max_data_len))
return False
byte_counter = 0 + resume_len
block_size = self.params.get('buffersize', 1024)
start = time.time()
# measure time over whole while-loop, so slow_down() and best_block_size() work together properly
now = None # needed for slow_down() in the first loop run
before = start # start measuring
while True:
# Download and write
data_block = data.read(block_size if not is_test else min(block_size, data_len - byte_counter))
byte_counter += len(data_block)
# exit loop when download is finished
if len(data_block) == 0:
break
# Open destination file just in time
if stream is None:
try:
(stream, tmpfilename) = sanitize_open(tmpfilename, open_mode)
assert stream is not None
filename = self.undo_temp_name(tmpfilename)
self.report_destination(filename)
except (OSError, IOError) as err:
self.report_error('unable to open for writing: %s' % str(err))
if data_len is not None:
data_len = int(data_len) + ctx.resume_len
min_data_len = self.params.get('min_filesize')
max_data_len = self.params.get('max_filesize')
if min_data_len is not None and data_len < min_data_len:
self.to_screen('\r[download] File is smaller than min-filesize (%s bytes < %s bytes). Aborting.' % (data_len, min_data_len))
return False
if max_data_len is not None and data_len > max_data_len:
self.to_screen('\r[download] File is larger than max-filesize (%s bytes > %s bytes). Aborting.' % (data_len, max_data_len))
return False
if self.params.get('xattr_set_filesize', False) and data_len is not None:
byte_counter = 0 + ctx.resume_len
block_size = self.params.get('buffersize', 1024)
start = time.time()
# measure time over whole while-loop, so slow_down() and best_block_size() work together properly
now = None # needed for slow_down() in the first loop run
before = start # start measuring
def retry(e):
if ctx.tmpfilename != '-':
ctx.stream.close()
ctx.stream = None
ctx.resume_len = os.path.getsize(encodeFilename(ctx.tmpfilename))
raise RetryDownload(e)
while True:
try:
# Download and write
data_block = ctx.data.read(block_size if not is_test else min(block_size, data_len - byte_counter))
# socket.timeout is a subclass of socket.error but may not have
# errno set
except socket.timeout as e:
retry(e)
except socket.error as e:
if e.errno not in (errno.ECONNRESET, errno.ETIMEDOUT):
raise
retry(e)
byte_counter += len(data_block)
# exit loop when download is finished
if len(data_block) == 0:
break
# Open destination file just in time
if ctx.stream is None:
try:
write_xattr(tmpfilename, 'user.ytdl.filesize', str(data_len).encode('utf-8'))
except (XAttrUnavailableError, XAttrMetadataError) as err:
self.report_error('unable to set filesize xattr: %s' % str(err))
ctx.stream, ctx.tmpfilename = sanitize_open(
ctx.tmpfilename, ctx.open_mode)
assert ctx.stream is not None
ctx.filename = self.undo_temp_name(ctx.tmpfilename)
self.report_destination(ctx.filename)
except (OSError, IOError) as err:
self.report_error('unable to open for writing: %s' % str(err))
return False
try:
stream.write(data_block)
except (IOError, OSError) as err:
if self.params.get('xattr_set_filesize', False) and data_len is not None:
try:
write_xattr(ctx.tmpfilename, 'user.ytdl.filesize', str(data_len).encode('utf-8'))
except (XAttrUnavailableError, XAttrMetadataError) as err:
self.report_error('unable to set filesize xattr: %s' % str(err))
try:
ctx.stream.write(data_block)
except (IOError, OSError) as err:
self.to_stderr('\n')
self.report_error('unable to write data: %s' % str(err))
return False
# Apply rate limit
self.slow_down(start, now, byte_counter - ctx.resume_len)
# end measuring of one loop run
now = time.time()
after = now
# Adjust block size
if not self.params.get('noresizebuffer', False):
block_size = self.best_block_size(after - before, len(data_block))
before = after
# Progress message
speed = self.calc_speed(start, now, byte_counter - ctx.resume_len)
if data_len is None:
eta = None
else:
eta = self.calc_eta(start, time.time(), data_len - ctx.resume_len, byte_counter - ctx.resume_len)
self._hook_progress({
'status': 'downloading',
'downloaded_bytes': byte_counter,
'total_bytes': data_len,
'tmpfilename': ctx.tmpfilename,
'filename': ctx.filename,
'eta': eta,
'speed': speed,
'elapsed': now - start,
})
if is_test and byte_counter == data_len:
break
if ctx.stream is None:
self.to_stderr('\n')
self.report_error('unable to write data: %s' % str(err))
self.report_error('Did not get any data blocks')
return False
if ctx.tmpfilename != '-':
ctx.stream.close()
# Apply rate limit
self.slow_down(start, now, byte_counter - resume_len)
if data_len is not None and byte_counter != data_len:
err = ContentTooShortError(byte_counter, int(data_len))
if count <= retries:
retry(err)
raise err
# end measuring of one loop run
now = time.time()
after = now
self.try_rename(ctx.tmpfilename, ctx.filename)
# Adjust block size
if not self.params.get('noresizebuffer', False):
block_size = self.best_block_size(after - before, len(data_block))
before = after
# Progress message
speed = self.calc_speed(start, now, byte_counter - resume_len)
if data_len is None:
eta = None
else:
eta = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
# Update file modification time
if self.params.get('updatetime', True):
info_dict['filetime'] = self.try_utime(ctx.filename, ctx.data.info().get('last-modified', None))
self._hook_progress({
'status': 'downloading',
'downloaded_bytes': byte_counter,
'total_bytes': data_len,
'tmpfilename': tmpfilename,
'filename': filename,
'eta': eta,
'speed': speed,
'elapsed': now - start,
'total_bytes': byte_counter,
'filename': ctx.filename,
'status': 'finished',
'elapsed': time.time() - start,
})
if is_test and byte_counter == data_len:
break
return True
if stream is None:
self.to_stderr('\n')
self.report_error('Did not get any data blocks')
return False
if tmpfilename != '-':
stream.close()
while count <= retries:
try:
establish_connection()
download()
return True
except RetryDownload as e:
count += 1
if count <= retries:
self.report_retry(e.source_error, count, retries)
continue
except SucceedDownload:
return True
if data_len is not None and byte_counter != data_len:
raise ContentTooShortError(byte_counter, int(data_len))
self.try_rename(tmpfilename, filename)
# Update file modification time
if self.params.get('updatetime', True):
info_dict['filetime'] = self.try_utime(filename, data.info().get('last-modified', None))
self._hook_progress({
'downloaded_bytes': byte_counter,
'total_bytes': byte_counter,
'filename': filename,
'status': 'finished',
'elapsed': time.time() - start,
})
return True
self.report_error('giving up after %s retries' % retries)
return False

View File

@@ -98,7 +98,7 @@ def write_piff_header(stream, params):
if is_audio:
smhd_payload = s88.pack(0) # balance
smhd_payload = u16.pack(0) # reserved
smhd_payload += u16.pack(0) # reserved
media_header_box = full_box(b'smhd', 0, 0, smhd_payload) # Sound Media Header
else:
vmhd_payload = u16.pack(0) # graphics mode
@@ -126,7 +126,6 @@ def write_piff_header(stream, params):
if fourcc == 'AACL':
sample_entry_box = box(b'mp4a', sample_entry_payload)
else:
sample_entry_payload = sample_entry_payload
sample_entry_payload += u16.pack(0) # pre defined
sample_entry_payload += u16.pack(0) # reserved
sample_entry_payload += u32.pack(0) * 3 # pre defined

View File

@@ -3,11 +3,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
js_to_json,
int_or_none,
parse_iso8601,
try_get,
)
@@ -124,7 +126,20 @@ class ABCIViewIE(InfoExtractor):
title = video_params.get('title') or video_params['seriesTitle']
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
formats = self._extract_akamai_formats(stream['hds-unmetered'], video_id)
format_urls = [
try_get(stream, lambda x: x['hds-unmetered'], compat_str)]
# May have higher quality video
sd_url = try_get(
stream, lambda x: x['streams']['hds']['sd'], compat_str)
if sd_url:
format_urls.append(sd_url.replace('metered', 'um'))
formats = []
for format_url in format_urls:
if format_url:
formats.extend(
self._extract_akamai_formats(format_url, video_id))
self._sort_formats(formats)
subtitles = {}

View File

@@ -7,6 +7,7 @@ import time
from .amp import AMPIE
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..compat import compat_urlparse
@@ -108,9 +109,7 @@ class AbcNewsIE(InfoExtractor):
r'window\.abcnvideo\.url\s*=\s*"([^"]+)"', webpage, 'video URL')
full_video_url = compat_urlparse.urljoin(url, video_url)
youtube_url = self._html_search_regex(
r'<iframe[^>]+src="(https://www\.youtube\.com/embed/[^"]+)"',
webpage, 'YouTube URL', default=None)
youtube_url = YoutubeIE._extract_url(webpage)
timestamp = None
date_str = self._html_search_regex(
@@ -140,7 +139,7 @@ class AbcNewsIE(InfoExtractor):
}
if youtube_url:
entries = [entry, self.url_result(youtube_url, 'Youtube')]
entries = [entry, self.url_result(youtube_url, ie=YoutubeIE.ie_key())]
return self.playlist_result(entries)
return entry

View File

@@ -22,7 +22,7 @@ class ABCOTVSIE(InfoExtractor):
'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
'ext': 'mp4',
'title': 'East Bay museum celebrates vintage synthesizers',
'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10',
'description': 'md5:24ed2bd527096ec2a5c67b9d5a9005f3',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1421123075,
'upload_date': '20150113',

View File

@@ -107,11 +107,13 @@ class ADNIE(InfoExtractor):
metas = options.get('metas') or {}
title = metas.get('title') or video_info['title']
links = player_config.get('links') or {}
error = None
if not links:
links_url = player_config['linksurl']
links_data = self._download_json(urljoin(
self._BASE_URL, links_url), video_id)
links = links_data.get('links') or {}
error = links_data.get('error')
formats = []
for format_id, qualities in links.items():
@@ -130,7 +132,8 @@ class ADNIE(InfoExtractor):
for f in m3u8_formats:
f['language'] = 'fr'
formats.extend(m3u8_formats)
error = options.get('error')
if not error:
error = options.get('error')
if not formats and error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
self._sort_formats(formats)

View File

@@ -6,12 +6,16 @@ import time
import xml.etree.ElementTree as etree
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..compat import (
compat_kwargs,
compat_urlparse,
)
from ..utils import (
unescapeHTML,
urlencode_postdata,
unified_timestamp,
ExtractorError,
NO_DEFAULT,
)
@@ -21,6 +25,11 @@ MSO_INFO = {
'username_field': 'username',
'password_field': 'password',
},
'ATTOTT': {
'name': 'DIRECTV NOW',
'username_field': 'email',
'password_field': 'loginpassword',
},
'Rogers': {
'name': 'Rogers',
'username_field': 'UserName',
@@ -1313,11 +1322,14 @@ class AdobePassIE(InfoExtractor):
_USER_AGENT = 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0'
_MVPD_CACHE = 'ap-mvpd'
_DOWNLOADING_LOGIN_PAGE = 'Downloading Provider Login Page'
def _download_webpage_handle(self, *args, **kwargs):
headers = kwargs.get('headers', {})
headers.update(self.geo_verification_headers())
kwargs['headers'] = headers
return super(AdobePassIE, self)._download_webpage_handle(*args, **kwargs)
return super(AdobePassIE, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))
@staticmethod
def _get_mvpd_resource(provider_id, title, guid, rating):
@@ -1361,6 +1373,21 @@ class AdobePassIE(InfoExtractor):
'Use --ap-mso to specify Adobe Pass Multiple-system operator Identifier '
'and --ap-username and --ap-password or --netrc to provide account credentials.', expected=True)
def extract_redirect_url(html, url=None, fatal=False):
# TODO: eliminate code duplication with generic extractor and move
# redirection code into _download_webpage_handle
REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)'
redirect_url = self._search_regex(
r'(?i)<meta\s+(?=(?:[a-z-]+="[^"]+"\s+)*http-equiv="refresh")'
r'(?:[a-z-]+="[^"]+"\s+)*?content="%s' % REDIRECT_REGEX,
html, 'meta refresh redirect',
default=NO_DEFAULT if fatal else None, fatal=fatal)
if not redirect_url:
return None
if url:
redirect_url = compat_urlparse.urljoin(url, unescapeHTML(redirect_url))
return redirect_url
mvpd_headers = {
'ap_42': 'anonymous',
'ap_11': 'Linux i686',
@@ -1410,16 +1437,15 @@ class AdobePassIE(InfoExtractor):
if '<form name="signin"' in provider_redirect_page:
provider_login_page_res = provider_redirect_page_res
elif 'http-equiv="refresh"' in provider_redirect_page:
oauth_redirect_url = self._html_search_regex(
r'content="0;\s*url=([^\'"]+)',
provider_redirect_page, 'meta refresh redirect')
oauth_redirect_url = extract_redirect_url(
provider_redirect_page, fatal=True)
provider_login_page_res = self._download_webpage_handle(
oauth_redirect_url, video_id,
'Downloading Provider Login Page')
self._DOWNLOADING_LOGIN_PAGE)
else:
provider_login_page_res = post_form(
provider_redirect_page_res,
'Downloading Provider Login Page')
self._DOWNLOADING_LOGIN_PAGE)
mvpd_confirm_page_res = post_form(
provider_login_page_res, 'Logging in', {
@@ -1466,8 +1492,17 @@ class AdobePassIE(InfoExtractor):
'Content-Type': 'application/x-www-form-urlencoded'
})
else:
# Some providers (e.g. DIRECTV NOW) have another meta refresh
# based redirect that should be followed.
provider_redirect_page, urlh = provider_redirect_page_res
provider_refresh_redirect_url = extract_redirect_url(
provider_redirect_page, url=urlh.geturl())
if provider_refresh_redirect_url:
provider_redirect_page_res = self._download_webpage_handle(
provider_refresh_redirect_url, video_id,
'Downloading Provider Redirect Page (meta refresh)')
provider_login_page_res = post_form(
provider_redirect_page_res, 'Downloading Provider Login Page')
provider_redirect_page_res, self._DOWNLOADING_LOGIN_PAGE)
mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', {
mso_info.get('username_field', 'username'): username,
mso_info.get('password_field', 'password'): password,

View File

@@ -138,6 +138,23 @@ class AfreecaTVIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
# adult video
'url': 'http://vod.afreecatv.com/PLAYER/STATION/26542731',
'info_dict': {
'id': '20171001_F1AE1711_196617479_1',
'ext': 'mp4',
'title': '[생]서아 초심 찾기 방송 (part 1)',
'thumbnail': 're:^https?://(?:video|st)img.afreecatv.com/.*$',
'uploader': 'BJ서아',
'uploader_id': 'bjdyrksu',
'upload_date': '20171001',
'duration': 3600,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.afreecatv.com/player/Player.swf?szType=szBjId=djleegoon&nStationNo=11273158&nBbsNo=13161095&nTitleNo=36327652',
'only_matching': True,
@@ -160,7 +177,15 @@ class AfreecaTVIE(InfoExtractor):
video_xml = self._download_xml(
'http://afbbs.afreecatv.com:8080/api/video/get_video_info.php',
video_id, query={'nTitleNo': video_id})
video_id, query={
'nTitleNo': video_id,
'partialView': 'SKIP_ADULT',
})
flag = xpath_text(video_xml, './track/flag', 'flag', default=None)
if flag and flag != 'SUCCEED':
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, flag), expected=True)
video_element = video_xml.findall(compat_xpath('./track/video'))[1]
if video_element is None or video_element.text is None:

View File

@@ -0,0 +1,53 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
float_or_none,
try_get,
)
class AliExpressLiveIE(InfoExtractor):
_VALID_URL = r'https?://live\.aliexpress\.com/live/(?P<id>\d+)'
_TEST = {
'url': 'https://live.aliexpress.com/live/2800002704436634',
'md5': 'e729e25d47c5e557f2630eaf99b740a5',
'info_dict': {
'id': '2800002704436634',
'ext': 'mp4',
'title': 'CASIMA7.22',
'thumbnail': r're:http://.*\.jpg',
'uploader': 'CASIMA Official Store',
'timestamp': 1500717600,
'upload_date': '20170722',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
data = self._parse_json(
self._search_regex(
r'(?s)runParams\s*=\s*({.+?})\s*;?\s*var',
webpage, 'runParams'),
video_id)
title = data['title']
formats = self._extract_m3u8_formats(
data['replyStreamUrl'], video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
return {
'id': video_id,
'title': title,
'thumbnail': data.get('coverUrl'),
'uploader': try_get(
data, lambda x: x['followBar']['name'], compat_str),
'timestamp': float_or_none(data.get('startTimeLong'), scale=1000),
'formats': formats,
}

View File

@@ -3,9 +3,10 @@ from __future__ import unicode_literals
from .theplatform import ThePlatformIE
from ..utils import (
update_url_query,
parse_age_limit,
int_or_none,
parse_age_limit,
try_get,
update_url_query,
)
@@ -68,7 +69,8 @@ class AMCNetworksIE(ThePlatformIE):
info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid']
title = theplatform_metadata['title']
rating = theplatform_metadata['ratings'][0]['rating']
rating = try_get(
theplatform_metadata, lambda x: x['ratings'][0]['rating'])
auth_required = self._search_regex(
r'window\.authRequired\s*=\s*(true|false);',
webpage, 'auth required')

View File

@@ -0,0 +1,85 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
try_get,
unified_strdate,
)
class AmericasTestKitchenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.americastestkitchen.com/episode/548-summer-dinner-party',
'md5': 'b861c3e365ac38ad319cfd509c30577f',
'info_dict': {
'id': '1_5g5zua6e',
'title': 'Summer Dinner Party',
'ext': 'mp4',
'description': 'md5:858d986e73a4826979b6a5d9f8f6a1ec',
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1497285541,
'upload_date': '20170612',
'uploader_id': 'roger.metcalf@americastestkitchen.com',
'release_date': '20170617',
'series': "America's Test Kitchen",
'season_number': 17,
'episode': 'Summer Dinner Party',
'episode_number': 24,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.americastestkitchen.com/videos/3420-pan-seared-salmon',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id')
video_data = self._parse_json(
self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
webpage, 'initial context'),
video_id)
ep_data = try_get(
video_data,
(lambda x: x['episodeDetail']['content']['data'],
lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {})
external_id = ep_data.get('external_id') or ep_meta['external_id']
title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get(
'description') or ep_meta.get('description'))
thumbnail = try_get(ep_meta, lambda x: x['photo']['image_url'])
release_date = unified_strdate(ep_data.get('aired_at'))
season_number = int_or_none(ep_meta.get('season_number'))
episode = ep_meta.get('title')
episode_number = int_or_none(ep_meta.get('episode_number'))
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, external_id),
'ie_key': 'Kaltura',
'title': title,
'description': description,
'thumbnail': thumbnail,
'release_date': release_date,
'series': "America's Test Kitchen",
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
}

View File

@@ -3,16 +3,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urlparse,
compat_str,
)
from ..compat import compat_str
from ..utils import (
determine_ext,
extract_attributes,
ExtractorError,
sanitized_Request,
urlencode_postdata,
urljoin,
)
@@ -21,6 +18,8 @@ class AnimeOnDemandIE(InfoExtractor):
_LOGIN_URL = 'https://www.anime-on-demand.de/users/sign_in'
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
_NETRC_MACHINE = 'animeondemand'
# German-speaking countries of Europe
_GEO_COUNTRIES = ['AT', 'CH', 'DE', 'LI', 'LU']
_TESTS = [{
# jap, OmU
'url': 'https://www.anime-on-demand.de/anime/161',
@@ -46,6 +45,10 @@ class AnimeOnDemandIE(InfoExtractor):
# Full length film, non-series, ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/185',
'only_matching': True,
}, {
# Flash videos
'url': 'https://www.anime-on-demand.de/anime/12',
'only_matching': True,
}]
def _login(self):
@@ -72,14 +75,13 @@ class AnimeOnDemandIE(InfoExtractor):
'post url', default=self._LOGIN_URL, group='url')
if not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
request = sanitized_Request(
post_url, urlencode_postdata(login_form))
request.add_header('Referer', self._LOGIN_URL)
post_url = urljoin(self._LOGIN_URL, post_url)
response = self._download_webpage(
request, None, 'Logging in as %s' % username)
post_url, None, 'Logging in as %s' % username,
data=urlencode_postdata(login_form), headers={
'Referer': self._LOGIN_URL,
})
if all(p not in response for p in ('>Logout<', 'href="/users/sign_out"')):
error = self._search_regex(
@@ -120,10 +122,11 @@ class AnimeOnDemandIE(InfoExtractor):
formats = []
for input_ in re.findall(
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', html):
r'<input[^>]+class=["\'].*?streamstarter[^>]+>', html):
attributes = extract_attributes(input_)
title = attributes.get('data-dialog-header')
playlist_urls = []
for playlist_key in ('data-playlist', 'data-otherplaylist'):
for playlist_key in ('data-playlist', 'data-otherplaylist', 'data-stream'):
playlist_url = attributes.get(playlist_key)
if isinstance(playlist_url, compat_str) and re.match(
r'/?[\da-zA-Z]+', playlist_url):
@@ -147,19 +150,38 @@ class AnimeOnDemandIE(InfoExtractor):
format_id_list.append(compat_str(num))
format_id = '-'.join(format_id_list)
format_note = ', '.join(filter(None, (kind, lang_note)))
request = sanitized_Request(
compat_urlparse.urljoin(url, playlist_url),
item_id_list = []
if format_id:
item_id_list.append(format_id)
item_id_list.append('videomaterial')
playlist = self._download_json(
urljoin(url, playlist_url), video_id,
'Downloading %s JSON' % ' '.join(item_id_list),
headers={
'X-Requested-With': 'XMLHttpRequest',
'X-CSRF-Token': csrf_token,
'Referer': url,
'Accept': 'application/json, text/javascript, */*; q=0.01',
})
playlist = self._download_json(
request, video_id, 'Downloading %s playlist JSON' % format_id,
fatal=False)
}, fatal=False)
if not playlist:
continue
stream_url = playlist.get('streamurl')
if stream_url:
rtmp = re.search(
r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+/))(?P<playpath>mp[34]:.+)',
stream_url)
if rtmp:
formats.append({
'url': rtmp.group('url'),
'app': rtmp.group('app'),
'play_path': rtmp.group('playpath'),
'page_url': url,
'player_url': 'https://www.anime-on-demand.de/assets/jwplayer.flash-55abfb34080700304d49125ce9ffb4a6.swf',
'rtmp_real_time': True,
'format_id': 'rtmp',
'ext': 'flv',
})
continue
start_video = playlist.get('startvideo', 0)
playlist = playlist.get('playlist')
if not playlist or not isinstance(playlist, list):
@@ -222,7 +244,7 @@ class AnimeOnDemandIE(InfoExtractor):
f.update({
'id': '%s-%s' % (f['id'], m.group('kind').lower()),
'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')),
'url': urljoin(url, m.group('href')),
})
entries.append(f)

View File

@@ -3,13 +3,13 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
HEADRequest,
int_or_none,
mimetype2ext,
)
class AparatIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)'
_VALID_URL = r'https?://(?:www\.)?aparat\.com/(?:v/|video/video/embed/videohash/)(?P<id>[a-zA-Z0-9]+)'
_TEST = {
'url': 'http://www.aparat.com/v/wP8On',
@@ -29,30 +29,41 @@ class AparatIE(InfoExtractor):
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
embed_url = 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id
webpage = self._download_webpage(embed_url, video_id)
file_list = self._parse_json(self._search_regex(
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage, 'file list'), video_id)
for i, item in enumerate(file_list[0]):
video_url = item['file']
req = HEADRequest(video_url)
res = self._request_webpage(
req, video_id, note='Testing video URL %d' % i, errnote=False)
if res:
break
else:
raise ExtractorError('No working video URLs found')
webpage = self._download_webpage(
'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
video_id)
title = self._search_regex(r'\s+title:\s*"([^"]+)"', webpage, 'title')
file_list = self._parse_json(
self._search_regex(
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage,
'file list'),
video_id)
formats = []
for item in file_list[0]:
file_url = item.get('file')
if not file_url:
continue
ext = mimetype2ext(item.get('type'))
label = item.get('label')
formats.append({
'url': file_url,
'ext': ext,
'format_id': label or ext,
'height': int_or_none(self._search_regex(
r'(\d+)[pP]', label or '', 'height', default=None)),
})
self._sort_formats(formats)
thumbnail = self._search_regex(
r'image:\s*"([^"]+)"', webpage, 'thumbnail', fatal=False)
return {
'id': video_id,
'title': title,
'url': video_url,
'ext': 'mp4',
'thumbnail': thumbnail,
'age_limit': self._family_friendly_search(webpage),
'formats': formats,
}

View File

@@ -93,6 +93,7 @@ class ARDMediathekIE(InfoExtractor):
duration = int_or_none(media_info.get('_duration'))
thumbnail = media_info.get('_previewImage')
is_live = media_info.get('_isLive') is True
subtitles = {}
subtitle_url = media_info.get('_subtitleUrl')
@@ -106,6 +107,7 @@ class ARDMediathekIE(InfoExtractor):
'id': video_id,
'duration': duration,
'thumbnail': thumbnail,
'is_live': is_live,
'formats': formats,
'subtitles': subtitles,
}
@@ -166,9 +168,11 @@ class ARDMediathekIE(InfoExtractor):
# determine video id from url
m = re.match(self._VALID_URL, url)
document_id = None
numid = re.search(r'documentId=([0-9]+)', url)
if numid:
video_id = numid.group(1)
document_id = video_id = numid.group(1)
else:
video_id = m.group('video_id')
@@ -228,12 +232,16 @@ class ARDMediathekIE(InfoExtractor):
'formats': formats,
}
else: # request JSON file
if not document_id:
video_id = self._search_regex(
r'/play/(?:config|media)/(\d+)', webpage, 'media id')
info = self._extract_media_info(
'http://www.ardmediathek.de/play/media/%s' % video_id, webpage, video_id)
'http://www.ardmediathek.de/play/media/%s' % video_id,
webpage, video_id)
info.update({
'id': video_id,
'title': title,
'title': self._live_title(title) if info.get('is_live') else title,
'description': description,
'thumbnail': thumbnail,
})

View File

@@ -9,12 +9,13 @@ from ..compat import (
compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
find_xpath_attr,
unified_strdate,
get_element_by_attribute,
int_or_none,
NO_DEFAULT,
qualities,
unified_strdate,
)
# There are different sources of video in arte.tv, the extraction process
@@ -79,6 +80,13 @@ class ArteTVBaseIE(InfoExtractor):
info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer']
vsr = player_info['VSR']
if not vsr:
raise ExtractorError(
'Video %s is not available' % player_info.get('VID') or video_id,
expected=True)
upload_date_str = player_info.get('shootingDate')
if not upload_date_str:
upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0]
@@ -107,7 +115,7 @@ class ArteTVBaseIE(InfoExtractor):
langcode = LANGS.get(lang, lang)
formats = []
for format_id, format_dict in player_info['VSR'].items():
for format_id, format_dict in vsr.items():
f = dict(format_dict)
versionCode = f.get('versionCode')
l = re.escape(langcode)

View File

@@ -0,0 +1,93 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import (
extract_attributes,
remove_end,
urlencode_postdata,
)
class AsianCrushIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/video/(?:[^/]+/)?0+(?P<id>\d+)v\b'
_TESTS = [{
'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/',
'md5': 'c3b740e48d0ba002a42c0b72857beae6',
'info_dict': {
'id': '1_y4tmjm5r',
'ext': 'mp4',
'title': 'Women Who Flirt',
'description': 'md5:3db14e9186197857e7063522cb89a805',
'timestamp': 1496936429,
'upload_date': '20170608',
'uploader_id': 'craig@crifkin.com',
},
}, {
'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://www.asiancrush.com/wp-admin/admin-ajax.php', video_id,
data=urlencode_postdata({
'postid': video_id,
'action': 'get_channel_kaltura_vars',
}))
entry_id = data['entry_id']
return self.url_result(
'kaltura:%s:%s' % (data['partner_id'], entry_id),
ie=KalturaIE.ie_key(), video_id=entry_id,
video_title=data.get('vid_label'))
class AsianCrushPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/series/0+(?P<id>\d+)s\b'
_TEST = {
'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/',
'info_dict': {
'id': '12481',
'title': 'Scholar Who Walks the Night',
'description': 'md5:7addd7c5132a09fd4741152d96cce886',
},
'playlist_count': 20,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = []
for mobj in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>%s.*?)\1[^>]*>' % AsianCrushIE._VALID_URL,
webpage):
attrs = extract_attributes(mobj.group(0))
if attrs.get('class') == 'clearfix':
entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key()))
title = remove_end(
self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False),
' | AsianCrush')
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'twitter:description', webpage, 'description', fatal=False)
return self.playlist_result(entries, playlist_id, title, description)

View File

@@ -43,7 +43,7 @@ class AudioBoomIE(InfoExtractor):
def from_clip(field):
if clip:
clip.get(field)
return clip.get(field)
audio_url = from_clip('clipURLPriorToLoading') or self._og_search_property(
'audio', webpage, 'audio url')

View File

@@ -14,14 +14,16 @@ from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
KNOWN_EXTENSIONS,
parse_filesize,
unescapeHTML,
update_url_query,
unified_strdate,
)
class BandcampIE(InfoExtractor):
_VALID_URL = r'https?://.*?\.bandcamp\.com/track/(?P<title>.*)'
_VALID_URL = r'https?://.*?\.bandcamp\.com/track/(?P<title>[^/?#&]+)'
_TESTS = [{
'url': 'http://youtube-dl.bandcamp.com/track/youtube-dl-test-song',
'md5': 'c557841d5e50261777a6585648adf439',
@@ -155,7 +157,7 @@ class BandcampIE(InfoExtractor):
class BandcampAlbumIE(InfoExtractor):
IE_NAME = 'Bandcamp:album'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<album_id>[^?#]+)|/?(?:$|[?#]))'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<album_id>[^/?#&]+))?'
_TESTS = [{
'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
@@ -222,6 +224,12 @@ class BandcampAlbumIE(InfoExtractor):
'playlist_count': 2,
}]
@classmethod
def suitable(cls, url):
return (False
if BandcampWeeklyIE.suitable(url) or BandcampIE.suitable(url)
else super(BandcampAlbumIE, cls).suitable(url))
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
uploader_id = mobj.group('subdomain')
@@ -234,7 +242,12 @@ class BandcampAlbumIE(InfoExtractor):
raise ExtractorError('The page doesn\'t contain any tracks')
# Only tracks with duration info have songs
entries = [
self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key())
self.url_result(
compat_urlparse.urljoin(url, t_path),
ie=BandcampIE.ie_key(),
video_title=self._search_regex(
r'<span\b[^>]+\bitemprop=["\']name["\'][^>]*>([^<]+)',
elem_content, 'track title', fatal=False))
for elem_content, t_path in track_elements
if self._html_search_meta('duration', elem_content, default=None)]
@@ -250,3 +263,92 @@ class BandcampAlbumIE(InfoExtractor):
'title': title,
'entries': entries,
}
class BandcampWeeklyIE(InfoExtractor):
IE_NAME = 'Bandcamp:weekly'
_VALID_URL = r'https?://(?:www\.)?bandcamp\.com/?\?(?:.*?&)?show=(?P<id>\d+)'
_TESTS = [{
'url': 'https://bandcamp.com/?show=224',
'md5': 'b00df799c733cf7e0c567ed187dea0fd',
'info_dict': {
'id': '224',
'ext': 'opus',
'title': 'BC Weekly April 4th 2017 - Magic Moments',
'description': 'md5:5d48150916e8e02d030623a48512c874',
'duration': 5829.77,
'release_date': '20170404',
'series': 'Bandcamp Weekly',
'episode': 'Magic Moments',
'episode_number': 208,
'episode_id': '224',
}
}, {
'url': 'https://bandcamp.com/?blah/blah@&show=228',
'only_matching': True
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
blob = self._parse_json(
self._search_regex(
r'data-blob=(["\'])(?P<blob>{.+?})\1', webpage,
'blob', group='blob'),
video_id, transform_source=unescapeHTML)
show = blob['bcw_show']
# This is desired because any invalid show id redirects to `bandcamp.com`
# which happens to expose the latest Bandcamp Weekly episode.
show_id = int_or_none(show.get('show_id')) or int_or_none(video_id)
formats = []
for format_id, format_url in show['audio_stream'].items():
if not isinstance(format_url, compat_str):
continue
for known_ext in KNOWN_EXTENSIONS:
if known_ext in format_id:
ext = known_ext
break
else:
ext = None
formats.append({
'format_id': format_id,
'url': format_url,
'ext': ext,
'vcodec': 'none',
})
self._sort_formats(formats)
title = show.get('audio_title') or 'Bandcamp Weekly'
subtitle = show.get('subtitle')
if subtitle:
title += ' - %s' % subtitle
episode_number = None
seq = blob.get('bcw_seq')
if seq and isinstance(seq, list):
try:
episode_number = next(
int_or_none(e.get('episode_number'))
for e in seq
if isinstance(e, dict) and int_or_none(e.get('id')) == show_id)
except StopIteration:
pass
return {
'id': video_id,
'title': title,
'description': show.get('desc') or show.get('short_desc'),
'duration': float_or_none(show.get('audio_duration')),
'is_live': False,
'release_date': unified_strdate(show.get('published_date')),
'series': 'Bandcamp Weekly',
'episode': show.get('subtitle'),
'episode_number': episode_number,
'episode_id': compat_str(video_id),
'formats': formats
}

View File

@@ -29,15 +29,16 @@ from ..compat import (
class BBCCoUkIE(InfoExtractor):
IE_NAME = 'bbc.co.uk'
IE_DESC = 'BBC iPlayer'
_ID_REGEX = r'[pb][\da-z]{7}'
_ID_REGEX = r'[pbw][\da-z]{7}'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?bbc\.co\.uk/
(?:
programmes/(?!articles/)|
iplayer(?:/[^/]+)?/(?:episode/|playlist/)|
music/clips[/#]|
radio/player/
music/(?:clips|audiovideo/popular)[/#]|
radio/player/|
events/[^/]+/play/[^/]+/
)
(?P<id>%s)(?!/(?:episodes|broadcasts|clips))
''' % _ID_REGEX
@@ -229,8 +230,13 @@ class BBCCoUkIE(InfoExtractor):
}, {
'url': 'http://www.bbc.co.uk/radio/player/p03cchwf',
'only_matching': True,
}
]
}, {
'url': 'https://www.bbc.co.uk/music/audiovideo/popular#p055bc55',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/programmes/w3csv1y9',
'only_matching': True,
}]
_USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
@@ -523,6 +529,12 @@ class BBCCoUkIE(InfoExtractor):
webpage = self._download_webpage(url, group_id, 'Downloading video page')
error = self._search_regex(
r'<div\b[^>]+\bclass=["\']smp__message delta["\'][^>]*>([^<]+)<',
webpage, 'error', default=None)
if error:
raise ExtractorError(error, expected=True)
programme_id = None
duration = None

View File

@@ -9,6 +9,7 @@ from ..compat import (
from ..utils import (
int_or_none,
parse_iso8601,
urljoin,
)
@@ -36,9 +37,11 @@ class BeegIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
cpl_url = self._search_regex(
r'<script[^>]+src=(["\'])(?P<url>(?:https?:)?//static\.beeg\.com/cpl/\d+\.js.*?)\1',
r'<script[^>]+src=(["\'])(?P<url>(?:/static|(?:https?:)?//static\.beeg\.com)/cpl/\d+\.js.*?)\1',
webpage, 'cpl', default=None, group='url')
cpl_url = urljoin(url, cpl_url)
beeg_version, beeg_salt = [None] * 2
if cpl_url:
@@ -54,12 +57,16 @@ class BeegIE(InfoExtractor):
r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg salt',
default=None, group='beeg_salt')
beeg_version = beeg_version or '2000'
beeg_version = beeg_version or '2185'
beeg_salt = beeg_salt or 'pmweAkq8lAYKdfWcFCUj0yoVgoPlinamH5UE1CB3H'
video = self._download_json(
'https://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id),
video_id)
for api_path in ('', 'api.'):
video = self._download_json(
'https://%sbeeg.com/api/v6/%s/video/%s'
% (api_path, beeg_version, video_id), video_id,
fatal=api_path == 'api.')
if video:
break
def split(o, e):
def cut(s, x):

View File

@@ -54,6 +54,22 @@ class BiliBiliIE(InfoExtractor):
'description': '如果你是神明并且能够让妄想成为现实。那你会进行怎么样的妄想是淫靡的世界独裁社会毁灭性的制裁还是……2015年涩谷。从6年前发生的大灾害“涩谷地震”之后复兴了的这个街区里新设立的私立高中...',
},
'skip': 'Geo-restricted to China',
}, {
# Title with double quotes
'url': 'http://www.bilibili.com/video/av8903802/',
'info_dict': {
'id': '8903802',
'ext': 'mp4',
'title': '阿滴英文|英文歌分享#6 "Closer',
'description': '滴妹今天唱Closer給你聽! 有史以来,被推最多次也是最久的歌曲,其实歌词跟我原本想像差蛮多的,不过还是好听! 微博@阿滴英文',
'uploader': '阿滴英文',
'uploader_id': '65880958',
'timestamp': 1488382620,
'upload_date': '20170301',
},
'params': {
'skip_download': True, # Test metadata only
},
}]
_APP_KEY = '84956560bc028eb7'
@@ -135,7 +151,7 @@ class BiliBiliIE(InfoExtractor):
'formats': formats,
})
title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title')
title = self._html_search_regex('<h1[^>]*>([^<]+)</h1>', webpage, 'title')
description = self._html_search_meta('description', webpage)
timestamp = unified_timestamp(self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', default=None))

View File

@@ -33,13 +33,18 @@ class BpbIE(InfoExtractor):
title = self._html_search_regex(
r'<h2 class="white">(.*?)</h2>', webpage, 'title')
video_info_dicts = re.findall(
r"({\s*src:\s*'http://film\.bpb\.de/[^}]+})", webpage)
r"({\s*src\s*:\s*'https?://film\.bpb\.de/[^}]+})", webpage)
formats = []
for video_info in video_info_dicts:
video_info = self._parse_json(video_info, video_id, transform_source=js_to_json)
quality = video_info['quality']
video_url = video_info['src']
video_info = self._parse_json(
video_info, video_id, transform_source=js_to_json, fatal=False)
if not video_info:
continue
video_url = video_info.get('src')
if not video_url:
continue
quality = 'high' if '_high' in video_url else 'low'
formats.append({
'url': video_url,
'preference': 10 if quality == 'high' else 0,

View File

@@ -84,9 +84,10 @@ class BuzzFeedIE(InfoExtractor):
continue
entries.append(self.url_result(video['url']))
facebook_url = FacebookIE._extract_url(webpage)
if facebook_url:
entries.append(self.url_result(facebook_url))
facebook_urls = FacebookIE._extract_urls(webpage)
entries.extend([
self.url_result(facebook_url)
for facebook_url in facebook_urls])
return {
'_type': 'playlist',

View File

@@ -3,24 +3,104 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import float_or_none
from ..utils import (
float_or_none,
strip_or_none,
)
class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet)/assets/(?P<id>m[dz]-ast-[^/?#&]+)'
_TESTS = [{
'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'md5': '90139b746a0a9bd7bb631283f6e2a64e',
'info_dict': {
'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'ext': 'flv',
'title': 'Nachtwacht: De Greystook',
'description': 'md5:1db3f5dc4c7109c821261e7512975be7',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1468.03,
},
'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
}, {
'url': 'https://mediazone.vrt.be/api/v1/canvas/assets/mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
site_id, video_id = mobj.group('site_id'), mobj.group('id')
data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), video_id)
title = data['title']
description = data.get('description')
formats = []
for target in data['targetUrls']:
format_url, format_type = target.get('url'), target.get('type')
if not format_url or not format_type:
continue
if format_type == 'HLS':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_type, fatal=False))
elif format_type == 'HDS':
formats.extend(self._extract_f4m_formats(
format_url, video_id, f4m_id=format_type, fatal=False))
elif format_type == 'MPEG_DASH':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id=format_type, fatal=False))
elif format_type == 'HSS':
formats.extend(self._extract_ism_formats(
format_url, video_id, ism_id='mss', fatal=False))
else:
formats.append({
'format_id': format_type,
'url': format_url,
})
self._sort_formats(formats)
subtitles = {}
subtitle_urls = data.get('subtitleUrls')
if isinstance(subtitle_urls, list):
for subtitle in subtitle_urls:
subtitle_url = subtitle.get('url')
if subtitle_url and subtitle.get('type') == 'CLOSED':
subtitles.setdefault('nl', []).append({'url': subtitle_url})
return {
'id': video_id,
'display_id': video_id,
'title': title,
'description': description,
'formats': formats,
'duration': float_or_none(data.get('duration'), 1000),
'thumbnail': data.get('posterImageUrl'),
'subtitles': subtitles,
}
class CanvasEenIE(InfoExtractor):
IE_DESC = 'canvas.be and een.be'
_VALID_URL = r'https?://(?:www\.)?(?P<site_id>canvas|een)\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
'md5': 'ea838375a547ac787d4064d8c7860a6c',
'md5': 'ed66976748d12350b118455979cca293',
'info_dict': {
'id': 'mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e',
'display_id': 'de-afspraak-veilt-voor-de-warmste-week',
'ext': 'mp4',
'ext': 'flv',
'title': 'De afspraak veilt voor de Warmste Week',
'description': 'md5:24cb860c320dc2be7358e0e5aa317ba6',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 49.02,
}
},
'expected_warnings': ['is not a supported codec'],
}, {
# with subtitles
'url': 'http://www.canvas.be/video/panorama/2016/pieter-0167',
@@ -40,7 +120,8 @@ class CanvasIE(InfoExtractor):
},
'params': {
'skip_download': True,
}
},
'skip': 'Pagina niet gevonden',
}, {
'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
'info_dict': {
@@ -54,7 +135,8 @@ class CanvasIE(InfoExtractor):
},
'params': {
'skip_download': True,
}
},
'skip': 'Episode no longer available',
}, {
'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
'only_matching': True,
@@ -66,55 +148,21 @@ class CanvasIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
title = (self._search_regex(
title = strip_or_none(self._search_regex(
r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>',
webpage, 'title', default=None) or self._og_search_title(
webpage)).strip()
webpage, default=None))
video_id = self._html_search_regex(
r'data-video=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id', group='id')
data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), display_id)
formats = []
for target in data['targetUrls']:
format_url, format_type = target.get('url'), target.get('type')
if not format_url or not format_type:
continue
if format_type == 'HLS':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, entry_protocol='m3u8_native',
ext='mp4', preference=0, fatal=False, m3u8_id=format_type))
elif format_type == 'HDS':
formats.extend(self._extract_f4m_formats(
format_url, display_id, f4m_id=format_type, fatal=False))
elif format_type == 'MPEG_DASH':
formats.extend(self._extract_mpd_formats(
format_url, display_id, mpd_id=format_type, fatal=False))
else:
formats.append({
'format_id': format_type,
'url': format_url,
})
self._sort_formats(formats)
subtitles = {}
subtitle_urls = data.get('subtitleUrls')
if isinstance(subtitle_urls, list):
for subtitle in subtitle_urls:
subtitle_url = subtitle.get('url')
if subtitle_url and subtitle.get('type') == 'CLOSED':
subtitles.setdefault('nl', []).append({'url': subtitle_url})
r'data-video=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id',
group='id')
return {
'_type': 'url_transparent',
'url': 'https://mediazone.vrt.be/api/v1/%s/assets/%s' % (site_id, video_id),
'ie_key': CanvasIE.ie_key(),
'id': video_id,
'display_id': display_id,
'title': title,
'description': self._og_search_description(webpage),
'formats': formats,
'duration': float_or_none(data.get('duration'), 1000),
'thumbnail': data.get('posterImageUrl'),
'subtitles': subtitles,
}

View File

@@ -200,6 +200,7 @@ class CBCWatchBaseIE(InfoExtractor):
'media': 'http://search.yahoo.com/mrss/',
'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
}
_GEO_COUNTRIES = ['CA']
def _call_api(self, path, video_id):
url = path if path.startswith('http') else self._API_BASE_URL + path
@@ -287,6 +288,11 @@ class CBCWatchBaseIE(InfoExtractor):
class CBCWatchVideoIE(CBCWatchBaseIE):
IE_NAME = 'cbc.ca:watch:video'
_VALID_URL = r'https?://api-cbc\.cloud\.clearleap\.com/cloffice/client/web/play/?\?.*?\bcontentId=(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TEST = {
# geo-restricted to Canada, bypassable
'url': 'https://api-cbc.cloud.clearleap.com/cloffice/client/web/play/?contentId=3c84472a-1eea-4dee-9267-2655d5055dcf&categoryId=ebc258f5-ee40-4cca-b66b-ba6bd55b7235',
'only_matching': True,
}
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -323,9 +329,10 @@ class CBCWatchIE(CBCWatchBaseIE):
IE_NAME = 'cbc.ca:watch'
_VALID_URL = r'https?://watch\.cbc\.ca/(?:[^/]+/)+(?P<id>[0-9a-f-]+)'
_TESTS = [{
# geo-restricted to Canada, bypassable
'url': 'http://watch.cbc.ca/doc-zone/season-6/customer-disservice/38e815a-009e3ab12e4',
'info_dict': {
'id': '38e815a-009e3ab12e4',
'id': '9673749a-5e77-484c-8b62-a1092a6b5168',
'ext': 'mp4',
'title': 'Customer (Dis)Service',
'description': 'md5:8bdd6913a0fe03d4b2a17ebe169c7c87',
@@ -337,8 +344,8 @@ class CBCWatchIE(CBCWatchBaseIE):
'skip_download': True,
'format': 'bestvideo',
},
'skip': 'Geo-restricted to Canada',
}, {
# geo-restricted to Canada, bypassable
'url': 'http://watch.cbc.ca/arthur/all/1ed4b385-cd84-49cf-95f0-80f004680057',
'info_dict': {
'id': '1ed4b385-cd84-49cf-95f0-80f004680057',
@@ -346,7 +353,6 @@ class CBCWatchIE(CBCWatchBaseIE):
'description': 'Arthur, the sweetest 8-year-old aardvark, and his pals solve all kinds of problems with humour, kindness and teamwork.',
},
'playlist_mincount': 30,
'skip': 'Geo-restricted to Canada',
}]
def _real_extract(self, url):

View File

@@ -15,19 +15,23 @@ class CBSNewsIE(CBSIE):
_TESTS = [
{
'url': 'http://www.cbsnews.com/news/tesla-and-spacex-elon-musks-industrial-empire/',
# 60 minutes
'url': 'http://www.cbsnews.com/news/artificial-intelligence-positioned-to-be-a-game-changer/',
'info_dict': {
'id': 'tesla-and-spacex-elon-musks-industrial-empire',
'ext': 'flv',
'title': 'Tesla and SpaceX: Elon Musk\'s industrial empire',
'thumbnail': 'http://beta.img.cbsnews.com/i/2014/03/30/60147937-2f53-4565-ad64-1bdd6eb64679/60-0330-pelley-640x360.jpg',
'duration': 791,
'id': '_B6Ga3VJrI4iQNKsir_cdFo9Re_YJHE_',
'ext': 'mp4',
'title': 'Artificial Intelligence',
'description': 'md5:8818145f9974431e0fb58a1b8d69613c',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1606,
'uploader': 'CBSI-NEW',
'timestamp': 1498431900,
'upload_date': '20170625',
},
'params': {
# rtmp download
# m3u8 download
'skip_download': True,
},
'skip': 'Subscribers only',
},
{
'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
@@ -52,6 +56,22 @@ class CBSNewsIE(CBSIE):
'skip_download': True,
},
},
{
# 48 hours
'url': 'http://www.cbsnews.com/news/maria-ridulph-murder-will-the-nations-oldest-cold-case-to-go-to-trial-ever-get-solved/',
'info_dict': {
'id': 'QpM5BJjBVEAUFi7ydR9LusS69DPLqPJ1',
'ext': 'mp4',
'title': 'Cold as Ice',
'description': 'Can a childhood memory of a friend\'s murder solve a 1957 cold case? "48 Hours" correspondent Erin Moriarty has the latest.',
'upload_date': '20170604',
'timestamp': 1496538000,
'uploader': 'CBSI-NEW',
},
'params': {
'skip_download': True,
},
},
]
def _real_extract(self, url):
@@ -60,7 +80,7 @@ class CBSNewsIE(CBSIE):
webpage = self._download_webpage(url, video_id)
video_info = self._parse_json(self._html_search_regex(
r'(?:<ul class="media-list items" id="media-related-items"><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
r'(?:<ul class="media-list items" id="media-related-items"[^>]*><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
webpage, 'video JSON info', default='{}'), video_id, fatal=False)
if video_info:

View File

@@ -124,7 +124,7 @@ class CDAIE(InfoExtractor):
}
def extract_format(page, version):
json_str = self._search_regex(
json_str = self._html_search_regex(
r'player_data=(\\?["\'])(?P<player_data>.+?)\1', page,
'%s player_json' % version, fatal=False, group='player_data')
if not json_str:

View File

@@ -5,7 +5,7 @@ from ..utils import remove_end
class CharlieRoseIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?charlierose\.com/video(?:s|/player)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?charlierose\.com/(?:video|episode)(?:s|/player)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://charlierose.com/videos/27996',
'md5': 'fda41d49e67d4ce7c2411fd2c4702e09',
@@ -24,6 +24,9 @@ class CharlieRoseIE(InfoExtractor):
}, {
'url': 'https://charlierose.com/videos/27996',
'only_matching': True,
}, {
'url': 'https://charlierose.com/episodes/30887?autoplay=true',
'only_matching': True,
}]
_PLAYER_BASE = 'https://charlierose.com/video/player/%s'

View File

@@ -5,6 +5,7 @@ import base64
import json
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import (
clean_html,
ExtractorError
@@ -70,11 +71,9 @@ class ChilloutzoneIE(InfoExtractor):
# If nativePlatform is None a fallback mechanism is used (i.e. youtube embed)
if native_platform is None:
youtube_url = self._html_search_regex(
r'<iframe.* src="((?:https?:)?//(?:[^.]+\.)?youtube\.com/.+?)"',
webpage, 'fallback video URL', default=None)
if youtube_url is not None:
return self.url_result(youtube_url, ie='Youtube')
youtube_url = YoutubeIE._extract_url(webpage)
if youtube_url:
return self.url_result(youtube_url, ie=YoutubeIE.ie_key())
# Non Fallback: Decide to use native source (e.g. youtube or vimeo) or
# the own CDN

View File

@@ -9,12 +9,20 @@ from ..utils import (
class CinchcastIE(InfoExtractor):
_VALID_URL = r'https?://player\.cinchcast\.com/.*?assetId=(?P<id>[0-9]+)'
_TEST = {
_VALID_URL = r'https?://player\.cinchcast\.com/.*?(?:assetId|show_id)=(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://player.cinchcast.com/?show_id=5258197&platformId=1&assetType=single',
'info_dict': {
'id': '5258197',
'ext': 'mp3',
'title': 'Train Your Brain to Up Your Game with Coach Mandy',
'upload_date': '20130816',
},
}, {
# Actual test is run in generic, look for undergroundwellness
'url': 'http://player.cinchcast.com/?platformId=1&#038;assetType=single&#038;assetId=7141703',
'only_matching': True,
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -0,0 +1,72 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
unescapeHTML,
)
class CJSWIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cjsw\.com/program/(?P<program>[^/]+)/episode/(?P<id>\d+)'
_TESTS = [{
'url': 'http://cjsw.com/program/freshly-squeezed/episode/20170620',
'md5': 'cee14d40f1e9433632c56e3d14977120',
'info_dict': {
'id': '91d9f016-a2e7-46c5-8dcb-7cbcd7437c41',
'ext': 'mp3',
'title': 'Freshly Squeezed Episode June 20, 2017',
'description': 'md5:c967d63366c3898a80d0c7b0ff337202',
'series': 'Freshly Squeezed',
'episode_id': '20170620',
},
}, {
# no description
'url': 'http://cjsw.com/program/road-pops/episode/20170707/',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
program, episode_id = mobj.group('program', 'id')
audio_id = '%s/%s' % (program, episode_id)
webpage = self._download_webpage(url, episode_id)
title = unescapeHTML(self._search_regex(
(r'<h1[^>]+class=["\']episode-header__title["\'][^>]*>(?P<title>[^<]+)',
r'data-audio-title=(["\'])(?P<title>(?:(?!\1).)+)\1'),
webpage, 'title', group='title'))
audio_url = self._search_regex(
r'<button[^>]+data-audio-src=(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'audio url', group='url')
audio_id = self._search_regex(
r'/([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})\.mp3',
audio_url, 'audio id', default=audio_id)
formats = [{
'url': audio_url,
'ext': determine_ext(audio_url, 'mp3'),
'vcodec': 'none',
}]
description = self._html_search_regex(
r'<p>(?P<description>.+?)</p>', webpage, 'description',
default=None)
series = self._search_regex(
r'data-showname=(["\'])(?P<name>(?:(?!\1).)+)\1', webpage,
'series', default=program, group='name')
return {
'id': audio_id,
'title': title,
'description': description,
'formats': formats,
'series': series,
'episode_id': episode_id,
}

View File

@@ -1,67 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
)
class ClipfishIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clipfish\.de/(?:[^/]+/)+video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.clipfish.de/special/ugly-americans/video/4343170/s01-e01-ugly-americans-date-in-der-hoelle/',
'md5': 'b9a5dc46294154c1193e2d10e0c95693',
'info_dict': {
'id': '4343170',
'ext': 'mp4',
'title': 'S01 E01 - Ugly Americans - Date in der Hölle',
'description': 'Mark Lilly arbeitet im Sozialdienst der Stadt New York und soll Immigranten bei ihrer Einbürgerung in die USA zur Seite stehen.',
'upload_date': '20161005',
'duration': 1291,
'view_count': int,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_info = self._download_json(
'http://www.clipfish.de/devapi/id/%s?format=json&apikey=hbbtv' % video_id,
video_id)['items'][0]
formats = []
m3u8_url = video_info.get('media_videourl_hls')
if m3u8_url:
formats.append({
'url': m3u8_url.replace('de.hls.fra.clipfish.de', 'hls.fra.clipfish.de'),
'ext': 'mp4',
'format_id': 'hls',
})
mp4_url = video_info.get('media_videourl')
if mp4_url:
formats.append({
'url': mp4_url,
'format_id': 'mp4',
'width': int_or_none(video_info.get('width')),
'height': int_or_none(video_info.get('height')),
'tbr': int_or_none(video_info.get('bitrate')),
})
descr = video_info.get('descr')
if descr:
descr = descr.strip()
return {
'id': video_id,
'title': video_info['title'],
'description': descr,
'formats': formats,
'thumbnail': video_info.get('media_content_thumbnail_large') or video_info.get('media_thumbnail'),
'duration': int_or_none(video_info.get('media_length')),
'upload_date': unified_strdate(video_info.get('pubDate')),
'view_count': int_or_none(video_info.get('media_views'))
}

View File

@@ -0,0 +1,74 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
parse_iso8601,
qualities,
)
import re
class ClippitIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clippituser\.tv/c/(?P<id>[a-z]+)'
_TEST = {
'url': 'https://www.clippituser.tv/c/evmgm',
'md5': '963ae7a59a2ec4572ab8bf2f2d2c5f09',
'info_dict': {
'id': 'evmgm',
'ext': 'mp4',
'title': 'Bye bye Brutus. #BattleBots - Clippit',
'uploader': 'lizllove',
'uploader_url': 'https://www.clippituser.tv/p/lizllove',
'timestamp': 1472183818,
'upload_date': '20160826',
'description': 'BattleBots | ABC',
'thumbnail': r're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<title.*>(.+?)</title>', webpage, 'title')
FORMATS = ('sd', 'hd')
quality = qualities(FORMATS)
formats = []
for format_id in FORMATS:
url = self._html_search_regex(r'data-%s-file="(.+?)"' % format_id,
webpage, 'url', fatal=False)
if not url:
continue
match = re.search(r'/(?P<height>\d+)\.mp4', url)
formats.append({
'url': url,
'format_id': format_id,
'quality': quality(format_id),
'height': int(match.group('height')) if match else None,
})
uploader = self._html_search_regex(r'class="username".*>\s+(.+?)\n',
webpage, 'uploader', fatal=False)
uploader_url = ('https://www.clippituser.tv/p/' + uploader
if uploader else None)
timestamp = self._html_search_regex(r'datetime="(.+?)"',
webpage, 'date', fatal=False)
thumbnail = self._html_search_regex(r'data-image="(.+?)"',
webpage, 'thumbnail', fatal=False)
return {
'id': video_id,
'title': title,
'formats': formats,
'uploader': uploader,
'uploader_url': uploader_url,
'timestamp': parse_iso8601(timestamp),
'description': self._og_search_description(webpage),
'thumbnail': thumbnail,
}

View File

@@ -30,7 +30,11 @@ class CloudyIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://www.cloudy.ec/embed.php?id=%s' % video_id, video_id)
'https://www.cloudy.ec/embed.php', video_id, query={
'id': video_id,
'playerPage': 1,
'autoplay': 1,
})
info = self._parse_html5_media_entries(url, webpage, video_id)[0]

View File

@@ -120,13 +120,16 @@ class ComedyCentralTVIE(MTVServicesInfoExtractor):
class ComedyCentralShortnameIE(InfoExtractor):
_VALID_URL = r'^:(?P<id>tds|thedailyshow)$'
_VALID_URL = r'^:(?P<id>tds|thedailyshow|theopposition)$'
_TESTS = [{
'url': ':tds',
'only_matching': True,
}, {
'url': ':thedailyshow',
'only_matching': True,
}, {
'url': ':theopposition',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -134,5 +137,6 @@ class ComedyCentralShortnameIE(InfoExtractor):
shortcut_map = {
'tds': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'thedailyshow': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'theopposition': 'http://www.cc.com/shows/the-opposition-with-jordan-klepper/full-episodes',
}
return self.url_result(shortcut_map[video_id])

View File

@@ -27,6 +27,7 @@ from ..compat import (
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
compat_xml_parse_error,
)
from ..downloader.f4m import remove_encrypted_media
from ..utils import (
@@ -376,7 +377,7 @@ class InfoExtractor(object):
cls._VALID_URL_RE = re.compile(cls._VALID_URL)
m = cls._VALID_URL_RE.match(url)
assert m
return m.group('id')
return compat_str(m.group('id'))
@classmethod
def working(cls):
@@ -420,7 +421,7 @@ class InfoExtractor(object):
if country_code:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country_code)
if self._downloader.params.get('verbose', False):
self._downloader.to_stdout(
self._downloader.to_screen(
'[debug] Using fake IP %s (%s) as X-Forwarded-For.'
% (self._x_forwarded_for_ip, country_code.upper()))
@@ -646,15 +647,29 @@ class InfoExtractor(object):
def _download_xml(self, url_or_request, video_id,
note='Downloading XML', errnote='Unable to download XML',
transform_source=None, fatal=True, encoding=None, data=None, headers={}, query={}):
transform_source=None, fatal=True, encoding=None,
data=None, headers={}, query={}):
"""Return the xml as an xml.etree.ElementTree.Element"""
xml_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal, encoding=encoding, data=data, headers=headers, query=query)
url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query)
if xml_string is False:
return xml_string
return self._parse_xml(
xml_string, video_id, transform_source=transform_source,
fatal=fatal)
def _parse_xml(self, xml_string, video_id, transform_source=None, fatal=True):
if transform_source:
xml_string = transform_source(xml_string)
return compat_etree_fromstring(xml_string.encode('utf-8'))
try:
return compat_etree_fromstring(xml_string.encode('utf-8'))
except compat_xml_parse_error as ve:
errmsg = '%s: Failed to parse XML ' % video_id
if fatal:
raise ExtractorError(errmsg, cause=ve)
else:
self.report_warning(errmsg + str(ve))
def _download_json(self, url_or_request, video_id,
note='Downloading JSON metadata',
@@ -730,12 +745,12 @@ class InfoExtractor(object):
video_info['title'] = video_title
return video_info
def playlist_from_matches(self, matches, video_id, video_title, getter=None, ie=None):
urlrs = orderedSet(
def playlist_from_matches(self, matches, playlist_id=None, playlist_title=None, getter=None, ie=None):
urls = orderedSet(
self.url_result(self._proto_relative_url(getter(m) if getter else m), ie)
for m in matches)
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
urls, playlist_id=playlist_id, playlist_title=playlist_title)
@staticmethod
def playlist_result(entries, playlist_id=None, playlist_title=None, playlist_description=None):
@@ -940,7 +955,8 @@ class InfoExtractor(object):
def _family_friendly_search(self, html):
# See http://schema.org/VideoObject
family_friendly = self._html_search_meta('isFamilyFriendly', html)
family_friendly = self._html_search_meta(
'isFamilyFriendly', html, default=None)
if not family_friendly:
return None
@@ -1002,17 +1018,17 @@ class InfoExtractor(object):
item_type = e.get('@type')
if expected_type is not None and expected_type != item_type:
return info
if item_type == 'TVEpisode':
if item_type in ('TVEpisode', 'Episode'):
info.update({
'episode': unescapeHTML(e.get('name')),
'episode_number': int_or_none(e.get('episodeNumber')),
'description': unescapeHTML(e.get('description')),
})
part_of_season = e.get('partOfSeason')
if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
if isinstance(part_of_season, dict) and part_of_season.get('@type') in ('TVSeason', 'Season', 'CreativeWorkSeason'):
info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Article':
info.update({
@@ -1022,10 +1038,10 @@ class InfoExtractor(object):
})
elif item_type == 'VideoObject':
extract_video_object(e)
elif item_type == 'WebPage':
video = e.get('video')
if isinstance(video, dict) and video.get('@type') == 'VideoObject':
extract_video_object(video)
continue
video = e.get('video')
if isinstance(video, dict) and video.get('@type') == 'VideoObject':
extract_video_object(video)
break
return dict((k, v) for k, v in info.items() if v is not None)
@@ -1785,7 +1801,7 @@ class InfoExtractor(object):
ms_info['timescale'] = int(timescale)
segment_duration = source.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
ms_info['segment_duration'] = float(segment_duration)
def extract_Initialization(source):
initialization = source.find(_add_ns('Initialization'))
@@ -1892,19 +1908,23 @@ class InfoExtractor(object):
'Bandwidth': bandwidth,
}
def location_key(location):
return 'url' if re.match(r'^https?://', location) else 'path'
if 'segment_urls' not in representation_ms_info and 'media' in representation_ms_info:
media_template = prepare_template('media', ('Number', 'Bandwidth', 'Time'))
media_location_key = location_key(media_template)
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
# can't be used at the same time
if '%(Number' in media_template and 's' not in representation_ms_info:
segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration':
if 'total_number' not in representation_ms_info and 'segment_duration' in representation_ms_info:
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
representation_ms_info['fragments'] = [{
'url': media_template % {
media_location_key: media_template % {
'Number': segment_number,
'Bandwidth': bandwidth,
},
@@ -1928,7 +1948,7 @@ class InfoExtractor(object):
'Number': segment_number,
}
representation_ms_info['fragments'].append({
'url': segment_url,
media_location_key: segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
})
@@ -1952,8 +1972,9 @@ class InfoExtractor(object):
for s in representation_ms_info['s']:
duration = float_or_none(s['d'], timescale)
for r in range(s.get('r', 0) + 1):
segment_uri = representation_ms_info['segment_urls'][segment_index]
fragments.append({
'url': representation_ms_info['segment_urls'][segment_index],
location_key(segment_uri): segment_uri,
'duration': duration,
})
segment_index += 1
@@ -1962,6 +1983,7 @@ class InfoExtractor(object):
# No fragments key is present in this case.
if 'fragments' in representation_ms_info:
f.update({
'fragment_base_url': base_url,
'fragments': [],
'protocol': 'http_dash_segments',
})
@@ -1969,10 +1991,8 @@ class InfoExtractor(object):
initialization_url = representation_ms_info['initialization_url']
if not f.get('url'):
f['url'] = initialization_url
f['fragments'].append({'url': initialization_url})
f['fragments'].append({location_key(initialization_url): initialization_url})
f['fragments'].extend(representation_ms_info['fragments'])
for fragment in f['fragments']:
fragment['url'] = urljoin(base_url, fragment['url'])
try:
existing_format = next(
fo for fo in formats
@@ -2110,19 +2130,19 @@ class InfoExtractor(object):
return f
return {}
def _media_formats(src, cur_media_type):
def _media_formats(src, cur_media_type, type_info={}):
full_url = absolute_url(src)
ext = determine_ext(full_url)
ext = type_info.get('ext') or determine_ext(full_url)
if ext == 'm3u8':
is_plain_url = False
formats = self._extract_m3u8_formats(
full_url, video_id, ext='mp4',
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id,
preference=preference)
preference=preference, fatal=False)
elif ext == 'mpd':
is_plain_url = False
formats = self._extract_mpd_formats(
full_url, video_id, mpd_id=mpd_id)
full_url, video_id, mpd_id=mpd_id, fatal=False)
else:
is_plain_url = True
formats = [{
@@ -2132,15 +2152,18 @@ class InfoExtractor(object):
return is_plain_url, formats
entries = []
# amp-video and amp-audio are very similar to their HTML5 counterparts
# so we wll include them right here (see
# https://www.ampproject.org/docs/reference/components/amp-video)
media_tags = [(media_tag, media_type, '')
for media_tag, media_type
in re.findall(r'(?s)(<(video|audio)[^>]*/>)', webpage)]
in re.findall(r'(?s)(<(?:amp-)?(video|audio)[^>]*/>)', webpage)]
media_tags.extend(re.findall(
# We only allow video|audio followed by a whitespace or '>'.
# Allowing more characters may end up in significant slow down (see
# https://github.com/rg3/youtube-dl/issues/11979, example URL:
# http://www.porntrex.com/maps/videositemap.xml).
r'(?s)(<(?P<tag>video|audio)(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage))
r'(?s)(<(?P<tag>(?:amp-)?(?:video|audio))(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage))
for media_tag, media_type, media_content in media_tags:
media_info = {
'formats': [],
@@ -2158,9 +2181,15 @@ class InfoExtractor(object):
src = source_attributes.get('src')
if not src:
continue
is_plain_url, formats = _media_formats(src, media_type)
f = parse_content_type(source_attributes.get('type'))
is_plain_url, formats = _media_formats(src, media_type, f)
if is_plain_url:
f = parse_content_type(source_attributes.get('type'))
# res attribute is not standard but seen several times
# in the wild
f.update({
'height': int_or_none(source_attributes.get('res')),
'format_id': source_attributes.get('label'),
})
f.update(formats[0])
media_info['formats'].append(f)
else:
@@ -2299,6 +2328,8 @@ class InfoExtractor(object):
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if not isinstance(track, dict):
continue
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
@@ -2328,6 +2359,8 @@ class InfoExtractor(object):
urls = []
formats = []
for source in jwplayer_sources_data:
if not isinstance(source, dict):
continue
source_url = self._proto_relative_url(source.get('file'))
if not source_url:
continue
@@ -2416,10 +2449,12 @@ class InfoExtractor(object):
self._downloader.report_warning(msg)
return res
def _set_cookie(self, domain, name, value, expire_time=None):
def _set_cookie(self, domain, name, value, expire_time=None, port=None,
path='/', secure=False, discard=False, rest={}, **kwargs):
cookie = compat_cookiejar.Cookie(
0, name, value, None, None, domain, None,
None, '/', True, False, expire_time, '', None, None, None)
0, name, value, port, port is not None, domain, True,
domain.startswith('.'), path, True, secure, expire_time,
discard, None, None, rest)
self._downloader.cookiejar.set_cookie(cookie)
def _get_cookies(self, url):

View File

@@ -116,16 +116,16 @@ class CondeNastIE(InfoExtractor):
entries = [self.url_result(build_url(path), 'CondeNast') for path in paths]
return self.playlist_result(entries, playlist_title=title)
def _extract_video_params(self, webpage):
query = {}
params = self._search_regex(
r'(?s)var params = {(.+?)}[;,]', webpage, 'player params', default=None)
if params:
query.update({
'videoId': self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id'),
'playerId': self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id'),
'target': self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target'),
})
def _extract_video_params(self, webpage, display_id):
query = self._parse_json(
self._search_regex(
r'(?s)var\s+params\s*=\s*({.+?})[;,]', webpage, 'player params',
default='{}'),
display_id, transform_source=js_to_json, fatal=False)
if query:
query['videoId'] = self._search_regex(
r'(?:data-video-id=|currentVideoId\s*=\s*)["\']([\da-f]+)',
webpage, 'video id', default=None)
else:
params = extract_attributes(self._search_regex(
r'(<[^>]+data-js="video-player"[^>]+>)',
@@ -141,17 +141,27 @@ class CondeNastIE(InfoExtractor):
video_id = params['videoId']
video_info = None
if params.get('playerId'):
info_page = self._download_json(
'http://player.cnevids.com/player/video.js',
video_id, 'Downloading video info', fatal=False, query=params)
if info_page:
video_info = info_page.get('video')
if not video_info:
info_page = self._download_webpage(
'http://player.cnevids.com/player/loader.js',
video_id, 'Downloading loader info', query=params)
else:
# New API path
query = params.copy()
query['embedType'] = 'inline'
info_page = self._download_json(
'http://player.cnevids.com/embed-api.json', video_id,
'Downloading embed info', fatal=False, query=query)
# Old fallbacks
if not info_page:
if params.get('playerId'):
info_page = self._download_json(
'http://player.cnevids.com/player/video.js', video_id,
'Downloading video info', fatal=False, query=params)
if info_page:
video_info = info_page.get('video')
if not video_info:
info_page = self._download_webpage(
'http://player.cnevids.com/player/loader.js',
video_id, 'Downloading loader info', query=params)
if not video_info:
info_page = self._download_webpage(
'https://player.cnevids.com/inline/video/%s.js' % video_id,
video_id, 'Downloading inline info', query={
@@ -215,7 +225,7 @@ class CondeNastIE(InfoExtractor):
if url_type == 'series':
return self._extract_series(url, webpage)
else:
params = self._extract_video_params(webpage)
params = self._extract_video_params(webpage, display_id)
info = self._search_json_ld(
webpage, display_id, fatal=False)
info.update(self._extract_video(params))

View File

@@ -8,7 +8,16 @@ from ..utils import int_or_none
class CorusIE(ThePlatformFeedIE):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:globaltv|etcanada)\.com|(?:hgtv|foodnetwork|slice)\.ca)/(?:video/|(?:[^/]+/)+(?:videos/[a-z0-9-]+-|video\.html\?.*?\bv=))(?P<id>\d+)'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?P<domain>
(?:globaltv|etcanada)\.com|
(?:hgtv|foodnetwork|slice|history|showcase)\.ca
)
/(?:video/|(?:[^/]+/)+(?:videos/[a-z0-9-]+-|video\.html\?.*?\bv=))
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://www.hgtv.ca/shows/bryan-inc/videos/movie-night-popcorn-with-bryan-870923331648/',
'md5': '05dcbca777bf1e58c2acbb57168ad3a6',
@@ -27,6 +36,12 @@ class CorusIE(ThePlatformFeedIE):
}, {
'url': 'http://etcanada.com/video/873675331955/meet-the-survivor-game-changers-castaways-part-2/',
'only_matching': True,
}, {
'url': 'http://www.history.ca/the-world-without-canada/video/full-episodes/natural-resources/video.html?v=955054659646#video',
'only_matching': True,
}, {
'url': 'http://www.showcase.ca/eyewitness/video/eyewitness++106/video.html?v=955070531919&p=1&s=da#video',
'only_matching': True,
}]
_TP_FEEDS = {
@@ -50,6 +65,14 @@ class CorusIE(ThePlatformFeedIE):
'feed_id': '5tUJLgV2YNJ5',
'account_id': 2414427935,
},
'history': {
'feed_id': 'tQFx_TyyEq4J',
'account_id': 2369613659,
},
'showcase': {
'feed_id': '9H6qyshBZU3E',
'account_id': 2414426607,
},
}
def _real_extract(self, url):

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import (
parse_iso8601,
str_to_int,
@@ -41,11 +42,9 @@ class CrackedIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
youtube_url = self._search_regex(
r'<iframe[^>]+src="((?:https?:)?//www\.youtube\.com/embed/[^"]+)"',
webpage, 'youtube url', default=None)
youtube_url = YoutubeIE._extract_url(webpage)
if youtube_url:
return self.url_result(youtube_url, 'Youtube')
return self.url_result(youtube_url, ie=YoutubeIE.ie_key())
video_url = self._html_search_regex(
[r'var\s+CK_vidSrc\s*=\s*"([^"]+)"', r'<video\s+src="([^"]+)"'],

View File

@@ -510,7 +510,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
# webpage provide more accurate data than series_title from XML
series = self._html_search_regex(
r'id=["\']showmedia_about_episode_num[^>]+>\s*<a[^>]+>([^<]+)',
r'(?s)<h\d[^>]+\bid=["\']showmedia_about_episode_num[^>]+>(.+?)</h\d',
webpage, 'series', fatal=False)
season = xpath_text(metadata, 'series_title')
@@ -518,7 +518,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
season_number = int_or_none(self._search_regex(
r'(?s)<h4[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h4>\s*<h4>\s*Season (\d+)',
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
webpage, 'season number', default=None))
return {

View File

@@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
@@ -12,8 +14,8 @@ from ..utils import (
class DailyMailIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dailymail\.co\.uk/video/[^/]+/video-(?P<id>[0-9]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?dailymail\.co\.uk/(?:video/[^/]+/video-|embed/video/)(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.dailymail.co.uk/video/tvshowbiz/video-1295863/The-Mountain-appears-sparkling-water-ad-Heavy-Bubbles.html',
'md5': 'f6129624562251f628296c3a9ffde124',
'info_dict': {
@@ -22,7 +24,16 @@ class DailyMailIE(InfoExtractor):
'title': 'The Mountain appears in sparkling water ad for \'Heavy Bubbles\'',
'description': 'md5:a93d74b6da172dd5dc4d973e0b766a84',
}
}
}, {
'url': 'http://www.dailymail.co.uk/embed/video/1295863.html',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe\b[^>]+\bsrc=["\'](?P<url>(?:https?:)?//(?:www\.)?dailymail\.co\.uk/embed/video/\d+\.html)',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -147,7 +147,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
view_count_str = self._search_regex(
(r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([\s\d,.]+)"',
r'video_views_count[^>]+>\s+([\s\d\,.]+)'),
webpage, 'view count', fatal=False)
webpage, 'view count', default=None)
if view_count_str:
view_count_str = re.sub(r'\s', '', view_count_str)
view_count = str_to_int(view_count_str)
@@ -159,7 +159,9 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
[r'buildPlayer\(({.+?})\);\n', # See https://github.com/rg3/youtube-dl/issues/7826
r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
r'buildPlayer\(({.+?})\);',
r'var\s+config\s*=\s*({.+?});'],
r'var\s+config\s*=\s*({.+?});',
# New layout regex (see https://github.com/rg3/youtube-dl/issues/13580)
r'__PLAYER_CONFIG__\s*=\s*({.+?});'],
webpage, 'player v5', default=None)
if player_v5:
player = self._parse_json(player_v5, video_id)
@@ -323,7 +325,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = 'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>[^/?#&]+)'
_MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
_PAGE_TEMPLATE = 'https://www.dailymotion.com/playlist/%s/%s'
_TESTS = [{

View File

@@ -15,7 +15,7 @@ from ..utils import (
class DisneyIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|(?:starwars|marvelkids)\.com))/(?:(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})|(?:[^/]+/)?(?P<display_id>[^/?#]+))'''
https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr|channel\.de)|(?:starwars|marvelkids)\.com))/(?:(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})|(?:[^/]+/)?(?P<display_id>[^/?#]+))'''
_TESTS = [{
# Disney.EmbedVideo
'url': 'http://video.disney.com/watch/moana-trailer-545ed1857afee5a0ec239977',
@@ -68,6 +68,9 @@ class DisneyIE(InfoExtractor):
}, {
'url': 'http://disneyjunior.en.disneyme.com/dj/watch-my-friends-tigger-and-pooh-promo',
'only_matching': True,
}, {
'url': 'http://disneychannel.de/sehen/soy-luna-folge-118-5518518987ba27f3cc729268',
'only_matching': True,
}, {
'url': 'http://disneyjunior.disney.com/galactech-the-galactech-grab-galactech-an-admiral-rescue',
'only_matching': True,

View File

@@ -13,7 +13,7 @@ from ..utils import (
class DigitallySpeakingIE(InfoExtractor):
_VALID_URL = r'https?://(?:evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
_VALID_URL = r'https?://(?:s?evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
_TESTS = [{
# From http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface
@@ -28,6 +28,10 @@ class DigitallySpeakingIE(InfoExtractor):
# From http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC
'url': 'http://events.digitallyspeaking.com/gdc/sf11/xml/12396_1299111843500GMPX.xml',
'only_matching': True,
}, {
# From http://www.gdcvault.com/play/1013700/Advanced-Material
'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml',
'only_matching': True,
}]
def _parse_mp4(self, metadata):

View File

@@ -7,16 +7,18 @@ import time
from .common import InfoExtractor
from ..compat import (
compat_urlparse,
compat_HTTPError,
compat_str,
compat_urlparse,
)
from ..utils import (
USER_AGENTS,
ExtractorError,
int_or_none,
unified_strdate,
remove_end,
try_get,
unified_strdate,
update_url_query,
USER_AGENTS,
)
@@ -183,28 +185,44 @@ class DPlayItIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
info_url = self._search_regex(
r'url\s*:\s*["\']((?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)',
webpage, 'video id')
title = remove_end(self._og_search_title(webpage), ' | Dplay')
try:
info = self._download_json(
info_url, display_id, headers={
'Authorization': 'Bearer %s' % self._get_cookies(url).get(
'dplayit_token').value,
'Referer': url,
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 403):
info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
error = info['errors'][0]
if error.get('code') == 'access.denied.geoblocked':
self.raise_geo_restricted(
msg=error.get('detail'), countries=self._GEO_COUNTRIES)
raise ExtractorError(info['errors'][0]['detail'], expected=True)
raise
video_id = None
info = self._search_regex(
r'playback_json\s*:\s*JSON\.parse\s*\(\s*("(?:\\.|[^"\\])+?")',
webpage, 'playback JSON', default=None)
if info:
for _ in range(2):
info = self._parse_json(info, display_id, fatal=False)
if not info:
break
else:
video_id = try_get(info, lambda x: x['data']['id'])
if not info:
info_url = self._search_regex(
r'url\s*[:=]\s*["\']((?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)',
webpage, 'info url')
video_id = info_url.rpartition('/')[-1]
try:
info = self._download_json(
info_url, display_id, headers={
'Authorization': 'Bearer %s' % self._get_cookies(url).get(
'dplayit_token').value,
'Referer': url,
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 403):
info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
error = info['errors'][0]
if error.get('code') == 'access.denied.geoblocked':
self.raise_geo_restricted(
msg=error.get('detail'), countries=self._GEO_COUNTRIES)
raise ExtractorError(info['errors'][0]['detail'], expected=True)
raise
hls_url = info['data']['attributes']['streaming']['hls']['url']
@@ -230,7 +248,7 @@ class DPlayItIE(InfoExtractor):
season_number = episode_number = upload_date = None
return {
'id': info_url.rpartition('/')[-1],
'id': compat_str(video_id or display_id),
'display_id': display_id,
'title': title,
'description': self._og_search_description(webpage),

View File

@@ -12,6 +12,7 @@ from ..utils import (
ExtractorError,
clean_html,
int_or_none,
remove_end,
sanitized_Request,
urlencode_postdata
)
@@ -72,15 +73,15 @@ class DramaFeverIE(DramaFeverBaseIE):
'url': 'http://www.dramafever.com/drama/4512/1/Cooking_with_Shin/',
'info_dict': {
'id': '4512.1',
'ext': 'mp4',
'title': 'Cooking with Shin 4512.1',
'ext': 'flv',
'title': 'Cooking with Shin',
'description': 'md5:a8eec7942e1664a6896fcd5e1287bfd0',
'episode': 'Episode 1',
'episode_number': 1,
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1404336058,
'upload_date': '20140702',
'duration': 343,
'duration': 344,
},
'params': {
# m3u8 download
@@ -90,15 +91,15 @@ class DramaFeverIE(DramaFeverBaseIE):
'url': 'http://www.dramafever.com/drama/4826/4/Mnet_Asian_Music_Awards_2015/?ap=1',
'info_dict': {
'id': '4826.4',
'ext': 'mp4',
'title': 'Mnet Asian Music Awards 2015 4826.4',
'ext': 'flv',
'title': 'Mnet Asian Music Awards 2015',
'description': 'md5:3ff2ee8fedaef86e076791c909cf2e91',
'episode': 'Mnet Asian Music Awards 2015 - Part 3',
'episode_number': 4,
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1450213200,
'upload_date': '20151215',
'duration': 5602,
'duration': 5359,
},
'params': {
# m3u8 download
@@ -122,6 +123,10 @@ class DramaFeverIE(DramaFeverBaseIE):
countries=self._GEO_COUNTRIES)
raise
# title is postfixed with video id for some reason, removing
if info.get('title'):
info['title'] = remove_end(info['title'], video_id).strip()
series_id, episode_number = video_id.split('.')
episode_info = self._download_json(
# We only need a single episode info, so restricting page size to one episode

View File

@@ -1,135 +1,59 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
js_to_json,
parse_duration,
unescapeHTML,
)
class DRBonanzaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dr\.dk/bonanza/(?:[^/]+/)+(?:[^/])+?(?:assetId=(?P<id>\d+))?(?:[#&]|$)'
_TESTS = [{
'url': 'http://www.dr.dk/bonanza/serie/portraetter/Talkshowet.htm?assetId=65517',
_VALID_URL = r'https?://(?:www\.)?dr\.dk/bonanza/[^/]+/\d+/[^/]+/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
_TEST = {
'url': 'http://www.dr.dk/bonanza/serie/154/matador/40312/matador---0824-komme-fremmede-',
'info_dict': {
'id': '65517',
'id': '40312',
'display_id': 'matador---0824-komme-fremmede-',
'ext': 'mp4',
'title': 'Talkshowet - Leonard Cohen',
'description': 'md5:8f34194fb30cd8c8a30ad8b27b70c0ca',
'title': 'MATADOR - 08:24. "Komme fremmede".',
'description': 'md5:77b4c1ac4d4c1b9d610ab4395212ff84',
'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
'timestamp': 1295537932,
'upload_date': '20110120',
'duration': 3664,
'duration': 4613,
},
'params': {
'skip_download': True, # requires rtmp
},
}, {
'url': 'http://www.dr.dk/bonanza/radio/serie/sport/fodbold.htm?assetId=59410',
'md5': '6dfe039417e76795fb783c52da3de11d',
'info_dict': {
'id': '59410',
'ext': 'mp3',
'title': 'EM fodbold 1992 Danmark - Tyskland finale Transmission',
'description': 'md5:501e5a195749480552e214fbbed16c4e',
'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
'timestamp': 1223274900,
'upload_date': '20081006',
'duration': 7369,
},
}]
}
def _real_extract(self, url):
url_id = self._match_id(url)
webpage = self._download_webpage(url, url_id)
mobj = re.match(self._VALID_URL, url)
video_id, display_id = mobj.group('id', 'display_id')
if url_id:
info = json.loads(self._html_search_regex(r'({.*?%s.*})' % url_id, webpage, 'json'))
else:
# Just fetch the first video on that page
info = json.loads(self._html_search_regex(r'bonanzaFunctions.newPlaylist\(({.*})\)', webpage, 'json'))
webpage = self._download_webpage(url, display_id)
asset_id = str(info['AssetId'])
title = info['Title'].rstrip(' \'\"-,.:;!?')
duration = int_or_none(info.get('Duration'), scale=1000)
# First published online. "FirstPublished" contains the date for original airing.
timestamp = parse_iso8601(
re.sub(r'\.\d+$', '', info['Created']))
info = self._parse_html5_media_entries(
url, webpage, display_id, m3u8_id='hls',
m3u8_entry_protocol='m3u8_native')[0]
self._sort_formats(info['formats'])
def parse_filename_info(url):
match = re.search(r'/\d+_(?P<width>\d+)x(?P<height>\d+)x(?P<bitrate>\d+)K\.(?P<ext>\w+)$', url)
if match:
return {
'width': int(match.group('width')),
'height': int(match.group('height')),
'vbr': int(match.group('bitrate')),
'ext': match.group('ext')
}
match = re.search(r'/\d+_(?P<bitrate>\d+)K\.(?P<ext>\w+)$', url)
if match:
return {
'vbr': int(match.group('bitrate')),
'ext': match.group(2)
}
return {}
asset = self._parse_json(
self._search_regex(
r'(?s)currentAsset\s*=\s*({.+?})\s*</script', webpage, 'asset'),
display_id, transform_source=js_to_json)
video_types = ['VideoHigh', 'VideoMid', 'VideoLow']
preferencemap = {
'VideoHigh': -1,
'VideoMid': -2,
'VideoLow': -3,
'Audio': -4,
}
title = unescapeHTML(asset['AssetTitle']).strip()
formats = []
for file in info['Files']:
if info['Type'] == 'Video':
if file['Type'] in video_types:
format = parse_filename_info(file['Location'])
format.update({
'url': file['Location'],
'format_id': file['Type'].replace('Video', ''),
'preference': preferencemap.get(file['Type'], -10),
})
if format['url'].startswith('rtmp'):
rtmp_url = format['url']
format['rtmp_live'] = True # --resume does not work
if '/bonanza/' in rtmp_url:
format['play_path'] = rtmp_url.split('/bonanza/')[1]
formats.append(format)
elif file['Type'] == 'Thumb':
thumbnail = file['Location']
elif info['Type'] == 'Audio':
if file['Type'] == 'Audio':
format = parse_filename_info(file['Location'])
format.update({
'url': file['Location'],
'format_id': file['Type'],
'vcodec': 'none',
})
formats.append(format)
elif file['Type'] == 'Thumb':
thumbnail = file['Location']
def extract(field):
return self._search_regex(
r'<div[^>]+>\s*<p>%s:<p>\s*</div>\s*<div[^>]+>\s*<p>([^<]+)</p>' % field,
webpage, field, default=None)
description = '%s\n%s\n%s\n' % (
info['Description'], info['Actors'], info['Colophon'])
self._sort_formats(formats)
display_id = re.sub(r'[^\w\d-]', '', re.sub(r' ', '-', title.lower())) + '-' + asset_id
display_id = re.sub(r'-+', '-', display_id)
return {
'id': asset_id,
info.update({
'id': asset.get('AssetId') or video_id,
'display_id': display_id,
'title': title,
'formats': formats,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
}
'description': extract('Programinfo'),
'duration': parse_duration(extract('Tid')),
'thumbnail': asset.get('AssetImageUrl'),
})
return info

View File

@@ -44,8 +44,23 @@ class DrTuberIE(InfoExtractor):
webpage = self._download_webpage(
'http://www.drtuber.com/video/%s' % video_id, display_id)
video_url = self._html_search_regex(
r'<source src="([^"]+)"', webpage, 'video URL')
video_data = self._download_json(
'http://www.drtuber.com/player_config_json/', video_id, query={
'vid': video_id,
'embed': 0,
'aid': 0,
'domain_id': 0,
})
formats = []
for format_id, video_url in video_data['files'].items():
if video_url:
formats.append({
'format_id': format_id,
'quality': 2 if format_id == 'hq' else 1,
'url': video_url
})
self._sort_formats(formats)
title = self._html_search_regex(
(r'class="title_watch"[^>]*><(?:p|h\d+)[^>]*>([^<]+)<',
@@ -75,7 +90,7 @@ class DrTuberIE(InfoExtractor):
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'formats': formats,
'title': title,
'thumbnail': thumbnail,
'like_count': like_count,

View File

@@ -118,7 +118,7 @@ class DRTVIE(InfoExtractor):
if target == 'HDS':
f4m_formats = self._extract_f4m_formats(
uri + '?hdcore=3.3.0&plugin=aasp-3.3.0.99.43',
video_id, preference, f4m_id=format_id)
video_id, preference, f4m_id=format_id, fatal=False)
if kind == 'AudioResource':
for f in f4m_formats:
f['vcodec'] = 'none'
@@ -126,7 +126,8 @@ class DRTVIE(InfoExtractor):
elif target == 'HLS':
formats.extend(self._extract_m3u8_formats(
uri, video_id, 'mp4', entry_protocol='m3u8_native',
preference=preference, m3u8_id=format_id))
preference=preference, m3u8_id=format_id,
fatal=False))
else:
bitrate = link.get('Bitrate')
if bitrate:

View File

@@ -11,6 +11,7 @@ from ..compat import (
from ..utils import (
ExtractorError,
int_or_none,
unsmuggle_url,
)
@@ -50,6 +51,10 @@ class EaglePlatformIE(InfoExtractor):
'view_count': int,
},
'skip': 'Georestricted',
}, {
# referrer protected video (https://tvrain.ru/lite/teleshow/kak_vse_nachinalos/namin-418921/)
'url': 'eagleplatform:tvrainru.media.eagleplatform.com:582306',
'only_matching': True,
}]
@staticmethod
@@ -60,16 +65,40 @@ class EaglePlatformIE(InfoExtractor):
webpage)
if mobj is not None:
return mobj.group('url')
# Basic usage embedding (see http://dultonmedia.github.io/eplayer/)
PLAYER_JS_RE = r'''
<script[^>]+
src=(?P<qjs>["\'])(?:https?:)?//(?P<host>(?:(?!(?P=qjs)).)+\.media\.eagleplatform\.com)/player/player\.js(?P=qjs)
.+?
'''
# "Basic usage" embedding (see http://dultonmedia.github.io/eplayer/)
mobj = re.search(
r'''(?xs)
<script[^>]+
src=(?P<q1>["\'])(?:https?:)?//(?P<host>.+?\.media\.eagleplatform\.com)/player/player\.js(?P=q1)
.+?
%s
<div[^>]+
class=(?P<q2>["\'])eagleplayer(?P=q2)[^>]+
class=(?P<qclass>["\'])eagleplayer(?P=qclass)[^>]+
data-id=["\'](?P<id>\d+)
''', webpage)
''' % PLAYER_JS_RE, webpage)
if mobj is not None:
return 'eagleplatform:%(host)s:%(id)s' % mobj.groupdict()
# Generalization of "Javascript code usage", "Combined usage" and
# "Usage without attaching to DOM" embeddings (see
# http://dultonmedia.github.io/eplayer/)
mobj = re.search(
r'''(?xs)
%s
<script>
.+?
new\s+EaglePlayer\(
(?:[^,]+\s*,\s*)?
{
.+?
\bid\s*:\s*["\']?(?P<id>\d+)
.+?
}
\s*\)
.+?
</script>
''' % PLAYER_JS_RE, webpage)
if mobj is not None:
return 'eagleplatform:%(host)s:%(id)s' % mobj.groupdict()
@@ -79,9 +108,10 @@ class EaglePlatformIE(InfoExtractor):
if status != 200:
raise ExtractorError(' '.join(response['errors']), expected=True)
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata', *args, **kwargs):
def _download_json(self, url_or_request, video_id, *args, **kwargs):
try:
response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
response = super(EaglePlatformIE, self)._download_json(
url_or_request, video_id, *args, **kwargs)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError):
response = self._parse_json(ee.cause.read().decode('utf-8'), video_id)
@@ -93,11 +123,24 @@ class EaglePlatformIE(InfoExtractor):
return self._download_json(url_or_request, video_id, note)['data'][0]
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url)
host, video_id = mobj.group('custom_host') or mobj.group('host'), mobj.group('id')
headers = {}
query = {
'id': video_id,
}
referrer = smuggled_data.get('referrer')
if referrer:
headers['Referer'] = referrer
query['referrer'] = referrer
player_data = self._download_json(
'http://%s/api/player_data?id=%s' % (host, video_id), video_id)
'http://%s/api/player_data' % host, video_id,
headers=headers, query=query)
media = player_data['data']['playlist']['viewports'][0]['medialist'][0]

View File

@@ -1,15 +1,18 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
try_get,
unified_timestamp,
)
class EggheadCourseIE(InfoExtractor):
IE_DESC = 'egghead.io course'
IE_NAME = 'egghead:course'
_VALID_URL = r'https://egghead\.io/courses/(?P<id>[a-zA-Z_0-9-]+)'
_VALID_URL = r'https://egghead\.io/courses/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript',
'playlist_count': 29,
@@ -22,18 +25,60 @@ class EggheadCourseIE(InfoExtractor):
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
title = self._html_search_regex(r'<h1 class="title">([^<]+)</h1>', webpage, 'title')
ul = self._search_regex(r'(?s)<ul class="series-lessons-list">(.*?)</ul>', webpage, 'session list')
course = self._download_json(
'https://egghead.io/api/v1/series/%s' % playlist_id, playlist_id)
found = re.findall(r'(?s)<a class="[^"]*"\s*href="([^"]+)">\s*<li class="item', ul)
entries = [self.url_result(m) for m in found]
entries = [
self.url_result(
'wistia:%s' % lesson['wistia_id'], ie='Wistia',
video_id=lesson['wistia_id'], video_title=lesson.get('title'))
for lesson in course['lessons'] if lesson.get('wistia_id')]
return self.playlist_result(
entries, playlist_id, course.get('title'),
course.get('description'))
class EggheadLessonIE(InfoExtractor):
IE_DESC = 'egghead.io lesson'
IE_NAME = 'egghead:lesson'
_VALID_URL = r'https://egghead\.io/lessons/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'info_dict': {
'id': 'fv5yotjxcg',
'ext': 'mp4',
'title': 'Create linear data flow with container style types (Box)',
'description': 'md5:9aa2cdb6f9878ed4c39ec09e85a8150e',
'thumbnail': r're:^https?:.*\.jpg$',
'timestamp': 1481296768,
'upload_date': '20161209',
'duration': 304,
'view_count': 0,
'tags': ['javascript', 'free'],
},
'params': {
'skip_download': True,
},
}
def _real_extract(self, url):
lesson_id = self._match_id(url)
lesson = self._download_json(
'https://egghead.io/api/v1/lessons/%s' % lesson_id, lesson_id)
return {
'_type': 'playlist',
'id': playlist_id,
'title': title,
'description': self._og_search_description(webpage),
'entries': entries,
'_type': 'url_transparent',
'ie_key': 'Wistia',
'url': 'wistia:%s' % lesson['wistia_id'],
'id': lesson['wistia_id'],
'title': lesson.get('title'),
'description': lesson.get('summary'),
'thumbnail': lesson.get('thumb_nail'),
'timestamp': unified_timestamp(lesson.get('published_at')),
'duration': int_or_none(lesson.get('duration')),
'view_count': int_or_none(lesson.get('plays_count')),
'tags': try_get(lesson, lambda x: x['tag_list'], list),
}

View File

@@ -10,7 +10,25 @@ from ..utils import (
class ESPNIE(InfoExtractor):
_VALID_URL = r'https?://(?:espn\.go|(?:www\.)?espn)\.com/video/clip(?:\?.*?\bid=|/_/id/)(?P<id>\d+)'
_VALID_URL = r'''(?x)
https?://
(?:
(?:(?:\w+\.)+)?espn\.go|
(?:www\.)?espn
)\.com/
(?:
(?:
video/clip|
watch/player
)
(?:
\?.*?\bid=|
/_/id/
)
)
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://espn.go.com/video/clip?id=10365079',
'info_dict': {
@@ -25,20 +43,34 @@ class ESPNIE(InfoExtractor):
'skip_download': True,
},
}, {
# intl video, from http://www.espnfc.us/video/mls-highlights/150/video/2743663/must-see-moments-best-of-the-mls-season
'url': 'http://espn.go.com/video/clip?id=2743663',
'url': 'https://broadband.espn.go.com/video/clip?id=18910086',
'info_dict': {
'id': '2743663',
'id': '18910086',
'ext': 'mp4',
'title': 'Must-See Moments: Best of the MLS season',
'description': 'md5:4c2d7232beaea572632bec41004f0aeb',
'timestamp': 1449446454,
'upload_date': '20151207',
'title': 'Kyrie spins around defender for two',
'description': 'md5:2b0f5bae9616d26fba8808350f0d2b9b',
'timestamp': 1489539155,
'upload_date': '20170315',
},
'params': {
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}, {
'url': 'http://nonredline.sports.espn.go.com/video/clip?id=19744672',
'only_matching': True,
}, {
'url': 'https://cdn.espn.go.com/video/clip/_/id/19771774',
'only_matching': True,
}, {
'url': 'http://www.espn.com/watch/player?id=19141491',
'only_matching': True,
}, {
'url': 'http://www.espn.com/watch/player?bucketId=257&id=19505875',
'only_matching': True,
}, {
'url': 'http://www.espn.com/watch/player/_/id/19141491',
'only_matching': True,
}, {
'url': 'http://www.espn.com/video/clip?id=10365079',
'only_matching': True,

View File

@@ -39,12 +39,14 @@ from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE
from .amcnetworks import AMCNetworksIE
from .americastestkitchen import AmericasTestKitchenIE
from .animeondemand import AnimeOnDemandIE
from .anitube import AnitubeIE
from .anvato import AnvatoIE
from .anysex import AnySexIE
from .aol import AolIE
from .allocine import AllocineIE
from .aliexpress import AliExpressLiveIE
from .aparat import AparatIE
from .appleconnect import AppleConnectIE
from .appletrailers import (
@@ -71,6 +73,10 @@ from .arte import (
TheOperaPlatformIE,
ArteTVPlaylistIE,
)
from .asiancrush import (
AsianCrushIE,
AsianCrushPlaylistIE,
)
from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE
from .atvat import ATVAtIE
@@ -90,7 +96,7 @@ from .azmedien import (
)
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
from .bbc import (
BBCCoUkIE,
BBCCoUkArticleIE,
@@ -144,7 +150,10 @@ from .camdemy import (
from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .canvas import CanvasIE
from .canvas import (
CanvasIE,
CanvasEenIE,
)
from .carambatv import (
CarambaTVIE,
CarambaTVPageIE,
@@ -181,8 +190,9 @@ from .chirbit import (
ChirbitProfileIE,
)
from .cinchcast import CinchcastIE
from .clipfish import ClipfishIE
from .cjsw import CJSWIE
from .cliphunter import CliphunterIE
from .clippit import ClippitIE
from .cliprs import ClipRsIE
from .clipsyndicate import ClipsyndicateIE
from .closertotruth import CloserToTruthIE
@@ -293,7 +303,10 @@ from .dw import (
from .eagleplatform import EaglePlatformIE
from .ebaumsworld import EbaumsWorldIE
from .echomsk import EchoMskIE
from .egghead import EggheadCourseIE
from .egghead import (
EggheadCourseIE,
EggheadLessonIE,
)
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .einthusan import EinthusanIE
@@ -343,7 +356,12 @@ from .flipagram import FlipagramIE
from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE
from .formula1 import Formula1IE
from .fourtube import FourTubeIE
from .fourtube import (
FourTubeIE,
PornTubeIE,
PornerBrosIE,
FuxIE,
)
from .fox import FOXIE
from .fox9 import FOX9IE
from .foxgay import FoxgayIE
@@ -392,7 +410,6 @@ from .globo import (
from .go import GoIE
from .go90 import Go90IE
from .godtube import GodTubeIE
from .godtv import GodTVIE
from .golem import GolemIE
from .googledrive import GoogleDriveIE
from .googleplus import GooglePlusIE
@@ -466,8 +483,10 @@ from .jamendo import (
)
from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE
from .joj import JojIE
from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE
from .kakao import KakaoIE
from .kaltura import KalturaIE
from .kamcord import KamcordIE
from .kanalplay import KanalPlayIE
@@ -496,6 +515,7 @@ from .la7 import LA7IE
from .laola1tv import (
Laola1TvEmbedIE,
Laola1TvIE,
ITTFIE,
)
from .lci import LCIIE
from .lcp import (
@@ -523,7 +543,10 @@ from .limelight import (
LimelightChannelListIE,
)
from .litv import LiTVIE
from .liveleak import LiveLeakIE
from .liveleak import (
LiveLeakIE,
LiveLeakEmbedIE,
)
from .livestream import (
LivestreamIE,
LivestreamOriginalIE,
@@ -546,10 +569,12 @@ from .mangomolo import (
MangomoloVideoIE,
MangomoloLiveIE,
)
from .manyvids import ManyVidsIE
from .matchtv import MatchTVIE
from .mdr import MDRIE
from .mediaset import MediasetIE
from .medici import MediciIE
from .megaphone import MegaphoneIE
from .meipai import MeipaiIE
from .melonvod import MelonVODIE
from .meta import METAIE
@@ -576,7 +601,6 @@ from .mixcloud import (
)
from .mlb import MLBIE
from .mnet import MnetIE
from .mpora import MporaIE
from .moevideo import MoeVideoIE
from .mofosex import MofosexIE
from .mojvideo import MojvideoIE
@@ -637,7 +661,10 @@ from .neteasemusic import (
NetEaseMusicProgramIE,
NetEaseMusicDjRadioIE,
)
from .newgrounds import NewgroundsIE
from .newgrounds import (
NewgroundsIE,
NewgroundsPlaylistIE,
)
from .newstube import NewstubeIE
from .nextmedia import (
NextMediaIE,
@@ -645,6 +672,10 @@ from .nextmedia import (
AppleDailyIE,
NextTVIE,
)
from .nexx import (
NexxIE,
NexxEmbedIE,
)
from .nfb import NFBIE
from .nfl import NFLIE
from .nhk import NhkVodIE
@@ -658,6 +689,7 @@ from .nick import (
NickIE,
NickDeIE,
NickNightIE,
NickRuIE,
)
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninecninemedia import (
@@ -741,6 +773,7 @@ from .ora import OraTVIE
from .orf import (
ORFTVthekIE,
ORFFM4IE,
ORFFM4StoryIE,
ORFOE1IE,
ORFIPTVIE,
)
@@ -753,6 +786,7 @@ from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
from .pbs import PBSIE
from .pearvideo import PearVideoIE
from .people import PeopleIE
from .periscope import (
PeriscopeIE,
@@ -779,6 +813,7 @@ from .polskieradio import (
PolskieRadioIE,
PolskieRadioCategoryIE,
)
from .popcorntv import PopcornTVIE
from .porn91 import Porn91IE
from .porncom import PornComIE
from .pornflip import PornFlipIE
@@ -818,11 +853,16 @@ from .radiobremen import RadioBremenIE
from .radiofrance import RadioFranceIE
from .rai import (
RaiPlayIE,
RaiPlayLiveIE,
RaiIE,
)
from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redbulltv import RedBullTVIE
from .reddit import (
RedditIE,
RedditRIE,
)
from .redtube import RedTubeIE
from .regiotv import RegioTVIE
from .rentv import (
@@ -866,9 +906,11 @@ from .rutube import (
RutubeEmbedIE,
RutubeMovieIE,
RutubePersonIE,
RutubePlaylistIE,
)
from .rutv import RUTVIE
from .ruutu import RuutuIE
from .ruv import RuvIE
from .sandia import SandiaIE
from .safari import (
SafariIE,
@@ -915,8 +957,9 @@ from .soundcloud import (
SoundcloudIE,
SoundcloudSetIE,
SoundcloudUserIE,
SoundcloudTrackStationIE,
SoundcloudPlaylistIE,
SoundcloudSearchIE
SoundcloudSearchIE,
)
from .soundgasm import (
SoundgasmIE,
@@ -965,6 +1008,7 @@ from .tagesschau import (
TagesschauIE,
)
from .tass import TassIE
from .tastytrade import TastyTradeIE
from .tbs import TBSIE
from .tdslifeway import TDSLifewayIE
from .teachertube import (
@@ -973,7 +1017,6 @@ from .teachertube import (
)
from .teachingchannel import TeachingChannelIE
from .teamcoco import TeamcocoIE
from .teamfourstar import TeamFourStarIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .tele13 import Tele13IE
@@ -1195,12 +1238,14 @@ from .vk import (
)
from .vlive import (
VLiveIE,
VLiveChannelIE
VLiveChannelIE,
VLivePlaylistIE
)
from .vodlocker import VodlockerIE
from .vodpl import VODPlIE
from .vodplatform import VODPlatformIE
from .voicerepublic import VoiceRepublicIE
from .voot import VootIE
from .voxmedia import VoxMediaIE
from .vporn import VpornIE
from .vrt import VRTIE
@@ -1222,6 +1267,7 @@ from .washingtonpost import (
WashingtonPostArticleIE,
)
from .wat import WatIE
from .watchbox import WatchBoxIE
from .watchindianporn import WatchIndianPornIE
from .wdr import (
WDRIE,
@@ -1271,12 +1317,12 @@ from .yahoo import (
YahooIE,
YahooSearchIE,
)
from .yam import YamIE
from .yandexmusic import (
YandexMusicTrackIE,
YandexMusicAlbumIE,
YandexMusicPlaylistIE,
)
from .yandexdisk import YandexDiskIE
from .yesjapan import YesJapanIE
from .yinyuetai import YinYueTaiIE
from .ynet import YnetIE

View File

@@ -203,19 +203,19 @@ class FacebookIE(InfoExtractor):
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https://www\.facebook\.com/video/embed.+?)\1', webpage)
if mobj is not None:
return mobj.group('url')
def _extract_urls(webpage):
urls = []
for mobj in re.finditer(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://www\.facebook\.com/(?:video/embed|plugins/video\.php).+?)\1',
webpage):
urls.append(mobj.group('url'))
# Facebook API embed
# see https://developers.facebook.com/docs/plugins/embedded-video-player
mobj = re.search(r'''(?x)<div[^>]+
for mobj in re.finditer(r'''(?x)<div[^>]+
class=(?P<q1>[\'"])[^\'"]*\bfb-(?:video|post)\b[^\'"]*(?P=q1)[^>]+
data-href=(?P<q2>[\'"])(?P<url>(?:https?:)?//(?:www\.)?facebook.com/.+?)(?P=q2)''', webpage)
if mobj is not None:
return mobj.group('url')
data-href=(?P<q2>[\'"])(?P<url>(?:https?:)?//(?:www\.)?facebook.com/.+?)(?P=q2)''', webpage):
urls.append(mobj.group('url'))
return urls
def _login(self):
(useremail, password) = self._get_login_info()

View File

@@ -102,6 +102,8 @@ class FirstTVIE(InfoExtractor):
'format_id': f.get('name'),
'tbr': tbr,
'source_preference': quality(f.get('name')),
# quality metadata of http formats may be incorrect
'preference': -1,
})
# m3u8 URL format is reverse engineered from [1] (search for
# master.m3u8). dashEdges (that is currently balancer-vod.1tv.ru)

View File

@@ -43,7 +43,7 @@ class FiveTVIE(InfoExtractor):
'info_dict': {
'id': 'glavnoe',
'ext': 'mp4',
'title': 'Итоги недели с 8 по 14 июня 2015 года',
'title': r're:^Итоги недели с \d+ по \d+ \w+ \d{4} года$',
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
@@ -70,7 +70,8 @@ class FiveTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'<a[^>]+?href="([^"]+)"[^>]+?class="videoplayer"',
[r'<div[^>]+?class="flowplayer[^>]+?data-href="([^"]+)"',
r'<a[^>]+?href="([^"]+)"[^>]+?class="videoplayer"'],
webpage, 'video url')
title = self._og_search_title(webpage, default=None) or self._search_regex(

View File

@@ -1,7 +1,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode
from ..compat import (
compat_str,
compat_urllib_parse_urlencode,
)
from ..utils import (
ExtractorError,
int_or_none,
@@ -81,7 +84,7 @@ class FlickrIE(InfoExtractor):
formats = []
for stream in streams['stream']:
stream_type = str(stream.get('type'))
stream_type = compat_str(stream.get('type'))
formats.append({
'format_id': stream_type,
'url': stream['_content'],

View File

@@ -3,39 +3,22 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
parse_duration,
parse_iso8601,
sanitized_Request,
str_to_int,
)
class FourTubeIE(InfoExtractor):
IE_NAME = '4tube'
_VALID_URL = r'https?://(?:www\.)?4tube\.com/videos/(?P<id>\d+)'
_TEST = {
'url': 'http://www.4tube.com/videos/209733/hot-babe-holly-michaels-gets-her-ass-stuffed-by-black',
'md5': '6516c8ac63b03de06bc8eac14362db4f',
'info_dict': {
'id': '209733',
'ext': 'mp4',
'title': 'Hot Babe Holly Michaels gets her ass stuffed by black',
'uploader': 'WCP Club',
'uploader_id': 'wcp-club',
'upload_date': '20131031',
'timestamp': 1383263892,
'duration': 583,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
}
}
class FourTubeBaseIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
kind, video_id, display_id = mobj.group('kind', 'id', 'display_id')
if kind == 'm' or not display_id:
url = self._URL_TEMPLATE % video_id
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta('name', webpage)
@@ -43,10 +26,10 @@ class FourTubeIE(InfoExtractor):
'uploadDate', webpage))
thumbnail = self._html_search_meta('thumbnailUrl', webpage)
uploader_id = self._html_search_regex(
r'<a class="item-to-subscribe" href="[^"]+/channels/([^/"]+)" title="Go to [^"]+ page">',
r'<a class="item-to-subscribe" href="[^"]+/(?:channel|user)s?/([^/"]+)" title="Go to [^"]+ page">',
webpage, 'uploader id', fatal=False)
uploader = self._html_search_regex(
r'<a class="item-to-subscribe" href="[^"]+/channels/[^/"]+" title="Go to ([^"]+) page">',
r'<a class="item-to-subscribe" href="[^"]+/(?:channel|user)s?/[^/"]+" title="Go to ([^"]+) page">',
webpage, 'uploader', fatal=False)
categories_html = self._search_regex(
@@ -60,10 +43,10 @@ class FourTubeIE(InfoExtractor):
view_count = str_to_int(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([0-9,]+)">',
webpage, 'view count', fatal=False))
webpage, 'view count', default=None))
like_count = str_to_int(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserLikes:([0-9,]+)">',
webpage, 'like count', fatal=False))
webpage, 'like count', default=None))
duration = parse_duration(self._html_search_meta('duration', webpage))
media_id = self._search_regex(
@@ -85,14 +68,14 @@ class FourTubeIE(InfoExtractor):
media_id = params[0]
sources = ['%s' % p for p in params[2]]
token_url = 'http://tkn.4tube.com/{0}/desktop/{1}'.format(
token_url = 'https://tkn.kodicdn.com/{0}/desktop/{1}'.format(
media_id, '+'.join(sources))
headers = {
b'Content-Type': b'application/x-www-form-urlencoded',
b'Origin': b'http://www.4tube.com',
}
token_req = sanitized_Request(token_url, b'{}', headers)
tokens = self._download_json(token_req, video_id)
parsed_url = compat_urlparse.urlparse(url)
tokens = self._download_json(token_url, video_id, data=b'', headers={
'Origin': '%s://%s' % (parsed_url.scheme, parsed_url.hostname),
'Referer': url,
})
formats = [{
'url': tokens[format]['token'],
'format_id': format + 'p',
@@ -115,3 +98,126 @@ class FourTubeIE(InfoExtractor):
'duration': duration,
'age_limit': 18,
}
class FourTubeIE(FourTubeBaseIE):
IE_NAME = '4tube'
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?4tube\.com/(?:videos|embed)/(?P<id>\d+)(?:/(?P<display_id>[^/?#&]+))?'
_URL_TEMPLATE = 'https://www.4tube.com/videos/%s/video'
_TESTS = [{
'url': 'http://www.4tube.com/videos/209733/hot-babe-holly-michaels-gets-her-ass-stuffed-by-black',
'md5': '6516c8ac63b03de06bc8eac14362db4f',
'info_dict': {
'id': '209733',
'ext': 'mp4',
'title': 'Hot Babe Holly Michaels gets her ass stuffed by black',
'uploader': 'WCP Club',
'uploader_id': 'wcp-club',
'upload_date': '20131031',
'timestamp': 1383263892,
'duration': 583,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
},
}, {
'url': 'http://www.4tube.com/embed/209733',
'only_matching': True,
}, {
'url': 'http://m.4tube.com/videos/209733/hot-babe-holly-michaels-gets-her-ass-stuffed-by-black',
'only_matching': True,
}]
class FuxIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?fux\.com/(?:video|embed)/(?P<id>\d+)(?:/(?P<display_id>[^/?#&]+))?'
_URL_TEMPLATE = 'https://www.fux.com/video/%s/video'
_TESTS = [{
'url': 'https://www.fux.com/video/195359/awesome-fucking-kitchen-ends-cum-swallow',
'info_dict': {
'id': '195359',
'ext': 'mp4',
'title': 'Awesome fucking in the kitchen ends with cum swallow',
'uploader': 'alenci2342',
'uploader_id': 'alenci2342',
'upload_date': '20131230',
'timestamp': 1388361660,
'duration': 289,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.fux.com/embed/195359',
'only_matching': True,
}, {
'url': 'https://www.fux.com/video/195359/awesome-fucking-kitchen-ends-cum-swallow',
'only_matching': True,
}]
class PornTubeIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?porntube\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'
_URL_TEMPLATE = 'https://www.porntube.com/videos/video_%s'
_TESTS = [{
'url': 'https://www.porntube.com/videos/teen-couple-doing-anal_7089759',
'info_dict': {
'id': '7089759',
'ext': 'mp4',
'title': 'Teen couple doing anal',
'uploader': 'Alexy',
'uploader_id': 'Alexy',
'upload_date': '20150606',
'timestamp': 1433595647,
'duration': 5052,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.porntube.com/embed/7089759',
'only_matching': True,
}, {
'url': 'https://m.porntube.com/videos/teen-couple-doing-anal_7089759',
'only_matching': True,
}]
class PornerBrosIE(FourTubeBaseIE):
_VALID_URL = r'https?://(?:(?P<kind>www|m)\.)?pornerbros\.com/(?:videos/(?P<display_id>[^/]+)_|embed/)(?P<id>\d+)'
_URL_TEMPLATE = 'https://www.pornerbros.com/videos/video_%s'
_TESTS = [{
'url': 'https://www.pornerbros.com/videos/skinny-brunette-takes-big-cock-down-her-anal-hole_181369',
'md5': '6516c8ac63b03de06bc8eac14362db4f',
'info_dict': {
'id': '181369',
'ext': 'mp4',
'title': 'Skinny brunette takes big cock down her anal hole',
'uploader': 'PornerBros HD',
'uploader_id': 'pornerbros-hd',
'upload_date': '20130130',
'timestamp': 1359527401,
'duration': 1224,
'view_count': int,
'categories': list,
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.pornerbros.com/embed/181369',
'only_matching': True,
}, {
'url': 'https://m.pornerbros.com/videos/skinny-brunette-takes-big-cock-down-her-anal-hole_181369',
'only_matching': True,
}]

View File

@@ -3,56 +3,99 @@ from __future__ import unicode_literals
from .adobepass import AdobePassIE
from ..utils import (
smuggle_url,
update_url_query,
int_or_none,
parse_age_limit,
parse_duration,
try_get,
unified_timestamp,
)
class FOXIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.fox.com/watch/255180355939/7684182528',
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[\da-fA-F]+)'
_TESTS = [{
# clip
'url': 'https://www.fox.com/watch/4b765a60490325103ea69888fb2bd4e8/',
'md5': 'ebd296fcc41dd4b19f8115d8461a3165',
'info_dict': {
'id': '255180355939',
'id': '4b765a60490325103ea69888fb2bd4e8',
'ext': 'mp4',
'title': 'Official Trailer: Gotham',
'description': 'Tracing the rise of the great DC Comics Super-Villains and vigilantes, Gotham reveals an entirely new chapter that has never been told.',
'duration': 129,
'timestamp': 1400020798,
'upload_date': '20140513',
'uploader': 'NEWA-FNG-FOXCOM',
'title': 'Aftermath: Bruce Wayne Develops Into The Dark Knight',
'description': 'md5:549cd9c70d413adb32ce2a779b53b486',
'duration': 102,
'timestamp': 1504291893,
'upload_date': '20170901',
'creator': 'FOX',
'series': 'Gotham',
},
'add_ie': ['ThePlatform'],
}
'params': {
'skip_download': True,
},
}, {
# episode, geo-restricted
'url': 'https://www.fox.com/watch/087036ca7f33c8eb79b08152b4dd75c1/',
'only_matching': True,
}, {
# episode, geo-restricted, tv provided required
'url': 'https://www.fox.com/watch/30056b295fb57f7452aeeb4920bc3024/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), video_id)
fox_pdk_player = settings['fox_pdk_player']
release_url = fox_pdk_player['release_url']
query = {
'mbr': 'true',
'switch': 'http'
}
if fox_pdk_player.get('access') == 'locked':
ap_p = settings['foxAdobePassProvider']
rating = ap_p.get('videoRating')
if rating == 'n/a':
rating = None
resource = self._get_mvpd_resource('fbc-fox', None, ap_p['videoGUID'], rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, 'fbc-fox', resource)
video = self._download_json(
'https://api.fox.com/fbc-content/v1_4/video/%s' % video_id,
video_id, headers={
'apikey': 'abdcbed02c124d393b39e818a4312055',
'Content-Type': 'application/json',
'Referer': url,
})
info = self._search_json_ld(webpage, video_id, fatal=False)
info.update({
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
title = video['name']
m3u8_url = self._download_json(
video['videoRelease']['url'], video_id)['playURL']
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
self._sort_formats(formats)
description = video.get('description')
duration = int_or_none(video.get('durationInSeconds')) or int_or_none(
video.get('duration')) or parse_duration(video.get('duration'))
timestamp = unified_timestamp(video.get('datePublished'))
age_limit = parse_age_limit(video.get('contentRating'))
data = try_get(
video, lambda x: x['trackingData']['properties'], dict) or {}
creator = data.get('brand') or data.get('network') or video.get('network')
series = video.get('seriesName') or data.get(
'seriesName') or data.get('show')
season_number = int_or_none(video.get('seasonNumber'))
episode = video.get('name')
episode_number = int_or_none(video.get('episodeNumber'))
release_year = int_or_none(video.get('releaseYear'))
if data.get('authRequired'):
# TODO: AP
pass
return {
'id': video_id,
})
return info
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'age_limit': age_limit,
'creator': creator,
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
'release_year': release_year,
'formats': formats,
}

View File

@@ -5,6 +5,7 @@ import itertools
from .common import InfoExtractor
from ..utils import (
get_element_by_id,
int_or_none,
remove_end,
)
@@ -46,7 +47,7 @@ class FoxgayIE(InfoExtractor):
formats = [{
'url': source,
'height': resolution,
'height': int_or_none(resolution),
} for source, resolution in zip(
video_data['sources'], video_data.get('resolutions', itertools.repeat(None)))]

View File

@@ -112,7 +112,7 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
class FranceTVIE(FranceTVBaseInfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?france\.tv|mobile\.france\.tv)/(?:[^/]+/)+(?P<id>[^/]+)\.html'
_VALID_URL = r'https?://(?:(?:www\.)?france\.tv|mobile\.france\.tv)/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_TESTS = [{
'url': 'https://www.france.tv/france-2/13h15-le-dimanche/140921-les-mysteres-de-jesus.html',
@@ -157,6 +157,9 @@ class FranceTVIE(FranceTVBaseInfoExtractor):
}, {
'url': 'https://mobile.france.tv/france-5/c-dans-l-air/137347-emission-du-vendredi-12-mai-2017.html',
'only_matching': True,
}, {
'url': 'https://www.france.tv/142749-rouge-sang.html',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -1,10 +1,14 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
unified_timestamp,
)
class FunnyOrDieIE(InfoExtractor):
@@ -18,6 +22,10 @@ class FunnyOrDieIE(InfoExtractor):
'title': 'Heart-Shaped Box: Literal Video Version',
'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338',
'thumbnail': r're:^http:.*\.jpg$',
'uploader': 'DASjr',
'timestamp': 1317904928,
'upload_date': '20111006',
'duration': 318.3,
},
}, {
'url': 'http://www.funnyordie.com/embed/e402820827',
@@ -27,6 +35,8 @@ class FunnyOrDieIE(InfoExtractor):
'title': 'Please Use This Song (Jon Lajoie)',
'description': 'Please use this to sell something. www.jonlajoie.com',
'thumbnail': r're:^http:.*\.jpg$',
'timestamp': 1398988800,
'upload_date': '20140502',
},
'params': {
'skip_download': True,
@@ -100,15 +110,53 @@ class FunnyOrDieIE(InfoExtractor):
'url': 'http://www.funnyordie.com%s' % src,
}]
post_json = self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details')
post = json.loads(post_json)
timestamp = unified_timestamp(self._html_search_meta(
'uploadDate', webpage, 'timestamp', default=None))
uploader = self._html_search_regex(
r'<h\d[^>]+\bclass=["\']channel-preview-name[^>]+>(.+?)</h',
webpage, 'uploader', default=None)
title, description, thumbnail, duration = [None] * 4
medium = self._parse_json(
self._search_regex(
r'jsonMedium\s*=\s*({.+?});', webpage, 'JSON medium',
default='{}'),
video_id, fatal=False)
if medium:
title = medium.get('title')
duration = float_or_none(medium.get('duration'))
if not timestamp:
timestamp = unified_timestamp(medium.get('publishDate'))
post = self._parse_json(
self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details',
default='{}'),
video_id, fatal=False)
if post:
if not title:
title = post.get('name')
description = post.get('description')
thumbnail = post.get('picture')
if not title:
title = self._og_search_title(webpage)
if not description:
description = self._og_search_description(webpage)
if not duration:
duration = int_or_none(self._html_search_meta(
('video:duration', 'duration'), webpage, 'duration', default=False))
return {
'id': video_id,
'title': post['name'],
'description': post.get('description'),
'thumbnail': post.get('picture'),
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -10,6 +10,7 @@ from .common import InfoExtractor
from .youtube import YoutubeIE
from ..compat import (
compat_etree_fromstring,
compat_str,
compat_urllib_parse_unquote,
compat_urlparse,
compat_xml_parse_error,
@@ -21,6 +22,8 @@ from ..utils import (
HEADRequest,
is_html,
js_to_json,
KNOWN_EXTENSIONS,
mimetype2ext,
orderedSet,
sanitized_Request,
smuggle_url,
@@ -35,6 +38,10 @@ from .brightcove import (
BrightcoveLegacyIE,
BrightcoveNewIE,
)
from .nexx import (
NexxIE,
NexxEmbedIE,
)
from .nbc import NBCSportsVPlayerIE
from .ooyala import OoyalaIE
from .rutv import RUTVIE
@@ -56,6 +63,7 @@ from .dailymotion import (
DailymotionIE,
DailymotionCloudIE,
)
from .dailymail import DailyMailIE
from .onionstudios import OnionStudiosIE
from .viewlift import ViewLiftEmbedIE
from .mtv import MTVServicesEmbeddedIE
@@ -90,6 +98,9 @@ from .anvato import AnvatoIE
from .washingtonpost import WashingtonPostIE
from .wistia import WistiaIE
from .mediaset import MediasetIE
from .joj import JojIE
from .megaphone import MegaphoneIE
from .vzaar import VzaarIE
class GenericIE(InfoExtractor):
@@ -567,6 +578,19 @@ class GenericIE(InfoExtractor):
},
'skip': 'movie expired',
},
# ooyala video embedded with http://player.ooyala.com/static/v4/production/latest/core.min.js
{
'url': 'http://wnep.com/2017/07/22/steampunk-fest-comes-to-honesdale/',
'info_dict': {
'id': 'lwYWYxYzE6V5uJMjNGyKtwwiw9ZJD7t2',
'ext': 'mp4',
'title': 'Steampunk Fest Comes to Honesdale',
'duration': 43.276,
},
'params': {
'skip_download': True,
}
},
# embed.ly video
{
'url': 'http://www.tested.com/science/weird/460206-tested-grinding-coffee-2000-frames-second/',
@@ -758,6 +782,20 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['Dailymotion'],
},
# DailyMail embed
{
'url': 'http://www.bumm.sk/krimi/2017/07/05/biztonsagi-kamera-buktatta-le-az-agg-ferfit-utlegelo-apolot',
'info_dict': {
'id': '1495629',
'ext': 'mp4',
'title': 'Care worker punches elderly dementia patient in head 11 times',
'description': 'md5:3a743dee84e57e48ec68bf67113199a5',
},
'add_ie': ['DailyMail'],
'params': {
'skip_download': True,
},
},
# YouTube embed
{
'url': 'http://www.badzine.de/ansicht/datum/2014/06/09/so-funktioniert-die-neue-englische-badminton-liga.html',
@@ -1094,6 +1132,35 @@ class GenericIE(InfoExtractor):
'skip_download': True,
}
},
{
# Video.js embed, multiple formats
'url': 'http://ortcam.com/solidworks-урок-6-настройка-чертежа_33f9b7351.html',
'info_dict': {
'id': 'yygqldloqIk',
'ext': 'mp4',
'title': 'SolidWorks. Урок 6 Настройка чертежа',
'description': 'md5:baf95267792646afdbf030e4d06b2ab3',
'upload_date': '20130314',
'uploader': 'PROстое3D',
'uploader_id': 'PROstoe3D',
},
'params': {
'skip_download': True,
},
},
{
# Video.js embed, single format
'url': 'https://www.vooplayer.com/v3/watch/watch.php?v=NzgwNTg=',
'info_dict': {
'id': 'watch',
'ext': 'mp4',
'title': 'Step 1 - Good Foundation',
'description': 'md5:d1e7ff33a29fc3eb1673d6c270d344f4',
},
'params': {
'skip_download': True,
},
},
# rtl.nl embed
{
'url': 'http://www.rtlnieuws.nl/nieuws/buitenland/aanslagen-kopenhagen',
@@ -1184,7 +1251,7 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['Kaltura'],
},
# Eagle.Platform embed (generic URL)
# EaglePlatform embed (generic URL)
{
'url': 'http://lenta.ru/news/2015/03/06/navalny/',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
@@ -1198,8 +1265,26 @@ class GenericIE(InfoExtractor):
'view_count': int,
'age_limit': 0,
},
'params': {
'skip_download': True,
},
},
# ClipYou (Eagle.Platform) embed (custom URL)
# referrer protected EaglePlatform embed
{
'url': 'https://tvrain.ru/lite/teleshow/kak_vse_nachinalos/namin-418921/',
'info_dict': {
'id': '582306',
'ext': 'mp4',
'title': 'Стас Намин: «Мы нарушили девственность Кремля»',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 3382,
'view_count': int,
},
'params': {
'skip_download': True,
},
},
# ClipYou (EaglePlatform) embed (custom URL)
{
'url': 'http://muz-tv.ru/play/7129/',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
@@ -1211,6 +1296,9 @@ class GenericIE(InfoExtractor):
'duration': 216,
'view_count': int,
},
'params': {
'skip_download': True,
},
},
# Pladform embed
{
@@ -1462,14 +1550,27 @@ class GenericIE(InfoExtractor):
# LiveLeak embed
{
'url': 'http://www.wykop.pl/link/3088787/',
'md5': 'ace83b9ed19b21f68e1b50e844fdf95d',
'md5': '7619da8c820e835bef21a1efa2a0fc71',
'info_dict': {
'id': '874_1459135191',
'ext': 'mp4',
'title': 'Man shows poor quality of new apartment building',
'description': 'The wall is like a sand pile.',
'uploader': 'Lake8737',
}
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Another LiveLeak embed pattern (#13336)
{
'url': 'https://milo.yiannopoulos.net/2017/06/concealed-carry-robbery/',
'info_dict': {
'id': '2eb_1496309988',
'ext': 'mp4',
'title': 'Thief robs place where everyone was armed',
'description': 'md5:694d73ee79e535953cf2488562288eee',
'uploader': 'brazilwtf',
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Duplicated embedded video URLs
{
@@ -1511,6 +1612,22 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['BrightcoveLegacy'],
},
# Nexx embed
{
'url': 'https://www.funk.net/serien/5940e15073f6120001657956/items/593efbb173f6120001657503',
'info_dict': {
'id': '247746',
'ext': 'mp4',
'title': "Yesterday's Jam (OV)",
'description': 'md5:09bc0984723fed34e2581624a84e05f0',
'timestamp': 1492594816,
'upload_date': '20170419',
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
},
# Facebook <iframe> embed
{
'url': 'https://www.hostblogger.de/blog/archives/6181-Auto-jagt-Betonmischer.html',
@@ -1521,6 +1638,21 @@ class GenericIE(InfoExtractor):
'title': 'Facebook video #599637780109885',
},
},
# Facebook <iframe> embed, plugin video
{
'url': 'http://5pillarsuk.com/2017/06/07/tariq-ramadan-disagrees-with-pr-exercise-by-imams-refusing-funeral-prayers-for-london-attackers/',
'info_dict': {
'id': '1754168231264132',
'ext': 'mp4',
'title': 'About the Imams and Religious leaders refusing to perform funeral prayers for...',
'uploader': 'Tariq Ramadan (official)',
'timestamp': 1496758379,
'upload_date': '20170606',
},
'params': {
'skip_download': True,
},
},
# Facebook API embed
{
'url': 'http://www.lothype.com/blue-stars-2016-preview-standstill-full-show/',
@@ -1697,6 +1829,21 @@ class GenericIE(InfoExtractor):
},
'playlist_mincount': 5,
},
{
# Limelight embed (LimelightPlayerUtil.embed)
'url': 'https://tv5.ca/videos?v=xuu8qowr291ri',
'info_dict': {
'id': '95d035dc5c8a401588e9c0e6bd1e9c92',
'ext': 'mp4',
'title': '07448641',
'timestamp': 1499890639,
'upload_date': '20170712',
},
'params': {
'skip_download': True,
},
'add_ie': ['LimelightMedia'],
},
{
'url': 'http://kron4.com/2017/04/28/standoff-with-walnut-creek-murder-suspect-ends-with-arrest/',
'info_dict': {
@@ -1733,6 +1880,45 @@ class GenericIE(InfoExtractor):
},
'add_ie': [MediasetIE.ie_key()],
},
{
# JOJ.sk embeds
'url': 'https://www.noviny.sk/slovensko/238543-slovenskom-sa-prehnala-vlna-silnych-burok',
'info_dict': {
'id': '238543-slovenskom-sa-prehnala-vlna-silnych-burok',
'title': 'Slovenskom sa prehnala vlna silných búrok',
},
'playlist_mincount': 5,
'add_ie': [JojIE.ie_key()],
},
{
# AMP embed (see https://www.ampproject.org/docs/reference/components/amp-video)
'url': 'https://tvrain.ru/amp/418921/',
'md5': 'cc00413936695987e8de148b67d14f1d',
'info_dict': {
'id': '418921',
'ext': 'mp4',
'title': 'Стас Намин: «Мы нарушили девственность Кремля»',
},
},
{
# vzaar embed
'url': 'http://help.vzaar.com/article/165-embedding-video',
'md5': '7e3919d9d2620b89e3e00bec7fe8c9d4',
'info_dict': {
'id': '8707641',
'ext': 'mp4',
'title': 'Building A Business Online: Principal Chairs Q & A',
},
},
{
# multiple HTML5 videos on one page
'url': 'https://www.paragon-software.com/home/rk-free/keyscenarios.html',
'info_dict': {
'id': 'keyscenarios',
'title': 'Rescue Kit 14 Free Edition - Getting started',
},
'playlist_count': 4,
}
# {
# # TODO: find another test
# # http://schema.org/VideoObject
@@ -1882,7 +2068,7 @@ class GenericIE(InfoExtractor):
if head_response is not False:
# Check for redirect
new_url = head_response.geturl()
new_url = compat_str(head_response.geturl())
if url != new_url:
self.report_following_redirect(new_url)
if force_videoid:
@@ -1907,14 +2093,14 @@ class GenericIE(InfoExtractor):
content_type = head_response.headers.get('Content-Type', '').lower()
m = re.match(r'^(?P<type>audio|video|application(?=/(?:ogg$|(?:vnd\.apple\.|x-)?mpegurl)))/(?P<format_id>[^;\s]+)', content_type)
if m:
format_id = m.group('format_id')
format_id = compat_str(m.group('format_id'))
if format_id.endswith('mpegurl'):
formats = self._extract_m3u8_formats(url, video_id, 'mp4')
elif format_id == 'f4m':
formats = self._extract_f4m_formats(url, video_id)
else:
formats = [{
'format_id': m.group('format_id'),
'format_id': format_id,
'url': url,
'vcodec': 'none' if m.group('type') == 'audio' else None
}]
@@ -1983,7 +2169,7 @@ class GenericIE(InfoExtractor):
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats(
doc, video_id,
mpd_base_url=full_response.geturl().rpartition('/')[0],
mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0],
mpd_url=url)
self._sort_formats(info_dict['formats'])
return info_dict
@@ -2032,6 +2218,13 @@ class GenericIE(InfoExtractor):
video_description = self._og_search_description(webpage, default=None)
video_thumbnail = self._og_search_thumbnail(webpage, default=None)
info_dict.update({
'title': video_title,
'description': video_description,
'thumbnail': video_thumbnail,
'age_limit': age_limit,
})
# Look for Brightcove Legacy Studio embeds
bc_urls = BrightcoveLegacyIE._extract_brightcove_urls(webpage)
if bc_urls:
@@ -2053,6 +2246,16 @@ class GenericIE(InfoExtractor):
if bc_urls:
return self.playlist_from_matches(bc_urls, video_id, video_title, ie='BrightcoveNew')
# Look for Nexx embeds
nexx_urls = NexxIE._extract_urls(webpage)
if nexx_urls:
return self.playlist_from_matches(nexx_urls, video_id, video_title, ie=NexxIE.ie_key())
# Look for Nexx iFrame embeds
nexx_embed_urls = NexxEmbedIE._extract_urls(webpage)
if nexx_embed_urls:
return self.playlist_from_matches(nexx_embed_urls, video_id, video_title, ie=NexxEmbedIE.ie_key())
# Look for ThePlatform embeds
tp_urls = ThePlatformIE._extract_urls(webpage)
if tp_urls:
@@ -2080,36 +2283,11 @@ class GenericIE(InfoExtractor):
if vid_me_embed_url is not None:
return self.url_result(vid_me_embed_url, 'Vidme')
# Look for embedded YouTube player
matches = re.findall(r'''(?x)
(?:
<iframe[^>]+?src=|
data-video-url=|
<embed[^>]+?src=|
embedSWF\(?:\s*|
<object[^>]+data=|
new\s+SWFObject\(
)
(["\'])
(?P<url>(?:https?:)?//(?:www\.)?youtube(?:-nocookie)?\.com/
(?:embed|v|p)/.+?)
\1''', webpage)
if matches:
# Look for YouTube embeds
youtube_urls = YoutubeIE._extract_urls(webpage)
if youtube_urls:
return self.playlist_from_matches(
matches, video_id, video_title, lambda m: unescapeHTML(m[1]))
# Look for lazyYT YouTube embed
matches = re.findall(
r'class="lazyYT" data-youtube-id="([^"]+)"', webpage)
if matches:
return self.playlist_from_matches(matches, video_id, video_title, lambda m: unescapeHTML(m))
# Look for Wordpress "YouTube Video Importer" plugin
matches = re.findall(r'''(?x)<div[^>]+
class=(?P<q1>[\'"])[^\'"]*\byvii_single_video_player\b[^\'"]*(?P=q1)[^>]+
data-video_id=(?P<q2>[\'"])([^\'"]+)(?P=q2)''', webpage)
if matches:
return self.playlist_from_matches(matches, video_id, video_title, lambda m: m[-1])
youtube_urls, video_id, video_title, ie=YoutubeIE.ie_key())
matches = DailymotionIE._extract_urls(webpage)
if matches:
@@ -2125,6 +2303,12 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
playlists, video_id, video_title, lambda p: '//dailymotion.com/playlist/%s' % p)
# Look for DailyMail embeds
dailymail_urls = DailyMailIE._extract_urls(webpage)
if dailymail_urls:
return self.playlist_from_matches(
dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
# Look for embedded Wistia player
wistia_url = WistiaIE._extract_url(webpage)
if wistia_url:
@@ -2176,6 +2360,7 @@ class GenericIE(InfoExtractor):
# Look for Ooyala videos
mobj = (re.search(r'player\.ooyala\.com/[^"?]+[?#][^"]*?(?:embedCode|ec)=(?P<ec>[^"&]+)', webpage) or
re.search(r'OO\.Player\.create\([\'"].*?[\'"],\s*[\'"](?P<ec>.{32})[\'"]', webpage) or
re.search(r'OO\.Player\.create\.apply\(\s*OO\.Player\s*,\s*op\(\s*\[\s*[\'"][^\'"]*[\'"]\s*,\s*[\'"](?P<ec>.{32})[\'"]', webpage) or
re.search(r'SBN\.VideoLinkset\.ooyala\([\'"](?P<ec>.{32})[\'"]\)', webpage) or
re.search(r'data-ooyala-video-id\s*=\s*[\'"](?P<ec>.{32})[\'"]', webpage))
if mobj is not None:
@@ -2221,9 +2406,9 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'))
# Look for embedded Facebook player
facebook_url = FacebookIE._extract_url(webpage)
if facebook_url is not None:
return self.url_result(facebook_url, 'Facebook')
facebook_urls = FacebookIE._extract_urls(webpage)
if facebook_urls:
return self.playlist_from_matches(facebook_urls, video_id, video_title)
# Look for embedded VK player
mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk\.com/video_ext\.php.+?)\1', webpage)
@@ -2420,12 +2605,12 @@ class GenericIE(InfoExtractor):
if kaltura_url:
return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())
# Look for Eagle.Platform embeds
# Look for EaglePlatform embeds
eagleplatform_url = EaglePlatformIE._extract_url(webpage)
if eagleplatform_url:
return self.url_result(eagleplatform_url, EaglePlatformIE.ie_key())
return self.url_result(smuggle_url(eagleplatform_url, {'referrer': url}), EaglePlatformIE.ie_key())
# Look for ClipYou (uses Eagle.Platform) embeds
# Look for ClipYou (uses EaglePlatform) embeds
mobj = re.search(
r'<iframe[^>]+src="https?://(?P<host>media\.clipyou\.ru)/index/player\?.*\brecord_id=(?P<id>\d+).*"', webpage)
if mobj is not None:
@@ -2600,9 +2785,9 @@ class GenericIE(InfoExtractor):
self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
# Look for LiveLeak embeds
liveleak_url = LiveLeakIE._extract_url(webpage)
if liveleak_url:
return self.url_result(liveleak_url, 'LiveLeak')
liveleak_urls = LiveLeakIE._extract_urls(webpage)
if liveleak_urls:
return self.playlist_from_matches(liveleak_urls, video_id, video_title)
# Look for 3Q SDN embeds
threeqsdn_url = ThreeQSDNIE._extract_url(webpage)
@@ -2654,7 +2839,7 @@ class GenericIE(InfoExtractor):
rutube_urls = RutubeIE._extract_urls(webpage)
if rutube_urls:
return self.playlist_from_matches(
rutube_urls, ie=RutubeIE.ie_key())
rutube_urls, video_id, video_title, ie=RutubeIE.ie_key())
# Look for WashingtonPost embeds
wapo_urls = WashingtonPostIE._extract_urls(webpage)
@@ -2668,38 +2853,109 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
mediaset_urls, video_id, video_title, ie=MediasetIE.ie_key())
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, default={}, expected_type='VideoObject')
if json_ld.get('url'):
info_dict.update({
'title': video_title or info_dict['title'],
'description': video_description,
'thumbnail': video_thumbnail,
'age_limit': age_limit
})
info_dict.update(json_ld)
return info_dict
# Look for JOJ.sk embeds
joj_urls = JojIE._extract_urls(webpage)
if joj_urls:
return self.playlist_from_matches(
joj_urls, video_id, video_title, ie=JojIE.ie_key())
# Look for megaphone.fm embeds
mpfn_urls = MegaphoneIE._extract_urls(webpage)
if mpfn_urls:
return self.playlist_from_matches(
mpfn_urls, video_id, video_title, ie=MegaphoneIE.ie_key())
# Look for vzaar embeds
vzaar_urls = VzaarIE._extract_urls(webpage)
if vzaar_urls:
return self.playlist_from_matches(
vzaar_urls, video_id, video_title, ie=VzaarIE.ie_key())
def merge_dicts(dict1, dict2):
merged = {}
for k, v in dict1.items():
if v is not None:
merged[k] = v
for k, v in dict2.items():
if v is None:
continue
if (k not in merged or
(isinstance(v, compat_str) and v and
isinstance(merged[k], compat_str) and
not merged[k])):
merged[k] = v
return merged
# Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries:
for entry in entries:
entry.update({
if len(entries) == 1:
entries[0].update({
'id': video_id,
'title': video_title,
})
else:
for num, entry in enumerate(entries, start=1):
entry.update({
'id': '%s-%s' % (video_id, num),
'title': '%s (%d)' % (video_title, num),
})
for entry in entries:
self._sort_formats(entry['formats'])
return self.playlist_result(entries)
return self.playlist_result(entries, video_id, video_title)
jwplayer_data = self._find_jwplayer_data(
webpage, video_id, transform_source=js_to_json)
if jwplayer_data:
info = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False, base_url=url)
if not info.get('title'):
info['title'] = video_title
return info
return merge_dicts(info, info_dict)
# Video.js embed
mobj = re.search(
r'(?s)\bvideojs\s*\(.+?\.src\s*\(\s*((?:\[.+?\]|{.+?}))\s*\)\s*;',
webpage)
if mobj is not None:
sources = self._parse_json(
mobj.group(1), video_id, transform_source=js_to_json,
fatal=False) or []
if not isinstance(sources, list):
sources = [sources]
formats = []
for source in sources:
src = source.get('src')
if not src or not isinstance(src, compat_str):
continue
src = compat_urlparse.urljoin(url, src)
src_type = source.get('type')
if isinstance(src_type, compat_str):
src_type = src_type.lower()
ext = determine_ext(src).lower()
if src_type == 'video/youtube':
return self.url_result(src, YoutubeIE.ie_key())
if src_type == 'application/dash+xml' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
src, video_id, mpd_id='dash', fatal=False))
elif src_type == 'application/x-mpegurl' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': src,
'ext': (mimetype2ext(src_type) or
ext if ext in KNOWN_EXTENSIONS else 'mp4'),
})
if formats:
self._sort_formats(formats)
info_dict['formats'] = formats
return info_dict
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, default={}, expected_type='VideoObject')
if json_ld.get('url'):
return merge_dicts(json_ld, info_dict)
def check_video(vurl):
if YoutubeIE.suitable(vurl):
@@ -2788,7 +3044,7 @@ class GenericIE(InfoExtractor):
# be supported by youtube-dl thus this is checked the very last (see
# https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser)
embed_url = self._html_search_meta('twitter:player', webpage, default=None)
if embed_url:
if embed_url and embed_url != url:
return self.url_result(embed_url)
if not found:

View File

@@ -11,7 +11,7 @@ from ..utils import (
class GfycatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/)?(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/|gifs/detail/)?(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
'info_dict': {
@@ -44,6 +44,9 @@ class GfycatIE(InfoExtractor):
'categories': list,
'age_limit': 0,
}
}, {
'url': 'https://gfycat.com/gifs/detail/UnconsciousLankyIvorygull',
'only_matching': True
}]
def _real_extract(self, url):
@@ -82,7 +85,7 @@ class GfycatIE(InfoExtractor):
video_url = gfy.get('%sUrl' % format_id)
if not video_url:
continue
filesize = gfy.get('%sSize' % format_id)
filesize = int_or_none(gfy.get('%sSize' % format_id))
formats.append({
'url': video_url,
'format_id': format_id,

View File

@@ -5,9 +5,10 @@ import json
from .common import InfoExtractor
from ..utils import (
unescapeHTML,
qualities,
determine_ext,
int_or_none,
qualities,
unescapeHTML,
)
@@ -15,7 +16,7 @@ class GiantBombIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?giantbomb\.com/videos/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
_TEST = {
'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/',
'md5': '57badeface303ecf6b98b812de1b9018',
'md5': 'c8ea694254a59246a42831155dec57ac',
'info_dict': {
'id': '2300-9782',
'display_id': 'quick-look-destiny-the-dark-below',
@@ -51,11 +52,16 @@ class GiantBombIE(InfoExtractor):
for format_id, video_url in video['videoStreams'].items():
if format_id == 'f4m_stream':
continue
if video_url.endswith('.f4m'):
ext = determine_ext(video_url)
if ext == 'f4m':
f4m_formats = self._extract_f4m_formats(video_url + '?hdcore=3.3.1', display_id)
if f4m_formats:
f4m_formats[0]['quality'] = quality(format_id)
formats.extend(f4m_formats)
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, display_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': video_url,

View File

@@ -1,66 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..utils import js_to_json
class GodTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?god\.tv(?:/[^/]+)*/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://god.tv/jesus-image/video/jesus-conference-2016/randy-needham',
'info_dict': {
'id': 'lpd3g2MzE6D1g8zFAKz8AGpxWcpu6o_3',
'ext': 'mp4',
'title': 'Randy Needham',
'duration': 3615.08,
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://god.tv/playlist/bible-study',
'info_dict': {
'id': 'bible-study',
},
'playlist_mincount': 37,
}, {
'url': 'http://god.tv/node/15097',
'only_matching': True,
}, {
'url': 'http://god.tv/live/africa',
'only_matching': True,
}, {
'url': 'http://god.tv/liveevents',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
settings = self._parse_json(
self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'settings', default='{}'),
display_id, transform_source=js_to_json, fatal=False)
ooyala_id = None
if settings:
playlist = settings.get('playlist')
if playlist and isinstance(playlist, list):
entries = [
OoyalaIE._build_url_result(video['content_id'])
for video in playlist if video.get('content_id')]
if entries:
return self.playlist_result(entries, display_id)
ooyala_id = settings.get('ooyala', {}).get('content_id')
if not ooyala_id:
ooyala_id = self._search_regex(
r'["\']content_id["\']\s*:\s*(["\'])(?P<id>[\w-]+)\1',
webpage, 'ooyala id', group='id')
return OoyalaIE._build_url_result(ooyala_id)

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
@@ -46,7 +47,7 @@ class GolemIE(InfoExtractor):
continue
formats.append({
'format_id': e.tag,
'format_id': compat_str(e.tag),
'url': compat_urlparse.urljoin(self._PREFIX, url),
'height': self._int(e.get('height'), 'height'),
'width': self._int(e.get('width'), 'width'),

View File

@@ -4,26 +4,61 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
lowercase_escape,
update_url_query,
)
class GoogleDriveIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:docs|drive)\.google\.com/(?:uc\?.*?id=|file/d/)|video\.google\.com/get_player\?.*?docid=)(?P<id>[a-zA-Z0-9_-]{28,})'
_VALID_URL = r'''(?x)
https?://
(?:
(?:docs|drive)\.google\.com/
(?:
(?:uc|open)\?.*?id=|
file/d/
)|
video\.google\.com/get_player\?.*?docid=
)
(?P<id>[a-zA-Z0-9_-]{28,})
'''
_TESTS = [{
'url': 'https://drive.google.com/file/d/0ByeS4oOUV-49Zzh4R1J6R09zazQ/edit?pli=1',
'md5': 'd109872761f7e7ecf353fa108c0dbe1e',
'md5': '5c602afbbf2c1db91831f5d82f678554',
'info_dict': {
'id': '0ByeS4oOUV-49Zzh4R1J6R09zazQ',
'ext': 'mp4',
'title': 'Big Buck Bunny.mp4',
'duration': 45,
}
}, {
# video can't be watched anonymously due to view count limit reached,
# but can be downloaded (see https://github.com/rg3/youtube-dl/issues/14046)
'url': 'https://drive.google.com/file/d/0B-vUyvmDLdWDcEt4WjBqcmI2XzQ/view',
'md5': 'bfbd670d03a470bb1e6d4a257adec12e',
'info_dict': {
'id': '0B-vUyvmDLdWDcEt4WjBqcmI2XzQ',
'ext': 'mp4',
'title': 'Annabelle Creation (2017)- Z.V1 [TH].MP4',
}
}, {
# video id is longer than 28 characters
'url': 'https://drive.google.com/file/d/1ENcQ_jeCuj7y19s66_Ou9dRP4GKGsodiDQ/edit',
'info_dict': {
'id': '1ENcQ_jeCuj7y19s66_Ou9dRP4GKGsodiDQ',
'ext': 'mp4',
'title': 'Andreea Banica feat Smiley - Hooky Song (Official Video).mp4',
'duration': 189,
},
'only_matching': True,
}, {
'url': 'https://drive.google.com/open?id=0B2fjwgkl1A_CX083Tkowdmt6d28',
'only_matching': True,
}, {
'url': 'https://drive.google.com/uc?id=0B2fjwgkl1A_CX083Tkowdmt6d28',
'only_matching': True,
}]
_FORMATS_EXT = {
@@ -44,6 +79,13 @@ class GoogleDriveIE(InfoExtractor):
'46': 'webm',
'59': 'mp4',
}
_BASE_URL_CAPTIONS = 'https://drive.google.com/timedtext'
_CAPTIONS_ENTRY_TAG = {
'subtitles': 'track',
'automatic_captions': 'target',
}
_caption_formats_ext = []
_captions_xml = None
@staticmethod
def _extract_url(webpage):
@@ -53,41 +95,183 @@ class GoogleDriveIE(InfoExtractor):
if mobj:
return 'https://drive.google.com/file/d/%s' % mobj.group('id')
def _download_subtitles_xml(self, video_id, subtitles_id, hl):
if self._captions_xml:
return
self._captions_xml = self._download_xml(
self._BASE_URL_CAPTIONS, video_id, query={
'id': video_id,
'vid': subtitles_id,
'hl': hl,
'v': video_id,
'type': 'list',
'tlangs': '1',
'fmts': '1',
'vssids': '1',
}, note='Downloading subtitles XML',
errnote='Unable to download subtitles XML', fatal=False)
if self._captions_xml:
for f in self._captions_xml.findall('format'):
if f.attrib.get('fmt_code') and not f.attrib.get('default'):
self._caption_formats_ext.append(f.attrib['fmt_code'])
def _get_captions_by_type(self, video_id, subtitles_id, caption_type,
origin_lang_code=None):
if not subtitles_id or not caption_type:
return
captions = {}
for caption_entry in self._captions_xml.findall(
self._CAPTIONS_ENTRY_TAG[caption_type]):
caption_lang_code = caption_entry.attrib.get('lang_code')
if not caption_lang_code:
continue
caption_format_data = []
for caption_format in self._caption_formats_ext:
query = {
'vid': subtitles_id,
'v': video_id,
'fmt': caption_format,
'lang': (caption_lang_code if origin_lang_code is None
else origin_lang_code),
'type': 'track',
'name': '',
'kind': '',
}
if origin_lang_code is not None:
query.update({'tlang': caption_lang_code})
caption_format_data.append({
'url': update_url_query(self._BASE_URL_CAPTIONS, query),
'ext': caption_format,
})
captions[caption_lang_code] = caption_format_data
return captions
def _get_subtitles(self, video_id, subtitles_id, hl):
if not subtitles_id or not hl:
return
self._download_subtitles_xml(video_id, subtitles_id, hl)
if not self._captions_xml:
return
return self._get_captions_by_type(video_id, subtitles_id, 'subtitles')
def _get_automatic_captions(self, video_id, subtitles_id, hl):
if not subtitles_id or not hl:
return
self._download_subtitles_xml(video_id, subtitles_id, hl)
if not self._captions_xml:
return
track = self._captions_xml.find('track')
if track is None:
return
origin_lang_code = track.attrib.get('lang_code')
if not origin_lang_code:
return
return self._get_captions_by_type(
video_id, subtitles_id, 'automatic_captions', origin_lang_code)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://docs.google.com/file/d/%s' % video_id, video_id)
reason = self._search_regex(r'"reason"\s*,\s*"([^"]+)', webpage, 'reason', default=None)
if reason:
raise ExtractorError(reason)
title = self._search_regex(r'"title"\s*,\s*"([^"]+)', webpage, 'title')
title = self._search_regex(
r'"title"\s*,\s*"([^"]+)', webpage, 'title',
default=None) or self._og_search_title(webpage)
duration = int_or_none(self._search_regex(
r'"length_seconds"\s*,\s*"([^"]+)', webpage, 'length seconds', default=None))
fmt_stream_map = self._search_regex(
r'"fmt_stream_map"\s*,\s*"([^"]+)', webpage, 'fmt stream map').split(',')
fmt_list = self._search_regex(r'"fmt_list"\s*,\s*"([^"]+)', webpage, 'fmt_list').split(',')
r'"length_seconds"\s*,\s*"([^"]+)', webpage, 'length seconds',
default=None))
formats = []
for fmt, fmt_stream in zip(fmt_list, fmt_stream_map):
fmt_id, fmt_url = fmt_stream.split('|')
resolution = fmt.split('/')[1]
width, height = resolution.split('x')
formats.append({
'url': lowercase_escape(fmt_url),
'format_id': fmt_id,
'resolution': resolution,
'width': int_or_none(width),
'height': int_or_none(height),
'ext': self._FORMATS_EXT[fmt_id],
fmt_stream_map = self._search_regex(
r'"fmt_stream_map"\s*,\s*"([^"]+)', webpage,
'fmt stream map', default='').split(',')
fmt_list = self._search_regex(
r'"fmt_list"\s*,\s*"([^"]+)', webpage,
'fmt_list', default='').split(',')
if fmt_stream_map and fmt_list:
resolutions = {}
for fmt in fmt_list:
mobj = re.search(
r'^(?P<format_id>\d+)/(?P<width>\d+)[xX](?P<height>\d+)', fmt)
if mobj:
resolutions[mobj.group('format_id')] = (
int(mobj.group('width')), int(mobj.group('height')))
for fmt_stream in fmt_stream_map:
fmt_stream_split = fmt_stream.split('|')
if len(fmt_stream_split) < 2:
continue
format_id, format_url = fmt_stream_split[:2]
f = {
'url': lowercase_escape(format_url),
'format_id': format_id,
'ext': self._FORMATS_EXT[format_id],
}
resolution = resolutions.get(format_id)
if resolution:
f.update({
'width': resolution[0],
'height': resolution[1],
})
formats.append(f)
source_url = update_url_query(
'https://drive.google.com/uc', {
'id': video_id,
'export': 'download',
})
urlh = self._request_webpage(
source_url, video_id, note='Requesting source file',
errnote='Unable to request source file', fatal=False)
if urlh:
def add_source_format(src_url):
formats.append({
'url': src_url,
'ext': determine_ext(title, 'mp4').lower(),
'format_id': 'source',
'quality': 1,
})
if urlh.headers.get('Content-Disposition'):
add_source_format(source_url)
else:
confirmation_webpage = self._webpage_read_content(
urlh, url, video_id, note='Downloading confirmation page',
errnote='Unable to confirm download', fatal=False)
if confirmation_webpage:
confirm = self._search_regex(
r'confirm=([^&"\']+)', confirmation_webpage,
'confirmation code', fatal=False)
if confirm:
add_source_format(update_url_query(source_url, {
'confirm': confirm,
}))
if not formats:
reason = self._search_regex(
r'"reason"\s*,\s*"([^"]+)', webpage, 'reason', default=None)
if reason:
raise ExtractorError(reason, expected=True)
self._sort_formats(formats)
hl = self._search_regex(
r'"hl"\s*,\s*"([^"]+)', webpage, 'hl', default=None)
subtitles_id = None
ttsurl = self._search_regex(
r'"ttsurl"\s*,\s*"([^"]+)', webpage, 'ttsurl', default=None)
if ttsurl:
# the video Id for subtitles will be the last value in the ttsurl
# query string
subtitles_id = ttsurl.encode('utf-8').decode(
'unicode_escape').split('=')[-1]
return {
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'formats': formats,
'subtitles': self.extract_subtitles(video_id, subtitles_id, hl),
'automatic_captions': self.extract_automatic_captions(
video_id, subtitles_id, hl),
}

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import (
determine_ext,
int_or_none,
@@ -25,6 +26,22 @@ class HeiseIE(InfoExtractor):
'description': 'md5:c934cbfb326c669c2bcabcbe3d3fcd20',
'thumbnail': r're:^https?://.*/gallery/$',
}
}, {
# YouTube embed
'url': 'http://www.heise.de/newsticker/meldung/Netflix-In-20-Jahren-vom-Videoverleih-zum-TV-Revolutionaer-3814130.html',
'md5': 'e403d2b43fea8e405e88e3f8623909f1',
'info_dict': {
'id': '6kmWbXleKW4',
'ext': 'mp4',
'title': 'NEU IM SEPTEMBER | Netflix',
'description': 'md5:2131f3c7525e540d5fd841de938bd452',
'upload_date': '20170830',
'uploader': 'Netflix Deutschland, Österreich und Schweiz',
'uploader_id': 'netflixdach',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.heise.de/ct/artikel/c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2403911.html',
'only_matching': True,
@@ -40,6 +57,16 @@ class HeiseIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta('fulltitle', webpage, default=None)
if not title or title == "c't":
title = self._search_regex(
r'<div[^>]+class="videoplayerjw"[^>]+data-title="([^"]+)"',
webpage, 'title')
yt_urls = YoutubeIE._extract_urls(webpage)
if yt_urls:
return self.playlist_from_matches(yt_urls, video_id, title, ie=YoutubeIE.ie_key())
container_id = self._search_regex(
r'<div class="videoplayerjw"[^>]+data-container="([0-9]+)"',
webpage, 'container ID')
@@ -47,12 +74,6 @@ class HeiseIE(InfoExtractor):
r'<div class="videoplayerjw"[^>]+data-sequenz="([0-9]+)"',
webpage, 'sequenz ID')
title = self._html_search_meta('fulltitle', webpage, default=None)
if not title or title == "c't":
title = self._search_regex(
r'<div[^>]+class="videoplayerjw"[^>]+data-title="([^"]+)"',
webpage, 'title')
doc = self._download_xml(
'http://www.heise.de/videout/feed', video_id, query={
'container': container_id,

View File

@@ -7,14 +7,19 @@ from .common import InfoExtractor
class HGTVComShowIE(InfoExtractor):
IE_NAME = 'hgtv.com:show'
_VALID_URL = r'https?://(?:www\.)?hgtv\.com/shows/[^/]+/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.hgtv.com/shows/flip-or-flop/flip-or-flop-full-episodes-videos',
_TESTS = [{
# data-module="video"
'url': 'http://www.hgtv.com/shows/flip-or-flop/flip-or-flop-full-episodes-season-4-videos',
'info_dict': {
'id': 'flip-or-flop-full-episodes-videos',
'id': 'flip-or-flop-full-episodes-season-4-videos',
'title': 'Flip or Flop Full Episodes',
},
'playlist_mincount': 15,
}
}, {
# data-deferred-module="video"
'url': 'http://www.hgtv.com/shows/good-bones/episodes/an-old-victorian-house-gets-a-new-facelift',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
@@ -23,7 +28,7 @@ class HGTVComShowIE(InfoExtractor):
config = self._parse_json(
self._search_regex(
r'(?s)data-module=["\']video["\'][^>]*>.*?<script[^>]+type=["\']text/x-config["\'][^>]*>(.+?)</script',
r'(?s)data-(?:deferred-)?module=["\']video["\'][^>]*>.*?<script[^>]+type=["\']text/x-config["\'][^>]*>(.+?)</script',
webpage, 'video config'),
display_id)['channels'][0]

View File

@@ -89,6 +89,11 @@ class IGNIE(InfoExtractor):
'url': 'http://me.ign.com/ar/angry-birds-2/106533/video/lrd-ldyy-lwl-lfylm-angry-birds',
'only_matching': True,
},
{
# videoId pattern
'url': 'http://www.ign.com/articles/2017/06/08/new-ducktales-short-donalds-birthday-doesnt-go-as-planned',
'only_matching': True,
},
]
def _find_video_id(self, webpage):
@@ -98,6 +103,8 @@ class IGNIE(InfoExtractor):
r'data-video-id="(.+?)"',
r'<object id="vid_(.+?)"',
r'<meta name="og:image" content=".*/(.+?)-(.+?)/.+.jpg"',
r'videoId&quot;\s*:\s*&quot;(.+?)&quot;',
r'videoId["\']\s*:\s*["\']([^"\']+?)["\']',
]
return self._search_regex(res_id, webpage, 'video id', default=None)

View File

@@ -59,12 +59,18 @@ class ITVIE(InfoExtractor):
def _add_sub_element(element, name):
return etree.SubElement(element, _add_ns(name))
production_id = (
params.get('data-video-autoplay-id') or
'%s#001' % (
params.get('data-video-episode-id') or
video_id.replace('a', '/')))
req_env = etree.Element(_add_ns('soapenv:Envelope'))
_add_sub_element(req_env, 'soapenv:Header')
body = _add_sub_element(req_env, 'soapenv:Body')
get_playlist = _add_sub_element(body, ('tem:GetPlaylist'))
request = _add_sub_element(get_playlist, 'tem:request')
_add_sub_element(request, 'itv:ProductionId').text = params['data-video-id']
_add_sub_element(request, 'itv:ProductionId').text = production_id
_add_sub_element(request, 'itv:RequestGuid').text = compat_str(uuid.uuid4()).upper()
vodcrid = _add_sub_element(request, 'itv:Vodcrid')
_add_sub_element(vodcrid, 'com:Id')

100
youtube_dl/extractor/joj.py Executable file
View File

@@ -0,0 +1,100 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
js_to_json,
try_get,
)
class JojIE(InfoExtractor):
_VALID_URL = r'''(?x)
(?:
joj:|
https?://media\.joj\.sk/embed/
)
(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})
'''
_TESTS = [{
'url': 'https://media.joj.sk/embed/a388ec4c-6019-4a4a-9312-b1bee194e932',
'info_dict': {
'id': 'a388ec4c-6019-4a4a-9312-b1bee194e932',
'ext': 'mp4',
'title': 'NOVÉ BÝVANIE',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 3118,
}
}, {
'url': 'joj:a388ec4c-6019-4a4a-9312-b1bee194e932',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe\b[^>]+\bsrc=["\'](?P<url>(?:https?:)?//media\.joj\.sk/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://media.joj.sk/embed/%s' % video_id, video_id)
title = self._search_regex(
(r'videoTitle\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
r'<title>(?P<title>[^<]+)'), webpage, 'title',
default=None, group='title') or self._og_search_title(webpage)
bitrates = self._parse_json(
self._search_regex(
r'(?s)bitrates\s*=\s*({.+?});', webpage, 'bitrates',
default='{}'),
video_id, transform_source=js_to_json, fatal=False)
formats = []
for format_url in try_get(bitrates, lambda x: x['mp4'], list) or []:
if isinstance(format_url, compat_str):
height = self._search_regex(
r'(\d+)[pP]\.', format_url, 'height', default=None)
formats.append({
'url': format_url,
'format_id': '%sp' % height if height else None,
'height': int(height),
})
if not formats:
playlist = self._download_xml(
'https://media.joj.sk/services/Video.php?clip=%s' % video_id,
video_id)
for file_el in playlist.findall('./files/file'):
path = file_el.get('path')
if not path:
continue
format_id = file_el.get('id') or file_el.get('label')
formats.append({
'url': 'http://n16.joj.sk/storage/%s' % path.replace(
'dat/', '', 1),
'format_id': format_id,
'height': int_or_none(self._search_regex(
r'(\d+)[pP]', format_id or path, 'height',
default=None)),
})
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage)
duration = int_or_none(self._search_regex(
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@@ -65,9 +65,9 @@ class JoveIE(InfoExtractor):
webpage, 'description', fatal=False)
publish_date = unified_strdate(self._html_search_meta(
'citation_publication_date', webpage, 'publish date', fatal=False))
comment_count = self._html_search_regex(
comment_count = int(self._html_search_regex(
r'<meta name="num_comments" content="(\d+) Comments?"',
webpage, 'comment count', fatal=False)
webpage, 'comment count', fatal=False))
return {
'id': video_id,

View File

@@ -0,0 +1,149 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
unified_timestamp,
update_url_query,
)
class KakaoIE(InfoExtractor):
_VALID_URL = r'https?://tv\.kakao\.com/channel/(?P<channel>\d+)/cliplink/(?P<id>\d+)'
_API_BASE = 'http://tv.kakao.com/api/v1/ft/cliplinks'
_TESTS = [{
'url': 'http://tv.kakao.com/channel/2671005/cliplink/301965083',
'md5': '702b2fbdeb51ad82f5c904e8c0766340',
'info_dict': {
'id': '301965083',
'ext': 'mp4',
'title': '乃木坂46 バナナマン 「3期生紹介コーナーが始動顔高低差GPも」 『乃木坂工事中』',
'uploader_id': 2671005,
'uploader': '그랑그랑이',
'timestamp': 1488160199,
'upload_date': '20170227',
}
}, {
'url': 'http://tv.kakao.com/channel/2653210/cliplink/300103180',
'md5': 'a8917742069a4dd442516b86e7d66529',
'info_dict': {
'id': '300103180',
'ext': 'mp4',
'description': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)\r\n\r\n[쇼! 음악중심] 20160611, 507회',
'title': '러블리즈 - Destiny (나의 지구) (Lovelyz - Destiny)',
'uploader_id': 2653210,
'uploader': '쇼 음악중심',
'timestamp': 1485684628,
'upload_date': '20170129',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
player_header = {
'Referer': update_url_query(
'http://tv.kakao.com/embed/player/cliplink/%s' % video_id, {
'service': 'kakao_tv',
'autoplay': '1',
'profile': 'HIGH',
'wmode': 'transparent',
})
}
QUERY_COMMON = {
'player': 'monet_html5',
'referer': url,
'uuid': '',
'service': 'kakao_tv',
'section': '',
'dteType': 'PC',
}
query = QUERY_COMMON.copy()
query['fields'] = 'clipLink,clip,channel,hasPlusFriend,-service,-tagList'
impress = self._download_json(
'%s/%s/impress' % (self._API_BASE, video_id),
video_id, 'Downloading video info',
query=query, headers=player_header)
clip_link = impress['clipLink']
clip = clip_link['clip']
title = clip.get('title') or clip_link.get('displayTitle')
tid = impress.get('tid', '')
query = QUERY_COMMON.copy()
query.update({
'tid': tid,
'profile': 'HIGH',
})
raw = self._download_json(
'%s/%s/raw' % (self._API_BASE, video_id),
video_id, 'Downloading video formats info',
query=query, headers=player_header)
formats = []
for fmt in raw.get('outputList', []):
try:
profile_name = fmt['profile']
fmt_url_json = self._download_json(
'%s/%s/raw/videolocation' % (self._API_BASE, video_id),
video_id,
'Downloading video URL for profile %s' % profile_name,
query={
'service': 'kakao_tv',
'section': '',
'tid': tid,
'profile': profile_name
}, headers=player_header, fatal=False)
if fmt_url_json is None:
continue
fmt_url = fmt_url_json['url']
formats.append({
'url': fmt_url,
'format_id': profile_name,
'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')),
'format_note': fmt.get('label'),
'filesize': int_or_none(fmt.get('filesize'))
})
except KeyError:
pass
self._sort_formats(formats)
thumbs = []
for thumb in clip.get('clipChapterThumbnailList', []):
thumbs.append({
'url': thumb.get('thumbnailUrl'),
'id': compat_str(thumb.get('timeInSec')),
'preference': -1 if thumb.get('isDefault') else 0
})
top_thumbnail = clip.get('thumbnailUrl')
if top_thumbnail:
thumbs.append({
'url': top_thumbnail,
'preference': 10,
})
return {
'id': video_id,
'title': title,
'description': clip.get('description'),
'uploader': clip_link.get('channel', {}).get('name'),
'uploader_id': clip_link.get('channelId'),
'thumbnails': thumbs,
'timestamp': unified_timestamp(clip_link.get('createTime')),
'duration': int_or_none(clip.get('duration')),
'view_count': int_or_none(clip.get('playCount')),
'like_count': int_or_none(clip.get('likeCount')),
'comment_count': int_or_none(clip.get('commentCount')),
'formats': formats,
}

Some files were not shown because too many files have changed in this diff Show More