Compare commits

..

250 Commits

Author SHA1 Message Date
Sergey M․
c797db4a2f release 2018.06.14 2018-06-14 01:24:53 +07:00
Sergey M․
03eef0f032 [ChangeLog] Actualize
[ci skip]
2018-06-14 01:22:42 +07:00
Remita Amine
aa56061627 [discoverynetworks] Add support for disco-api videos(closes #16724) 2018-06-13 16:46:59 +01:00
Remita Amine
18d66f0410 [dailymotion] use compat_struct_pack 2018-06-13 15:12:42 +01:00
Remita Amine
f15f7a674b [dailymotion] add support for password protected videos(closes #9789) 2018-06-13 14:51:19 +01:00
Sergey M․
9aca7fe6a3 [abc:iview] Extract more series metadata 2018-06-12 20:25:50 +07:00
Remita Amine
e0671819e7 [abc] fix ABC IView extraction and add support for livestreams(closes #16704)(closes #12354) 2018-06-12 13:07:57 +01:00
Sergey M․
5d6c81b63f [downloader/http] Fix resume when writing ot stdout (closes #16699) 2018-06-12 03:12:29 +07:00
Sergey M․
dc53c78634 [crackle] Add support for sonycrackle.com (closes #16698) 2018-06-12 02:06:30 +07:00
Sergey M․
7dc9c60b4b [tvnet] Fix _VALID_URL 2018-06-12 02:05:58 +07:00
Sergey M․
e51752754d [tvnet] Improve video id extraction 2018-06-12 01:50:43 +07:00
Sergey M․
0645be49cb [inc] PEP 8 2018-06-12 01:41:23 +07:00
Sergey M․
a572ae6114 [tvnet] Improve and fix issues (closes #15462) 2018-06-12 01:37:34 +07:00
Thomas van der Berg
b2df66aeca [tvnet] Add extractor 2018-06-12 01:37:29 +07:00
Sergey M․
93cffb1444 [nrk] Update API hosts and try all previously known ones (closes #16690) 2018-06-11 03:08:36 +07:00
Sergey M․
d253df2f65 [wimp] Fix Youtube embeds extraction 2018-06-11 02:40:17 +07:00
Sergey M․
e8c6afc168 release 2018.06.11 2018-06-11 01:57:30 +07:00
Sergey M․
cc37cc3f99 [ChangeLog] Actualize
[ci skip]
2018-06-11 01:55:16 +07:00
Sergey M․
9d581efe05 [npo] Extend _VALID_URL (closes #16682) 2018-06-10 00:26:16 +07:00
Sergey M․
ff2e486221 [inc] Add support for another embed schema (closes #16666) 2018-06-09 02:53:04 +07:00
Remita Amine
6ae36035d9 [tv4] fix format extraction(closes #16650) 2018-06-06 00:41:08 +01:00
Remita Amine
9afd74d705 [nexx] extract free cdn http formats 2018-06-05 01:02:46 +01:00
Sergey M․
2e6975306a [nexx] Update tests 2018-06-05 02:59:25 +07:00
Sergey M․
06ea7bdd99 [nexx] Add support for free cdn (closes #16538) 2018-06-05 02:55:54 +07:00
Sergey M․
d7be705308 [pbs] Add another cove id pattern (closes #15373) 2018-06-05 00:17:26 +07:00
Sergey M․
2e190c2ad9 [rbmaradio] Add support for 192k format (closes #16631) 2018-06-04 23:51:25 +07:00
Sergey M․
94418c8eb3 release 2018.06.04 2018-06-04 02:41:53 +07:00
Sergey M․
f7560859a3 [devscripts/update-copyright] Update copyright year 2018-06-04 02:33:54 +07:00
Sergey M․
c6c478f40d [ChangeLog] Actualize
[ci skip]
2018-06-04 02:16:33 +07:00
Sergey M․
c3023e9f2e [camtube] Add extractor 2018-06-03 17:09:20 +07:00
Sergey M․
77053237c5 [twitter:card] Generalize base API URL 2018-06-03 15:58:12 +07:00
Sergey M․
b6b2ccb72f [twitter:card] Extract guest token (closes #16609) 2018-06-03 15:57:45 +07:00
Sergey M․
0a10f50e2f [chaturbate] Use geo verification headers 2018-06-03 04:30:33 +07:00
Sergey M․
6d155707e6 [bbc] Add support for bbcthree (closes #16612) 2018-06-03 04:07:59 +07:00
Sergey M․
eb6793ba97 [youtube] Update tests 2018-06-03 02:23:45 +07:00
Sergey M․
7e72694b5e [youtube] Move metadata extraction after video availability check 2018-06-03 02:08:38 +07:00
Sergey M․
936784b272 [youtube] Extract track and artist 2018-06-03 02:05:14 +07:00
Sergey M․
003fe73ccf [safari] Add support for new URL schema (closes #16614) 2018-06-03 00:53:11 +07:00
Remita Amine
1ea559c445 [adn] fix extraction 2018-06-02 18:14:22 +01:00
Sergey M․
19e42ead9b release 2018.06.02 2018-06-02 01:51:31 +07:00
Sergey M․
73c938e460 [ChangeLog] Actualize
[ci skip]
2018-06-02 01:49:48 +07:00
Sergey M․
9b89daefa6 [facebook] Improve extraction (closes #16554) 2018-06-02 01:42:05 +07:00
Nathan Rossi
9d082e7cb8 [facebook] Add support for tahoe player videos (closes #15441)
Specific videos appear to use a newer/different player, this requires a
second request for the video data as the initial request is missing the
specified data.

Additionally these videos have different page content for the uploader
value, which is stored in the `<meta property="og:title"...>` element of
the initial request.
2018-06-02 01:32:53 +07:00
Sergey M․
f20f636596 [cbc] Improve extraction (closes #16583, closes #16593) 2018-06-02 00:35:07 +07:00
Logan Fleur
b995043ab8 Ignore venv directory 2018-06-02 00:18:57 +07:00
Enes
85750f8972 [openload] Improve ext extraction 2018-06-02 00:16:22 +07:00
Sergey M․
926d97fc6b [9c9media] PEP 8 2018-06-01 05:17:49 +07:00
Sergey M․
2593725a9b [twitter:card] Add support for another endpoint (closes #16586) 2018-06-01 05:16:00 +07:00
DroidFreak32
0bfdcc1495 [openload] Add support for oload.win and oload.download 2018-05-31 22:01:44 +07:00
Remita Amine
c3f75e2454 [audimedia] fix extraction(closes #15309) 2018-05-31 12:39:45 +01:00
Remita Amine
3a8e3730c1 [francetv] add support for sport.francetvinfo.fr(closes #15645) 2018-05-31 11:40:37 +01:00
Remita Amine
acca2ac7f3 [mlb] improve extraction(closes #16587) 2018-05-31 02:50:14 +01:00
Remita Amine
128b58ad13 [nhl] remove old extractors 2018-05-31 02:49:35 +01:00
Remita Amine
4fd1437d9d [rbmaradio] check formats availability(closes #16585) 2018-05-30 17:08:32 +01:00
Sergey M․
e425710554 release 2018.05.30 2018-05-30 21:54:30 +07:00
Sergey M․
bc3143ac5e [ChangeLog] Actualize
[ci skip]
2018-05-30 21:52:33 +07:00
Remita Amine
e0d42dd4b2 [teamcoco] Fix extraction for full episodes(closes #16573) 2018-05-30 13:21:07 +01:00
Remita Amine
a07879d6b2 [spiegel] fix info extraction(#16538) 2018-05-28 00:10:46 +01:00
Sergey M․
cfd7f2a636 [apa] Add extractor (closes #15041, closes #15672) 2018-05-27 18:24:54 +07:00
Remita Amine
9c65c4a6cd [bellmedia] add support for bnnbloomberg.ca(#16560) 2018-05-27 12:11:53 +01:00
Remita Amine
c9e12a618c [9c9media] extract mpd formats and subtitles 2018-05-27 12:10:12 +01:00
Sergey M․
8882840ec5 [cammodels] Use geo verification headers 2018-05-26 22:22:58 +07:00
Sergey M․
2ce35d9f43 [cammodels] Add another error pattern 2018-05-26 22:22:58 +07:00
Sergey M․
f16f48779c [downloader/rtmp] Generalize download messages and report time elapsed on finish 2018-05-26 22:22:58 +07:00
Sergey M․
ddd8486a44 [downloader/rtmp] Gracefully handle live streams interrupted by user 2018-05-26 22:22:58 +07:00
Remita Amine
68217024e8 remove unnecessary assignment parenthesis 2018-05-26 16:13:54 +01:00
Remita Amine
ec2f3d2800 [ufctv] add support for authentication(closes #16542) 2018-05-26 16:13:54 +01:00
Sergey M․
8b1da46e8f [cammodels] Improve and simplify (closes #14499) 2018-05-26 21:25:30 +07:00
mars67857
2a49d01992 [cammodels] Add extractor 2018-05-26 21:25:21 +07:00
Remita Amine
261f47306c [utils] fix style id extraction for namespaced id attribute(closes #16551) 2018-05-26 14:38:24 +01:00
Remita Amine
c0fd20abca [soundcloud] detect format extension(closes #16549) 2018-05-26 14:38:24 +01:00
Parmjit Virk
986c0b0215 [cbc] Fix playlist title extraction (closes #16502) 2018-05-26 20:05:54 +07:00
Sergey M․
97b01144bd [tumblr] Detect and report sensitive media (closes #13829) 2018-05-26 20:00:00 +07:00
Sergey M․
56cd31f320 [tumblr] Improve authentication (closes #15133) 2018-05-26 19:59:35 +07:00
Zack Fernandes
c678192af3 [tumblr] Add support for authentication 2018-05-26 19:56:01 +07:00
Sergey M․
0934c9d4fa release 2018.05.26 2018-05-26 13:02:21 +07:00
Sergey M․
38e4e8ab80 [ChangeLog] Actualize
[ci skip]
2018-05-26 12:58:34 +07:00
Remita Amine
5a16c9d9d3 [utils] keep the original TV_PARENTAL_GUIDELINES dict 2018-05-25 23:12:50 +01:00
Petr Novák
bdbcc8eecb [dvtv] Remove dead test 2018-05-26 02:15:50 +07:00
rhhayward
9ef5cdb5cb [audiomack] Stringify video id (closes #15310) 2018-05-26 02:13:29 +07:00
Sergey M․
03fad17cb6 [izlesene] Improve extraction and fix issues (closes #16407, closes #16271) 2018-05-26 01:53:17 +07:00
Enes
f4d261b765 [izlesene] Fix extraction (closes #16233) 2018-05-26 01:53:11 +07:00
Sergey M․
aee36ca832 [indavideo] Add support for generic embeds (closes #11989) 2018-05-26 01:25:40 +07:00
Sergey M․
2a7c6befc1 [indavideo] Fix extraction (closes #11221) 2018-05-26 01:09:44 +07:00
András Veres-Szentkirályi
b39f42ee92 [indavideo] Sign download URLs 2018-05-26 00:46:05 +07:00
Sergey M․
6bd499e8ca [peertube] Add support for generic embeds 2018-05-26 00:28:30 +07:00
Sergey M․
f2fc63a5a8 [peertube] Add support for embed and API URLs 2018-05-26 00:15:38 +07:00
Sergey M․
c561b75c82 [peertube] Add extractor (closes #16301, closes #16329) 2018-05-26 00:09:15 +07:00
Jakub Wilk
3d2a643fdc [imgur] Fix extraction 2018-05-25 21:54:21 +08:00
Remita Amine
e8e58c2278 [hidive] add support for authentication(closes #16534) 2018-05-24 11:53:42 +01:00
Remita Amine
1139935db7 [nbc] add support for stream.nbcsports.com(closes #13911) 2018-05-24 02:51:47 +01:00
Remita Amine
ca0aef42d4 [viewlift] add support for hoichoi.tv(closes #16536) 2018-05-23 23:04:12 +01:00
Remita Amine
3bb3ff38a1 [test_utils] add tests for b836118724 2018-05-23 12:20:05 +01:00
Remita Amine
268e132dec [go90] extract age limit and detect drm protection(#10127) 2018-05-23 12:15:21 +01:00
Remita Amine
670dcba8c7 [viewlift] Remove rating format transformation 2018-05-23 12:13:44 +01:00
Remita Amine
b836118724 [utils] Relax TV Parental Guidelines matching 2018-05-23 12:12:20 +01:00
Remita Amine
57d6792024 [viewlift] fix extraction for snagfils.com(closes #15766) 2018-05-23 11:27:36 +01:00
Remita Amine
b89ac53455 [globo] use compat_str 2018-05-21 17:46:52 +01:00
Remita Amine
d81ffc3aa0 [globo] Add entry for netrc authentication 2018-05-21 15:39:02 +01:00
Remita Amine
e518749300 [globo] handle login errors 2018-05-21 15:07:24 +01:00
Remita Amine
db2058f63e [globo] improve extraction(closes #4189)
- add support for authentication
- simplify url signing
- extract DASH and MSS formats
2018-05-21 14:55:50 +01:00
huichen90
5c766952dc Update leeco.py
Fixed this bug :youtube_dl.utils.ExtractorError: An extractor error has occurred. (caused by KeyError('location',));
2018-05-21 21:26:53 +08:00
Sergey M․
504f20dd30 Remove experimental mark for some options 2018-05-19 23:53:24 +07:00
Remita Amine
f2b1fa07ec [teamcoco] relax _VALID_URL regex and add a fallback for format extraction(fixes #16484) 2018-05-19 13:05:51 +01:00
Remita Amine
acd620c930 [teamcoco] improve _VALID_URL regex(#16484) 2018-05-19 12:19:05 +01:00
Remita Amine
27694fe7ad [imdb:list] fix _VALID_URL regex 2018-05-19 11:04:08 +01:00
Remita Amine
0167f0dbfe [imdb] improve extraction(fixes #4085)(fixes #14557) 2018-05-19 10:15:11 +01:00
Sergey M․
7550ea501a release 2018.05.18 2018-05-18 00:32:51 +07:00
Sergey M․
58197205d3 [ChangeLog] Actualize
[ci skip]
2018-05-18 00:30:41 +07:00
Sergey M․
361a965b5c [vimeo:likes] Relax _VALID_URL and fix single page likes extraction (closes #16475) 2018-05-17 23:21:40 +07:00
Remita Amine
a3f86160fa [pluralsight] fix clip id extraction(fixes #16460) 2018-05-17 13:46:05 +01:00
Remita Amine
1306f5ed72 [mychannels] add support for mychannels.com(closes #15334) 2018-05-16 19:11:48 +01:00
Remita Amine
58a68d8fda [moniker] Remove extractor(closes #15336) 2018-05-16 18:44:33 +01:00
Remita Amine
eea2fafcf5 [pbs] fix embed data extraction(fixes #16474) 2018-05-16 18:34:25 +01:00
Remita Amine
6843ac5b13 add support for paramountnetwork.com and bellator.com(fixes #15418) 2018-05-16 17:49:35 +01:00
Remita Amine
54fc90aabf [youtube] fix hd720 format position 2018-05-16 16:24:44 +01:00
Remita Amine
997530d9d4 [dailymotion] remove fragment part from m3u8 urls(closes #8915) 2018-05-16 12:04:24 +01:00
Remita Amine
fe3a60f040 [dreisat] improve extraction(closes #15350)
- extract all formats
- extract more format metadata extraction
- improve format sorting
- use hls native downloader
- detect geo-restriction
- bypass geo-restriction
2018-05-16 11:30:29 +01:00
Remita Amine
7f34984e81 [dtube] Add new extractor(closes #15201) 2018-05-16 09:35:47 +01:00
Sergey M․
1e4fe5a7cc [options] Fix typo (closes #16450) 2018-05-14 23:42:33 +07:00
Sergey M․
c63ca0eef8 [youtube] Improve format filesize extraction (#16453) 2018-05-14 23:27:56 +07:00
Sergey M․
84a9fef899 [youtube] Make uploader extraction non fatal (#16444) 2018-05-13 22:49:01 +07:00
Remita Amine
4c76aa0666 [youtube] fix extraction for embed restricted live streams(fixes #16433) 2018-05-13 13:20:16 +01:00
Remita Amine
90b633f86b [nbc] improve info extraction(fixes #16440) 2018-05-13 11:31:41 +01:00
Sergey M․
07acdc5afc [twitch:clips] Sort formats 2018-05-12 12:08:54 +07:00
Sergey M․
49fa7de301 [twitch:clips] Fix extraction (closes #16429) 2018-05-11 23:21:02 +07:00
llyyr
dbd5c502ea [redditr] Relax _VALID_URL (closes #16426) 2018-05-10 23:17:23 +07:00
Sergey M․
bc5e4aa57e [mixcloud] Bypass throttling for HTTP formats (#12579, #16424) 2018-05-10 22:22:26 +07:00
Sergey M․
1344d3e169 [nickbr] Relax _VALID_URL (#13230) 2018-05-10 22:01:13 +07:00
Remita Amine
ff8889cd4d [teamcoco] fix extraction(closes #16374) 2018-05-10 08:19:56 +01:00
Sergey M․
9e18bb4c67 release 2018.05.09 2018-05-09 00:36:47 +07:00
Sergey M․
44277998ad [ChangeLog] Actualize
[ci skip]
2018-05-09 00:34:39 +07:00
Sergey M․
05108a496a [YoutubeDL] Ensure ext exists for automatic captions 2018-05-08 22:57:52 +07:00
Sergey M․
2fbd86352e [udemy] Extract asset captions 2018-05-08 22:57:01 +07:00
Sergey M․
0ce76801e8 [udemy] Extract stream URLs (closes #16372) 2018-05-08 22:33:35 +07:00
Sergey M․
789b7774a7 [businessinsider] Add extractor (closes #16387, closes #16388, closes #16389) 2018-05-06 21:58:55 +07:00
Sergey M․
660a230b2d [cloudflarestream] Add support for cloudflare streams (closes #16375) 2018-05-05 01:21:52 +07:00
Sergey M․
a90a6b54ee [watchbox] Fix extraction (closes #16356) 2018-05-02 20:43:34 +07:00
Remita Amine
3cc0d0b829 [discovery] extract Affiliate/Anonymous Auth Token from cookies(closes #14954) 2018-05-02 09:32:53 +01:00
Sergey M․
ea1f5e5dbd [itv:btcc] Add extractor (closes #16139) 2018-05-02 07:21:24 +07:00
Sergey M․
5f95927a62 Improve geo bypass mechanism
* Introduce geo bypass context
* Add ability to bypass based on IP blocks in CIDR notation
* Introduce --geo-bypass-ip-block
2018-05-02 07:20:59 +07:00
Sergey M․
a93ce61bd5 [tunein] Use live title for live streams (closes #16347) 2018-05-02 01:29:44 +07:00
Sergey M․
c18142da6e [itv] Improve extraction (closes #16253) 2018-05-01 22:48:08 +07:00
Sergey M․
cc42941390 release 2018.05.01 2018-05-01 03:38:57 +07:00
Sergey M․
cc5772c4f0 [ChangeLog] Actualize
[ci skip]
2018-05-01 03:30:23 +07:00
Sergey M․
c21692fa94 [kaltura] Improve iframe embeds detection (closes #16337) 2018-05-01 03:09:04 +07:00
Sergey M․
8513963468 [udemy] Extract outputs renditions (closes #16289, closes #16291, closes #16320, closes #16321, closes #16334, closes #16335) 2018-05-01 02:15:43 +07:00
Sergey M․
67ca1a8ef7 [zattoo] Improve and simplify (closes #14676) 2018-05-01 01:50:30 +07:00
Alex Seiler
4a73354586 [zattoo] Add extractor (closes #14668) 2018-05-01 01:50:07 +07:00
Sergey M․
796bf9de45 [yandexmusic] Convert release_year to int 2018-04-29 22:56:07 +07:00
Sergey M․
e5eadfa82f [udemy,xiami,yandexmusic] Override _download_webpage_handle instead of _download_webpage 2018-04-29 22:54:52 +07:00
Niklas Haas
30226342ab [youtube] Correctly disable polymer on all requests
Rather than just the one that use the _download_webpage helper. The need
for this was made apparent by 0fe7783e, which refactored
_download_json in a way that completely avoids the use of
_download_webpage, thus breaking youtube.

Fixes #16323
2018-04-29 22:35:16 +07:00
Bastian de Groot
01aec84880 [generic] Prefer enclosures over links in RSS feeds 2018-04-29 22:14:37 +07:00
Meneth32
12b0d4e0e1 [redditr] Add support for old.reddit.com URLs 2018-04-29 21:59:40 +07:00
Sergey M․
106c8c3edb [nrktv] Update API host (closes #16324) 2018-04-29 19:04:40 +07:00
Sergey M․
500a86a52e [downloader/fragment] Restart download if .ytdl file is corrupt (closes #16312) 2018-04-29 00:33:31 +07:00
Sergey M․
7dd6ab4a47 [imdb] Extract all formats (closes #16249) 2018-04-28 04:51:39 +07:00
Sergey M․
ae1c585cee [vimeo] Extract JSON LD (closes #16295) 2018-04-28 02:51:18 +07:00
Sergey M․
e7e4a6e0f9 [extractor/common] Extract interaction statistic 2018-04-28 02:48:03 +07:00
Sergey M․
6cc622327f [utils] Introduce merge_dicts 2018-04-28 02:47:17 +07:00
Sergey M․
0fe7783ece [extractor/common] Add _download_json_handle 2018-04-28 01:59:15 +07:00
Sergey M․
c84eae4f66 [funk:channel] Improve extraction (closes #16285) 2018-04-27 03:45:52 +07:00
Sergey M․
d3711b0050 [devscripts/gh-pages/generate-download.py] Use program checksum from versions.json 2018-04-25 02:14:27 +07:00
Sergey M․
b5802d69f5 release 2018.04.25 2018-04-25 01:12:40 +07:00
Sergey M․
e028d4f506 [ChangeLog] Actualize
[ci skip]
2018-04-25 01:07:37 +07:00
Sergey M․
ecb24f7c08 Credit @f2face for #16115 2018-04-25 01:07:32 +07:00
Sergey M․
95284bc281 Credit @TingPing for picarto (#15551) 2018-04-25 01:07:27 +07:00
Sergey M․
5d0fe6d23e Credit @Zopieux for #16250 2018-04-25 01:07:23 +07:00
Alexandre Macabies
76030543cd [openload] Recognize IPv6 stream URLs (closes #16137) 2018-04-25 00:49:30 +07:00
Sergey M․
0ff51adae6 [twitch] Extract is_live according to status (closes #16259) 2018-04-24 23:55:06 +07:00
Sergey M․
1cc47c6674 [utils] Fix match_str for boolean meta fields 2018-04-24 23:54:49 +07:00
Sergey M․
99036a1298 [pornflip] Relax _VALID_URL (closes #16258) 2018-04-23 04:03:11 +07:00
Sergey M․
171625469a [etonline] Remove extractor (closes #16256)
Covered by generic extractor
2018-04-23 03:17:34 +07:00
Sergey M․
af751350e8 [Makefile] Add support for pandoc 2 and disable smart extension (closes #16251)
smart extension rewrites straight quotes as curly quotes, -- as en-dashes and so on that is unwanted behavior.
2018-04-23 02:50:11 +07:00
Sergey M․
2441c1aab1 [breakcom] Fix extraction (closes #16254) 2018-04-23 00:16:52 +07:00
Sergey M․
70d35d166c [youtube] Add ability to authenticate with cookies 2018-04-22 06:08:05 +07:00
Sergey M․
3853309fe2 [youtube:feed] Implement lazy playlist extraction (closes #10184) 2018-04-22 06:07:32 +07:00
Sergey M․
6cdaaf7031 [svt] Improve (closes #15809) 2018-04-22 05:34:03 +07:00
0x9fff00
488ff2dd3a [svt] Add support for TV channel live streams (Closes #15279) 2018-04-22 05:33:40 +07:00
Sergey M․
353f0bde78 [cbssports] PEP 8 2018-04-22 04:57:22 +07:00
Sergey M․
040c6296bb [ccma] Fix video extraction (closes #15931) 2018-04-22 04:55:35 +07:00
Sergey M․
a693386df1 [rentv] Improve extraction (closes #15227) 2018-04-21 23:22:30 +07:00
einstein95
4b8588fe02 [rentv] Fix extraction 2018-04-21 23:22:25 +07:00
Sergey M․
d65a48a0ef [nick] Add support for nickjr.nl (closes #16230) 2018-04-20 23:12:13 +07:00
Sergey M․
c194200277 [mofosex] Fix test 2018-04-19 22:38:31 +07:00
Sergey M․
d317973284 [extremetube] Fix metadata extraction 2018-04-19 22:36:33 +07:00
Parmjit Virk
1792bc3a06 [keezmovies] Add support for generic embeds (closes #16134) 2018-04-19 22:25:51 +07:00
Douglas Su
5a19d231ca [YoutubeDL] Fix typo in media extension compatibility checker 2018-04-19 22:21:50 +07:00
Remita Amine
d86c5167ae [nexx] extract new azure urls(closes #16223) 2018-04-19 15:48:03 +01:00
Remita Amine
b004d9bbf1 [cbssports] fix extraction(fixes #16217) 2018-04-19 15:08:17 +01:00
Sergey M․
9b3036bd2e [instagram:user] Fix extraction (closes #16119) 2018-04-18 10:12:24 +07:00
Sergey M․
e30991f920 [kaltura] Improve embeds detection (closes #16201) 2018-04-18 01:26:15 +07:00
Dan Salmon
518d5ba519 Fix some tests 2018-04-18 00:10:02 +07:00
Sergey M․
238d42cf5d [instagram:user] Fix extraction (closes #16119) 2018-04-17 22:37:50 +07:00
Remita Amine
522d6b5c96 [cbs] skip DRM asset types(fixes #16104) 2018-04-16 07:48:51 +01:00
Sergey M․
3c92fd1cd5 release 2018.04.16 2018-04-16 01:09:18 +07:00
Sergey M․
bdf7ba6f3a Set chmod 644 for all extractors 2018-04-16 01:07:21 +07:00
Sergey M․
0e6ccb3905 [ChangeLog] Actualize
[ci skip]
2018-04-16 00:56:05 +07:00
Sergey M․
c07cb68e79 [smotri:broadcast] Fix extraction (closes #16180) 2018-04-16 00:54:21 +07:00
Sergey M․
a42839e548 [picarto] Improve extraction (closes #6205, closes #12514, closes #15276, closes #15551) 2018-04-16 00:34:47 +07:00
Patrick Griffis
d6166a7602 [picarto] Add extractor 2018-04-16 00:32:15 +07:00
Sergey M․
8e41c9ad01 [vine:user] Improve extraction (closes #16190) 2018-04-15 22:46:43 +07:00
Timmy
9b5aead6aa [vine:user] Fix extraction (closes #15514) 2018-04-15 22:46:30 +07:00
Sergey M․
68ddba20ae [instagram:user] Remove User-Agent from signature (closes #16119) 2018-04-13 22:28:33 +07:00
Sergey M․
92ded33a05 [pornhub] Relax _VALID_URLs (closes #16165) 2018-04-12 04:53:45 +07:00
Sergey M․
64f03e5b4c [cbc:watch] Re-acquire device token when expired (closes #16160) 2018-04-11 23:30:19 +07:00
Ray Douglass
d783aee56a [fxnetworks] Add support for https theplatform URLs (closes #16125) 2018-04-11 20:11:24 +07:00
Sergey M․
315ab3d500 [instagram:user] Simplify signing (#16119) 2018-04-11 01:51:57 +07:00
Sergey M․
dd9aea8cbd [instagram:user] Add request signing (closes #16119) 2018-04-11 01:25:41 +07:00
Sergey M․
fce7962691 [twitch] Add support for mobile URLs (closes #16146) 2018-04-10 23:07:37 +07:00
Sergey M․
f7f9757efc release 2018.04.09 2018-04-09 01:19:27 +07:00
Sergey M․
880ed89d49 [ChangeLog] Actualize
[ci skip]
2018-04-09 01:14:47 +07:00
Sergey M․
94c3442e6a [YoutubeDL] Do not save/restore console title while simulate (closes #16103) 2018-04-09 01:04:22 +07:00
Sergey M․
069937151e [generic] Add support for tube8 embeds 2018-04-09 00:37:15 +07:00
Sergey M․
d3431dcb90 [generic] Restrict share-videos.se embeds regex to filter bogus URLs (#16115) 2018-04-09 00:25:44 +07:00
Surya Oktafendri
1fc37ca3f1 [generic] Add support for share-videos.se embeds (closes #16089) 2018-04-09 00:19:23 +07:00
Sergey M․
d04ca97616 [odnoklassniki] Improve _VALID_URL readability 2018-04-08 22:21:21 +07:00
GDR!
608c738c7d [odnoklassniki] Extend _VALID_URL (closes #16081) 2018-04-08 22:13:00 +07:00
aeph6Ee0
66b686727b [extractor/common] Relax JSON-LD context check (closes #16006) 2018-04-08 03:09:42 +07:00
Sergey M․
717ea4e14e [steam] Bypass mature content check (closes #16113) 2018-04-08 00:29:43 +07:00
Sergey M․
cae5d9705c [acast] Extract more metadata 2018-04-08 00:21:55 +07:00
Sergey M․
1c9b1a4494 [acast] Fix extraction (closes #16118) 2018-04-08 00:08:45 +07:00
Sergey M․
ff826177cc [instagram:user] Fix extraction (closes #16119) 2018-04-07 23:58:57 +07:00
Parmjit Virk
9d15be3a5b [drtuber] Fix title extraction (closes #16107) 2018-04-07 21:39:21 +07:00
Sergey M․
e2750e1437 [liveleak] Extend _VALID_URL (closes #16117) 2018-04-07 20:55:01 +07:00
Sergey M․
e944737c59 [openload] Add support for oload.xyz 2018-04-06 23:40:15 +07:00
Sergey M․
fdfb32a0dd [openload] Relax stream URL regex 2018-04-06 00:15:22 +07:00
Sergey M․
235d828b7b [openload] Fix extraction (closes #16099) 2018-04-05 23:49:15 +07:00
Sergey M․
1236ac6b0b [svtplay] Share svtplay regex 2018-04-05 00:29:13 +07:00
Sergey M․
df146eb282 [svtplay:series] Add support for season URLs 2018-04-05 00:29:08 +07:00
Sergey M․
b71bb3ba8b [svtplay:series] Improve extraction (closes #16059) 2018-04-05 00:29:02 +07:00
Mattias Wadman
fd97fa7bfc [svtplay:series] Add extractor
Related to #11130
2018-04-05 00:28:58 +07:00
Sergey M․
e8dfecb384 release 2018.04.03 2018-04-03 00:26:11 +07:00
Sergey M․
10f9caec04 [ChangeLog] Actualize
[ci skip]
2018-04-03 00:23:03 +07:00
Sergey M․
ea6679fbeb [tvnow] Fix issues, simplify and improve (closes #15837) 2018-04-03 00:08:22 +07:00
AndroKev
3acae1e031 [tvnow] Add support for shows 2018-04-03 00:06:47 +07:00
Sergey M․
8bd1df3c31 [dramafever] Fix authentication (closes #16067) 2018-04-02 22:19:42 +07:00
Sergey M․
86693c4930 [afreecatv] Use partial view only when necessary (closes #14450) 2018-04-02 00:00:45 +07:00
Sergey M․
d563fb32ba [afreecatv] Remove debug output 2018-04-01 23:07:54 +07:00
Sergey M․
e51762be19 [afreecatv] Add support for authentication (#14450) 2018-04-01 22:47:39 +07:00
kenavera
03fcde10ce [nationalgeographic] Add support for new URL schema (closes #16001) 2018-04-01 21:22:51 +07:00
Sergey M․
95a1322bc1 [bilibili] Remove debug from player params regexes 2018-04-01 02:06:58 +07:00
Parmjit Virk
0669f8fd8f [xvideos] Fix thumbnail extraction (closes #15978) 2018-03-31 23:46:08 +07:00
kenavera
0b4bbcdcb6 [medialaan] Fix vod id 2018-03-31 22:14:49 +07:00
Luca Steeb
3e78d23b57 [openload] Add support for oload.site 2018-03-30 23:25:43 +07:00
Sergey M․
190f6c936b [naver] Fix extraction (closes #16029) 2018-03-29 23:49:09 +07:00
Sergey M․
02f6ccbce3 [dramafever] Partially switch to API v5 (closes #16026) 2018-03-29 23:06:13 +07:00
Arend v. Reinersdorff
5d60b99717 [options] Mention comments support in --batch-file 2018-03-27 22:25:29 +07:00
xofe
9e6a418015 [abc:iview] Unescape title and series meta fields 2018-03-27 22:08:40 +07:00
Attila-Mihaly Balazs
99c3091850 [videa] Extend _VALID_URL 2018-03-27 22:02:04 +07:00
164 changed files with 5153 additions and 2616 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.03.26.1*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.03.26.1**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.06.14*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.06.14**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.03.26.1
[debug] youtube-dl version 2018.06.14
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

1
.gitignore vendored
View File

@@ -47,3 +47,4 @@ youtube-dl.zsh
*.iml
tmp/
venv/

View File

@@ -236,3 +236,6 @@ Lei Wang
Petr Novák
Leonardo Taccari
Martin Weinelt
Surya Oktafendri
TingPing
Alexandre Macabies

267
ChangeLog
View File

@@ -1,3 +1,270 @@
version 2018.06.14
Core
* [downloader/http] Fix retry on error when streaming to stdout (#16699)
Extractors
+ [discoverynetworks] Add support for disco-api videos (#16724)
+ [dailymotion] Add support for password protected videos (#9789)
+ [abc:iview] Add support for livestreams (#12354)
* [abc:iview] Fix extraction (#16704)
+ [crackle] Add support for sonycrackle.com (#16698)
+ [tvnet] Add support for tvnet.gov.vn (#15462)
* [nrk] Update API hosts and try all previously known ones (#16690)
* [wimp] Fix Youtube embeds extraction
version 2018.06.11
Extractors
* [npo] Extend URL regular expression and add support for npostart.nl (#16682)
+ [inc] Add support for another embed schema (#16666)
* [tv4] Fix format extraction (#16650)
+ [nexx] Add support for free cdn (#16538)
+ [pbs] Add another cove id pattern (#15373)
+ [rbmaradio] Add support for 192k format (#16631)
version 2018.06.04
Extractors
+ [camtube] Add support for camtube.co
+ [twitter:card] Extract guest token (#16609)
+ [chaturbate] Use geo verification headers
+ [bbc] Add support for bbcthree (#16612)
* [youtube] Move metadata extraction after video availability check
+ [youtube] Extract track and artist
+ [safari] Add support for new URL schema (#16614)
* [adn] Fix extraction
version 2018.06.02
Core
* [utils] Improve determine_ext
Extractors
+ [facebook] Add support for tahoe player videos (#15441, #16554)
* [cbc] Improve extraction (#16583, #16593)
* [openload] Improve ext extraction (#16595)
+ [twitter:card] Add support for another endpoint (#16586)
+ [openload] Add support for oload.win and oload.download (#16592)
* [audimedia] Fix extraction (#15309)
+ [francetv] Add support for sport.francetvinfo.fr (#15645)
* [mlb] Improve extraction (#16587)
- [nhl] Remove old extractors
* [rbmaradio] Check formats availability (#16585)
version 2018.05.30
Core
* [downloader/rtmp] Generalize download messages and report time elapsed
on finish
* [downloader/rtmp] Gracefully handle live streams interrupted by user
Extractors
* [teamcoco] Fix extraction for full episodes (#16573)
* [spiegel] Fix info extraction (#16538)
+ [apa] Add support for apa.at (#15041, #15672)
+ [bellmedia] Add support for bnnbloomberg.ca (#16560)
+ [9c9media] Extract MPD formats and subtitles
* [cammodels] Use geo verification headers
+ [ufctv] Add support for authentication (#16542)
+ [cammodels] Add support for cammodels.com (#14499)
* [utils] Fix style id extraction for namespaced id attribute in dfxp2srt
(#16551)
* [soundcloud] Detect format extension (#16549)
* [cbc] Fix playlist title extraction (#16502)
+ [tumblr] Detect and report sensitive media (#13829)
+ [tumblr] Add support for authentication (#15133)
version 2018.05.26
Core
* [utils] Improve parse_age_limit
Extractors
* [audiomack] Stringify video id (#15310)
* [izlesene] Fix extraction (#16233, #16271, #16407)
+ [indavideo] Add support for generic embeds (#11989)
* [indavideo] Fix extraction (#11221)
* [indavideo] Sign download URLs (#16174)
+ [peertube] Add support for PeerTube based sites (#16301, #16329)
* [imgur] Fix extraction (#16537)
+ [hidive] Add support for authentication (#16534)
+ [nbc] Add support for stream.nbcsports.com (#13911)
+ [viewlift] Add support for hoichoi.tv (#16536)
* [go90] Extract age limit and detect DRM protection(#10127)
* [viewlift] fix extraction for snagfilms.com (#15766)
* [globo] Improve extraction (#4189)
* Add support for authentication
* Simplify URL signing
* Extract DASH and MSS formats
* [leeco] Fix extraction (#16464)
* [teamcoco] Add fallback for format extraction (#16484)
* [teamcoco] Improve URL regular expression (#16484)
* [imdb] Improve extraction (#4085, #14557)
version 2018.05.18
Extractors
* [vimeo:likes] Relax URL regular expression and fix single page likes
extraction (#16475)
* [pluralsight] Fix clip id extraction (#16460)
+ [mychannels] Add support for mychannels.com (#15334)
- [moniker] Remove extractor (#15336)
* [pbs] Fix embed data extraction (#16474)
+ [mtv] Add support for paramountnetwork.com and bellator.com (#15418)
* [youtube] Fix hd720 format position
* [dailymotion] Remove fragment part from m3u8 URLs (#8915)
* [3sat] Improve extraction (#15350)
* Extract all formats
* Extract more format metadata
* Improve format sorting
* Use hls native downloader
* Detect and bypass geo-restriction
+ [dtube] Add support for d.tube (#15201)
* [options] Fix typo (#16450)
* [youtube] Improve format filesize extraction (#16453)
* [youtube] Make uploader extraction non fatal (#16444)
* [youtube] Fix extraction for embed restricted live streams (#16433)
* [nbc] Improve info extraction (#16440)
* [twitch:clips] Fix extraction (#16429)
* [redditr] Relax URL regular expression (#16426, #16427)
* [mixcloud] Bypass throttling for HTTP formats (#12579, #16424)
+ [nick] Add support for nickjr.de (#13230)
* [teamcoco] Fix extraction (#16374)
version 2018.05.09
Core
* [YoutubeDL] Ensure ext exists for automatic captions
* Introduce --geo-bypass-ip-block
Extractors
+ [udemy] Extract asset captions
+ [udemy] Extract stream URLs (#16372)
+ [businessinsider] Add support for businessinsider.com (#16387, #16388, #16389)
+ [cloudflarestream] Add support for cloudflarestream.com (#16375)
* [watchbox] Fix extraction (#16356)
* [discovery] Extract Affiliate/Anonymous Auth Token from cookies (#14954)
+ [itv:btcc] Add support for itv.com/btcc (#16139)
* [tunein] Use live title for live streams (#16347)
* [itv] Improve extraction (#16253)
version 2018.05.01
Core
* [downloader/fragment] Restart download if .ytdl file is corrupt (#16312)
+ [extractor/common] Extract interaction statistic
+ [utils] Add merge_dicts
+ [extractor/common] Add _download_json_handle
Extractors
* [kaltura] Improve iframe embeds detection (#16337)
+ [udemy] Extract outputs renditions (#16289, #16291, #16320, #16321, #16334,
#16335)
+ [zattoo] Add support for zattoo.com and mobiltv.quickline.com (#14668, #14676)
* [yandexmusic] Convert release_year to int
* [udemy] Override _download_webpage_handle instead of _download_webpage
* [xiami] Override _download_webpage_handle instead of _download_webpage
* [yandexmusic] Override _download_webpage_handle instead of _download_webpage
* [youtube] Correctly disable polymer on all requests (#16323, #16326)
* [generic] Prefer enclosures over links in RSS feeds (#16189)
+ [redditr] Add support for old.reddit.com URLs (#16274)
* [nrktv] Update API host (#16324)
+ [imdb] Extract all formats (#16249)
+ [vimeo] Extract JSON-LD (#16295)
* [funk:channel] Improve extraction (#16285)
version 2018.04.25
Core
* [utils] Fix match_str for boolean meta fields
+ [Makefile] Add support for pandoc 2 and disable smart extension (#16251)
* [YoutubeDL] Fix typo in media extension compatibility checker (#16215)
Extractors
+ [openload] Recognize IPv6 stream URLs (#16136, #16137, #16205, #16246,
#16250)
+ [twitch] Extract is_live according to status (#16259)
* [pornflip] Relax URL regular expression (#16258)
- [etonline] Remove extractor (#16256)
* [breakcom] Fix extraction (#16254)
+ [youtube] Add ability to authenticate with cookies
* [youtube:feed] Implement lazy playlist extraction (#10184)
+ [svt] Add support for TV channel live streams (#15279, #15809)
* [ccma] Fix video extraction (#15931)
* [rentv] Fix extraction (#15227)
+ [nick] Add support for nickjr.nl (#16230)
* [extremetube] Fix metadata extraction
+ [keezmovies] Add support for generic embeds (#16134, #16154)
* [nexx] Extract new azure URLs (#16223)
* [cbssports] Fix extraction (#16217)
* [kaltura] Improve embeds detection (#16201)
* [instagram:user] Fix extraction (#16119)
* [cbs] Skip DRM asset types (#16104)
version 2018.04.16
Extractors
* [smotri:broadcast] Fix extraction (#16180)
+ [picarto] Add support for picarto.tv (#6205, #12514, #15276, #15551)
* [vine:user] Fix extraction (#15514, #16190)
* [pornhub] Relax URL regular expression (#16165)
* [cbc:watch] Re-acquire device token when expired (#16160)
+ [fxnetworks] Add support for https theplatform URLs (#16125, #16157)
+ [instagram:user] Add request signing (#16119)
+ [twitch] Add support for mobile URLs (#16146)
version 2018.04.09
Core
* [YoutubeDL] Do not save/restore console title while simulate (#16103)
* [extractor/common] Relax JSON-LD context check (#16006)
Extractors
+ [generic] Add support for tube8 embeds
+ [generic] Add support for share-videos.se embeds (#16089, #16115)
* [odnoklassniki] Extend URL regular expression (#16081)
* [steam] Bypass mature content check (#16113)
+ [acast] Extract more metadata
* [acast] Fix extraction (#16118)
* [instagram:user] Fix extraction (#16119)
* [drtuber] Fix title extraction (#16107, #16108)
* [liveleak] Extend URL regular expression (#16117)
+ [openload] Add support for oload.xyz
* [openload] Relax stream URL regular expression
* [openload] Fix extraction (#16099)
+ [svtplay:series] Add support for season URLs
+ [svtplay:series] Add support for series (#11130, #16059)
version 2018.04.03
Extractors
+ [tvnow] Add support for shows (#15837)
* [dramafever] Fix authentication (#16067)
* [afreecatv] Use partial view only when necessary (#14450)
+ [afreecatv] Add support for authentication (#14450)
+ [nationalgeographic] Add support for new URL schema (#16001, #16054)
* [xvideos] Fix thumbnail extraction (#15978, #15979)
* [medialaan] Fix vod id (#16038)
+ [openload] Add support for oload.site (#16039)
* [naver] Fix extraction (#16029)
* [dramafever] Partially switch to API v5 (#16026)
* [abc:iview] Unescape title and series meta fields (#15994)
* [videa] Extend URL regular expression (#16003)
version 2018.03.26.1
Core

View File

@@ -14,6 +14,9 @@ PYTHON ?= /usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)
# set markdown input format to "markdown-smart" for pandoc version 2 and to "markdown" for pandoc prior to version 2
MARKDOWN = $(shell if [ `pandoc -v | head -n1 | cut -d" " -f2 | head -c1` = "2" ]; then echo markdown-smart; else echo markdown; fi)
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
install -d $(DESTDIR)$(BINDIR)
install -m 755 youtube-dl $(DESTDIR)$(BINDIR)
@@ -82,11 +85,11 @@ supportedsites:
$(PYTHON) devscripts/make_supportedsites.py docs/supportedsites.md
README.txt: README.md
pandoc -f markdown -t plain README.md -o README.txt
pandoc -f $(MARKDOWN) -t plain README.md -o README.txt
youtube-dl.1: README.md
$(PYTHON) devscripts/prepare_manpage.py youtube-dl.1.temp.md
pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1
pandoc -s -f $(MARKDOWN) -t man youtube-dl.1.temp.md -o youtube-dl.1
rm -f youtube-dl.1.temp.md
youtube-dl.bash-completion: youtube_dl/*.py youtube_dl/*/*.py devscripts/bash-completion.in

View File

@@ -93,8 +93,8 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
## Network Options:
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy.
To enable experimental SOCKS proxy, specify
a proper scheme. For example
To enable SOCKS proxy, specify a proper
scheme. For example
socks5://127.0.0.1:1080/. Pass in an empty
string (--proxy "") for direct connection
--socket-timeout SECONDS Time to wait before giving up, in seconds
@@ -106,16 +106,18 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--geo-verification-proxy URL Use this proxy to verify the IP address for
some geo-restricted sites. The default
proxy specified by --proxy (or none, if the
options is not present) is used for the
option is not present) is used for the
actual downloading.
--geo-bypass Bypass geographic restriction via faking
X-Forwarded-For HTTP header (experimental)
X-Forwarded-For HTTP header
--no-geo-bypass Do not bypass geographic restriction via
faking X-Forwarded-For HTTP header
(experimental)
--geo-bypass-country CODE Force bypass geographic restriction with
explicitly provided two-letter ISO 3166-2
country code (experimental)
country code
--geo-bypass-ip-block IP_BLOCK Force bypass geographic restriction with
explicitly provided IP block in CIDR
notation
## Video Selection:
--playlist-start NUMBER Playlist video to start at (default is 1)
@@ -206,7 +208,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--playlist-reverse Download playlist videos in reverse order
--playlist-random Download playlist videos in random order
--xattr-set-filesize Set file xattribute ytdl.filesize with
expected file size (experimental)
expected file size
--hls-prefer-native Use the native HLS downloader instead of
ffmpeg
--hls-prefer-ffmpeg Use ffmpeg instead of the native HLS
@@ -223,7 +225,9 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
## Filesystem Options:
-a, --batch-file FILE File containing URLs to download ('-' for
stdin)
stdin), one URL per line. Lines starting
with '#', ';' or ']' are considered as
comments and ignored.
--id Use only video ID in file name
-o, --output TEMPLATE Output filename template, see the "OUTPUT
TEMPLATE" for all the info

View File

@@ -1,27 +1,22 @@
#!/usr/bin/env python3
from __future__ import unicode_literals
import hashlib
import urllib.request
import json
versions_info = json.load(open('update/versions.json'))
version = versions_info['latest']
URL = versions_info['versions'][version]['bin'][0]
data = urllib.request.urlopen(URL).read()
version_dict = versions_info['versions'][version]
# Read template page
with open('download.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read()
sha256sum = hashlib.sha256(data).hexdigest()
template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_URL@', URL)
template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0])
template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1])
template = template.replace('@TAR_URL@', versions_info['versions'][version]['tar'][0])
template = template.replace('@TAR_SHA256SUM@', versions_info['versions'][version]['tar'][1])
template = template.replace('@PROGRAM_URL@', version_dict['bin'][0])
template = template.replace('@PROGRAM_SHA256SUM@', version_dict['bin'][1])
template = template.replace('@EXE_URL@', version_dict['exe'][0])
template = template.replace('@EXE_SHA256SUM@', version_dict['exe'][1])
template = template.replace('@TAR_URL@', version_dict['tar'][0])
template = template.replace('@TAR_SHA256SUM@', version_dict['tar'][1])
with open('download.html', 'w', encoding='utf-8') as dlf:
dlf.write(template)

View File

@@ -13,7 +13,7 @@ year = str(datetime.datetime.now().year)
for fn in glob.glob('*.html*'):
with io.open(fn, encoding='utf-8') as f:
content = f.read()
newc = re.sub(r'(?P<copyright>Copyright © 2006-)(?P<year>[0-9]{4})', 'Copyright © 2006-' + year, content)
newc = re.sub(r'(?P<copyright>Copyright © 2011-)(?P<year>[0-9]{4})', 'Copyright © 2011-' + year, content)
if content != newc:
tmpFn = fn + '.part'
with io.open(tmpFn, 'wt', encoding='utf-8') as outf:

View File

@@ -15,7 +15,6 @@
- **8tracks**
- **91porn**
- **9c9media**
- **9c9media:stack**
- **9gag**
- **9now.com.au**
- **abc.net.au**
@@ -48,6 +47,7 @@
- **anitube.se**
- **Anvato**
- **AnySex**
- **APA**
- **Aparat**
- **AppleConnect**
- **AppleDaily**: 臺灣蘋果日報
@@ -100,6 +100,7 @@
- **Beatport**
- **Beeg**
- **BehindKink**
- **Bellator**
- **BellMedia**
- **Bet**
- **Bigflix**
@@ -122,10 +123,13 @@
- **BRMediathek**: Bayerischer Rundfunk Mediathek
- **bt:article**: Bergens Tidende Articles
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BusinessInsider**
- **BuzzFeed**
- **BYUtv**
- **Camdemy**
- **CamdemyFolder**
- **CamModels**
- **CamTube**
- **CamWithHer**
- **canalc2.tv**
- **Canalplus**: mycanal.fr and piwiplus.fr
@@ -163,6 +167,7 @@
- **ClipRs**
- **Clipsyndicate**
- **CloserToTruth**
- **CloudflareStream**
- **cloudtime**: CloudTime
- **Cloudy**
- **Clubic**
@@ -232,6 +237,7 @@
- **DrTuber**
- **drtv**
- **drtv:live**
- **DTube**
- **Dumpert**
- **dvtv**: http://video.aktualne.cz/
- **dw**
@@ -257,7 +263,6 @@
- **ESPN**
- **ESPNArticle**
- **EsriVideo**
- **ETOnline**
- **Europa**
- **EveryonesMixtape**
- **ExpoTV**
@@ -362,7 +367,6 @@
- **ImgurAlbum**
- **Ina**
- **Inc**
- **Indavideo**
- **IndavideoEmbed**
- **InfoQ**
- **Instagram**
@@ -374,6 +378,7 @@
- **Ir90Tv**
- **ITTF**
- **ITV**
- **ITVBTCC**
- **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV
@@ -446,7 +451,6 @@
- **mailru**: Видео@Mail.Ru
- **mailru:music**: Музыка@Mail.Ru
- **mailru:music:search**: Музыка@Mail.Ru
- **MakersChannel**
- **MakerTV**
- **mangomolo:live**
- **mangomolo:video**
@@ -484,7 +488,6 @@
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex**
- **Mojvideo**
- **Moniker**: allmyvideos.net and vidspot.net
- **Morningstar**: morningstar.com
- **Motherless**
- **MotherlessGroup**
@@ -506,6 +509,7 @@
- **mva:course**: Microsoft Virtual Academy courses
- **Mwave**
- **MwaveMeetGreet**
- **MyChannels**
- **MySpace**
- **MySpace:album**
- **MySpass**
@@ -523,6 +527,7 @@
- **nbcolympics**
- **nbcolympics:stream**
- **NBCSports**
- **NBCSportsStream**
- **NBCSportsVPlayer**
- **ndr**: NDR.de - Norddeutscher Rundfunk
- **ndr:embed**
@@ -549,9 +554,6 @@
- **nfl.com**
- **NhkVod**
- **nhl.com**
- **nhl.com:news**: NHL news
- **nhl.com:videocenter**
- **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com**
- **nick.de**
- **nickelodeon:br**
@@ -616,11 +618,13 @@
- **PacktPubCourse**
- **PandaTV**: 熊猫TV
- **pandora.tv**: 판도라TV
- **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos
- **Patreon**
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag**
- **PearVideo**
- **PeerTube**
- **People**
- **PerformGroup**
- **periscope**: Periscope
@@ -628,6 +632,8 @@
- **PhilharmonieDeParis**: Philharmonie de Paris
- **phoenix.de**
- **Photobucket**
- **Picarto**
- **PicartoVod**
- **Piksel**
- **Pinkbike**
- **Pladform**
@@ -666,6 +672,8 @@
- **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜
- **Quickline**
- **QuicklineLive**
- **R7**
- **R7Article**
- **radio.de**
@@ -783,7 +791,7 @@
- **Spiegel**
- **Spiegel:Article**: Articles on spiegel.de
- **Spiegeltv**
- **Spike**
- **sport.francetvinfo.fr**
- **Sport5**
- **SportBoxEmbed**
- **SportDeutschland**
@@ -804,6 +812,7 @@
- **SunPorno**
- **SVT**
- **SVTPlay**: SVT Play and Öppet arkiv
- **SVTSeries**
- **SWRMediathek**
- **Syfy**
- **SztvHu**
@@ -884,9 +893,11 @@
- **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com**
- **TVN24**
- **TVNet**
- **TVNoe**
- **TVNow**
- **TVNowList**
- **TVNowShow**
- **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
- **tvp:series**
@@ -1089,6 +1100,8 @@
- **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication)
- **Zapiks**
- **Zaq1**
- **Zattoo**
- **ZattooLive**
- **ZDF**
- **ZDFChannel**
- **zingmp3**: mp3.zing.vn

View File

@@ -2,5 +2,5 @@
universal = True
[flake8]
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git,venv
ignore = E402,E501,E731,E741

View File

@@ -232,7 +232,7 @@ class TestNPOSubtitles(BaseTestSubtitles):
class TestMTVSubtitles(BaseTestSubtitles):
url = 'http://www.cc.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother'
url = 'http://www.cc.com/video-clips/p63lk0/adam-devine-s-house-party-chasing-white-swans'
IE = ComedyCentralIE
def getInfoDict(self):
@@ -243,7 +243,7 @@ class TestMTVSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['en']))
self.assertEqual(md5(subtitles['en']), 'b9f6ca22a6acf597ec76f61749765e65')
self.assertEqual(md5(subtitles['en']), '78206b8d8a0cfa9da64dc026eea48961')
class TestNRKSubtitles(BaseTestSubtitles):

View File

@@ -42,6 +42,7 @@ from youtube_dl.utils import (
is_html,
js_to_json,
limit_length,
merge_dicts,
mimetype2ext,
month_by_name,
multipart_encode,
@@ -360,6 +361,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(determine_ext('http://example.com/foo/bar.nonext/?download', None), None)
self.assertEqual(determine_ext('http://example.com/foo/bar/mp4?download', None), None)
self.assertEqual(determine_ext('http://example.com/foo/bar.m3u8//?download'), 'm3u8')
self.assertEqual(determine_ext('foobar', None), None)
def test_find_xpath_attr(self):
testxml = '''<root>
@@ -518,6 +520,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_age_limit('PG-13'), 13)
self.assertEqual(parse_age_limit('TV-14'), 14)
self.assertEqual(parse_age_limit('TV-MA'), 17)
self.assertEqual(parse_age_limit('TV14'), 14)
self.assertEqual(parse_age_limit('TV_G'), 0)
def test_parse_duration(self):
self.assertEqual(parse_duration(None), None)
@@ -669,6 +673,17 @@ class TestUtil(unittest.TestCase):
self.assertEqual(dict_get(d, ('b', 'c', key, )), None)
self.assertEqual(dict_get(d, ('b', 'c', key, ), skip_false_values=False), false_value)
def test_merge_dicts(self):
self.assertEqual(merge_dicts({'a': 1}, {'b': 2}), {'a': 1, 'b': 2})
self.assertEqual(merge_dicts({'a': 1}, {'a': 2}), {'a': 1})
self.assertEqual(merge_dicts({'a': 1}, {'a': None}), {'a': 1})
self.assertEqual(merge_dicts({'a': 1}, {'a': ''}), {'a': 1})
self.assertEqual(merge_dicts({'a': 1}, {}), {'a': 1})
self.assertEqual(merge_dicts({'a': None}, {'a': 1}), {'a': 1})
self.assertEqual(merge_dicts({'a': ''}, {'a': 1}), {'a': ''})
self.assertEqual(merge_dicts({'a': ''}, {'a': 'abc'}), {'a': 'abc'})
self.assertEqual(merge_dicts({'a': None}, {'a': ''}, {'a': 'abc'}), {'a': 'abc'})
def test_encode_compat_str(self):
self.assertEqual(encode_compat_str(b'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82', 'utf-8'), 'тест')
self.assertEqual(encode_compat_str('тест', 'utf-8'), 'тест')
@@ -1072,6 +1087,18 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
self.assertFalse(match_str(
'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 190, 'dislike_count': 10}))
self.assertTrue(match_str('is_live', {'is_live': True}))
self.assertFalse(match_str('is_live', {'is_live': False}))
self.assertFalse(match_str('is_live', {'is_live': None}))
self.assertFalse(match_str('is_live', {}))
self.assertFalse(match_str('!is_live', {'is_live': True}))
self.assertTrue(match_str('!is_live', {'is_live': False}))
self.assertTrue(match_str('!is_live', {'is_live': None}))
self.assertTrue(match_str('!is_live', {}))
self.assertTrue(match_str('title', {'title': 'abc'}))
self.assertTrue(match_str('title', {'title': ''}))
self.assertFalse(match_str('!title', {'title': 'abc'}))
self.assertFalse(match_str('!title', {'title': ''}))
def test_parse_dfxp_time_expr(self):
self.assertEqual(parse_dfxp_time_expr(None), None)

View File

@@ -61,7 +61,7 @@ class TestYoutubeLists(unittest.TestCase):
dl = FakeYDL()
dl.params['extract_flat'] = True
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
result = ie.extract('https://www.youtube.com/playlist?list=PL-KKIb8rvtMSrAO9YFbeM6UQrAqoFTUWv')
self.assertIsPlaylist(result)
for entry in result['entries']:
self.assertTrue(entry.get('title'))

View File

@@ -211,7 +211,7 @@ class YoutubeDL(object):
At the moment, this is only supported by YouTube.
proxy: URL of the proxy server to use
geo_verification_proxy: URL of the proxy to use for IP address verification
on geo-restricted sites. (Experimental)
on geo-restricted sites.
socket_timeout: Time to wait for unresponsive hosts, in seconds
bidi_workaround: Work around buggy terminals without bidirectional text
support, using fridibi
@@ -259,7 +259,7 @@ class YoutubeDL(object):
- "warn": only emit a warning
- "detect_or_warn": check whether we can do anything
about it, warn otherwise (default)
source_address: (Experimental) Client-side IP address to bind to.
source_address: Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the
youtube-dl servers for debugging.
sleep_interval: Number of seconds to sleep before each download when
@@ -281,11 +281,14 @@ class YoutubeDL(object):
match_filter_func in utils.py is one example for this.
no_color: Do not emit color codes in output.
geo_bypass: Bypass geographic restriction via faking X-Forwarded-For
HTTP header (experimental)
HTTP header
geo_bypass_country:
Two-letter ISO 3166-2 country code that will be used for
explicit geographic restriction bypassing via faking
X-Forwarded-For HTTP header (experimental)
X-Forwarded-For HTTP header
geo_bypass_ip_block:
IP range in CIDR notation that will be used similarly to
geo_bypass_country
The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call.
@@ -532,6 +535,8 @@ class YoutubeDL(object):
def save_console_title(self):
if not self.params.get('consoletitle', False):
return
if self.params.get('simulate', False):
return
if compat_os_name != 'nt' and 'TERM' in os.environ:
# Save the title on stack
self._write_string('\033[22;0t', self._screen_file)
@@ -539,6 +544,8 @@ class YoutubeDL(object):
def restore_console_title(self):
if not self.params.get('consoletitle', False):
return
if self.params.get('simulate', False):
return
if compat_os_name != 'nt' and 'TERM' in os.environ:
# Restore the title from stack
self._write_string('\033[23;0t', self._screen_file)
@@ -1475,23 +1482,28 @@ class YoutubeDL(object):
if info_dict.get('%s_number' % field) is not None and not info_dict.get(field):
info_dict[field] = '%s %d' % (field.capitalize(), info_dict['%s_number' % field])
for cc_kind in ('subtitles', 'automatic_captions'):
cc = info_dict.get(cc_kind)
if cc:
for _, subtitle in cc.items():
for subtitle_format in subtitle:
if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if subtitle_format.get('ext') is None:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
automatic_captions = info_dict.get('automatic_captions')
subtitles = info_dict.get('subtitles')
if subtitles:
for _, subtitle in subtitles.items():
for subtitle_format in subtitle:
if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if subtitle_format.get('ext') is None:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
if self.params.get('listsubtitles', False):
if 'automatic_captions' in info_dict:
self.list_subtitles(info_dict['id'], info_dict.get('automatic_captions'), 'automatic captions')
self.list_subtitles(
info_dict['id'], automatic_captions, 'automatic captions')
self.list_subtitles(info_dict['id'], subtitles, 'subtitles')
return
info_dict['requested_subtitles'] = self.process_subtitles(
info_dict['id'], subtitles,
info_dict.get('automatic_captions'))
info_dict['id'], subtitles, automatic_captions)
# We now pick which formats have to be downloaded
if info_dict.get('formats') is None:
@@ -1849,7 +1861,7 @@ class YoutubeDL(object):
def compatible_formats(formats):
video, audio = formats
# Check extension
video_ext, audio_ext = audio.get('ext'), video.get('ext')
video_ext, audio_ext = video.get('ext'), audio.get('ext')
if video_ext and audio_ext:
COMPATIBLE_EXTS = (
('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'ismv', 'isma'),

View File

@@ -430,6 +430,7 @@ def _real_main(argv=None):
'config_location': opts.config_location,
'geo_bypass': opts.geo_bypass,
'geo_bypass_country': opts.geo_bypass_country,
'geo_bypass_ip_block': opts.geo_bypass_ip_block,
# just for deprecation check
'autonumber': opts.autonumber if opts.autonumber is True else None,
'usetitle': opts.usetitle if opts.usetitle is True else None,

View File

@@ -45,7 +45,6 @@ class FileDownloader(object):
min_filesize: Skip files smaller than this size
max_filesize: Skip files larger than this size
xattr_set_filesize: Set ytdl.filesize user xattribute with expected size.
(experimental)
external_downloader_args: A list of additional command-line arguments for the
external downloader.
hls_use_mpegts: Use the mpegts container for HLS videos.

View File

@@ -74,9 +74,14 @@ class FragmentFD(FileDownloader):
return not ctx['live'] and not ctx['tmpfilename'] == '-'
def _read_ytdl_file(self, ctx):
assert 'ytdl_corrupt' not in ctx
stream, _ = sanitize_open(self.ytdl_filename(ctx['filename']), 'r')
ctx['fragment_index'] = json.loads(stream.read())['downloader']['current_fragment']['index']
stream.close()
try:
ctx['fragment_index'] = json.loads(stream.read())['downloader']['current_fragment']['index']
except Exception:
ctx['ytdl_corrupt'] = True
finally:
stream.close()
def _write_ytdl_file(self, ctx):
frag_index_stream, _ = sanitize_open(self.ytdl_filename(ctx['filename']), 'w')
@@ -158,11 +163,17 @@ class FragmentFD(FileDownloader):
if self.__do_ytdl_file(ctx):
if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))):
self._read_ytdl_file(ctx)
if ctx['fragment_index'] > 0 and resume_len == 0:
is_corrupt = ctx.get('ytdl_corrupt') is True
is_inconsistent = ctx['fragment_index'] > 0 and resume_len == 0
if is_corrupt or is_inconsistent:
message = (
'.ytdl file is corrupt' if is_corrupt else
'Inconsistent state of incomplete fragment download')
self.report_warning(
'Inconsistent state of incomplete fragment download. '
'Restarting from the beginning...')
'%s. Restarting from the beginning...' % message)
ctx['fragment_index'] = resume_len = 0
if 'ytdl_corrupt' in ctx:
del ctx['ytdl_corrupt']
self._write_ytdl_file(ctx)
else:
self._write_ytdl_file(ctx)

View File

@@ -217,10 +217,11 @@ class HttpFD(FileDownloader):
before = start # start measuring
def retry(e):
if ctx.tmpfilename != '-':
to_stdout = ctx.tmpfilename == '-'
if not to_stdout:
ctx.stream.close()
ctx.stream = None
ctx.resume_len = os.path.getsize(encodeFilename(ctx.tmpfilename))
ctx.resume_len = byte_counter if to_stdout else os.path.getsize(encodeFilename(ctx.tmpfilename))
raise RetryDownload(e)
while True:

View File

@@ -24,71 +24,78 @@ class RtmpFD(FileDownloader):
def real_download(self, filename, info_dict):
def run_rtmpdump(args):
start = time.time()
resume_percent = None
resume_downloaded_data_len = None
proc = subprocess.Popen(args, stderr=subprocess.PIPE)
cursor_in_new_line = True
proc_stderr_closed = False
while not proc_stderr_closed:
# read line from stderr
line = ''
while True:
char = proc.stderr.read(1)
if not char:
proc_stderr_closed = True
break
if char in [b'\r', b'\n']:
break
line += char.decode('ascii', 'replace')
if not line:
# proc_stderr_closed is True
continue
mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec \(([0-9]{1,2}\.[0-9])%\)', line)
if mobj:
downloaded_data_len = int(float(mobj.group(1)) * 1024)
percent = float(mobj.group(2))
if not resume_percent:
resume_percent = percent
resume_downloaded_data_len = downloaded_data_len
time_now = time.time()
eta = self.calc_eta(start, time_now, 100 - resume_percent, percent - resume_percent)
speed = self.calc_speed(start, time_now, downloaded_data_len - resume_downloaded_data_len)
data_len = None
if percent > 0:
data_len = int(downloaded_data_len * 100 / percent)
self._hook_progress({
'status': 'downloading',
'downloaded_bytes': downloaded_data_len,
'total_bytes_estimate': data_len,
'tmpfilename': tmpfilename,
'filename': filename,
'eta': eta,
'elapsed': time_now - start,
'speed': speed,
})
cursor_in_new_line = False
else:
# no percent for live streams
mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec', line)
def dl():
resume_percent = None
resume_downloaded_data_len = None
proc_stderr_closed = False
while not proc_stderr_closed:
# read line from stderr
line = ''
while True:
char = proc.stderr.read(1)
if not char:
proc_stderr_closed = True
break
if char in [b'\r', b'\n']:
break
line += char.decode('ascii', 'replace')
if not line:
# proc_stderr_closed is True
continue
mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec \(([0-9]{1,2}\.[0-9])%\)', line)
if mobj:
downloaded_data_len = int(float(mobj.group(1)) * 1024)
percent = float(mobj.group(2))
if not resume_percent:
resume_percent = percent
resume_downloaded_data_len = downloaded_data_len
time_now = time.time()
speed = self.calc_speed(start, time_now, downloaded_data_len)
eta = self.calc_eta(start, time_now, 100 - resume_percent, percent - resume_percent)
speed = self.calc_speed(start, time_now, downloaded_data_len - resume_downloaded_data_len)
data_len = None
if percent > 0:
data_len = int(downloaded_data_len * 100 / percent)
self._hook_progress({
'status': 'downloading',
'downloaded_bytes': downloaded_data_len,
'total_bytes_estimate': data_len,
'tmpfilename': tmpfilename,
'filename': filename,
'status': 'downloading',
'eta': eta,
'elapsed': time_now - start,
'speed': speed,
})
cursor_in_new_line = False
elif self.params.get('verbose', False):
if not cursor_in_new_line:
self.to_screen('')
cursor_in_new_line = True
self.to_screen('[rtmpdump] ' + line)
proc.wait()
else:
# no percent for live streams
mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec', line)
if mobj:
downloaded_data_len = int(float(mobj.group(1)) * 1024)
time_now = time.time()
speed = self.calc_speed(start, time_now, downloaded_data_len)
self._hook_progress({
'downloaded_bytes': downloaded_data_len,
'tmpfilename': tmpfilename,
'filename': filename,
'status': 'downloading',
'elapsed': time_now - start,
'speed': speed,
})
cursor_in_new_line = False
elif self.params.get('verbose', False):
if not cursor_in_new_line:
self.to_screen('')
cursor_in_new_line = True
self.to_screen('[rtmpdump] ' + line)
try:
dl()
finally:
proc.wait()
if not cursor_in_new_line:
self.to_screen('')
return proc.returncode
@@ -163,7 +170,15 @@ class RtmpFD(FileDownloader):
RD_INCOMPLETE = 2
RD_NO_CONNECT = 3
retval = run_rtmpdump(args)
started = time.time()
try:
retval = run_rtmpdump(args)
except KeyboardInterrupt:
if not info_dict.get('is_live'):
raise
retval = RD_SUCCESS
self.to_screen('\n[rtmpdump] Interrupted by user')
if retval == RD_NO_CONNECT:
self.report_error('[rtmpdump] Could not connect to RTMP server.')
@@ -171,7 +186,7 @@ class RtmpFD(FileDownloader):
while retval in (RD_INCOMPLETE, RD_FAILED) and not test and not live:
prevsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen('[rtmpdump] %s bytes' % prevsize)
self.to_screen('[rtmpdump] Downloaded %s bytes' % prevsize)
time.sleep(5.0) # This seems to be needed
args = basic_args + ['--resume']
if retval == RD_FAILED:
@@ -188,13 +203,14 @@ class RtmpFD(FileDownloader):
break
if retval == RD_SUCCESS or (test and retval == RD_INCOMPLETE):
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen('[rtmpdump] %s bytes' % fsize)
self.to_screen('[rtmpdump] Downloaded %s bytes' % fsize)
self.try_rename(tmpfilename, filename)
self._hook_progress({
'downloaded_bytes': fsize,
'total_bytes': fsize,
'filename': filename,
'status': 'finished',
'elapsed': time.time() - started,
})
return True
else:

View File

@@ -13,6 +13,7 @@ from ..utils import (
int_or_none,
parse_iso8601,
try_get,
unescapeHTML,
update_url_query,
)
@@ -104,21 +105,22 @@ class ABCIE(InfoExtractor):
class ABCIViewIE(InfoExtractor):
IE_NAME = 'abc.net.au:iview'
_VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://iview\.abc\.net\.au/(?:[^/]+/)*video/(?P<id>[^/?#]+)'
_GEO_COUNTRIES = ['AU']
# ABC iview programs are normally available for 14 days only.
_TESTS = [{
'url': 'http://iview.abc.net.au/programs/call-the-midwife/ZW0898A003S00',
'url': 'https://iview.abc.net.au/show/ben-and-hollys-little-kingdom/series/0/video/ZX9371A050S00',
'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
'info_dict': {
'id': 'ZW0898A003S00',
'id': 'ZX9371A050S00',
'ext': 'mp4',
'title': 'Series 5 Ep 3',
'description': 'md5:e0ef7d4f92055b86c4f33611f180ed79',
'upload_date': '20171228',
'uploader_id': 'abc1',
'timestamp': 1514499187,
'title': "Gaston's Birthday",
'series': "Ben And Holly's Little Kingdom",
'description': 'md5:f9de914d02f226968f598ac76f105bcf',
'upload_date': '20180604',
'uploader_id': 'abc4kids',
'timestamp': 1528140219,
},
'params': {
'skip_download': True,
@@ -127,17 +129,16 @@ class ABCIViewIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_params = self._parse_json(self._search_regex(
r'videoParams\s*=\s*({.+?});', webpage, 'video params'), video_id)
title = video_params.get('title') or video_params['seriesTitle']
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
video_params = self._download_json(
'https://iview.abc.net.au/api/programs/' + video_id, video_id)
title = unescapeHTML(video_params.get('title') or video_params['seriesTitle'])
stream = next(s for s in video_params['playlist'] if s.get('type') in ('program', 'livestream'))
house_number = video_params.get('episodeHouseNumber')
path = '/auth/hls/sign?ts={0}&hn={1}&d=android-mobile'.format(
house_number = video_params.get('episodeHouseNumber') or video_id
path = '/auth/hls/sign?ts={0}&hn={1}&d=android-tablet'.format(
int(time.time()), house_number)
sig = hmac.new(
'android.content.res.Resources'.encode('utf-8'),
b'android.content.res.Resources',
path.encode('utf-8'), hashlib.sha256).hexdigest()
token = self._download_webpage(
'http://iview.abc.net.au{0}&sig={1}'.format(path, sig), video_id)
@@ -167,18 +168,26 @@ class ABCIViewIE(InfoExtractor):
'ext': 'vtt',
}]
is_live = video_params.get('livestream') == '1'
if is_live:
title = self._live_title(title)
return {
'id': video_id,
'title': title,
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage),
'thumbnail': self._html_search_meta(['og:image', 'twitter:image:src'], webpage),
'description': video_params.get('description'),
'thumbnail': video_params.get('thumbnail'),
'duration': int_or_none(video_params.get('eventDuration')),
'timestamp': parse_iso8601(video_params.get('pubDate'), ' '),
'series': video_params.get('seriesTitle'),
'series': unescapeHTML(video_params.get('seriesTitle')),
'series_id': video_params.get('seriesHouseNumber') or video_id[:7],
'episode_number': int_or_none(self._html_search_meta('episodeNumber', webpage, default=None)),
'episode': self._html_search_meta('episode_title', webpage, default=None),
'season_number': int_or_none(self._search_regex(
r'\bSeries\s+(\d+)\b', title, 'season number', default=None)),
'episode_number': int_or_none(self._search_regex(
r'\bEp\s+(\d+)\b', title, 'episode number', default=None)),
'episode_id': house_number,
'uploader_id': video_params.get('channel'),
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
}

View File

@@ -7,7 +7,9 @@ import functools
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
float_or_none,
int_or_none,
try_get,
unified_timestamp,
OnDemandPagedList,
)
@@ -24,40 +26,58 @@ class ACastIE(InfoExtractor):
'id': '57de3baa-4bb0-487e-9418-2692c1277a34',
'ext': 'mp3',
'title': '"Where Are You?": Taipei 101, Taiwan',
'description': 'md5:a0b4ef3634e63866b542e5b1199a1a0e',
'timestamp': 1196172000,
'upload_date': '20071127',
'description': 'md5:a0b4ef3634e63866b542e5b1199a1a0e',
'duration': 211,
'creator': 'Concierge',
'series': 'Condé Nast Traveler Podcast',
'episode': '"Where Are You?": Taipei 101, Taiwan',
}
}, {
# test with multiple blings
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'md5': 'e87d5b8516cd04c0d81b6ee1caca28d0',
'md5': 'a02393c74f3bdb1801c3ec2695577ce0',
'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3',
'title': '2. Raggarmordet - Röster ur det förflutna',
'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4',
'timestamp': 1477346700,
'upload_date': '20161024',
'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4',
'duration': 2766,
'duration': 2766.602563,
'creator': 'Anton Berg & Martin Johnson',
'series': 'Spår',
'episode': '2. Raggarmordet - Röster ur det förflutna',
}
}]
def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups()
s = self._download_json(
'https://play-api.acast.com/stitch/%s/%s' % (channel, display_id),
display_id)['result']
media_url = s['url']
cast_data = self._download_json(
'https://play-api.acast.com/splash/%s/%s' % (channel, display_id), display_id)
e = cast_data['result']['episode']
'https://play-api.acast.com/splash/%s/%s' % (channel, display_id),
display_id)['result']
e = cast_data['episode']
title = e['name']
return {
'id': compat_str(e['id']),
'display_id': display_id,
'url': e['mediaUrl'],
'title': e['name'],
'description': e.get('description'),
'url': media_url,
'title': title,
'description': e.get('description') or e.get('summary'),
'thumbnail': e.get('image'),
'timestamp': unified_timestamp(e.get('publishingDate')),
'duration': int_or_none(e.get('duration')),
'duration': float_or_none(s.get('duration') or e.get('duration')),
'filesize': int_or_none(e.get('contentLength')),
'creator': try_get(cast_data, lambda x: x['show']['author'], compat_str),
'series': try_get(cast_data, lambda x: x['show']['name'], compat_str),
'season_number': int_or_none(e.get('seasonNumber')),
'episode': title,
'episode_number': int_or_none(e.get('episodeNumber')),
}

View File

@@ -1,8 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import binascii
import json
import os
import random
from .common import InfoExtractor
from ..aes import aes_cbc_decrypt
@@ -12,9 +15,12 @@ from ..compat import (
)
from ..utils import (
bytes_to_intlist,
bytes_to_long,
ExtractorError,
float_or_none,
intlist_to_bytes,
long_to_bytes,
pkcs1pad,
srt_subtitles_timecode,
strip_or_none,
urljoin,
@@ -35,6 +41,7 @@ class ADNIE(InfoExtractor):
}
}
_BASE_URL = 'http://animedigitalnetwork.fr'
_RSA_KEY = (0xc35ae1e4356b65a73b551493da94b8cb443491c0aa092a357a5aee57ffc14dda85326f42d716e539a34542a0d3f363adf16c5ec222d713d5997194030ee2e4f0d1fb328c01a81cf6868c090d50de8e169c6b13d1675b9eeed1cbc51e1fffca9b38af07f37abd790924cd3bee59d0257cfda4fe5f3f0534877e21ce5821447d1b, 65537)
def _get_subtitles(self, sub_path, video_id):
if not sub_path:
@@ -42,16 +49,14 @@ class ADNIE(InfoExtractor):
enc_subtitles = self._download_webpage(
urljoin(self._BASE_URL, sub_path),
video_id, fatal=False, headers={
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0',
})
video_id, fatal=False)
if not enc_subtitles:
return None
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(compat_b64decode(enc_subtitles[24:])),
bytes_to_intlist(b'\xc8\x6e\x06\xbc\xbe\xc6\x49\xf5\x88\x0d\xc8\x47\xc4\x27\x0c\x60'),
bytes_to_intlist(binascii.unhexlify(self._K + '9032ad7083106400')),
bytes_to_intlist(compat_b64decode(enc_subtitles[:24]))
))
subtitles_json = self._parse_json(
@@ -112,11 +117,24 @@ class ADNIE(InfoExtractor):
error = None
if not links:
links_url = player_config.get('linksurl') or options['videoUrl']
links_data = self._download_json(urljoin(
self._BASE_URL, links_url), video_id)
token = options['token']
self._K = ''.join([random.choice('0123456789abcdef') for _ in range(16)])
message = bytes_to_intlist(json.dumps({
'k': self._K,
'e': 60,
't': token,
}))
padded_message = intlist_to_bytes(pkcs1pad(message, 128))
n, e = self._RSA_KEY
encrypted_message = long_to_bytes(pow(bytes_to_long(padded_message), e, n))
authorization = base64.b64encode(encrypted_message).decode()
links_data = self._download_json(
urljoin(self._BASE_URL, links_url), video_id, headers={
'Authorization': 'Bearer ' + authorization,
})
links = links_data.get('links') or {}
metas = metas or links_data.get('meta') or {}
sub_path = sub_path or links_data.get('subtitles')
sub_path = (sub_path or links_data.get('subtitles')) + '&token=' + token
error = links_data.get('error')
title = metas.get('title') or video_info['title']

View File

@@ -9,6 +9,7 @@ from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
urlencode_postdata,
xpath_text,
)
@@ -28,6 +29,7 @@ class AfreecaTVIE(InfoExtractor):
)
(?P<id>\d+)
'''
_NETRC_MACHINE = 'afreecatv'
_TESTS = [{
'url': 'http://live.afreecatv.com:8079/app/index.cgi?szType=read_ucc_bbs&szBjId=dailyapril&nStationNo=16711924&nBbsNo=18605867&nTitleNo=36164052&szSkin=',
'md5': 'f72c89fe7ecc14c1b5ce506c4996046e',
@@ -139,22 +141,22 @@ class AfreecaTVIE(InfoExtractor):
'skip_download': True,
},
}, {
# adult video
'url': 'http://vod.afreecatv.com/PLAYER/STATION/26542731',
# PARTIAL_ADULT
'url': 'http://vod.afreecatv.com/PLAYER/STATION/32028439',
'info_dict': {
'id': '20171001_F1AE1711_196617479_1',
'id': '20180327_27901457_202289533_1',
'ext': 'mp4',
'title': '[생]서아 초심 찾기 방송 (part 1)',
'title': '[생]빨개요♥ (part 1)',
'thumbnail': 're:^https?://(?:video|st)img.afreecatv.com/.*$',
'uploader': 'BJ서아',
'uploader': '[SA]서아',
'uploader_id': 'bjdyrksu',
'upload_date': '20171001',
'duration': 3600,
'age_limit': 18,
'upload_date': '20180327',
'duration': 3601,
},
'params': {
'skip_download': True,
},
'expected_warnings': ['adult content'],
}, {
'url': 'http://www.afreecatv.com/player/Player.swf?szType=szBjId=djleegoon&nStationNo=11273158&nBbsNo=13161095&nTitleNo=36327652',
'only_matching': True,
@@ -172,6 +174,51 @@ class AfreecaTVIE(InfoExtractor):
video_key['part'] = int(m.group('part'))
return video_key
def _real_initialize(self):
self._login()
def _login(self):
username, password = self._get_login_info()
if username is None:
return
login_form = {
'szWork': 'login',
'szType': 'json',
'szUid': username,
'szPassword': password,
'isSaveId': 'false',
'szScriptVar': 'oLoginRet',
'szAction': '',
}
response = self._download_json(
'https://login.afreecatv.com/app/LoginAction.php', None,
'Logging in', data=urlencode_postdata(login_form))
_ERRORS = {
-4: 'Your account has been suspended due to a violation of our terms and policies.',
-5: 'https://member.afreecatv.com/app/user_delete_progress.php',
-6: 'https://login.afreecatv.com/membership/changeMember.php',
-8: "Hello! AfreecaTV here.\nThe username you have entered belongs to \n an account that requires a legal guardian's consent. \nIf you wish to use our services without restriction, \nplease make sure to go through the necessary verification process.",
-9: 'https://member.afreecatv.com/app/pop_login_block.php',
-11: 'https://login.afreecatv.com/afreeca/second_login.php',
-12: 'https://member.afreecatv.com/app/user_security.php',
0: 'The username does not exist or you have entered the wrong password.',
-1: 'The username does not exist or you have entered the wrong password.',
-3: 'You have entered your username/password incorrectly.',
-7: 'You cannot use your Global AfreecaTV account to access Korean AfreecaTV.',
-10: 'Sorry for the inconvenience. \nYour account has been blocked due to an unauthorized access. \nPlease contact our Help Center for assistance.',
-32008: 'You have failed to log in. Please contact our Help Center.',
}
result = int_or_none(response.get('RESULT'))
if result != 1:
error = _ERRORS.get(result, 'You have failed to log in.')
raise ExtractorError(
'Unable to login: %s said: %s' % (self.IE_NAME, error),
expected=True)
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -187,22 +234,42 @@ class AfreecaTVIE(InfoExtractor):
r'nBbsNo\s*=\s*(\d+)', webpage, 'bbs')
video_id = self._search_regex(
r'nTitleNo\s*=\s*(\d+)', webpage, 'title', default=video_id)
print(video_id, station_id, bbs_id)
video_xml = self._download_xml(
'http://afbbs.afreecatv.com:8080/api/video/get_video_info.php',
video_id, headers={
'Referer': url,
}, query={
partial_view = False
for _ in range(2):
query = {
'nTitleNo': video_id,
'nStationNo': station_id,
'nBbsNo': bbs_id,
'partialView': 'SKIP_ADULT',
})
}
if partial_view:
query['partialView'] = 'SKIP_ADULT'
video_xml = self._download_xml(
'http://afbbs.afreecatv.com:8080/api/video/get_video_info.php',
video_id, 'Downloading video info XML%s'
% (' (skipping adult)' if partial_view else ''),
video_id, headers={
'Referer': url,
}, query=query)
flag = xpath_text(video_xml, './track/flag', 'flag', default=None)
if flag and flag != 'SUCCEED':
flag = xpath_text(video_xml, './track/flag', 'flag', default=None)
if flag and flag == 'SUCCEED':
break
if flag == 'PARTIAL_ADULT':
self._downloader.report_warning(
'In accordance with local laws and regulations, underage users are restricted from watching adult content. '
'Only content suitable for all ages will be downloaded. '
'Provide account credentials if you wish to download restricted content.')
partial_view = True
continue
elif flag == 'ADULT':
error = 'Only users older than 19 are able to watch this video. Provide account credentials to download this content.'
else:
error = flag
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, flag), expected=True)
'%s said: %s' % (self.IE_NAME, error), expected=True)
else:
raise ExtractorError('Unable to download video info')
video_element = video_xml.findall(compat_xpath('./track/video'))[-1]
if video_element is None or video_element.text is None:

0
youtube_dl/extractor/americastestkitchen.py Executable file → Normal file
View File

View File

@@ -52,7 +52,7 @@ class AnimeOnDemandIE(InfoExtractor):
}]
def _login(self):
(username, password) = self._get_login_info()
username, password = self._get_login_info()
if username is None:
return

View File

@@ -277,7 +277,9 @@ class AnvatoIE(InfoExtractor):
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
mobj = re.match(self._VALID_URL, url)
access_key, video_id = mobj.group('access_key_or_mcp', 'id')

View File

@@ -0,0 +1,94 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
js_to_json,
)
class APAIE(InfoExtractor):
_VALID_URL = r'https?://[^/]+\.apa\.at/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'http://uvp.apa.at/embed/293f6d17-692a-44e3-9fd5-7b178f3a1029',
'md5': '2b12292faeb0a7d930c778c7a5b4759b',
'info_dict': {
'id': 'jjv85FdZ',
'ext': 'mp4',
'title': '"Blau ist mysteriös": Die Blue Man Group im Interview',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 254,
'timestamp': 1519211149,
'upload_date': '20180221',
},
}, {
'url': 'https://uvp-apapublisher.sf.apa.at/embed/2f94e9e6-d945-4db2-9548-f9a41ebf7b78',
'only_matching': True,
}, {
'url': 'http://uvp-rma.sf.apa.at/embed/70404cca-2f47-4855-bbb8-20b1fae58f76',
'only_matching': True,
}, {
'url': 'http://uvp-kleinezeitung.sf.apa.at/embed/f1c44979-dba2-4ebf-b021-e4cf2cac3c81',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//[^/]+\.apa\.at/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}.*?)\1',
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
jwplatform_id = self._search_regex(
r'media[iI]d\s*:\s*["\'](?P<id>[a-zA-Z0-9]{8})', webpage,
'jwplatform id', default=None)
if jwplatform_id:
return self.url_result(
'jwplatform:' + jwplatform_id, ie='JWPlatform',
video_id=video_id)
sources = self._parse_json(
self._search_regex(
r'sources\s*=\s*(\[.+?\])\s*;', webpage, 'sources'),
video_id, transform_source=js_to_json)
formats = []
for source in sources:
if not isinstance(source, dict):
continue
source_url = source.get('file')
if not source_url or not isinstance(source_url, compat_str):
continue
ext = determine_ext(source_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': source_url,
})
self._sort_formats(formats)
thumbnail = self._search_regex(
r'image\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='url')
return {
'id': video_id,
'title': video_id,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -74,7 +74,7 @@ class AtresPlayerIE(InfoExtractor):
self._login()
def _login(self):
(username, password) = self._get_login_info()
username, password = self._get_login_info()
if username is None:
return

View File

@@ -5,13 +5,12 @@ from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
sanitized_Request,
)
class AudiMediaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audi-mediacenter\.com/(?:en|de)/audimediatv/(?P<id>[^/?#]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?audi-mediacenter\.com/(?:en|de)/audimediatv/(?:video/)?(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.audi-mediacenter.com/en/audimediatv/60-seconds-of-audi-sport-104-2015-wec-bahrain-rookie-test-1467',
'md5': '79a8b71c46d49042609795ab59779b66',
'info_dict': {
@@ -24,41 +23,46 @@ class AudiMediaIE(InfoExtractor):
'duration': 74022,
'view_count': int,
}
}
# extracted from https://audimedia.tv/assets/embed/embedded-player.js (dataSourceAuthToken)
_AUTH_TOKEN = 'e25b42847dba18c6c8816d5d8ce94c326e06823ebf0859ed164b3ba169be97f2'
}, {
'url': 'https://www.audi-mediacenter.com/en/audimediatv/video/60-seconds-of-audi-sport-104-2015-wec-bahrain-rookie-test-2991',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
raw_payload = self._search_regex([
r'class="amtv-embed"[^>]+id="([^"]+)"',
r'class=\\"amtv-embed\\"[^>]+id=\\"([^"]+)\\"',
r'class="amtv-embed"[^>]+id="([0-9a-z-]+)"',
r'id="([0-9a-z-]+)"[^>]+class="amtv-embed"',
r'class=\\"amtv-embed\\"[^>]+id=\\"([0-9a-z-]+)\\"',
r'id=\\"([0-9a-z-]+)\\"[^>]+class=\\"amtv-embed\\"',
r'id=(?:\\)?"(amtve-[a-z]-\d+-[a-z]{2})',
], webpage, 'raw payload')
_, stage_mode, video_id, lang = raw_payload.split('-')
_, stage_mode, video_id, _ = raw_payload.split('-')
# TODO: handle s and e stage_mode (live streams and ended live streams)
if stage_mode not in ('s', 'e'):
request = sanitized_Request(
'https://audimedia.tv/api/video/v1/videos/%s?embed[]=video_versions&embed[]=thumbnail_image&where[content_language_iso]=%s' % (video_id, lang),
headers={'X-Auth-Token': self._AUTH_TOKEN})
json_data = self._download_json(request, video_id)['results']
video_data = self._download_json(
'https://www.audimedia.tv/api/video/v1/videos/' + video_id,
video_id, query={
'embed[]': ['video_versions', 'thumbnail_image'],
})['results']
formats = []
stream_url_hls = json_data.get('stream_url_hls')
stream_url_hls = video_data.get('stream_url_hls')
if stream_url_hls:
formats.extend(self._extract_m3u8_formats(
stream_url_hls, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
stream_url_hds = json_data.get('stream_url_hds')
stream_url_hds = video_data.get('stream_url_hds')
if stream_url_hds:
formats.extend(self._extract_f4m_formats(
stream_url_hds + '?hdcore=3.4.0',
video_id, f4m_id='hds', fatal=False))
for video_version in json_data.get('video_versions'):
for video_version in video_data.get('video_versions', []):
video_version_url = video_version.get('download_url') or video_version.get('stream_url')
if not video_version_url:
continue
@@ -79,11 +83,11 @@ class AudiMediaIE(InfoExtractor):
return {
'id': video_id,
'title': json_data['title'],
'description': json_data.get('subtitle'),
'thumbnail': json_data.get('thumbnail_image', {}).get('file'),
'timestamp': parse_iso8601(json_data.get('publication_date')),
'duration': int_or_none(json_data.get('duration')),
'view_count': int_or_none(json_data.get('view_count')),
'title': video_data['title'],
'description': video_data.get('subtitle'),
'thumbnail': video_data.get('thumbnail_image', {}).get('file'),
'timestamp': parse_iso8601(video_data.get('publication_date')),
'duration': int_or_none(video_data.get('duration')),
'view_count': int_or_none(video_data.get('view_count')),
'formats': formats,
}

View File

@@ -65,7 +65,7 @@ class AudiomackIE(InfoExtractor):
return {'_type': 'url', 'url': api_response['url'], 'ie_key': 'Soundcloud'}
return {
'id': api_response.get('id', album_url_tag),
'id': compat_str(api_response.get('id', album_url_tag)),
'uploader': api_response.get('artist'),
'title': api_response.get('title'),
'url': api_response['url'],

View File

@@ -44,7 +44,7 @@ class BambuserIE(InfoExtractor):
}
def _login(self):
(username, password) = self._get_login_info()
username, password = self._get_login_info()
if username is None:
return

View File

@@ -12,6 +12,7 @@ from ..utils import (
float_or_none,
get_element_by_class,
int_or_none,
js_to_json,
parse_duration,
parse_iso8601,
try_get,
@@ -772,6 +773,17 @@ class BBCIE(BBCCoUkIE):
# single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
'only_matching': True,
}, {
'url': 'https://www.bbc.co.uk/bbcthree/clip/73d0bbd0-abc3-4cea-b3c0-cdae21905eb1',
'info_dict': {
'id': 'p06556y7',
'ext': 'mp4',
'title': 'Transfers: Cristiano Ronaldo to Man Utd, Arsenal to spend?',
'description': 'md5:4b7dfd063d5a789a1512e99662be3ddd',
},
'params': {
'skip_download': True,
}
}]
@classmethod
@@ -994,6 +1006,36 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles,
}
bbc3_config = self._parse_json(
self._search_regex(
r'(?s)bbcthreeConfig\s*=\s*({.+?})\s*;\s*<', webpage,
'bbcthree config', default='{}'),
playlist_id, transform_source=js_to_json, fatal=False)
if bbc3_config:
bbc3_playlist = try_get(
bbc3_config, lambda x: x['payload']['content']['bbcMedia']['playlist'],
dict)
if bbc3_playlist:
playlist_title = bbc3_playlist.get('title') or playlist_title
thumbnail = bbc3_playlist.get('holdingImageURL')
entries = []
for bbc3_item in bbc3_playlist['items']:
programme_id = bbc3_item.get('versionID')
if not programme_id:
continue
formats, subtitles = self._download_media_selector(programme_id)
self._sort_formats(formats)
entries.append({
'id': programme_id,
'title': playlist_title,
'thumbnail': thumbnail,
'timestamp': timestamp,
'formats': formats,
'subtitles': subtitles,
})
return self.playlist_result(
entries, playlist_id, playlist_title, playlist_description)
def extract_all(pattern):
return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False),

View File

@@ -12,7 +12,7 @@ class BellMediaIE(InfoExtractor):
(?:
ctv|
tsn|
bnn|
bnn(?:bloomberg)?|
thecomedynetwork|
discovery|
discoveryvelocity|
@@ -27,17 +27,16 @@ class BellMediaIE(InfoExtractor):
much\.com
)/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
'md5': '36d3ef559cfe8af8efe15922cd3ce950',
'info_dict': {
'id': '706966',
'ext': 'mp4',
'title': 'Larry Day and Richard Jutras on the TIFF red carpet of \'Stonewall\'',
'description': 'etalk catches up with Larry Day and Richard Jutras on the TIFF red carpet of "Stonewall”.',
'upload_date': '20150919',
'timestamp': 1442624700,
'id': '1403070',
'ext': 'flv',
'title': 'David Cockfield\'s Top Picks',
'description': 'md5:810f7f8c6a83ad5b48677c3f8e5bb2c3',
'upload_date': '20180525',
'timestamp': 1527288600,
},
'expected_warnings': ['HTTP Error 404'],
}, {
'url': 'http://www.thecomedynetwork.ca/video/player?vid=923582',
'only_matching': True,
@@ -70,6 +69,7 @@ class BellMediaIE(InfoExtractor):
'investigationdiscovery': 'invdisc',
'animalplanet': 'aniplan',
'etalk': 'ctv',
'bnnbloomberg': 'bnn',
}
def _real_extract(self, url):

View File

@@ -117,9 +117,9 @@ class BiliBiliIE(InfoExtractor):
r'cid(?:["\']:|=)(\d+)', webpage, 'cid',
default=None
) or compat_parse_qs(self._search_regex(
[r'1EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
r'1EmbedPlayer\([^)]+,\s*\\"([^"]+)\\"\)',
r'1<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
[r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
r'EmbedPlayer\([^)]+,\s*\\"([^"]+)\\"\)',
r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
webpage, 'player parameters'))['cid'][0]
else:
if 'no_bangumi_tip' not in smuggled_data:

View File

@@ -3,15 +3,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..compat import compat_str
from ..utils import (
int_or_none,
parse_age_limit,
)
from ..utils import int_or_none
class BreakIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<site>break|screenjunkies)\.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)'
_VALID_URL = r'https?://(?:www\.)?break\.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)'
_TESTS = [{
'url': 'http://www.break.com/video/when-girls-act-like-guys-2468056',
'info_dict': {
@@ -19,125 +17,73 @@ class BreakIE(InfoExtractor):
'ext': 'mp4',
'title': 'When Girls Act Like D-Bags',
'age_limit': 13,
},
}, {
# youtube embed
'url': 'http://www.break.com/video/someone-forgot-boat-brakes-work',
'info_dict': {
'id': 'RrrDLdeL2HQ',
'ext': 'mp4',
'title': 'Whale Watching Boat Crashing Into San Diego Dock',
'description': 'md5:afc1b2772f0a8468be51dd80eb021069',
'upload_date': '20160331',
'uploader': 'Steve Holden',
'uploader_id': 'sdholden07',
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://www.screenjunkies.com/video/best-quentin-tarantino-movie-2841915',
'md5': '5c2b686bec3d43de42bde9ec047536b0',
'info_dict': {
'id': '2841915',
'display_id': 'best-quentin-tarantino-movie',
'ext': 'mp4',
'title': 'Best Quentin Tarantino Movie',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 3671,
'age_limit': 13,
'tags': list,
},
}, {
'url': 'http://www.screenjunkies.com/video/honest-trailers-the-dark-knight',
'info_dict': {
'id': '2348808',
'display_id': 'honest-trailers-the-dark-knight',
'ext': 'mp4',
'title': 'Honest Trailers - The Dark Knight',
'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'age_limit': 10,
'tags': list,
},
}, {
# requires subscription but worked around
'url': 'http://www.screenjunkies.com/video/knocking-dead-ep-1-the-show-so-far-3003285',
'info_dict': {
'id': '3003285',
'display_id': 'knocking-dead-ep-1-the-show-so-far',
'ext': 'mp4',
'title': 'State of The Dead Recap: Knocking Dead Pilot',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 3307,
'age_limit': 13,
'tags': list,
},
}, {
'url': 'http://www.break.com/video/ugc/baby-flex-2773063',
'only_matching': True,
}]
_DEFAULT_BITRATES = (48, 150, 320, 496, 864, 2240, 3264)
def _real_extract(self, url):
site, display_id, video_id = re.match(self._VALID_URL, url).groups()
display_id, video_id = re.match(self._VALID_URL, url).groups()
if not video_id:
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
(r'src=["\']/embed/(\d+)', r'data-video-content-id=["\'](\d+)'),
webpage, 'video id')
webpage = self._download_webpage(url, display_id)
webpage = self._download_webpage(
'http://www.%s.com/embed/%s' % (site, video_id),
display_id, 'Downloading video embed page')
embed_vars = self._parse_json(
youtube_url = YoutubeIE._extract_url(webpage)
if youtube_url:
return self.url_result(youtube_url, ie=YoutubeIE.ie_key())
content = self._parse_json(
self._search_regex(
r'(?s)embedVars\s*=\s*({.+?})\s*</script>', webpage, 'embed vars'),
r'(?s)content["\']\s*:\s*(\[.+?\])\s*[,\n]', webpage,
'content'),
display_id)
youtube_id = embed_vars.get('youtubeId')
if youtube_id:
return self.url_result(youtube_id, 'Youtube')
title = embed_vars['contentName']
formats = []
bitrates = []
for f in embed_vars.get('media', []):
if not f.get('uri') or f.get('mediaPurpose') != 'play':
for video in content:
video_url = video.get('url')
if not video_url or not isinstance(video_url, compat_str):
continue
bitrate = int_or_none(f.get('bitRate'))
if bitrate:
bitrates.append(bitrate)
bitrate = int_or_none(self._search_regex(
r'(\d+)_kbps', video_url, 'tbr', default=None))
formats.append({
'url': f['uri'],
'url': video_url,
'format_id': 'http-%d' % bitrate if bitrate else 'http',
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'tbr': bitrate,
'format': 'mp4',
})
if not bitrates:
# When subscriptionLevel > 0, i.e. plus subscription is required
# media list will be empty. However, hds and hls uris are still
# available. We can grab them assuming bitrates to be default.
bitrates = self._DEFAULT_BITRATES
auth_token = embed_vars.get('AuthToken')
def construct_manifest_url(base_url, ext):
pieces = [base_url]
pieces.extend([compat_str(b) for b in bitrates])
pieces.append('_kbps.mp4.%s?%s' % (ext, auth_token))
return ','.join(pieces)
if bitrates and auth_token:
hds_url = embed_vars.get('hdsUri')
if hds_url:
formats.extend(self._extract_f4m_formats(
construct_manifest_url(hds_url, 'f4m'),
display_id, f4m_id='hds', fatal=False))
hls_url = embed_vars.get('hlsUri')
if hls_url:
formats.extend(self._extract_m3u8_formats(
construct_manifest_url(hls_url, 'm3u8'),
display_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats)
title = self._search_regex(
(r'title["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
r'<h1[^>]*>(?P<value>[^<]+)'), webpage, 'title', group='value')
def get(key, name):
return int_or_none(self._search_regex(
r'%s["\']\s*:\s*["\'](\d+)' % key, webpage, name,
default=None))
age_limit = get('ratings', 'age limit')
video_id = video_id or get('pid', 'video id') or display_id
return {
'id': video_id,
'display_id': display_id,
'title': title,
'thumbnail': embed_vars.get('thumbUri'),
'duration': int_or_none(embed_vars.get('videoLengthInSeconds')) or None,
'age_limit': parse_age_limit(embed_vars.get('audienceRating')),
'tags': embed_vars.get('tags', '').split(','),
'thumbnail': self._og_search_thumbnail(webpage),
'age_limit': age_limit,
'formats': formats,
}

View File

@@ -669,7 +669,10 @@ class BrightcoveNewIE(AdobePassIE):
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
'ip_blocks': smuggled_data.get('geo_ip_blocks'),
})
account_id, player_id, embed, video_id = re.match(self._VALID_URL, url).groups()

View File

@@ -0,0 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .jwplatform import JWPlatformIE
class BusinessInsiderIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
'info_dict': {
'id': 'hZRllCfw',
'ext': 'mp4',
'title': "Here's how much radiation you're exposed to in everyday life",
'description': 'md5:9a0d6e2c279948aadaa5e84d6d9b99bd',
'upload_date': '20170709',
'timestamp': 1499606400,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
'only_matching': True,
}, {
'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
jwplatform_id = self._search_regex(
(r'data-media-id=["\']([a-zA-Z0-9]{8})',
r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})'),
webpage, 'jwplatform id')
return self.url_result(
'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),
video_id=video_id)

View File

@@ -0,0 +1,96 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
)
class CamModelsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cammodels\.com/cam/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.cammodels.com/cam/AutumnKnight/',
'only_matching': True,
}]
def _real_extract(self, url):
user_id = self._match_id(url)
webpage = self._download_webpage(
url, user_id, headers=self.geo_verification_headers())
manifest_root = self._html_search_regex(
r'manifestUrlRoot=([^&\']+)', webpage, 'manifest', default=None)
if not manifest_root:
ERRORS = (
("I'm offline, but let's stay connected", 'This user is currently offline'),
('in a private show', 'This user is in a private show'),
('is currently performing LIVE', 'This model is currently performing live'),
)
for pattern, message in ERRORS:
if pattern in webpage:
error = message
expected = True
break
else:
error = 'Unable to find manifest URL root'
expected = False
raise ExtractorError(error, expected=expected)
manifest = self._download_json(
'%s%s.json' % (manifest_root, user_id), user_id)
formats = []
for format_id, format_dict in manifest['formats'].items():
if not isinstance(format_dict, dict):
continue
encodings = format_dict.get('encodings')
if not isinstance(encodings, list):
continue
vcodec = format_dict.get('videoCodec')
acodec = format_dict.get('audioCodec')
for media in encodings:
if not isinstance(media, dict):
continue
media_url = media.get('location')
if not media_url or not isinstance(media_url, compat_str):
continue
format_id_list = [format_id]
height = int_or_none(media.get('videoHeight'))
if height is not None:
format_id_list.append('%dp' % height)
f = {
'url': media_url,
'format_id': '-'.join(format_id_list),
'width': int_or_none(media.get('videoWidth')),
'height': height,
'vbr': int_or_none(media.get('videoKbps')),
'abr': int_or_none(media.get('audioKbps')),
'fps': int_or_none(media.get('fps')),
'vcodec': vcodec,
'acodec': acodec,
}
if 'rtmp' in format_id:
f['ext'] = 'flv'
elif 'hls' in format_id:
f.update({
'ext': 'mp4',
# hls skips fragments, preferring rtmp
'preference': -1,
})
else:
continue
formats.append(f)
self._sort_formats(formats)
return {
'id': user_id,
'title': self._live_title(user_id),
'is_live': True,
'formats': formats,
}

View File

@@ -0,0 +1,69 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_timestamp,
)
class CamTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|api)\.)?camtube\.co/recordings?/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://camtube.co/recording/minafay-030618-1136-chaturbate-female',
'info_dict': {
'id': '42ad3956-dd5b-445a-8313-803ea6079fac',
'display_id': 'minafay-030618-1136-chaturbate-female',
'ext': 'mp4',
'title': 'minafay-030618-1136-chaturbate-female',
'duration': 1274,
'timestamp': 1528018608,
'upload_date': '20180603',
},
'params': {
'skip_download': True,
},
}]
_API_BASE = 'https://api.camtube.co'
def _real_extract(self, url):
display_id = self._match_id(url)
token = self._download_json(
'%s/rpc/session/new' % self._API_BASE, display_id,
'Downloading session token')['token']
self._set_cookie('api.camtube.co', 'session', token)
video = self._download_json(
'%s/recordings/%s' % (self._API_BASE, display_id), display_id,
headers={'Referer': url})
video_id = video['uuid']
timestamp = unified_timestamp(video.get('createdAt'))
duration = int_or_none(video.get('duration'))
view_count = int_or_none(video.get('viewCount'))
like_count = int_or_none(video.get('likeCount'))
creator = video.get('stageName')
formats = [{
'url': '%s/recordings/%s/manifest.m3u8'
% (self._API_BASE, video_id),
'format_id': 'hls',
'ext': 'mp4',
'protocol': 'm3u8_native',
}]
return {
'id': video_id,
'display_id': display_id,
'title': display_id,
'timestamp': timestamp,
'duration': duration,
'view_count': view_count,
'like_count': like_count,
'creator': creator,
'formats': formats,
}

View File

@@ -5,7 +5,10 @@ import json
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import (
compat_str,
compat_HTTPError,
)
from ..utils import (
js_to_json,
smuggle_url,
@@ -14,9 +17,11 @@ from ..utils import (
xpath_element,
xpath_with_ns,
find_xpath_attr,
orderedSet,
parse_duration,
parse_iso8601,
parse_age_limit,
strip_or_none,
int_or_none,
ExtractorError,
)
@@ -126,15 +131,23 @@ class CBCIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = self._og_search_title(webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title', default=None) or self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
entries = [
self._extract_player_init(player_init, display_id)
for player_init in re.findall(r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage)]
media_ids = []
for media_id_re in (
r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"',
r'<div[^>]+\bid=["\']player-(\d+)',
r'guid["\']\s*:\s*["\'](\d+)'):
media_ids.extend(re.findall(media_id_re, webpage))
entries.extend([
self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)])
for media_id in orderedSet(media_ids)])
return self.playlist_result(
entries, display_id,
self._og_search_title(webpage, fatal=False),
entries, display_id, strip_or_none(title),
self._og_search_description(webpage))
@@ -206,30 +219,48 @@ class CBCWatchBaseIE(InfoExtractor):
def _call_api(self, path, video_id):
url = path if path.startswith('http') else self._API_BASE_URL + path
result = self._download_xml(url, video_id, headers={
'X-Clearleap-DeviceId': self._device_id,
'X-Clearleap-DeviceToken': self._device_token,
})
for _ in range(2):
try:
result = self._download_xml(url, video_id, headers={
'X-Clearleap-DeviceId': self._device_id,
'X-Clearleap-DeviceToken': self._device_token,
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
# Device token has expired, re-acquiring device token
self._register_device()
continue
raise
error_message = xpath_text(result, 'userMessage') or xpath_text(result, 'systemMessage')
if error_message:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message))
return result
def _real_initialize(self):
if not self._device_id or not self._device_token:
device = self._downloader.cache.load('cbcwatch', 'device') or {}
self._device_id, self._device_token = device.get('id'), device.get('token')
if not self._device_id or not self._device_token:
result = self._download_xml(
self._API_BASE_URL + 'device/register',
None, data=b'<device><type>web</type></device>')
self._device_id = xpath_text(result, 'deviceId', fatal=True)
self._device_token = xpath_text(result, 'deviceToken', fatal=True)
self._downloader.cache.store(
'cbcwatch', 'device', {
'id': self._device_id,
'token': self._device_token,
})
if self._valid_device_token():
return
device = self._downloader.cache.load('cbcwatch', 'device') or {}
self._device_id, self._device_token = device.get('id'), device.get('token')
if self._valid_device_token():
return
self._register_device()
def _valid_device_token(self):
return self._device_id and self._device_token
def _register_device(self):
self._device_id = self._device_token = None
result = self._download_xml(
self._API_BASE_URL + 'device/register',
None, 'Acquiring device token',
data=b'<device><type>web</type></device>')
self._device_id = xpath_text(result, 'deviceId', fatal=True)
self._device_token = xpath_text(result, 'deviceToken', fatal=True)
self._downloader.cache.store(
'cbcwatch', 'device', {
'id': self._device_id,
'token': self._device_token,
})
def _parse_rss_feed(self, rss):
channel = xpath_element(rss, 'channel', fatal=True)

View File

@@ -65,7 +65,7 @@ class CBSIE(CBSBaseIE):
last_e = None
for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType')
if not asset_type or asset_type in asset_types:
if not asset_type or asset_type in asset_types or asset_type in ('HLS_FPS', 'DASH_CENC'):
continue
asset_types.append(asset_type)
query = {

View File

@@ -4,28 +4,35 @@ from .cbs import CBSBaseIE
class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/(?:video|news)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast',
'url': 'https://www.cbssports.com/nba/video/donovan-mitchell-flashes-star-potential-in-game-2-victory-over-thunder/',
'info_dict': {
'id': '708337219968',
'id': '1214315075735',
'ext': 'mp4',
'title': 'Ben Simmons the next LeBron? Not so fast',
'description': 'md5:854294f627921baba1f4b9a990d87197',
'timestamp': 1466293740,
'upload_date': '20160618',
'title': 'Donovan Mitchell flashes star potential in Game 2 victory over Thunder',
'description': 'md5:df6f48622612c2d6bd2e295ddef58def',
'timestamp': 1524111457,
'upload_date': '20180419',
'uploader': 'CBSI-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'https://www.cbssports.com/nba/news/nba-playoffs-2018-watch-76ers-vs-heat-game-3-series-schedule-tv-channel-online-stream/',
'only_matching': True,
}]
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
[r'(?:=|%26)pcid%3D(\d+)', r'embedVideo(?:Container)?_(\d+)'],
webpage, 'video id')
return self._extract_video_info('byId=%s' % video_id, video_id)

View File

@@ -4,11 +4,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
int_or_none,
parse_duration,
parse_iso8601,
clean_html,
parse_resolution,
)
@@ -40,34 +42,42 @@ class CCMAIE(InfoExtractor):
def _real_extract(self, url):
media_type, media_id = re.match(self._VALID_URL, url).groups()
media_data = {}
formats = []
profiles = ['pc'] if media_type == 'audio' else ['mobil', 'pc']
for i, profile in enumerate(profiles):
md = self._download_json('http://dinamics.ccma.cat/pvideo/media.jsp', media_id, query={
media = self._download_json(
'http://dinamics.ccma.cat/pvideo/media.jsp', media_id, query={
'media': media_type,
'idint': media_id,
'profile': profile,
}, fatal=False)
if md:
media_data = md
media_url = media_data.get('media', {}).get('url')
if media_url:
formats.append({
'format_id': profile,
'url': media_url,
'quality': i,
})
})
formats = []
media_url = media['media']['url']
if isinstance(media_url, list):
for format_ in media_url:
format_url = format_.get('file')
if not format_url or not isinstance(format_url, compat_str):
continue
label = format_.get('label')
f = parse_resolution(label)
f.update({
'url': format_url,
'format_id': label,
})
formats.append(f)
else:
formats.append({
'url': media_url,
'vcodec': 'none' if media_type == 'audio' else None,
})
self._sort_formats(formats)
informacio = media_data['informacio']
informacio = media['informacio']
title = informacio['titol']
durada = informacio.get('durada', {})
duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text'))
timestamp = parse_iso8601(informacio.get('data_emissio', {}).get('utc'))
subtitles = {}
subtitols = media_data.get('subtitols', {})
subtitols = media.get('subtitols', {})
if subtitols:
sub_url = subtitols.get('url')
if sub_url:
@@ -77,7 +87,7 @@ class CCMAIE(InfoExtractor):
})
thumbnails = []
imatges = media_data.get('imatges', {})
imatges = media.get('imatges', {})
if imatges:
thumbnail_url = imatges.get('url')
if thumbnail_url:

0
youtube_dl/extractor/cda.py Executable file → Normal file
View File

View File

@@ -31,7 +31,8 @@ class ChaturbateIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(
url, video_id, headers=self.geo_verification_headers())
m3u8_urls = []

View File

@@ -0,0 +1,60 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class CloudflareStreamIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:
(?:watch\.)?cloudflarestream\.com/|
embed\.cloudflarestream\.com/embed/[^/]+\.js\?.*?\bvideo=
)
(?P<id>[\da-f]+)
'''
_TESTS = [{
'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
'info_dict': {
'id': '31c9291ab41fac05471db4e73aa11717',
'ext': 'mp4',
'title': '31c9291ab41fac05471db4e73aa11717',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://watch.cloudflarestream.com/9df17203414fd1db3e3ed74abbe936c1',
'only_matching': True,
}, {
'url': 'https://cloudflarestream.com/31c9291ab41fac05471db4e73aa11717/manifest/video.mpd',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.cloudflarestream\.com/embed/[^/]+\.js\?.*?\bvideo=[\da-f]+?.*?)\1',
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
formats = self._extract_m3u8_formats(
'https://cloudflarestream.com/%s/manifest/video.m3u8' % video_id,
video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False)
formats.extend(self._extract_mpd_formats(
'https://cloudflarestream.com/%s/manifest/video.mpd' % video_id,
video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'title': video_id,
'formats': formats,
}

View File

@@ -339,15 +339,17 @@ class InfoExtractor(object):
_GEO_BYPASS attribute may be set to False in order to disable
geo restriction bypass mechanisms for a particular extractor.
Though it won't disable explicit geo restriction bypass based on
country code provided with geo_bypass_country. (experimental)
country code provided with geo_bypass_country.
_GEO_COUNTRIES attribute may contain a list of presumably geo unrestricted
countries for this extractor. One of these countries will be used by
geo restriction bypass mechanism right away in order to bypass
geo restriction, of course, if the mechanism is not disabled. (experimental)
geo restriction, of course, if the mechanism is not disabled.
NB: both these geo attributes are experimental and may change in future
or be completely removed.
_GEO_IP_BLOCKS attribute may contain a list of presumably geo unrestricted
IP blocks in CIDR notation for this extractor. One of these IP blocks
will be used by geo restriction bypass mechanism similarly
to _GEO_COUNTRIES.
Finally, the _WORKING attribute should be set to False for broken IEs
in order to warn the users and skip the tests.
@@ -358,6 +360,7 @@ class InfoExtractor(object):
_x_forwarded_for_ip = None
_GEO_BYPASS = True
_GEO_COUNTRIES = None
_GEO_IP_BLOCKS = None
_WORKING = True
def __init__(self, downloader=None):
@@ -392,12 +395,15 @@ class InfoExtractor(object):
def initialize(self):
"""Initializes an instance (authentication, etc)."""
self._initialize_geo_bypass(self._GEO_COUNTRIES)
self._initialize_geo_bypass({
'countries': self._GEO_COUNTRIES,
'ip_blocks': self._GEO_IP_BLOCKS,
})
if not self._ready:
self._real_initialize()
self._ready = True
def _initialize_geo_bypass(self, countries):
def _initialize_geo_bypass(self, geo_bypass_context):
"""
Initialize geo restriction bypass mechanism.
@@ -408,28 +414,82 @@ class InfoExtractor(object):
HTTP requests.
This method will be used for initial geo bypass mechanism initialization
during the instance initialization with _GEO_COUNTRIES.
during the instance initialization with _GEO_COUNTRIES and
_GEO_IP_BLOCKS.
You may also manually call it from extractor's code if geo countries
You may also manually call it from extractor's code if geo bypass
information is not available beforehand (e.g. obtained during
extraction) or due to some another reason.
extraction) or due to some other reason. In this case you should pass
this information in geo bypass context passed as first argument. It may
contain following fields:
countries: List of geo unrestricted countries (similar
to _GEO_COUNTRIES)
ip_blocks: List of geo unrestricted IP blocks in CIDR notation
(similar to _GEO_IP_BLOCKS)
"""
if not self._x_forwarded_for_ip:
country_code = self._downloader.params.get('geo_bypass_country', None)
# If there is no explicit country for geo bypass specified and
# the extractor is known to be geo restricted let's fake IP
# as X-Forwarded-For right away.
if (not country_code and
self._GEO_BYPASS and
self._downloader.params.get('geo_bypass', True) and
countries):
country_code = random.choice(countries)
if country_code:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country_code)
# Geo bypass mechanism is explicitly disabled by user
if not self._downloader.params.get('geo_bypass', True):
return
if not geo_bypass_context:
geo_bypass_context = {}
# Backward compatibility: previously _initialize_geo_bypass
# expected a list of countries, some 3rd party code may still use
# it this way
if isinstance(geo_bypass_context, (list, tuple)):
geo_bypass_context = {
'countries': geo_bypass_context,
}
# The whole point of geo bypass mechanism is to fake IP
# as X-Forwarded-For HTTP header based on some IP block or
# country code.
# Path 1: bypassing based on IP block in CIDR notation
# Explicit IP block specified by user, use it right away
# regardless of whether extractor is geo bypassable or not
ip_block = self._downloader.params.get('geo_bypass_ip_block', None)
# Otherwise use random IP block from geo bypass context but only
# if extractor is known as geo bypassable
if not ip_block:
ip_blocks = geo_bypass_context.get('ip_blocks')
if self._GEO_BYPASS and ip_blocks:
ip_block = random.choice(ip_blocks)
if ip_block:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(ip_block)
if self._downloader.params.get('verbose', False):
self._downloader.to_screen(
'[debug] Using fake IP %s as X-Forwarded-For.'
% self._x_forwarded_for_ip)
return
# Path 2: bypassing based on country code
# Explicit country code specified by user, use it right away
# regardless of whether extractor is geo bypassable or not
country = self._downloader.params.get('geo_bypass_country', None)
# Otherwise use random country code from geo bypass context but
# only if extractor is known as geo bypassable
if not country:
countries = geo_bypass_context.get('countries')
if self._GEO_BYPASS and countries:
country = random.choice(countries)
if country:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country)
if self._downloader.params.get('verbose', False):
self._downloader.to_screen(
'[debug] Using fake IP %s (%s) as X-Forwarded-For.'
% (self._x_forwarded_for_ip, country_code.upper()))
% (self._x_forwarded_for_ip, country.upper()))
def extract(self, url):
"""Extracts URL information and returns it in list of dicts."""
@@ -682,18 +742,30 @@ class InfoExtractor(object):
else:
self.report_warning(errmsg + str(ve))
def _download_json(self, url_or_request, video_id,
note='Downloading JSON metadata',
errnote='Unable to download JSON metadata',
transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}):
json_string = self._download_webpage(
def _download_json_handle(
self, url_or_request, video_id, note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}):
"""Return a tuple (JSON object, URL handle)"""
res = self._download_webpage_handle(
url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query)
if (not fatal) and json_string is False:
return None
if res is False:
return res
json_string, urlh = res
return self._parse_json(
json_string, video_id, transform_source=transform_source, fatal=fatal)
json_string, video_id, transform_source=transform_source,
fatal=fatal), urlh
def _download_json(
self, url_or_request, video_id, note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}):
res = self._download_json_handle(
url_or_request, video_id, note=note, errnote=errnote,
transform_source=transform_source, fatal=fatal, encoding=encoding,
data=data, headers=headers, query=query)
return res if res is False else res[0]
def _parse_json(self, json_string, video_id, transform_source=None, fatal=True):
if transform_source:
@@ -1008,6 +1080,40 @@ class InfoExtractor(object):
if isinstance(json_ld, dict):
json_ld = [json_ld]
INTERACTION_TYPE_MAP = {
'CommentAction': 'comment',
'AgreeAction': 'like',
'DisagreeAction': 'dislike',
'LikeAction': 'like',
'DislikeAction': 'dislike',
'ListenAction': 'view',
'WatchAction': 'view',
'ViewAction': 'view',
}
def extract_interaction_statistic(e):
interaction_statistic = e.get('interactionStatistic')
if not isinstance(interaction_statistic, list):
return
for is_e in interaction_statistic:
if not isinstance(is_e, dict):
continue
if is_e.get('@type') != 'InteractionCounter':
continue
interaction_type = is_e.get('interactionType')
if not isinstance(interaction_type, compat_str):
continue
interaction_count = int_or_none(is_e.get('userInteractionCount'))
if interaction_count is None:
continue
count_kind = INTERACTION_TYPE_MAP.get(interaction_type.split('/')[-1])
if not count_kind:
continue
count_key = '%s_count' % count_kind
if info.get(count_key) is not None:
continue
info[count_key] = interaction_count
def extract_video_object(e):
assert e['@type'] == 'VideoObject'
info.update({
@@ -1023,9 +1129,10 @@ class InfoExtractor(object):
'height': int_or_none(e.get('height')),
'view_count': int_or_none(e.get('interactionCount')),
})
extract_interaction_statistic(e)
for e in json_ld:
if e.get('@context') == 'http://schema.org':
if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')):
item_type = e.get('@type')
if expected_type is not None and expected_type != item_type:
return info

View File

@@ -19,8 +19,8 @@ from ..utils import (
class CrackleIE(InfoExtractor):
_VALID_URL = r'(?:crackle:|https?://(?:(?:www|m)\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
_TEST = {
_VALID_URL = r'(?:crackle:|https?://(?:(?:www|m)\.)?(?:sony)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
_TESTS = [{
# geo restricted to CA
'url': 'https://www.crackle.com/andromeda/2502343',
'info_dict': {
@@ -45,7 +45,10 @@ class CrackleIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}
}, {
'url': 'https://www.sonycrackle.com/andromeda/2502343',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -49,7 +49,7 @@ class CrunchyrollBaseIE(InfoExtractor):
})
def _login(self):
(username, password) = self._get_login_info()
username, password = self._get_login_info()
if username is None:
return

View File

@@ -11,10 +11,10 @@ class CTVNewsIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?ctvnews\.ca/(?:video\?(?:clip|playlist|bin)Id=|.*?)(?P<id>[0-9.]+)'
_TESTS = [{
'url': 'http://www.ctvnews.ca/video?clipId=901995',
'md5': '10deb320dc0ccb8d01d34d12fc2ea672',
'md5': '9b8624ba66351a23e0b6e1391971f9af',
'info_dict': {
'id': '901995',
'ext': 'mp4',
'ext': 'flv',
'title': 'Extended: \'That person cannot be me\' Johnson says',
'description': 'md5:958dd3b4f5bbbf0ed4d045c790d89285',
'timestamp': 1467286284,

View File

@@ -35,7 +35,7 @@ class CuriosityStreamBaseIE(InfoExtractor):
return result['data']
def _real_initialize(self):
(email, password) = self._get_login_info()
email, password = self._get_login_info()
if email is None:
return
result = self._download_json(

View File

@@ -1,12 +1,16 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import json
import base64
import hashlib
import itertools
import json
import random
import re
import string
from .common import InfoExtractor
from ..compat import compat_struct_pack
from ..utils import (
determine_ext,
error_to_compat_str,
@@ -64,7 +68,6 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'uploader': 'Deadline',
'uploader_id': 'x1xm8ri',
'age_limit': 0,
'view_count': int,
},
}, {
'url': 'https://www.dailymotion.com/video/x2iuewm_steam-machine-models-pricing-listed-on-steam-store-ign-news_videogames',
@@ -167,6 +170,17 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
player = self._parse_json(player_v5, video_id)
metadata = player['metadata']
if metadata.get('error', {}).get('type') == 'password_protected':
password = self._downloader.params.get('videopassword')
if password:
r = int(metadata['id'][1:], 36)
us64e = lambda x: base64.urlsafe_b64encode(x).decode().strip('=')
t = ''.join(random.choice(string.ascii_letters) for i in range(10))
n = us64e(compat_struct_pack('I', r))
i = us64e(hashlib.md5(('%s%d%s' % (password, r, t)).encode()).digest())
metadata = self._download_json(
'http://www.dailymotion.com/player/metadata/video/p' + i + t + n, video_id)
self._check_error(metadata)
formats = []
@@ -180,9 +194,12 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
continue
ext = mimetype2ext(type_) or determine_ext(media_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
m3u8_formats = self._extract_m3u8_formats(
media_url, video_id, 'mp4', preference=-1,
m3u8_id='hls', fatal=False))
m3u8_id='hls', fatal=False)
for f in m3u8_formats:
f['url'] = f['url'].split('#')[0]
formats.append(f)
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
@@ -299,8 +316,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
def _check_error(self, info):
error = info.get('error')
if info.get('error') is not None:
title = error['title']
if error:
title = error.get('title') or error['message']
# See https://developer.dailymotion.com/api#access-error
if error.get('code') == 'DM007':
self.raise_geo_restricted(msg=title)

View File

@@ -5,7 +5,10 @@ import re
import string
from .discoverygo import DiscoveryGoBaseIE
from ..compat import compat_str
from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import (
ExtractorError,
try_get,
@@ -55,15 +58,27 @@ class DiscoveryIE(DiscoveryGoBaseIE):
video = next(cb for cb in content_blocks if cb.get('type') == 'video')['content']['items'][0]
video_id = video['id']
access_token = self._download_json(
'https://www.%s.com/anonymous' % site, display_id, query={
'authRel': 'authorization',
'client_id': try_get(
react_data, lambda x: x['application']['apiClientId'],
compat_str) or '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
})['access_token']
access_token = None
cookies = self._get_cookies(url)
# prefer Affiliate Auth Token over Anonymous Auth Token
auth_storage_cookie = cookies.get('eosAf') or cookies.get('eosAn')
if auth_storage_cookie and auth_storage_cookie.value:
auth_storage = self._parse_json(compat_urllib_parse_unquote(
compat_urllib_parse_unquote(auth_storage_cookie.value)),
video_id, fatal=False) or {}
access_token = auth_storage.get('a') or auth_storage.get('access_token')
if not access_token:
access_token = self._download_json(
'https://www.%s.com/anonymous' % site, display_id, query={
'authRel': 'authorization',
'client_id': try_get(
react_data, lambda x: x['application']['apiClientId'],
compat_str) or '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
})['access_token']
try:
stream = self._download_json(
@@ -72,7 +87,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
'Authorization': 'Bearer ' + access_token,
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
e_description = self._parse_json(
e.cause.read().decode(), display_id)['description']
if 'resource not available for country' in e_description:

View File

@@ -3,8 +3,8 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .brightcove import BrightcoveLegacyIE
from .dplay import DPlayIE
from ..compat import (
compat_parse_qs,
compat_urlparse,
@@ -12,8 +12,13 @@ from ..compat import (
from ..utils import smuggle_url
class DiscoveryNetworksDeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:discovery|tlc|animalplanet|dmax)\.de/(?:.*#(?P<id>\d+)|(?:[^/]+/)*videos/(?P<title>[^/?#]+))'
class DiscoveryNetworksDeIE(DPlayIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?(?P<site>discovery|tlc|animalplanet|dmax)\.de/
(?:
.*\#(?P<id>\d+)|
(?:[^/]+/)*videos/(?P<display_id>[^/?#]+)|
programme/(?P<programme>[^/]+)/video/(?P<alternate_id>[^/]+)
)'''
_TESTS = [{
'url': 'http://www.tlc.de/sendungen/breaking-amish/videos/#3235167922001',
@@ -40,6 +45,14 @@ class DiscoveryNetworksDeIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
alternate_id = mobj.group('alternate_id')
if alternate_id:
self._initialize_geo_bypass({
'countries': ['DE'],
})
return self._get_disco_api_info(
url, '%s/%s' % (mobj.group('programme'), alternate_id),
'sonic-eu1-prod.disco-api.com', mobj.group('site') + 'de')
brightcove_id = mobj.group('id')
if not brightcove_id:
title = mobj.group('title')

View File

@@ -97,12 +97,83 @@ class DPlayIE(InfoExtractor):
'only_matching': True,
}]
def _get_disco_api_info(self, url, display_id, disco_host, realm):
disco_base = 'https://' + disco_host
token = self._download_json(
'%s/token' % disco_base, display_id, 'Downloading token',
query={
'realm': realm,
})['data']['attributes']['token']
headers = {
'Referer': url,
'Authorization': 'Bearer ' + token,
}
video = self._download_json(
'%s/content/videos/%s' % (disco_base, display_id), display_id,
headers=headers, query={
'include': 'show'
})
video_id = video['data']['id']
info = video['data']['attributes']
title = info['name']
formats = []
for format_id, format_dict in self._download_json(
'%s/playback/videoPlaybackInfo/%s' % (disco_base, video_id),
display_id, headers=headers)['data']['attributes']['streaming'].items():
if not isinstance(format_dict, dict):
continue
format_url = format_dict.get('url')
if not format_url:
continue
ext = determine_ext(format_url)
if format_id == 'dash' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, display_id, mpd_id='dash', fatal=False))
elif format_id == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
else:
formats.append({
'url': format_url,
'format_id': format_id,
})
self._sort_formats(formats)
series = None
try:
included = video.get('included')
if isinstance(included, list):
show = next(e for e in included if e.get('type') == 'show')
series = try_get(
show, lambda x: x['attributes']['name'], compat_str)
except StopIteration:
pass
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': info.get('description'),
'duration': float_or_none(
info.get('videoDuration'), scale=1000),
'timestamp': unified_timestamp(info.get('publishStart')),
'series': series,
'season_number': int_or_none(info.get('seasonNumber')),
'episode_number': int_or_none(info.get('episodeNumber')),
'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats,
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
domain = mobj.group('domain')
self._initialize_geo_bypass([mobj.group('country').upper()])
self._initialize_geo_bypass({
'countries': [mobj.group('country').upper()],
})
webpage = self._download_webpage(url, display_id)
@@ -111,72 +182,8 @@ class DPlayIE(InfoExtractor):
if not video_id:
host = mobj.group('host')
disco_base = 'https://disco-api.%s' % host
self._download_json(
'%s/token' % disco_base, display_id, 'Downloading token',
query={
'realm': host.replace('.', ''),
})
video = self._download_json(
'%s/content/videos/%s' % (disco_base, display_id), display_id,
headers={
'Referer': url,
'x-disco-client': 'WEB:UNKNOWN:dplay-client:0.0.1',
}, query={
'include': 'show'
})
video_id = video['data']['id']
info = video['data']['attributes']
title = info['name']
formats = []
for format_id, format_dict in self._download_json(
'%s/playback/videoPlaybackInfo/%s' % (disco_base, video_id),
display_id)['data']['attributes']['streaming'].items():
if not isinstance(format_dict, dict):
continue
format_url = format_dict.get('url')
if not format_url:
continue
ext = determine_ext(format_url)
if format_id == 'dash' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, display_id, mpd_id='dash', fatal=False))
elif format_id == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
else:
formats.append({
'url': format_url,
'format_id': format_id,
})
self._sort_formats(formats)
series = None
try:
included = video.get('included')
if isinstance(included, list):
show = next(e for e in included if e.get('type') == 'show')
series = try_get(
show, lambda x: x['attributes']['name'], compat_str)
except StopIteration:
pass
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': info.get('description'),
'duration': float_or_none(
info.get('videoDuration'), scale=1000),
'timestamp': unified_timestamp(info.get('publishStart')),
'series': series,
'season_number': int_or_none(info.get('seasonNumber')),
'episode_number': int_or_none(info.get('episodeNumber')),
'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats,
}
return self._get_disco_api_info(
url, display_id, 'disco-api.' + host, host.replace('.', ''))
info = self._download_json(
'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id),

View File

@@ -2,26 +2,26 @@
from __future__ import unicode_literals
import itertools
import json
from .amp import AMPIE
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
compat_urlparse,
)
from ..utils import (
ExtractorError,
clean_html,
ExtractorError,
int_or_none,
remove_end,
sanitized_Request,
urlencode_postdata
parse_age_limit,
parse_duration,
unified_timestamp,
)
class DramaFeverBaseIE(AMPIE):
_LOGIN_URL = 'https://www.dramafever.com/accounts/login/'
class DramaFeverBaseIE(InfoExtractor):
_NETRC_MACHINE = 'dramafever'
_GEO_COUNTRIES = ['US', 'CA']
_CONSUMER_SECRET = 'DA59dtVXYLxajktV'
@@ -38,11 +38,11 @@ class DramaFeverBaseIE(AMPIE):
'consumer secret', default=self._CONSUMER_SECRET)
def _real_initialize(self):
self._login()
self._consumer_secret = self._get_consumer_secret()
self._login()
def _login(self):
(username, password) = self._get_login_info()
username, password = self._get_login_info()
if username is None:
return
@@ -51,37 +51,49 @@ class DramaFeverBaseIE(AMPIE):
'password': password,
}
request = sanitized_Request(
self._LOGIN_URL, urlencode_postdata(login_form))
response = self._download_webpage(
request, None, 'Logging in')
try:
response = self._download_json(
'https://www.dramafever.com/api/users/login', None, 'Logging in',
data=json.dumps(login_form).encode('utf-8'), headers={
'x-consumer-key': self._consumer_secret,
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (403, 404):
response = self._parse_json(
e.cause.read().decode('utf-8'), None)
else:
raise
if all(logout_pattern not in response
for logout_pattern in ['href="/accounts/logout/"', '>Log out<']):
error = self._html_search_regex(
r'(?s)<h\d[^>]+\bclass="hidden-xs prompt"[^>]*>(.+?)</h\d',
response, 'error message', default=None)
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in')
# Successful login
if response.get('result') or response.get('guid') or response.get('user_guid'):
return
errors = response.get('errors')
if errors and isinstance(errors, list):
error = errors[0]
message = error.get('message') or error['reason']
raise ExtractorError('Unable to login: %s' % message, expected=True)
raise ExtractorError('Unable to log in')
class DramaFeverIE(DramaFeverBaseIE):
IE_NAME = 'dramafever'
_VALID_URL = r'https?://(?:www\.)?dramafever\.com/(?:[^/]+/)?drama/(?P<id>[0-9]+/[0-9]+)(?:/|$)'
_TESTS = [{
'url': 'http://www.dramafever.com/drama/4512/1/Cooking_with_Shin/',
'url': 'https://www.dramafever.com/drama/4274/1/Heirs/',
'info_dict': {
'id': '4512.1',
'ext': 'flv',
'title': 'Cooking with Shin',
'description': 'md5:a8eec7942e1664a6896fcd5e1287bfd0',
'id': '4274.1',
'ext': 'wvm',
'title': 'Heirs - Episode 1',
'description': 'md5:362a24ba18209f6276e032a651c50bc2',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 3783,
'timestamp': 1381354993,
'upload_date': '20131009',
'series': 'Heirs',
'season_number': 1,
'episode': 'Episode 1',
'episode_number': 1,
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1404336058,
'upload_date': '20140702',
'duration': 344,
},
'params': {
# m3u8 download
@@ -110,50 +122,95 @@ class DramaFeverIE(DramaFeverBaseIE):
'only_matching': True,
}]
def _call_api(self, path, video_id, note, fatal=False):
return self._download_json(
'https://www.dramafever.com/api/5/' + path,
video_id, note=note, headers={
'x-consumer-key': self._consumer_secret,
}, fatal=fatal)
def _get_subtitles(self, video_id):
subtitles = {}
subs = self._call_api(
'video/%s/subtitles/webvtt/' % video_id, video_id,
'Downloading subtitles JSON', fatal=False)
if not subs or not isinstance(subs, list):
return subtitles
for sub in subs:
if not isinstance(sub, dict):
continue
sub_url = sub.get('url')
if not sub_url or not isinstance(sub_url, compat_str):
continue
subtitles.setdefault(
sub.get('code') or sub.get('language') or 'en', []).append({
'url': sub_url
})
return subtitles
def _real_extract(self, url):
video_id = self._match_id(url).replace('/', '.')
try:
info = self._extract_feed_info(
'http://www.dramafever.com/amp/episode/feed.json?guid=%s' % video_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
self.raise_geo_restricted(
msg='Currently unavailable in your country',
countries=self._GEO_COUNTRIES)
raise
# title is postfixed with video id for some reason, removing
if info.get('title'):
info['title'] = remove_end(info['title'], video_id).strip()
series_id, episode_number = video_id.split('.')
episode_info = self._download_json(
# We only need a single episode info, so restricting page size to one episode
# and dealing with page number as with episode number
r'http://www.dramafever.com/api/4/episode/series/?cs=%s&series_id=%s&page_number=%s&page_size=1'
% (self._consumer_secret, series_id, episode_number),
video_id, 'Downloading episode info JSON', fatal=False)
if episode_info:
value = episode_info.get('value')
if isinstance(value, list):
for v in value:
if v.get('type') == 'Episode':
subfile = v.get('subfile') or v.get('new_subfile')
if subfile and subfile != 'http://www.dramafever.com/st/':
info.setdefault('subtitles', {}).setdefault('English', []).append({
'ext': 'srt',
'url': subfile,
})
episode_number = int_or_none(v.get('number'))
episode_fallback = 'Episode'
if episode_number:
episode_fallback += ' %d' % episode_number
info['episode'] = v.get('title') or episode_fallback
info['episode_number'] = episode_number
break
return info
video = self._call_api(
'series/%s/episodes/%s/' % (series_id, episode_number), video_id,
'Downloading video JSON')
formats = []
download_assets = video.get('download_assets')
if download_assets and isinstance(download_assets, dict):
for format_id, format_dict in download_assets.items():
if not isinstance(format_dict, dict):
continue
format_url = format_dict.get('url')
if not format_url or not isinstance(format_url, compat_str):
continue
formats.append({
'url': format_url,
'format_id': format_id,
'filesize': int_or_none(video.get('filesize')),
})
stream = self._call_api(
'video/%s/stream/' % video_id, video_id, 'Downloading stream JSON',
fatal=False)
if stream:
stream_url = stream.get('stream_url')
if stream_url:
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats)
title = video.get('title') or 'Episode %s' % episode_number
description = video.get('description')
thumbnail = video.get('thumbnail')
timestamp = unified_timestamp(video.get('release_date'))
duration = parse_duration(video.get('duration'))
age_limit = parse_age_limit(video.get('tv_rating'))
series = video.get('series_title')
season_number = int_or_none(video.get('season'))
if series:
title = '%s - %s' % (series, title)
subtitles = self.extract_subtitles(video_id)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'age_limit': age_limit,
'series': series,
'season_number': season_number,
'episode_number': int_or_none(episode_number),
'formats': formats,
'subtitles': subtitles,
}
class DramaFeverSeriesIE(DramaFeverBaseIE):

View File

@@ -8,7 +8,6 @@ from ..utils import (
unified_strdate,
xpath_text,
determine_ext,
qualities,
float_or_none,
ExtractorError,
)
@@ -16,7 +15,8 @@ from ..utils import (
class DreiSatIE(InfoExtractor):
IE_NAME = '3sat'
_VALID_URL = r'(?:https?://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_GEO_COUNTRIES = ['DE']
_VALID_URL = r'https?://(?:www\.)?3sat\.de/mediathek/(?:(?:index|mediathek)\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)'
_TESTS = [
{
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',
@@ -43,7 +43,8 @@ class DreiSatIE(InfoExtractor):
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
param_groups = {}
for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
group_id = param_group.attrib.get(self._xpath_ns('id', 'http://www.w3.org/XML/1998/namespace'))
group_id = param_group.get(self._xpath_ns(
'id', 'http://www.w3.org/XML/1998/namespace'))
params = {}
for param in param_group:
params[param.get('name')] = param.get('value')
@@ -54,7 +55,7 @@ class DreiSatIE(InfoExtractor):
src = video.get('src')
if not src:
continue
bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
bitrate = int_or_none(self._search_regex(r'_(\d+)k', src, 'bitrate', None)) or float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
group_id = video.get('paramGroup')
param_group = param_groups[group_id]
for proto in param_group['protocols'].split(','):
@@ -75,66 +76,36 @@ class DreiSatIE(InfoExtractor):
note='Downloading video info',
errnote='Failed to download video info')
status_code = doc.find('./status/statuscode')
if status_code is not None and status_code.text != 'ok':
code = status_code.text
if code == 'notVisibleAnymore':
status_code = xpath_text(doc, './status/statuscode')
if status_code and status_code != 'ok':
if status_code == 'notVisibleAnymore':
message = 'Video %s is not available' % video_id
else:
message = '%s returned error: %s' % (self.IE_NAME, code)
message = '%s returned error: %s' % (self.IE_NAME, status_code)
raise ExtractorError(message, expected=True)
title = doc.find('.//information/title').text
description = xpath_text(doc, './/information/detail', 'description')
duration = int_or_none(xpath_text(doc, './/details/lengthSec', 'duration'))
uploader = xpath_text(doc, './/details/originChannelTitle', 'uploader')
uploader_id = xpath_text(doc, './/details/originChannelId', 'uploader id')
upload_date = unified_strdate(xpath_text(doc, './/details/airtime', 'upload date'))
title = xpath_text(doc, './/information/title', 'title', True)
def xml_to_thumbnails(fnode):
thumbnails = []
for node in fnode:
thumbnail_url = node.text
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
if 'key' in node.attrib:
m = re.match('^([0-9]+)x([0-9]+)$', node.attrib['key'])
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
return thumbnails
thumbnails = xml_to_thumbnails(doc.findall('.//teaserimages/teaserimage'))
format_nodes = doc.findall('.//formitaeten/formitaet')
quality = qualities(['veryhigh', 'high', 'med', 'low'])
def get_quality(elem):
return quality(xpath_text(elem, 'quality'))
format_nodes.sort(key=get_quality)
format_ids = []
urls = []
formats = []
for fnode in format_nodes:
video_url = fnode.find('url').text
is_available = 'http://www.metafilegenerator' not in video_url
if not is_available:
for fnode in doc.findall('.//formitaeten/formitaet'):
video_url = xpath_text(fnode, 'url')
if not video_url or video_url in urls:
continue
urls.append(video_url)
is_available = 'http://www.metafilegenerator' not in video_url
geoloced = 'static_geoloced_online' in video_url
if not is_available or geoloced:
continue
format_id = fnode.attrib['basetype']
quality = xpath_text(fnode, './quality', 'quality')
format_m = re.match(r'''(?x)
(?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
(?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
''', format_id)
ext = determine_ext(video_url, None) or format_m.group('container')
if ext not in ('smil', 'f4m', 'm3u8'):
format_id = format_id + '-' + quality
if format_id in format_ids:
continue
if ext == 'meta':
continue
@@ -147,24 +118,23 @@ class DreiSatIE(InfoExtractor):
if video_url.startswith('https://'):
continue
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id=format_id, fatal=False))
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id=format_id, fatal=False))
else:
proto = format_m.group('proto').lower()
quality = xpath_text(fnode, './quality')
if quality:
format_id += '-' + quality
abr = int_or_none(xpath_text(fnode, './audioBitrate', 'abr'), 1000)
vbr = int_or_none(xpath_text(fnode, './videoBitrate', 'vbr'), 1000)
abr = int_or_none(xpath_text(fnode, './audioBitrate'), 1000)
vbr = int_or_none(xpath_text(fnode, './videoBitrate'), 1000)
width = int_or_none(xpath_text(fnode, './width', 'width'))
height = int_or_none(xpath_text(fnode, './height', 'height'))
filesize = int_or_none(xpath_text(fnode, './filesize', 'filesize'))
format_note = ''
if not format_note:
format_note = None
tbr = int_or_none(self._search_regex(
r'_(\d+)k', video_url, 'bitrate', None))
if tbr and vbr and not abr:
abr = tbr - vbr
formats.append({
'format_id': format_id,
@@ -174,31 +144,50 @@ class DreiSatIE(InfoExtractor):
'vcodec': format_m.group('vcodec'),
'abr': abr,
'vbr': vbr,
'width': width,
'height': height,
'filesize': filesize,
'format_note': format_note,
'protocol': proto,
'_available': is_available,
'tbr': tbr,
'width': int_or_none(xpath_text(fnode, './width')),
'height': int_or_none(xpath_text(fnode, './height')),
'filesize': int_or_none(xpath_text(fnode, './filesize')),
'protocol': format_m.group('proto').lower(),
})
format_ids.append(format_id)
geolocation = xpath_text(doc, './/details/geolocation')
if not formats and geolocation and geolocation != 'none':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
thumbnails = []
for node in doc.findall('.//teaserimages/teaserimage'):
thumbnail_url = node.text
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
thumbnail_key = node.get('key')
if thumbnail_key:
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
upload_date = unified_strdate(xpath_text(doc, './/details/airtime'))
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'description': xpath_text(doc, './/information/detail'),
'duration': int_or_none(xpath_text(doc, './/details/lengthSec')),
'thumbnails': thumbnails,
'uploader': uploader,
'uploader_id': uploader_id,
'uploader': xpath_text(doc, './/details/originChannelTitle'),
'uploader_id': xpath_text(doc, './/details/originChannelId'),
'upload_date': upload_date,
'formats': formats,
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
video_id = self._match_id(url)
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?id=%s' % video_id
return self.extract_from_xml_url(video_id, details_url)

View File

@@ -66,7 +66,9 @@ class DrTuberIE(InfoExtractor):
self._sort_formats(formats)
title = self._html_search_regex(
(r'class="title_watch"[^>]*><(?:p|h\d+)[^>]*>([^<]+)<',
(r'<h1[^>]+class=["\']title[^>]+>([^<]+)',
r'<title>([^<]+)\s*@\s+DrTuber',
r'class="title_watch"[^>]*><(?:p|h\d+)[^>]*>([^<]+)<',
r'<p[^>]+class="title_substrate">([^<]+)</p>',
r'<title>([^<]+) - \d+'),
webpage, 'title')

View File

@@ -0,0 +1,83 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from socket import timeout
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
)
class DTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?d\.tube/(?:#!/)?v/(?P<uploader_id>[0-9a-z.-]+)/(?P<id>[0-9a-z]{8})'
_TEST = {
'url': 'https://d.tube/#!/v/benswann/zqd630em',
'md5': 'a03eaa186618ffa7a3145945543a251e',
'info_dict': {
'id': 'zqd630em',
'ext': 'mp4',
'title': 'Reality Check: FDA\'s Disinformation Campaign on Kratom',
'description': 'md5:700d164e066b87f9eac057949e4227c2',
'uploader_id': 'benswann',
'upload_date': '20180222',
'timestamp': 1519328958,
},
'params': {
'format': '480p',
},
}
def _real_extract(self, url):
uploader_id, video_id = re.match(self._VALID_URL, url).groups()
result = self._download_json('https://api.steemit.com/', video_id, data=json.dumps({
'jsonrpc': '2.0',
'method': 'get_content',
'params': [uploader_id, video_id],
}).encode())['result']
metadata = json.loads(result['json_metadata'])
video = metadata['video']
content = video['content']
info = video.get('info', {})
title = info.get('title') or result['title']
def canonical_url(h):
if not h:
return None
return 'https://ipfs.io/ipfs/' + h
formats = []
for q in ('240', '480', '720', '1080', ''):
video_url = canonical_url(content.get('video%shash' % q))
if not video_url:
continue
format_id = (q + 'p') if q else 'Source'
try:
self.to_screen('%s: Checking %s video format URL' % (video_id, format_id))
self._downloader._opener.open(video_url, timeout=5).close()
except timeout as e:
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, format_id))
continue
formats.append({
'format_id': format_id,
'url': video_url,
'height': int_or_none(q),
'ext': 'mp4',
})
return {
'id': video_id,
'title': title,
'description': content.get('description'),
'thumbnail': canonical_url(info.get('snaphash')),
'tags': content.get('tags') or metadata.get('tags'),
'duration': info.get('duration'),
'formats': formats,
'timestamp': parse_iso8601(result.get('created')),
'uploader_id': uploader_id,
}

View File

@@ -91,17 +91,6 @@ class DVTVIE(InfoExtractor):
}, {
'url': 'http://video.aktualne.cz/v-cechach-poprve-zazni-zelenkova-zrestaurovana-mse/r~45b4b00483ec11e4883b002590604f2e/',
'only_matching': True,
}, {
'url': 'https://video.aktualne.cz/dvtv/babis-a-zeman-nesou-vinu-za-to-ze-nemame-jasno-v-tom-kdo-bud/r~026afb54fad711e79704ac1f6b220ee8/',
'md5': '87defe16681b1429c91f7a74809823c6',
'info_dict': {
'id': 'f5ae72f6fad611e794dbac1f6b220ee8',
'ext': 'mp4',
'title': 'Babiš a Zeman nesou vinu za to, že nemáme jasno v tom, kdo bude vládnout, říká Pekarová Adamová',
},
'params': {
'skip_download': True,
},
}]
def _parse_video_metadata(self, js, video_id, live_js=None):

View File

@@ -1,39 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class ETOnlineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?etonline\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.etonline.com/tv/211130_dove_cameron_liv_and_maddie_emotional_episode_series_finale/',
'info_dict': {
'id': '211130_dove_cameron_liv_and_maddie_emotional_episode_series_finale',
'title': 'md5:a21ec7d3872ed98335cbd2a046f34ee6',
'description': 'md5:8b94484063f463cca709617c79618ccd',
},
'playlist_count': 2,
}, {
'url': 'http://www.etonline.com/media/video/here_are_the_stars_who_love_bringing_their_moms_as_dates_to_the_oscars-211359/',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1242911076001/default_default/index.html?videoId=ref:%s'
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % video_id, 'BrightcoveNew', video_id)
for video_id in re.findall(
r'site\.brightcove\s*\([^,]+,\s*["\'](title_\d+)', webpage)]
return self.playlist_result(
entries, playlist_id,
self._og_search_title(webpage, fatal=False),
self._og_search_description(webpage))

View File

@@ -44,6 +44,7 @@ from .anysex import AnySexIE
from .aol import AolIE
from .allocine import AllocineIE
from .aliexpress import AliExpressLiveIE
from .apa import APAIE
from .aparat import AparatIE
from .appleconnect import AppleConnectIE
from .appletrailers import (
@@ -137,6 +138,7 @@ from .brightcove import (
BrightcoveLegacyIE,
BrightcoveNewIE,
)
from .businessinsider import BusinessInsiderIE
from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
from .c56 import C56IE
@@ -144,6 +146,8 @@ from .camdemy import (
CamdemyIE,
CamdemyFolderIE
)
from .cammodels import CamModelsIE
from .camtube import CamTubeIE
from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
@@ -195,6 +199,7 @@ from .clippit import ClippitIE
from .cliprs import ClipRsIE
from .clipsyndicate import ClipsyndicateIE
from .closertotruth import CloserToTruthIE
from .cloudflarestream import CloudflareStreamIE
from .cloudy import CloudyIE
from .clubic import ClubicIE
from .clyp import ClypIE
@@ -281,6 +286,7 @@ from .drtv import (
DRTVIE,
DRTVLiveIE,
)
from .dtube import DTubeIE
from .dvtv import DVTVIE
from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE
@@ -326,7 +332,6 @@ from .espn import (
FiveThirtyEightIE,
)
from .esri import EsriVideoIE
from .etonline import ETOnlineIE
from .europa import EuropaIE
from .everyonesmixtape import EveryonesMixtapeIE
from .expotv import ExpoTVIE
@@ -377,6 +382,7 @@ from .francetv import (
FranceTVSiteIE,
FranceTVEmbedIE,
FranceTVInfoIE,
FranceTVInfoSportIE,
FranceTVJeunesseIE,
GenerationWhatIE,
CultureboxIE,
@@ -467,10 +473,7 @@ from .imgur import (
)
from .ina import InaIE
from .inc import IncIE
from .indavideo import (
IndavideoIE,
IndavideoEmbedIE,
)
from .indavideo import IndavideoEmbedIE
from .infoq import InfoQIE
from .instagram import InstagramIE, InstagramUserIE
from .internazionale import InternazionaleIE
@@ -478,7 +481,10 @@ from .internetvideoarchive import InternetVideoArchiveIE
from .iprima import IPrimaIE
from .iqiyi import IqiyiIE
from .ir90tv import Ir90TvIE
from .itv import ITVIE
from .itv import (
ITVIE,
ITVBTCCIE,
)
from .ivi import (
IviIE,
IviCompilationIE
@@ -577,7 +583,6 @@ from .mailru import (
MailRuMusicIE,
MailRuMusicSearchIE,
)
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE
from .mangomolo import (
MangomoloVideoIE,
@@ -620,7 +625,6 @@ from .mnet import MnetIE
from .moevideo import MoeVideoIE
from .mofosex import MofosexIE
from .mojvideo import MojvideoIE
from .moniker import MonikerIE
from .morningstar import MorningstarIE
from .motherless import (
MotherlessIE,
@@ -641,6 +645,7 @@ from .mtv import (
from .muenchentv import MuenchenTVIE
from .musicplayon import MusicPlayOnIE
from .mwave import MwaveIE, MwaveMeetGreetIE
from .mychannels import MyChannelsIE
from .myspace import MySpaceIE, MySpaceAlbumIE
from .myspass import MySpassIE
from .myvi import (
@@ -662,6 +667,7 @@ from .nbc import (
NBCOlympicsIE,
NBCOlympicsStreamIE,
NBCSportsIE,
NBCSportsStreamIE,
NBCSportsVPlayerIE,
)
from .ndr import (
@@ -701,12 +707,7 @@ from .nexx import (
from .nfb import NFBIE
from .nfl import NFLIE
from .nhk import NhkVodIE
from .nhl import (
NHLVideocenterIE,
NHLNewsIE,
NHLVideocenterCategoryIE,
NHLIE,
)
from .nhl import NHLIE
from .nick import (
NickIE,
NickBrIE,
@@ -715,10 +716,7 @@ from .nick import (
NickRuIE,
)
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninecninemedia import (
NineCNineMediaStackIE,
NineCNineMediaIE,
)
from .ninecninemedia import NineCNineMediaIE
from .ninegag import NineGagIE
from .ninenow import NineNowIE
from .nintendo import NintendoIE
@@ -806,6 +804,7 @@ from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
from .pbs import PBSIE
from .pearvideo import PearVideoIE
from .peertube import PeerTubeIE
from .people import PeopleIE
from .performgroup import PerformGroupIE
from .periscope import (
@@ -815,6 +814,10 @@ from .periscope import (
from .philharmoniedeparis import PhilharmonieDeParisIE
from .phoenix import PhoenixIE
from .photobucket import PhotobucketIE
from .picarto import (
PicartoIE,
PicartoVodIE,
)
from .piksel import PikselIE
from .pinkbike import PinkbikeIE
from .pladform import PladformIE
@@ -1007,7 +1010,10 @@ from .spankbang import SpankBangIE
from .spankwire import SpankwireIE
from .spiegel import SpiegelIE, SpiegelArticleIE
from .spiegeltv import SpiegeltvIE
from .spike import SpikeIE
from .spike import (
BellatorIE,
ParamountNetworkIE,
)
from .stitcher import StitcherIE
from .sport5 import Sport5IE
from .sportbox import SportBoxEmbedIE
@@ -1031,6 +1037,7 @@ from .sunporno import SunPornoIE
from .svt import (
SVTIE,
SVTPlayIE,
SVTSeriesIE,
)
from .swrmediathek import SWRMediathekIE
from .syfy import SyfyIE
@@ -1132,10 +1139,12 @@ from .tvc import (
from .tvigle import TvigleIE
from .tvland import TVLandIE
from .tvn24 import TVN24IE
from .tvnet import TVNetIE
from .tvnoe import TVNoeIE
from .tvnow import (
TVNowIE,
TVNowListIE,
TVNowShowIE,
)
from .tvp import (
TVPEmbedIE,
@@ -1413,5 +1422,11 @@ from .youtube import (
)
from .zapiks import ZapiksIE
from .zaq1 import Zaq1IE
from .zattoo import (
QuicklineIE,
QuicklineLiveIE,
ZattooIE,
ZattooLiveIE,
)
from .zdf import ZDFIE, ZDFChannelIE
from .zingmp3 import ZingMp3IE

View File

@@ -8,12 +8,12 @@ class ExtremeTubeIE(KeezMoviesIE):
_VALID_URL = r'https?://(?:www\.)?extremetube\.com/(?:[^/]+/)?video/(?P<id>[^/#?&]+)'
_TESTS = [{
'url': 'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
'md5': '1fb9228f5e3332ec8c057d6ac36f33e0',
'md5': '92feaafa4b58e82f261e5419f39c60cb',
'info_dict': {
'id': 'music-video-14-british-euro-brit-european-cumshots-swallow-652431',
'ext': 'mp4',
'title': 'Music Video 14 british euro brit european cumshots swallow',
'uploader': 'unknown',
'uploader': 'anonim',
'view_count': int,
'age_limit': 18,
}
@@ -36,10 +36,10 @@ class ExtremeTubeIE(KeezMoviesIE):
r'<h1[^>]+title="([^"]+)"[^>]*>', webpage, 'title')
uploader = self._html_search_regex(
r'Uploaded by:\s*</strong>\s*(.+?)\s*</div>',
r'Uploaded by:\s*</[^>]+>\s*<a[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False)
view_count = str_to_int(self._search_regex(
r'Views:\s*</strong>\s*<span>([\d,\.]+)</span>',
r'Views:\s*</[^>]+>\s*<[^>]+>([\d,\.]+)</',
webpage, 'view count', fatal=False))
info.update({

View File

@@ -56,6 +56,7 @@ class FacebookIE(InfoExtractor):
_CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36'
_VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s'
_VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true'
_TESTS = [{
'url': 'https://www.facebook.com/video.php?v=637842556329505&fref=nf',
@@ -208,6 +209,17 @@ class FacebookIE(InfoExtractor):
# no title
'url': 'https://www.facebook.com/onlycleverentertainment/videos/1947995502095005/',
'only_matching': True,
}, {
'url': 'https://www.facebook.com/WatchESLOne/videos/359649331226507/',
'info_dict': {
'id': '359649331226507',
'ext': 'mp4',
'title': '#ESLOne VoD - Birmingham Finals Day#1 Fnatic vs. @Evil Geniuses',
'uploader': 'ESL One Dota 2',
},
'params': {
'skip_download': True,
},
}]
@staticmethod
@@ -226,7 +238,7 @@ class FacebookIE(InfoExtractor):
return urls
def _login(self):
(useremail, password) = self._get_login_info()
useremail, password = self._get_login_info()
if useremail is None:
return
@@ -312,16 +324,18 @@ class FacebookIE(InfoExtractor):
if server_js_data:
video_data = extract_video_data(server_js_data.get('instances', []))
def extract_from_jsmods_instances(js_data):
if js_data:
return extract_video_data(try_get(
js_data, lambda x: x['jsmods']['instances'], list) or [])
if not video_data:
server_js_data = self._parse_json(
self._search_regex(
r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:stream_pagelet|pagelet_group_mall|permalink_video_pagelet)',
webpage, 'js data', default='{}'),
video_id, transform_source=js_to_json, fatal=False)
if server_js_data:
video_data = extract_video_data(try_get(
server_js_data, lambda x: x['jsmods']['instances'],
list) or [])
video_data = extract_from_jsmods_instances(server_js_data)
if not video_data:
if not fatal_if_no_video:
@@ -333,8 +347,33 @@ class FacebookIE(InfoExtractor):
expected=True)
elif '>You must log in to continue' in webpage:
self.raise_login_required()
else:
raise ExtractorError('Cannot parse data')
# Video info not in first request, do a secondary request using
# tahoe player specific URL
tahoe_data = self._download_webpage(
self._VIDEO_PAGE_TAHOE_TEMPLATE % video_id, video_id,
data=urlencode_postdata({
'__user': 0,
'__a': 1,
'__pc': self._search_regex(
r'pkg_cohort["\']\s*:\s*["\'](.+?)["\']', webpage,
'pkg cohort', default='PHASED:DEFAULT'),
'__rev': self._search_regex(
r'client_revision["\']\s*:\s*(\d+),', webpage,
'client revision', default='3944515'),
}),
headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
tahoe_js_data = self._parse_json(
self._search_regex(
r'for\s+\(\s*;\s*;\s*\)\s*;(.+)', tahoe_data,
'tahoe js data', default='{}'),
video_id, fatal=False)
video_data = extract_from_jsmods_instances(tahoe_js_data)
if not video_data:
raise ExtractorError('Cannot parse data')
formats = []
for f in video_data:
@@ -380,7 +419,8 @@ class FacebookIE(InfoExtractor):
video_title = 'Facebook video #%s' % video_id
uploader = clean_html(get_element_by_id(
'fbPhotoPageAuthorName', webpage)) or self._search_regex(
r'ownerName\s*:\s*"([^"]+)"', webpage, 'uploader', fatal=False)
r'ownerName\s*:\s*"([^"]+)"', webpage, 'uploader',
fatal=False) or self._og_search_title(webpage, fatal=False)
timestamp = int_or_none(self._search_regex(
r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
'timestamp', default=None))

View File

@@ -46,7 +46,7 @@ class FC2IE(InfoExtractor):
}]
def _login(self):
(username, password) = self._get_login_info()
username, password = self._get_login_info()
if username is None or password is None:
return False

View File

@@ -379,6 +379,31 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
return self._make_url_result(video_id, catalogue)
class FranceTVInfoSportIE(FranceTVBaseInfoExtractor):
IE_NAME = 'sport.francetvinfo.fr'
_VALID_URL = r'https?://sport\.francetvinfo\.fr/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://sport.francetvinfo.fr/les-jeux-olympiques/retour-sur-les-meilleurs-moments-de-pyeongchang-2018',
'info_dict': {
'id': '6e49080e-3f45-11e8-b459-000d3a2439ea',
'ext': 'mp4',
'title': 'Retour sur les meilleurs moments de Pyeongchang 2018',
'timestamp': 1523639962,
'upload_date': '20180413',
},
'params': {
'skip_download': True,
},
'add_ie': [FranceTVIE.ie_key()],
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(r'data-video="([^"]+)"', webpage, 'video_id')
return self._make_url_result(video_id, 'Sport-web')
class GenerationWhatIE(InfoExtractor):
IE_NAME = 'france2.fr:generation-what'
_VALID_URL = r'https?://generation-what\.francetv\.fr/[^/]+/video/(?P<id>[^/?#&]+)'

View File

@@ -51,7 +51,7 @@ class FunimationIE(InfoExtractor):
}]
def _login(self):
(username, password) = self._get_login_info()
username, password = self._get_login_info()
if username is None:
return
try:

View File

@@ -5,7 +5,10 @@ import re
from .common import InfoExtractor
from .nexx import NexxIE
from ..utils import int_or_none
from ..utils import (
int_or_none,
try_get,
)
class FunkBaseIE(InfoExtractor):
@@ -77,6 +80,20 @@ class FunkChannelIE(FunkBaseIE):
'params': {
'skip_download': True,
},
}, {
# only available via byIdList API
'url': 'https://www.funk.net/channel/informr/martin-sonneborn-erklaert-die-eu',
'info_dict': {
'id': '205067',
'ext': 'mp4',
'title': 'Martin Sonneborn erklärt die EU',
'description': 'md5:050f74626e4ed87edf4626d2024210c0',
'timestamp': 1494424042,
'upload_date': '20170510',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/',
'only_matching': True,
@@ -87,16 +104,28 @@ class FunkChannelIE(FunkBaseIE):
channel_id = mobj.group('id')
alias = mobj.group('alias')
results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter', channel_id,
headers={
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbCIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxzZWFyY2gtYXBpIn0.q4Y2xZG8PFHai24-4Pjx2gym9RmJejtmK6lMXP5wAgc',
'Referer': url,
}, query={
'channelId': channel_id,
'size': 100,
})['result']
headers = {
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbCIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxzZWFyY2gtYXBpIn0.q4Y2xZG8PFHai24-4Pjx2gym9RmJejtmK6lMXP5wAgc',
'Referer': url,
}
video = next(r for r in results if r.get('alias') == alias)
video = None
by_id_list = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/byIdList', channel_id,
headers=headers, query={
'ids': alias,
}, fatal=False)
if by_id_list:
video = try_get(by_id_list, lambda x: x['result'][0], dict)
if not video:
results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter', channel_id,
headers=headers, query={
'channelId': channel_id,
'size': 100,
})['result']
video = next(r for r in results if r.get('alias') == alias)
return self._make_url_result(video)

View File

@@ -41,7 +41,7 @@ class FXNetworksIE(AdobePassIE):
if 'The content you are trying to access is not available in your region.' in webpage:
self.raise_geo_restricted()
video_data = extract_attributes(self._search_regex(
r'(<a.+?rel="http://link\.theplatform\.com/s/.+?</a>)', webpage, 'video data'))
r'(<a.+?rel="https?://link\.theplatform\.com/s/.+?</a>)', webpage, 'video data'))
player_type = self._search_regex(r'playerType\s*=\s*[\'"]([^\'"]+)', webpage, 'player type', default=None)
release_url = video_data['rel']
title = video_data['data-title']

View File

@@ -91,7 +91,7 @@ class GDCVaultIE(InfoExtractor):
]
def _login(self, webpage_url, display_id):
(username, password) = self._get_login_info()
username, password = self._get_login_info()
if username is None or password is None:
self.report_warning('It looks like ' + webpage_url + ' requires a login. Try specifying a username and password and try again.')
return None

View File

@@ -23,6 +23,7 @@ from ..utils import (
is_html,
js_to_json,
KNOWN_EXTENSIONS,
merge_dicts,
mimetype2ext,
orderedSet,
sanitized_Request,
@@ -58,6 +59,7 @@ from .xhamster import XHamsterEmbedIE
from .tnaflix import TNAFlixNetworkEmbedIE
from .drtuber import DrTuberIE
from .redtube import RedTubeIE
from .tube8 import Tube8IE
from .vimeo import VimeoIE
from .dailymotion import DailymotionIE
from .dailymail import DailyMailIE
@@ -105,6 +107,10 @@ from .springboardplatform import SpringboardPlatformIE
from .yapfiles import YapFilesIE
from .vice import ViceIE
from .xfileshare import XFileShareIE
from .cloudflarestream import CloudflareStreamIE
from .peertube import PeerTubeIE
from .indavideo import IndavideoEmbedIE
from .apa import APAIE
class GenericIE(InfoExtractor):
@@ -189,6 +195,16 @@ class GenericIE(InfoExtractor):
'title': 'pdv_maddow_netcast_m4v-02-27-2015-201624',
}
},
# RSS feed with enclosures and unsupported link URLs
{
'url': 'http://www.hellointernet.fm/podcast?format=rss',
'info_dict': {
'id': 'http://www.hellointernet.fm/podcast?format=rss',
'description': 'CGP Grey and Brady Haran talk about YouTube, life, work, whatever.',
'title': 'Hello Internet',
},
'playlist_mincount': 100,
},
# SMIL from http://videolectures.net/promogram_igor_mekjavic_eng
{
'url': 'http://videolectures.net/promogram_igor_mekjavic_eng/video/1/smil.xml',
@@ -1219,7 +1235,7 @@ class GenericIE(InfoExtractor):
'title': '35871',
'timestamp': 1355743100,
'upload_date': '20121217',
'uploader_id': 'batchUser',
'uploader_id': 'cplapp@learn360.com',
},
'add_ie': ['Kaltura'],
},
@@ -1270,6 +1286,39 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['Kaltura'],
},
{
# Kaltura iframe embed, more sophisticated
'url': 'http://www.cns.nyu.edu/~eero/math-tools/Videos/lecture-05sep2017.html',
'info_dict': {
'id': '1_9gzouybz',
'ext': 'mp4',
'title': 'lecture-05sep2017',
'description': 'md5:40f347d91fd4ba047e511c5321064b49',
'upload_date': '20170913',
'uploader_id': 'eps2',
'timestamp': 1505340777,
},
'params': {
'skip_download': True,
},
'add_ie': ['Kaltura'],
},
{
# meta twitter:player
'url': 'http://thechive.com/2017/12/08/all-i-want-for-christmas-is-more-twerk/',
'info_dict': {
'id': '0_01b42zps',
'ext': 'mp4',
'title': 'Main Twerk (Video)',
'upload_date': '20171208',
'uploader_id': 'sebastian.salinas@thechive.com',
'timestamp': 1512713057,
},
'params': {
'skip_download': True,
},
'add_ie': ['Kaltura'],
},
# referrer protected EaglePlatform embed
{
'url': 'https://tvrain.ru/lite/teleshow/kak_vse_nachinalos/namin-418921/',
@@ -1426,21 +1475,6 @@ class GenericIE(InfoExtractor):
},
'expected_warnings': ['Failed to parse JSON Expecting value'],
},
# Ooyala embed
{
'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
'info_dict': {
'id': '50YnY4czr4ms1vJ7yz3xzq0excz_pUMs',
'ext': 'mp4',
'description': 'Index/Match versus VLOOKUP.',
'title': 'This is what separates the Excel masters from the wannabes',
'duration': 191.933,
},
'params': {
# m3u8 downloads
'skip_download': True,
}
},
# Brightcove URL in single quotes
{
'url': 'http://www.sportsnet.ca/baseball/mlb/sn-presents-russell-martin-world-citizen/',
@@ -1967,7 +2001,74 @@ class GenericIE(InfoExtractor):
'params': {
'skip_download': True,
},
}
},
{
# CloudflareStream embed
'url': 'https://www.cloudflare.com/products/cloudflare-stream/',
'info_dict': {
'id': '31c9291ab41fac05471db4e73aa11717',
'ext': 'mp4',
'title': '31c9291ab41fac05471db4e73aa11717',
},
'add_ie': [CloudflareStreamIE.ie_key()],
'params': {
'skip_download': True,
},
},
{
# PeerTube embed
'url': 'https://joinpeertube.org/fr/home/',
'info_dict': {
'id': 'home',
'title': 'Reprenez le contrôle de vos vidéos ! #JoinPeertube',
},
'playlist_count': 2,
},
{
# Indavideo embed
'url': 'https://streetkitchen.hu/receptek/igy_kell_otthon_hamburgert_sutni/',
'info_dict': {
'id': '1693903',
'ext': 'mp4',
'title': 'Így kell otthon hamburgert sütni',
'description': 'md5:f5a730ecf900a5c852e1e00540bbb0f7',
'timestamp': 1426330212,
'upload_date': '20150314',
'uploader': 'StreetKitchen',
'uploader_id': '546363',
},
'add_ie': [IndavideoEmbedIE.ie_key()],
'params': {
'skip_download': True,
},
},
{
# APA embed via JWPlatform embed
'url': 'http://www.vol.at/blue-man-group/5593454',
'info_dict': {
'id': 'jjv85FdZ',
'ext': 'mp4',
'title': '"Blau ist mysteriös": Die Blue Man Group im Interview',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 254,
'timestamp': 1519211149,
'upload_date': '20180221',
},
'params': {
'skip_download': True,
},
},
{
'url': 'http://share-videos.se/auto/video/83645793?uid=13',
'md5': 'b68d276de422ab07ee1d49388103f457',
'info_dict': {
'id': '83645793',
'title': 'Lock up and get excited',
'ext': 'mp4'
},
'skip': 'TODO: fix nested playlists processing in tests',
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
@@ -1998,13 +2099,15 @@ class GenericIE(InfoExtractor):
entries = []
for it in doc.findall('./channel/item'):
next_url = xpath_text(it, 'link', fatal=False)
next_url = None
enclosure_nodes = it.findall('./enclosure')
for e in enclosure_nodes:
next_url = e.attrib.get('url')
if next_url:
break
if not next_url:
enclosure_nodes = it.findall('./enclosure')
for e in enclosure_nodes:
next_url = e.attrib.get('url')
if next_url:
break
next_url = xpath_text(it, 'link', fatal=False)
if not next_url:
continue
@@ -2546,6 +2649,11 @@ class GenericIE(InfoExtractor):
if redtube_urls:
return self.playlist_from_matches(redtube_urls, video_id, video_title, ie=RedTubeIE.ie_key())
# Look for embedded Tube8 player
tube8_urls = Tube8IE._extract_urls(webpage)
if tube8_urls:
return self.playlist_from_matches(tube8_urls, video_id, video_title, ie=Tube8IE.ie_key())
# Look for embedded Tvigle player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
@@ -2963,20 +3071,32 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
xfileshare_urls, video_id, video_title, ie=XFileShareIE.ie_key())
def merge_dicts(dict1, dict2):
merged = {}
for k, v in dict1.items():
if v is not None:
merged[k] = v
for k, v in dict2.items():
if v is None:
continue
if (k not in merged or
(isinstance(v, compat_str) and v and
isinstance(merged[k], compat_str) and
not merged[k])):
merged[k] = v
return merged
cloudflarestream_urls = CloudflareStreamIE._extract_urls(webpage)
if cloudflarestream_urls:
return self.playlist_from_matches(
cloudflarestream_urls, video_id, video_title, ie=CloudflareStreamIE.ie_key())
peertube_urls = PeerTubeIE._extract_urls(webpage)
if peertube_urls:
return self.playlist_from_matches(
peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
if indavideo_urls:
return self.playlist_from_matches(
indavideo_urls, video_id, video_title, ie=IndavideoEmbedIE.ie_key())
apa_urls = APAIE._extract_urls(webpage)
if apa_urls:
return self.playlist_from_matches(
apa_urls, video_id, video_title, ie=APAIE.ie_key())
sharevideos_urls = [mobj.group('url') for mobj in re.finditer(
r'<iframe[^>]+?\bsrc\s*=\s*(["\'])(?P<url>(?:https?:)?//embed\.share-videos\.se/auto/embed/\d+\?.*?\buid=\d+.*?)\1',
webpage)]
if sharevideos_urls:
return self.playlist_from_matches(
sharevideos_urls, video_id, video_title)
# Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')

View File

@@ -1,15 +1,16 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import hashlib
import json
import random
import re
import math
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
compat_chr,
compat_ord,
)
from ..utils import (
ExtractorError,
@@ -22,12 +23,7 @@ from ..utils import (
class GloboIE(InfoExtractor):
_VALID_URL = r'(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})'
_API_URL_TEMPLATE = 'http://api.globovideos.com/videos/%s/playlist'
_SECURITY_URL_TEMPLATE = 'http://security.video.globo.com/videos/%s/hash?player=flash&version=17.0.0.132&resource_id=%s'
_RESIGN_EXPIRATION = 86400
_NETRC_MACHINE = 'globo'
_TESTS = [{
'url': 'http://g1.globo.com/carros/autoesporte/videos/t/exclusivos-do-g1/v/mercedes-benz-gla-passa-por-teste-de-colisao-na-europa/3607726/',
'md5': 'b3ccc801f75cd04a914d51dadb83a78d',
@@ -70,287 +66,51 @@ class GloboIE(InfoExtractor):
'only_matching': True,
}]
class MD5(object):
HEX_FORMAT_LOWERCASE = 0
HEX_FORMAT_UPPERCASE = 1
BASE64_PAD_CHARACTER_DEFAULT_COMPLIANCE = ''
BASE64_PAD_CHARACTER_RFC_COMPLIANCE = '='
PADDING = '=0xFF01DD'
hexcase = 0
b64pad = ''
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
return
def __init__(self):
pass
class JSArray(list):
def __getitem__(self, y):
try:
return list.__getitem__(self, y)
except IndexError:
return 0
def __setitem__(self, i, y):
try:
return list.__setitem__(self, i, y)
except IndexError:
self.extend([0] * (i - len(self) + 1))
self[-1] = y
@classmethod
def hex_md5(cls, param1):
return cls.rstr2hex(cls.rstr_md5(cls.str2rstr_utf8(param1)))
@classmethod
def b64_md5(cls, param1, param2=None):
return cls.rstr2b64(cls.rstr_md5(cls.str2rstr_utf8(param1, param2)))
@classmethod
def any_md5(cls, param1, param2):
return cls.rstr2any(cls.rstr_md5(cls.str2rstr_utf8(param1)), param2)
@classmethod
def rstr_md5(cls, param1):
return cls.binl2rstr(cls.binl_md5(cls.rstr2binl(param1), len(param1) * 8))
@classmethod
def rstr2hex(cls, param1):
_loc_2 = '0123456789ABCDEF' if cls.hexcase else '0123456789abcdef'
_loc_3 = ''
for _loc_5 in range(0, len(param1)):
_loc_4 = compat_ord(param1[_loc_5])
_loc_3 += _loc_2[_loc_4 >> 4 & 15] + _loc_2[_loc_4 & 15]
return _loc_3
@classmethod
def rstr2b64(cls, param1):
_loc_2 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_'
_loc_3 = ''
_loc_4 = len(param1)
for _loc_5 in range(0, _loc_4, 3):
_loc_6_1 = compat_ord(param1[_loc_5]) << 16
_loc_6_2 = compat_ord(param1[_loc_5 + 1]) << 8 if _loc_5 + 1 < _loc_4 else 0
_loc_6_3 = compat_ord(param1[_loc_5 + 2]) if _loc_5 + 2 < _loc_4 else 0
_loc_6 = _loc_6_1 | _loc_6_2 | _loc_6_3
for _loc_7 in range(0, 4):
if _loc_5 * 8 + _loc_7 * 6 > len(param1) * 8:
_loc_3 += cls.b64pad
else:
_loc_3 += _loc_2[_loc_6 >> 6 * (3 - _loc_7) & 63]
return _loc_3
@staticmethod
def rstr2any(param1, param2):
_loc_3 = len(param2)
_loc_4 = []
_loc_9 = [0] * ((len(param1) >> 2) + 1)
for _loc_5 in range(0, len(_loc_9)):
_loc_9[_loc_5] = compat_ord(param1[_loc_5 * 2]) << 8 | compat_ord(param1[_loc_5 * 2 + 1])
while len(_loc_9) > 0:
_loc_8 = []
_loc_7 = 0
for _loc_5 in range(0, len(_loc_9)):
_loc_7 = (_loc_7 << 16) + _loc_9[_loc_5]
_loc_6 = math.floor(_loc_7 / _loc_3)
_loc_7 -= _loc_6 * _loc_3
if len(_loc_8) > 0 or _loc_6 > 0:
_loc_8[len(_loc_8)] = _loc_6
_loc_4[len(_loc_4)] = _loc_7
_loc_9 = _loc_8
_loc_10 = ''
_loc_5 = len(_loc_4) - 1
while _loc_5 >= 0:
_loc_10 += param2[_loc_4[_loc_5]]
_loc_5 -= 1
return _loc_10
@classmethod
def str2rstr_utf8(cls, param1, param2=None):
_loc_3 = ''
_loc_4 = -1
if not param2:
param2 = cls.PADDING
param1 = param1 + param2[1:9]
while True:
_loc_4 += 1
if _loc_4 >= len(param1):
break
_loc_5 = compat_ord(param1[_loc_4])
_loc_6 = compat_ord(param1[_loc_4 + 1]) if _loc_4 + 1 < len(param1) else 0
if 55296 <= _loc_5 <= 56319 and 56320 <= _loc_6 <= 57343:
_loc_5 = 65536 + ((_loc_5 & 1023) << 10) + (_loc_6 & 1023)
_loc_4 += 1
if _loc_5 <= 127:
_loc_3 += compat_chr(_loc_5)
continue
if _loc_5 <= 2047:
_loc_3 += compat_chr(192 | _loc_5 >> 6 & 31) + compat_chr(128 | _loc_5 & 63)
continue
if _loc_5 <= 65535:
_loc_3 += compat_chr(224 | _loc_5 >> 12 & 15) + compat_chr(128 | _loc_5 >> 6 & 63) + compat_chr(
128 | _loc_5 & 63)
continue
if _loc_5 <= 2097151:
_loc_3 += compat_chr(240 | _loc_5 >> 18 & 7) + compat_chr(128 | _loc_5 >> 12 & 63) + compat_chr(
128 | _loc_5 >> 6 & 63) + compat_chr(128 | _loc_5 & 63)
return _loc_3
@staticmethod
def rstr2binl(param1):
_loc_2 = [0] * ((len(param1) >> 2) + 1)
for _loc_3 in range(0, len(_loc_2)):
_loc_2[_loc_3] = 0
for _loc_3 in range(0, len(param1) * 8, 8):
_loc_2[_loc_3 >> 5] |= (compat_ord(param1[_loc_3 // 8]) & 255) << _loc_3 % 32
return _loc_2
@staticmethod
def binl2rstr(param1):
_loc_2 = ''
for _loc_3 in range(0, len(param1) * 32, 8):
_loc_2 += compat_chr(param1[_loc_3 >> 5] >> _loc_3 % 32 & 255)
return _loc_2
@classmethod
def binl_md5(cls, param1, param2):
param1 = cls.JSArray(param1)
param1[param2 >> 5] |= 128 << param2 % 32
param1[(param2 + 64 >> 9 << 4) + 14] = param2
_loc_3 = 1732584193
_loc_4 = -271733879
_loc_5 = -1732584194
_loc_6 = 271733878
for _loc_7 in range(0, len(param1), 16):
_loc_8 = _loc_3
_loc_9 = _loc_4
_loc_10 = _loc_5
_loc_11 = _loc_6
_loc_3 = cls.md5_ff(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 0], 7, -680876936)
_loc_6 = cls.md5_ff(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 1], 12, -389564586)
_loc_5 = cls.md5_ff(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 2], 17, 606105819)
_loc_4 = cls.md5_ff(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 3], 22, -1044525330)
_loc_3 = cls.md5_ff(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 4], 7, -176418897)
_loc_6 = cls.md5_ff(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 5], 12, 1200080426)
_loc_5 = cls.md5_ff(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 6], 17, -1473231341)
_loc_4 = cls.md5_ff(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 7], 22, -45705983)
_loc_3 = cls.md5_ff(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 8], 7, 1770035416)
_loc_6 = cls.md5_ff(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 9], 12, -1958414417)
_loc_5 = cls.md5_ff(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 10], 17, -42063)
_loc_4 = cls.md5_ff(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 11], 22, -1990404162)
_loc_3 = cls.md5_ff(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 12], 7, 1804603682)
_loc_6 = cls.md5_ff(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 13], 12, -40341101)
_loc_5 = cls.md5_ff(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 14], 17, -1502002290)
_loc_4 = cls.md5_ff(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 15], 22, 1236535329)
_loc_3 = cls.md5_gg(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 1], 5, -165796510)
_loc_6 = cls.md5_gg(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 6], 9, -1069501632)
_loc_5 = cls.md5_gg(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 11], 14, 643717713)
_loc_4 = cls.md5_gg(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 0], 20, -373897302)
_loc_3 = cls.md5_gg(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 5], 5, -701558691)
_loc_6 = cls.md5_gg(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 10], 9, 38016083)
_loc_5 = cls.md5_gg(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 15], 14, -660478335)
_loc_4 = cls.md5_gg(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 4], 20, -405537848)
_loc_3 = cls.md5_gg(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 9], 5, 568446438)
_loc_6 = cls.md5_gg(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 14], 9, -1019803690)
_loc_5 = cls.md5_gg(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 3], 14, -187363961)
_loc_4 = cls.md5_gg(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 8], 20, 1163531501)
_loc_3 = cls.md5_gg(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 13], 5, -1444681467)
_loc_6 = cls.md5_gg(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 2], 9, -51403784)
_loc_5 = cls.md5_gg(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 7], 14, 1735328473)
_loc_4 = cls.md5_gg(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 12], 20, -1926607734)
_loc_3 = cls.md5_hh(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 5], 4, -378558)
_loc_6 = cls.md5_hh(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 8], 11, -2022574463)
_loc_5 = cls.md5_hh(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 11], 16, 1839030562)
_loc_4 = cls.md5_hh(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 14], 23, -35309556)
_loc_3 = cls.md5_hh(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 1], 4, -1530992060)
_loc_6 = cls.md5_hh(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 4], 11, 1272893353)
_loc_5 = cls.md5_hh(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 7], 16, -155497632)
_loc_4 = cls.md5_hh(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 10], 23, -1094730640)
_loc_3 = cls.md5_hh(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 13], 4, 681279174)
_loc_6 = cls.md5_hh(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 0], 11, -358537222)
_loc_5 = cls.md5_hh(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 3], 16, -722521979)
_loc_4 = cls.md5_hh(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 6], 23, 76029189)
_loc_3 = cls.md5_hh(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 9], 4, -640364487)
_loc_6 = cls.md5_hh(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 12], 11, -421815835)
_loc_5 = cls.md5_hh(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 15], 16, 530742520)
_loc_4 = cls.md5_hh(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 2], 23, -995338651)
_loc_3 = cls.md5_ii(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 0], 6, -198630844)
_loc_6 = cls.md5_ii(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 7], 10, 1126891415)
_loc_5 = cls.md5_ii(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 14], 15, -1416354905)
_loc_4 = cls.md5_ii(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 5], 21, -57434055)
_loc_3 = cls.md5_ii(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 12], 6, 1700485571)
_loc_6 = cls.md5_ii(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 3], 10, -1894986606)
_loc_5 = cls.md5_ii(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 10], 15, -1051523)
_loc_4 = cls.md5_ii(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 1], 21, -2054922799)
_loc_3 = cls.md5_ii(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 8], 6, 1873313359)
_loc_6 = cls.md5_ii(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 15], 10, -30611744)
_loc_5 = cls.md5_ii(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 6], 15, -1560198380)
_loc_4 = cls.md5_ii(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 13], 21, 1309151649)
_loc_3 = cls.md5_ii(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 4], 6, -145523070)
_loc_6 = cls.md5_ii(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 11], 10, -1120210379)
_loc_5 = cls.md5_ii(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 2], 15, 718787259)
_loc_4 = cls.md5_ii(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 9], 21, -343485551)
_loc_3 = cls.safe_add(_loc_3, _loc_8)
_loc_4 = cls.safe_add(_loc_4, _loc_9)
_loc_5 = cls.safe_add(_loc_5, _loc_10)
_loc_6 = cls.safe_add(_loc_6, _loc_11)
return [_loc_3, _loc_4, _loc_5, _loc_6]
@classmethod
def md5_cmn(cls, param1, param2, param3, param4, param5, param6):
return cls.safe_add(
cls.bit_rol(cls.safe_add(cls.safe_add(param2, param1), cls.safe_add(param4, param6)), param5), param3)
@classmethod
def md5_ff(cls, param1, param2, param3, param4, param5, param6, param7):
return cls.md5_cmn(param2 & param3 | ~param2 & param4, param1, param2, param5, param6, param7)
@classmethod
def md5_gg(cls, param1, param2, param3, param4, param5, param6, param7):
return cls.md5_cmn(param2 & param4 | param3 & ~param4, param1, param2, param5, param6, param7)
@classmethod
def md5_hh(cls, param1, param2, param3, param4, param5, param6, param7):
return cls.md5_cmn(param2 ^ param3 ^ param4, param1, param2, param5, param6, param7)
@classmethod
def md5_ii(cls, param1, param2, param3, param4, param5, param6, param7):
return cls.md5_cmn(param3 ^ (param2 | ~param4), param1, param2, param5, param6, param7)
@classmethod
def safe_add(cls, param1, param2):
_loc_3 = (param1 & 65535) + (param2 & 65535)
_loc_4 = (param1 >> 16) + (param2 >> 16) + (_loc_3 >> 16)
return cls.lshift(_loc_4, 16) | _loc_3 & 65535
@classmethod
def bit_rol(cls, param1, param2):
return cls.lshift(param1, param2) | (param1 & 0xFFFFFFFF) >> (32 - param2)
@staticmethod
def lshift(value, count):
r = (0xFFFFFFFF & value) << count
return -(~(r - 1) & 0xFFFFFFFF) if r > 0x7FFFFFFF else r
try:
self._download_json(
'https://login.globo.com/api/authentication', None, data=json.dumps({
'payload': {
'email': email,
'password': password,
'serviceId': 4654,
},
}).encode(), headers={
'Content-Type': 'application/json; charset=utf-8',
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
resp = self._parse_json(e.cause.read(), None)
raise ExtractorError(resp.get('userMessage') or resp['id'], expected=True)
raise
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
self._API_URL_TEMPLATE % video_id, video_id)['videos'][0]
'http://api.globovideos.com/videos/%s/playlist' % video_id,
video_id)['videos'][0]
title = video['title']
formats = []
for resource in video['resources']:
resource_id = resource.get('_id')
if not resource_id or resource_id.endswith('manifest'):
resource_url = resource.get('url')
if not resource_id or not resource_url:
continue
security = self._download_json(
self._SECURITY_URL_TEMPLATE % (video_id, resource_id),
video_id, 'Downloading security hash for %s' % resource_id)
'http://security.video.globo.com/videos/%s/hash' % video_id,
video_id, 'Downloading security hash for %s' % resource_id, query={
'player': 'flash',
'version': '17.0.0.132',
'resource_id': resource_id,
})
security_hash = security.get('hash')
if not security_hash:
@@ -361,22 +121,28 @@ class GloboIE(InfoExtractor):
continue
hash_code = security_hash[:2]
received_time = int(security_hash[2:12])
received_time = security_hash[2:12]
received_random = security_hash[12:22]
received_md5 = security_hash[22:]
sign_time = received_time + self._RESIGN_EXPIRATION
sign_time = compat_str(int(received_time) + 86400)
padding = '%010d' % random.randint(1, 10000000000)
signed_md5 = self.MD5.b64_md5(received_md5 + compat_str(sign_time) + padding)
signed_hash = hash_code + compat_str(received_time) + received_random + compat_str(sign_time) + padding + signed_md5
md5_data = (received_md5 + sign_time + padding + '0xFF01DD').encode()
signed_md5 = base64.urlsafe_b64encode(hashlib.md5(md5_data).digest()).decode().strip('=')
signed_hash = hash_code + received_time + received_random + sign_time + padding + signed_md5
resource_url = resource['url']
signed_url = '%s?h=%s&k=%s' % (resource_url, signed_hash, 'flash')
if resource_id.endswith('m3u8') or resource_url.endswith('.m3u8'):
formats.extend(self._extract_m3u8_formats(
signed_url, resource_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif resource_id.endswith('mpd') or resource_url.endswith('.mpd'):
formats.extend(self._extract_mpd_formats(
signed_url, resource_id, mpd_id='dash', fatal=False))
elif resource_id.endswith('manifest') or resource_url.endswith('/manifest'):
formats.extend(self._extract_ism_formats(
signed_url, resource_id, ism_id='mss', fatal=False))
else:
formats.append({
'url': signed_url,

View File

@@ -123,7 +123,7 @@ class GoIE(AdobePassIE):
'adobe_requestor_id': requestor_id,
})
else:
self._initialize_geo_bypass(['US'])
self._initialize_geo_bypass({'countries': ['US']})
entitlement = self._download_json(
'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json',
video_id, data=urlencode_postdata(data))

View File

@@ -6,7 +6,9 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
parse_age_limit,
parse_iso8601,
)
@@ -23,6 +25,7 @@ class Go90IE(InfoExtractor):
'description': 'VICE\'s Karley Sciortino meets with activists who discuss the state\'s strong anti-porn stance. Then, VICE Sports explains NFL contracts.',
'timestamp': 1491868800,
'upload_date': '20170411',
'age_limit': 14,
}
}
@@ -33,6 +36,8 @@ class Go90IE(InfoExtractor):
video_id, headers={
'Content-Type': 'application/json; charset=utf-8',
}, data=b'{"client":"web","device_type":"pc"}')
if video_data.get('requires_drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
main_video_asset = video_data['main_video_asset']
episode_number = int_or_none(video_data.get('episode_number'))
@@ -123,4 +128,5 @@ class Go90IE(InfoExtractor):
'season_number': season_number,
'episode_number': episode_number,
'subtitles': subtitles,
'age_limit': parse_age_limit(video_data.get('rating')),
}

View File

@@ -17,6 +17,8 @@ class HiDiveIE(InfoExtractor):
# Using X-Forwarded-For results in 403 HTTP error for HLS fragments,
# so disabling geo bypass completely
_GEO_BYPASS = False
_NETRC_MACHINE = 'hidive'
_LOGIN_URL = 'https://www.hidive.com/account/login'
_TESTS = [{
'url': 'https://www.hidive.com/stream/the-comic-artist-and-his-assistants/s01e001',
@@ -31,8 +33,26 @@ class HiDiveIE(InfoExtractor):
'params': {
'skip_download': True,
},
'skip': 'Requires Authentication',
}]
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
return
webpage = self._download_webpage(self._LOGIN_URL, None)
form = self._search_regex(
r'(?s)<form[^>]+action="/account/login"[^>]*>(.+?)</form>',
webpage, 'login form')
data = self._hidden_inputs(form)
data.update({
'Email': email,
'Password': password,
})
self._download_webpage(
self._LOGIN_URL, None, 'Logging in', data=urlencode_postdata(data))
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title, key = mobj.group('title', 'key')
@@ -43,6 +63,7 @@ class HiDiveIE(InfoExtractor):
data=urlencode_postdata({
'Title': title,
'Key': key,
'PlayerId': 'f4f895ce1ca713ba263b91caeb1daa2d08904783',
}))
restriction = settings.get('restrictionReason')
@@ -79,6 +100,7 @@ class HiDiveIE(InfoExtractor):
subtitles.setdefault(cc_lang, []).append({
'url': cc_url,
})
self._sort_formats(formats)
season_number = int_or_none(self._search_regex(
r's(\d+)', key, 'season number', default=None))

View File

@@ -66,7 +66,7 @@ class HRTiBaseIE(InfoExtractor):
self._logout_url = modules['user']['resources']['logout']['uri']
def _login(self):
(username, password) = self._get_login_info()
username, password = self._get_login_info()
# TODO: figure out authentication with cookies
if username is None or password is None:
self.raise_login_required()

View File

@@ -3,25 +3,27 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
mimetype2ext,
parse_duration,
qualities,
remove_end,
)
class ImdbIE(InfoExtractor):
IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title).+?[/-]vi(?P<id>\d+)'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).+?[/-]vi(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897',
'info_dict': {
'id': '2524815897',
'ext': 'mp4',
'title': 'Ice Age: Continental Drift Trailer (No. 2)',
'description': 'md5:9061c2219254e5d14e03c25c98e96a81',
'title': 'No. 2 from Ice Age: Continental Drift (2012)',
'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7',
}
}, {
'url': 'http://www.imdb.com/video/_/vi2524815897',
@@ -38,76 +40,67 @@ class ImdbIE(InfoExtractor):
}, {
'url': 'http://www.imdb.com/title/tt4218696/videoplayer/vi2608641561',
'only_matching': True,
}, {
'url': 'https://www.imdb.com/list/ls009921623/videoplayer/vi260482329',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage('http://www.imdb.com/video/imdb/vi%s' % video_id, video_id)
descr = self._html_search_regex(
r'(?s)<span itemprop="description">(.*?)</span>',
webpage, 'description', fatal=False)
player_url = 'http://www.imdb.com/video/imdb/vi%s/imdb/single' % video_id
player_page = self._download_webpage(
player_url, video_id, 'Downloading player page')
# the player page contains the info for the default format, we have to
# fetch other pages for the rest of the formats
extra_formats = re.findall(r'href="(?P<url>%s.*?)".*?>(?P<name>.*?)<' % re.escape(player_url), player_page)
format_pages = [
self._download_webpage(
f_url, video_id, 'Downloading info for %s format' % f_name)
for f_url, f_name in extra_formats]
format_pages.append(player_page)
webpage = self._download_webpage(
'https://www.imdb.com/videoplayer/vi' + video_id, video_id)
video_metadata = self._parse_json(self._search_regex(
r'window\.IMDbReactInitialState\.push\(({.+?})\);', webpage,
'video metadata'), video_id)['videos']['videoMetadata']['vi' + video_id]
title = self._html_search_meta(
['og:title', 'twitter:title'], webpage) or self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'title', fatal=False) or video_metadata['title']
quality = qualities(('SD', '480p', '720p', '1080p'))
formats = []
for format_page in format_pages:
json_data = self._search_regex(
r'<script[^>]+class="imdb-player-data"[^>]*?>(.*?)</script>',
format_page, 'json data', flags=re.DOTALL)
info = self._parse_json(json_data, video_id, fatal=False)
if not info:
for encoding in video_metadata.get('encodings', []):
if not encoding or not isinstance(encoding, dict):
continue
format_info = info.get('videoPlayerObject', {}).get('video', {})
if not format_info:
video_url = encoding.get('videoUrl')
if not video_url or not isinstance(video_url, compat_str):
continue
video_info_list = format_info.get('videoInfoList')
if not video_info_list or not isinstance(video_info_list, list):
ext = determine_ext(video_url, mimetype2ext(encoding.get('mimeType')))
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
video_info = video_info_list[0]
if not video_info or not isinstance(video_info, dict):
continue
video_url = video_info.get('videoUrl')
if not video_url:
continue
format_id = format_info.get('ffname')
format_id = encoding.get('definition')
formats.append({
'format_id': format_id,
'url': video_url,
'ext': mimetype2ext(video_info.get('videoMimeType')),
'ext': ext,
'quality': quality(format_id),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': remove_end(self._og_search_title(webpage), ' - IMDb'),
'title': title,
'formats': formats,
'description': descr,
'thumbnail': format_info.get('slate'),
'description': video_metadata.get('description'),
'thumbnail': video_metadata.get('slate', {}).get('url'),
'duration': parse_duration(video_metadata.get('duration')),
}
class ImdbListIE(InfoExtractor):
IE_NAME = 'imdb:list'
IE_DESC = 'Internet Movie Database lists'
_VALID_URL = r'https?://(?:www\.)?imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})'
_VALID_URL = r'https?://(?:www\.)?imdb\.com/list/ls(?P<id>\d{9})(?!/videoplayer/vi\d+)'
_TEST = {
'url': 'http://www.imdb.com/list/JFs9NWw6XI0',
'url': 'https://www.imdb.com/list/ls009921623/',
'info_dict': {
'id': 'JFs9NWw6XI0',
'title': 'March 23, 2012 Releases',
'id': '009921623',
'title': 'The Bourne Legacy',
'description': 'A list of trailers, clips, and more from The Bourne Legacy, starring Jeremy Renner and Rachel Weisz.',
},
'playlist_count': 7,
'playlist_count': 8,
}
def _real_extract(self, url):
@@ -115,9 +108,13 @@ class ImdbListIE(InfoExtractor):
webpage = self._download_webpage(url, list_id)
entries = [
self.url_result('http://www.imdb.com' + m, 'Imdb')
for m in re.findall(r'href="(/video/imdb/vi[^"]+)"\s+data-type="playlist"', webpage)]
for m in re.findall(r'href="(/list/ls%s/videoplayer/vi[^"]+)"' % list_id, webpage)]
list_title = self._html_search_regex(
r'<h1 class="header">(.*?)</h1>', webpage, 'list title')
r'<h1[^>]+class="[^"]*header[^"]*"[^>]*>(.*?)</h1>',
webpage, 'list title')
list_description = self._html_search_regex(
r'<div[^>]+class="[^"]*list-description[^"]*"[^>]*><p>(.*?)</p>',
webpage, 'list description')
return self.playlist_result(entries, list_id, list_title)
return self.playlist_result(entries, list_id, list_title, list_description)

View File

@@ -3,7 +3,6 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
int_or_none,
js_to_json,
@@ -21,7 +20,7 @@ class ImgurIE(InfoExtractor):
'id': 'A61SaA1',
'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
'description': 'Imgur: The most awesome images on the Internet.',
'description': 'Imgur: The magic of the Internet',
},
}, {
'url': 'https://imgur.com/A61SaA1',
@@ -29,7 +28,7 @@ class ImgurIE(InfoExtractor):
'id': 'A61SaA1',
'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
'description': 'Imgur: The most awesome images on the Internet.',
'description': 'Imgur: The magic of the Internet',
},
}, {
'url': 'https://imgur.com/gallery/YcAQlkx',
@@ -37,8 +36,6 @@ class ImgurIE(InfoExtractor):
'id': 'YcAQlkx',
'ext': 'mp4',
'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....',
'description': 'Imgur: The most awesome images on the Internet.'
}
}, {
'url': 'http://imgur.com/topic/Funny/N8rOudd',
@@ -50,8 +47,8 @@ class ImgurIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
compat_urlparse.urljoin(url, video_id), video_id)
gifv_url = 'https://i.imgur.com/{id}.gifv'.format(id=video_id)
webpage = self._download_webpage(gifv_url, video_id)
width = int_or_none(self._og_search_property(
'video:width', webpage, default=None))
@@ -107,7 +104,7 @@ class ImgurIE(InfoExtractor):
return {
'id': video_id,
'formats': formats,
'description': self._og_search_description(webpage),
'description': self._og_search_description(webpage, default=None),
'title': self._og_search_title(webpage),
}

View File

@@ -21,6 +21,21 @@ class IncIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
# div with id=kaltura_player_1_kqs38cgm
'url': 'https://www.inc.com/oscar-raymundo/richard-branson-young-entrepeneurs.html',
'info_dict': {
'id': '1_kqs38cgm',
'ext': 'mp4',
'title': 'Branson: "In the end, you have to say, Screw it. Just do it."',
'description': 'md5:21b832d034f9af5191ca5959da5e9cb6',
'timestamp': 1364403232,
'upload_date': '20130327',
'uploader_id': 'incdigital@inc.com',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.inc.com/video/david-whitford/founders-forum-tripadvisor-steve-kaufer-most-enjoyable-moment-for-entrepreneur.html',
'only_matching': True,
@@ -31,10 +46,13 @@ class IncIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
partner_id = self._search_regex(
r'var\s+_?bizo_data_partner_id\s*=\s*["\'](\d+)', webpage, 'partner id')
r'var\s+_?bizo_data_partner_id\s*=\s*["\'](\d+)', webpage,
'partner id', default='1034971')
kaltura_id = self._parse_json(self._search_regex(
r'pageInfo\.videos\s*=\s*\[(.+)\];', webpage, 'kaltura id'),
kaltura_id = self._search_regex(
r'id=(["\'])kaltura_player_(?P<id>.+?)\1', webpage, 'kaltura id',
default=None, group='id') or self._parse_json(self._search_regex(
r'pageInfo\.videos\s*=\s*\[(.+)\];', webpage, 'kaltura id'),
display_id)['vid_kaltura_id']
return self.url_result(

View File

@@ -1,11 +1,15 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
parse_age_limit,
parse_iso8601,
update_url_query,
)
@@ -13,7 +17,7 @@ class IndavideoEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:embed\.)?indavideo\.hu/player/video/|assets\.indavideo\.hu/swf/player\.swf\?.*\b(?:v(?:ID|id))=)(?P<id>[\da-f]+)'
_TESTS = [{
'url': 'http://indavideo.hu/player/video/1bdc3c6d80/',
'md5': 'f79b009c66194acacd40712a6778acfa',
'md5': 'c8a507a1c7410685f83a06eaeeaafeab',
'info_dict': {
'id': '1837039',
'ext': 'mp4',
@@ -36,6 +40,20 @@ class IndavideoEmbedIE(InfoExtractor):
'only_matching': True,
}]
# Some example URLs covered by generic extractor:
# http://indavideo.hu/video/Vicces_cica_1
# http://index.indavideo.hu/video/2015_0728_beregszasz
# http://auto.indavideo.hu/video/Sajat_utanfutoban_a_kis_tacsko
# http://erotika.indavideo.hu/video/Amator_tini_punci
# http://film.indavideo.hu/video/f_hrom_nagymamm_volt
# http://palyazat.indavideo.hu/video/Embertelen_dal_Dodgem_egyuttes
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+\bsrc=["\'](?P<url>(?:https?:)?//embed\.indavideo\.hu/player/video/[\da-f]+)',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -45,7 +63,14 @@ class IndavideoEmbedIE(InfoExtractor):
title = video['title']
video_urls = video.get('video_files', [])
video_urls = []
video_files = video.get('video_files')
if isinstance(video_files, list):
video_urls.extend(video_files)
elif isinstance(video_files, dict):
video_urls.extend(video_files.values())
video_file = video.get('video_file')
if video:
video_urls.append(video_file)
@@ -58,11 +83,23 @@ class IndavideoEmbedIE(InfoExtractor):
if flv_url not in video_urls:
video_urls.append(flv_url)
formats = [{
'url': video_url,
'height': int_or_none(self._search_regex(
r'\.(\d{3,4})\.mp4(?:\?|$)', video_url, 'height', default=None)),
} for video_url in video_urls]
filesh = video.get('filesh')
formats = []
for video_url in video_urls:
height = int_or_none(self._search_regex(
r'\.(\d{3,4})\.mp4(?:\?|$)', video_url, 'height', default=None))
if filesh:
if not height:
continue
token = filesh.get(compat_str(height))
if token is None:
continue
video_url = update_url_query(video_url, {'token': token})
formats.append({
'url': video_url,
'height': height,
})
self._sort_formats(formats)
timestamp = video.get('date')
@@ -89,55 +126,3 @@ class IndavideoEmbedIE(InfoExtractor):
'tags': tags,
'formats': formats,
}
class IndavideoIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?indavideo\.hu/video/(?P<id>[^/#?]+)'
_TESTS = [{
'url': 'http://indavideo.hu/video/Vicces_cica_1',
'md5': '8c82244ba85d2a2310275b318eb51eac',
'info_dict': {
'id': '1335611',
'display_id': 'Vicces_cica_1',
'ext': 'mp4',
'title': 'Vicces cica',
'description': 'Játszik a tablettel. :D',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Jet_Pack',
'uploader_id': '491217',
'timestamp': 1390821212,
'upload_date': '20140127',
'duration': 7,
'age_limit': 0,
'tags': ['vicces', 'macska', 'cica', 'ügyes', 'nevetés', 'játszik', 'Cukiság', 'Jet_Pack'],
},
}, {
'url': 'http://index.indavideo.hu/video/2015_0728_beregszasz',
'only_matching': True,
}, {
'url': 'http://auto.indavideo.hu/video/Sajat_utanfutoban_a_kis_tacsko',
'only_matching': True,
}, {
'url': 'http://erotika.indavideo.hu/video/Amator_tini_punci',
'only_matching': True,
}, {
'url': 'http://film.indavideo.hu/video/f_hrom_nagymamm_volt',
'only_matching': True,
}, {
'url': 'http://palyazat.indavideo.hu/video/Embertelen_dal_Dodgem_egyuttes',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
embed_url = self._search_regex(
r'<link[^>]+rel="video_src"[^>]+href="(.+?)"', webpage, 'embed url')
return {
'_type': 'url_transparent',
'ie_key': 'IndavideoEmbed',
'url': embed_url,
'display_id': display_id,
}

View File

@@ -1,15 +1,21 @@
from __future__ import unicode_literals
import itertools
import hashlib
import json
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import (
compat_str,
compat_HTTPError,
)
from ..utils import (
ExtractorError,
get_element_by_attribute,
int_or_none,
lowercase_escape,
std_headers,
try_get,
)
@@ -238,23 +244,56 @@ class InstagramUserIE(InfoExtractor):
}
}
def _entries(self, uploader_id):
_gis_tmpl = None
def _entries(self, data):
def get_count(suffix):
return int_or_none(try_get(
node, lambda x: x['edge_media_' + suffix]['count']))
uploader_id = data['entry_data']['ProfilePage'][0]['graphql']['user']['id']
csrf_token = data['config']['csrf_token']
rhx_gis = data.get('rhx_gis') or '3c7ca9dcefcf966d11dacf1f151335e8'
self._set_cookie('instagram.com', 'ig_pr', '1')
cursor = ''
for page_num in itertools.count(1):
media = self._download_json(
'https://www.instagram.com/graphql/query/', uploader_id,
'Downloading JSON page %d' % page_num, query={
'query_hash': '472f257a40c653c64c666ce877d59d2b',
'variables': json.dumps({
'id': uploader_id,
'first': 100,
'after': cursor,
})
})['data']['user']['edge_owner_to_timeline_media']
variables = json.dumps({
'id': uploader_id,
'first': 12,
'after': cursor,
})
if self._gis_tmpl:
gis_tmpls = [self._gis_tmpl]
else:
gis_tmpls = [
'%s' % rhx_gis,
'',
'%s:%s' % (rhx_gis, csrf_token),
'%s:%s:%s' % (rhx_gis, csrf_token, std_headers['User-Agent']),
]
for gis_tmpl in gis_tmpls:
try:
media = self._download_json(
'https://www.instagram.com/graphql/query/', uploader_id,
'Downloading JSON page %d' % page_num, headers={
'X-Requested-With': 'XMLHttpRequest',
'X-Instagram-GIS': hashlib.md5(
('%s:%s' % (gis_tmpl, variables)).encode('utf-8')).hexdigest(),
}, query={
'query_hash': '42323d64886122307be10013ad2dcc44',
'variables': variables,
})['data']['user']['edge_owner_to_timeline_media']
self._gis_tmpl = gis_tmpl
break
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
if gis_tmpl != gis_tmpls[-1]:
continue
raise
edges = media.get('edges')
if not edges or not isinstance(edges, list):
@@ -309,9 +348,13 @@ class InstagramUserIE(InfoExtractor):
def _real_extract(self, url):
username = self._match_id(url)
uploader_id = self._download_json(
'https://instagram.com/%s/' % username, username, query={
'__a': 1,
})['graphql']['user']['id']
webpage = self._download_webpage(url, username)
data = self._parse_json(
self._search_regex(
r'sharedData\s*=\s*({.+?})\s*;\s*[<\n]', webpage, 'data'),
username)
return self.playlist_result(
self._entries(uploader_id), username, username)
self._entries(data), username, username)

View File

@@ -239,7 +239,7 @@ class IqiyiIE(InfoExtractor):
return ohdave_rsa_encrypt(data, e, N)
def _login(self):
(username, password) = self._get_login_info()
username, password = self._get_login_info()
# No authentication to be performed
if not username:

View File

@@ -7,6 +7,7 @@ import json
import re
from .common import InfoExtractor
from .brightcove import BrightcoveNewIE
from ..compat import (
compat_str,
compat_etree_register_namespace,
@@ -18,6 +19,7 @@ from ..utils import (
xpath_text,
int_or_none,
parse_duration,
smuggle_url,
ExtractorError,
determine_ext,
)
@@ -41,6 +43,14 @@ class ITVIE(InfoExtractor):
# unavailable via data-playlist-url
'url': 'https://www.itv.com/hub/through-the-keyhole/2a2271a0033',
'only_matching': True,
}, {
# InvalidVodcrid
'url': 'https://www.itv.com/hub/james-martins-saturday-morning/2a5159a0034',
'only_matching': True,
}, {
# ContentUnavailable
'url': 'https://www.itv.com/hub/whos-doing-the-dishes/2a2898a0024',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -127,7 +137,8 @@ class ITVIE(InfoExtractor):
if fault_code == 'InvalidGeoRegion':
self.raise_geo_restricted(
msg=fault_string, countries=self._GEO_COUNTRIES)
elif fault_code != 'InvalidEntity':
elif fault_code not in (
'InvalidEntity', 'InvalidVodcrid', 'ContentUnavailable'):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, fault_string), expected=True)
info.update({
@@ -251,3 +262,38 @@ class ITVIE(InfoExtractor):
'subtitles': subtitles,
})
return info
class ITVBTCCIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?itv\.com/btcc/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.itv.com/btcc/races/btcc-2018-all-the-action-from-brands-hatch',
'info_dict': {
'id': 'btcc-2018-all-the-action-from-brands-hatch',
'title': 'BTCC 2018: All the action from Brands Hatch',
},
'playlist_mincount': 9,
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1582188683001/HkiHLnNRx_default/index.html?videoId=%s'
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(
smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % video_id, {
# ITV does not like some GB IP ranges, so here are some
# IP blocks it accepts
'geo_ip_blocks': [
'193.113.0.0/16', '54.36.162.0/23', '159.65.16.0/21'
],
'referrer': url,
}),
ie=BrightcoveNewIE.ie_key(), video_id=video_id)
for video_id in re.findall(r'data-video-id=["\'](\d+)', webpage)]
title = self._og_search_title(webpage, fatal=False)
return self.playlist_result(entries, playlist_id, title)

View File

@@ -1,10 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import (
determine_ext,
float_or_none,
@@ -57,12 +58,33 @@ class IzleseneIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
url = 'http://www.izlesene.com/video/%s' % video_id
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage('http://www.izlesene.com/video/%s' % video_id, video_id)
video = self._parse_json(
self._search_regex(
r'videoObj\s*=\s*({.+?})\s*;\s*\n', webpage, 'streams'),
video_id)
title = video.get('videoTitle') or self._og_search_title(webpage)
formats = []
for stream in video['media']['level']:
source_url = stream.get('source')
if not source_url or not isinstance(source_url, compat_str):
continue
ext = determine_ext(url, 'mp4')
quality = stream.get('value')
height = int_or_none(quality)
formats.append({
'format_id': '%sp' % quality if quality else 'sd',
'url': compat_urllib_parse_unquote(source_url),
'ext': ext,
'height': height,
})
self._sort_formats(formats)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage, default=None)
thumbnail = self._proto_relative_url(
thumbnail = video.get('posterURL') or self._proto_relative_url(
self._og_search_thumbnail(webpage), scheme='http:')
uploader = self._html_search_regex(
@@ -71,41 +93,15 @@ class IzleseneIE(InfoExtractor):
timestamp = parse_iso8601(self._html_search_meta(
'uploadDate', webpage, 'upload date'))
duration = float_or_none(self._html_search_regex(
r'"videoduration"\s*:\s*"([^"]+)"',
webpage, 'duration', fatal=False), scale=1000)
duration = float_or_none(video.get('duration') or self._html_search_regex(
r'videoduration["\']?\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'duration', fatal=False, group='value'), scale=1000)
view_count = str_to_int(get_element_by_id('videoViewCount', webpage))
comment_count = self._html_search_regex(
r'comment_count\s*=\s*\'([^\']+)\';',
webpage, 'comment_count', fatal=False)
content_url = self._html_search_meta(
'contentURL', webpage, 'content URL', fatal=False)
ext = determine_ext(content_url, 'mp4')
# Might be empty for some videos.
streams = self._html_search_regex(
r'"qualitylevel"\s*:\s*"([^"]+)"', webpage, 'streams', default='')
formats = []
if streams:
for stream in streams.split('|'):
quality, url = re.search(r'\[(\w+)\](.+)', stream).groups()
formats.append({
'format_id': '%sp' % quality if quality else 'sd',
'url': compat_urllib_parse_unquote(url),
'ext': ext,
})
else:
stream_url = self._search_regex(
r'"streamurl"\s*:\s*"([^"]+)"', webpage, 'stream URL')
formats.append({
'format_id': 'sd',
'url': compat_urllib_parse_unquote(stream_url),
'ext': ext,
})
return {
'id': video_id,
'title': title,

0
youtube_dl/extractor/joj.py Executable file → Normal file
View File

View File

@@ -135,10 +135,11 @@ class KalturaIE(InfoExtractor):
''', webpage) or
re.search(
r'''(?xs)
<iframe[^>]+src=(?P<q1>["'])
(?:https?:)?//(?:www\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
(?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
(?:(?!(?P=q1)).)*
[?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+)
(?:(?!(?P=q1)).)*
[?&]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+)
(?P=q1)
''', webpage)
)

View File

@@ -20,23 +20,23 @@ from ..utils import (
class KeezMoviesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?keezmovies\.com/video/(?:(?P<display_id>[^/]+)-)?(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711',
'md5': '1c1e75d22ffa53320f45eeb07bc4cdc0',
'url': 'https://www.keezmovies.com/video/arab-wife-want-it-so-bad-i-see-she-thirsty-and-has-tiny-money-18070681',
'md5': '2ac69cdb882055f71d82db4311732a1a',
'info_dict': {
'id': '1214711',
'display_id': 'petite-asian-lady-mai-playing-in-bathtub',
'id': '18070681',
'display_id': 'arab-wife-want-it-so-bad-i-see-she-thirsty-and-has-tiny-money',
'ext': 'mp4',
'title': 'Petite Asian Lady Mai Playing In Bathtub',
'thumbnail': r're:^https?://.*\.jpg$',
'title': 'Arab wife want it so bad I see she thirsty and has tiny money.',
'thumbnail': None,
'view_count': int,
'age_limit': 18,
}
}, {
'url': 'http://www.keezmovies.com/video/1214711',
'url': 'http://www.keezmovies.com/video/18070681',
'only_matching': True,
}]
def _extract_info(self, url):
def _extract_info(self, url, fatal=True):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = (mobj.group('display_id')
@@ -55,7 +55,7 @@ class KeezMoviesIE(InfoExtractor):
encrypted = False
def extract_format(format_url, height=None):
if not isinstance(format_url, compat_str) or not format_url.startswith('http'):
if not isinstance(format_url, compat_str) or not format_url.startswith(('http', '//')):
return
if format_url in format_urls:
return
@@ -105,7 +105,11 @@ class KeezMoviesIE(InfoExtractor):
raise ExtractorError(
'Video %s is no longer available' % video_id, expected=True)
self._sort_formats(formats)
try:
self._sort_formats(formats)
except ExtractorError:
if fatal:
raise
if not title:
title = self._html_search_regex(
@@ -122,7 +126,9 @@ class KeezMoviesIE(InfoExtractor):
}
def _real_extract(self, url):
webpage, info = self._extract_info(url)
webpage, info = self._extract_info(url, fatal=False)
if not info['formats']:
return self.url_result(url, 'Generic')
info['view_count'] = str_to_int(self._search_regex(
r'<b>([\d,.]+)</b> Views?', webpage, 'view count', fatal=False))
return info

View File

@@ -130,7 +130,7 @@ class LeIE(InfoExtractor):
media_id, 'Downloading flash playJson data', query={
'id': media_id,
'platid': 1,
'splatid': 101,
'splatid': 105,
'format': 1,
'source': 1000,
'tkey': self.calc_time_key(int(time.time())),

View File

@@ -282,7 +282,9 @@ class LimelightMediaIE(LimelightBaseIE):
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
pc, mobile, metadata = self._extract(
video_id, 'getPlaylistByMediaId',

View File

@@ -7,7 +7,7 @@ from ..utils import int_or_none
class LiveLeakIE(InfoExtractor):
_VALID_URL = r'https?://(?:\w+\.)?liveleak\.com/view\?(?:.*?)i=(?P<id>[\w_]+)(?:.*)'
_VALID_URL = r'https?://(?:\w+\.)?liveleak\.com/view\?.*?\b[it]=(?P<id>[\w_]+)'
_TESTS = [{
'url': 'http://www.liveleak.com/view?i=757_1364311680',
'md5': '0813c2430bea7a46bf13acf3406992f4',
@@ -79,6 +79,9 @@ class LiveLeakIE(InfoExtractor):
'title': 'Fuel Depot in China Explosion caught on video',
},
'playlist_count': 3,
}, {
'url': 'https://www.liveleak.com/view?t=HvHi_1523016227',
'only_matching': True,
}]
@staticmethod

View File

@@ -141,6 +141,7 @@ class MedialaanIE(GigyaBaseIE):
vod_id = config.get('vodId') or self._search_regex(
(r'\\"vodId\\"\s*:\s*\\"(.+?)\\"',
r'"vodId"\s*:\s*"(.+?)"',
r'<[^>]+id=["\']vod-(\d+)'),
webpage, 'video_id', default=None)

View File

@@ -4,7 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import (
int_or_none,
parse_codecs,
)
class MinotoIE(InfoExtractor):
@@ -26,7 +29,7 @@ class MinotoIE(InfoExtractor):
formats.extend(fmt_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
else:
fmt_profile = fmt.get('profile') or {}
f = {
formats.append({
'format_id': fmt_profile.get('name-short'),
'format_note': fmt_profile.get('name'),
'url': fmt_url,
@@ -35,16 +38,8 @@ class MinotoIE(InfoExtractor):
'filesize': int_or_none(fmt.get('filesize')),
'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')),
}
codecs = fmt.get('codecs')
if codecs:
codecs = codecs.split(',')
if len(codecs) == 2:
f.update({
'vcodec': codecs[0],
'acodec': codecs[1],
})
formats.append(f)
'codecs': parse_codecs(fmt.get('codecs')),
})
self._sort_formats(formats)
return {

View File

@@ -179,6 +179,10 @@ class MixcloudIE(InfoExtractor):
formats.append({
'format_id': 'http',
'url': decrypted,
'downloader_options': {
# Mixcloud starts throttling at >~5M
'http_chunk_size': 5242880,
},
})
self._sort_formats(formats)

View File

@@ -1,96 +1,90 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
parse_iso8601,
)
from .nhl import NHLBaseIE
class MLBIE(InfoExtractor):
class MLBIE(NHLBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:[\da-z_-]+\.)*mlb\.com/
(?:[\da-z_-]+\.)*(?P<site>mlb)\.com/
(?:
(?:
(?:.*?/)?video/(?:topic/[\da-z_-]+/)?(?:v|.*?/c-)|
(?:[^/]+/)*c-|
(?:
shared/video/embed/(?:embed|m-internal-embed)\.html|
(?:[^/]+/)+(?:play|index)\.jsp|
)\?.*?\bcontent_id=
)
(?P<id>n?\d+)|
(?:[^/]+/)*(?P<path>[^/]+)
(?P<id>\d+)
)
'''
_CONTENT_DOMAIN = 'content.mlb.com'
_TESTS = [
{
'url': 'http://m.mlb.com/sea/video/topic/51231442/v34698933/nymsea-ackley-robs-a-home-run-with-an-amazing-catch/?c_id=sea',
'md5': 'ff56a598c2cf411a9a38a69709e97079',
'url': 'https://www.mlb.com/mariners/video/ackleys-spectacular-catch/c-34698933',
'md5': '632358dacfceec06bad823b83d21df2d',
'info_dict': {
'id': '34698933',
'ext': 'mp4',
'title': "Ackley's spectacular catch",
'description': 'md5:7f5a981eb4f3cbc8daf2aeffa2215bf0',
'duration': 66,
'timestamp': 1405980600,
'upload_date': '20140721',
'timestamp': 1405995000,
'upload_date': '20140722',
'thumbnail': r're:^https?://.*\.jpg$',
},
},
{
'url': 'http://m.mlb.com/video/topic/81536970/v34496663/mianym-stanton-practices-for-the-home-run-derby',
'md5': 'd9c022c10d21f849f49c05ae12a8a7e9',
'url': 'https://www.mlb.com/video/stanton-prepares-for-derby/c-34496663',
'md5': 'bf2619bf9cacc0a564fc35e6aeb9219f',
'info_dict': {
'id': '34496663',
'ext': 'mp4',
'title': 'Stanton prepares for Derby',
'description': 'md5:d00ce1e5fd9c9069e9c13ab4faedfa57',
'duration': 46,
'timestamp': 1405105800,
'timestamp': 1405120200,
'upload_date': '20140711',
'thumbnail': r're:^https?://.*\.jpg$',
},
},
{
'url': 'http://m.mlb.com/video/topic/vtp_hrd_sponsor/v34578115/hrd-cespedes-wins-2014-gillette-home-run-derby',
'md5': '0e6e73d509321e142409b695eadd541f',
'url': 'https://www.mlb.com/video/cespedes-repeats-as-derby-champ/c-34578115',
'md5': '99bb9176531adc600b90880fb8be9328',
'info_dict': {
'id': '34578115',
'ext': 'mp4',
'title': 'Cespedes repeats as Derby champ',
'description': 'md5:08df253ce265d4cf6fb09f581fafad07',
'duration': 488,
'timestamp': 1405399936,
'timestamp': 1405414336,
'upload_date': '20140715',
'thumbnail': r're:^https?://.*\.jpg$',
},
},
{
'url': 'http://m.mlb.com/video/v34577915/bautista-on-derby-captaining-duties-his-performance',
'md5': 'b8fd237347b844365d74ea61d4245967',
'url': 'https://www.mlb.com/video/bautista-on-home-run-derby/c-34577915',
'md5': 'da8b57a12b060e7663ee1eebd6f330ec',
'info_dict': {
'id': '34577915',
'ext': 'mp4',
'title': 'Bautista on Home Run Derby',
'description': 'md5:b80b34031143d0986dddc64a8839f0fb',
'duration': 52,
'timestamp': 1405390722,
'timestamp': 1405405122,
'upload_date': '20140715',
'thumbnail': r're:^https?://.*\.jpg$',
},
},
{
'url': 'http://m.mlb.com/news/article/118550098/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer',
'md5': 'aafaf5b0186fee8f32f20508092f8111',
'url': 'https://www.mlb.com/news/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer/c-118550098',
'md5': 'e09e37b552351fddbf4d9e699c924d68',
'info_dict': {
'id': '75609783',
'ext': 'mp4',
'title': 'Must C: Pillar climbs for catch',
'description': '4/15/15: Blue Jays outfielder Kevin Pillar continues his defensive dominance by climbing the wall in left to rob Tim Beckham of a home run',
'timestamp': 1429124820,
'timestamp': 1429139220,
'upload_date': '20150415',
}
},
@@ -111,7 +105,7 @@ class MLBIE(InfoExtractor):
'only_matching': True,
},
{
'url': 'http://m.cardinals.mlb.com/stl/video/v51175783/atlstl-piscotty-makes-great-sliding-catch-on-line/?partnerId=as_mlb_20150321_42500876&adbid=579409712979910656&adbpl=tw&adbpr=52847728',
'url': 'https://www.mlb.com/cardinals/video/piscottys-great-sliding-catch/c-51175783',
'only_matching': True,
},
{
@@ -120,58 +114,7 @@ class MLBIE(InfoExtractor):
'only_matching': True,
},
{
'url': 'http://washington.nationals.mlb.com/mlb/gameday/index.jsp?c_id=was&gid=2015_05_09_atlmlb_wasmlb_1&lang=en&content_id=108309983&mode=video#',
'url': 'https://www.mlb.com/cut4/carlos-gomez-borrowed-sunglasses-from-an-as-fan/c-278912842',
'only_matching': True,
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
if not video_id:
video_path = mobj.group('path')
webpage = self._download_webpage(url, video_path)
video_id = self._search_regex(
[r'data-video-?id="(\d+)"', r'content_id=(\d+)'], webpage, 'video id')
detail = self._download_xml(
'http://m.mlb.com/gen/multimedia/detail/%s/%s/%s/%s.xml'
% (video_id[-3], video_id[-2], video_id[-1], video_id), video_id)
title = detail.find('./headline').text
description = detail.find('./big-blurb').text
duration = parse_duration(detail.find('./duration').text)
timestamp = parse_iso8601(detail.attrib['date'][:-5])
thumbnails = [{
'url': thumbnail.text,
} for thumbnail in detail.findall('./thumbnailScenarios/thumbnailScenario')]
formats = []
for media_url in detail.findall('./url'):
playback_scenario = media_url.attrib['playback_scenario']
fmt = {
'url': media_url.text,
'format_id': playback_scenario,
}
m = re.search(r'(?P<vbr>\d+)K_(?P<width>\d+)X(?P<height>\d+)', playback_scenario)
if m:
fmt.update({
'vbr': int(m.group('vbr')) * 1000,
'width': int(m.group('width')),
'height': int(m.group('height')),
})
formats.append(fmt)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
'thumbnails': thumbnails,
}

View File

@@ -12,7 +12,7 @@ class MofosexIE(KeezMoviesIE):
_VALID_URL = r'https?://(?:www\.)?mofosex\.com/videos/(?P<id>\d+)/(?P<display_id>[^/?#&.]+)\.html'
_TESTS = [{
'url': 'http://www.mofosex.com/videos/318131/amateur-teen-playing-and-masturbating-318131.html',
'md5': '39a15853632b7b2e5679f92f69b78e91',
'md5': '558fcdafbb63a87c019218d6e49daf8a',
'info_dict': {
'id': '318131',
'display_id': 'amateur-teen-playing-and-masturbating-318131',

View File

@@ -1,116 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import os.path
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
remove_start,
sanitized_Request,
urlencode_postdata,
)
class MonikerIE(InfoExtractor):
IE_DESC = 'allmyvideos.net and vidspot.net'
_VALID_URL = r'https?://(?:www\.)?(?:allmyvideos|vidspot)\.net/(?:(?:2|v)/v-)?(?P<id>[a-zA-Z0-9_-]+)'
_TESTS = [{
'url': 'http://allmyvideos.net/jih3nce3x6wn',
'md5': '710883dee1bfc370ecf9fa6a89307c88',
'info_dict': {
'id': 'jih3nce3x6wn',
'ext': 'mp4',
'title': 'youtube-dl test video',
},
}, {
'url': 'http://allmyvideos.net/embed-jih3nce3x6wn',
'md5': '710883dee1bfc370ecf9fa6a89307c88',
'info_dict': {
'id': 'jih3nce3x6wn',
'ext': 'mp4',
'title': 'youtube-dl test video',
},
}, {
'url': 'http://vidspot.net/l2ngsmhs8ci5',
'md5': '710883dee1bfc370ecf9fa6a89307c88',
'info_dict': {
'id': 'l2ngsmhs8ci5',
'ext': 'mp4',
'title': 'youtube-dl test video',
},
}, {
'url': 'https://www.vidspot.net/l2ngsmhs8ci5',
'only_matching': True,
}, {
'url': 'http://vidspot.net/2/v-ywDf99',
'md5': '5f8254ce12df30479428b0152fb8e7ba',
'info_dict': {
'id': 'ywDf99',
'ext': 'mp4',
'title': 'IL FAIT LE MALIN EN PORSHE CAYENNE ( mais pas pour longtemps)',
'description': 'IL FAIT LE MALIN EN PORSHE CAYENNE.',
},
}, {
'url': 'http://allmyvideos.net/v/v-HXZm5t',
'only_matching': True,
}]
def _real_extract(self, url):
orig_video_id = self._match_id(url)
video_id = remove_start(orig_video_id, 'embed-')
url = url.replace(orig_video_id, video_id)
assert re.match(self._VALID_URL, url) is not None
orig_webpage = self._download_webpage(url, video_id)
if '>File Not Found<' in orig_webpage:
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
error = self._search_regex(
r'class="err">([^<]+)<', orig_webpage, 'error', default=None)
if error:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error), expected=True)
builtin_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+?/builtin-.+?)\1',
orig_webpage, 'builtin URL', default=None, group='url')
if builtin_url:
req = sanitized_Request(builtin_url)
req.add_header('Referer', url)
webpage = self._download_webpage(req, video_id, 'Downloading builtin page')
title = self._og_search_title(orig_webpage).strip()
description = self._og_search_description(orig_webpage).strip()
else:
fields = re.findall(r'type="hidden" name="(.+?)"\s* value="?(.+?)">', orig_webpage)
data = dict(fields)
post = urlencode_postdata(data)
headers = {
b'Content-Type': b'application/x-www-form-urlencoded',
}
req = sanitized_Request(url, post, headers)
webpage = self._download_webpage(
req, video_id, note='Downloading video page ...')
title = os.path.splitext(data['fname'])[0]
description = None
# Could be several links with different quality
links = re.findall(r'"file" : "?(.+?)",', webpage)
# Assume the links are ordered in quality
formats = [{
'url': l,
'quality': i,
} for i, l in enumerate(links)]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'formats': formats,
}

View File

@@ -6,17 +6,17 @@ import re
from .common import InfoExtractor
class MakersChannelIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?makerschannel\.com/.*(?P<id_type>video|production)_id=(?P<id>[0-9]+)'
class MyChannelsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?mychannels\.com/.*(?P<id_type>video|production)_id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://makerschannel.com/en/zoomin/community-highlights?video_id=849',
'md5': '624a512c6969236b5967bf9286345ad1',
'url': 'https://mychannels.com/missholland/miss-holland?production_id=3416',
'md5': 'b8993daad4262dd68d89d651c0c52c45',
'info_dict': {
'id': '849',
'id': 'wUUDZZep6vQD',
'ext': 'mp4',
'title': 'Landing a bus on a plane is an epic win',
'uploader': 'ZoomIn',
'description': 'md5:cd9cca2ea7b69b78be81d07020c97139',
'title': 'Miss Holland joins VOTE LEAVE',
'description': 'Miss Holland | #13 Not a potato',
'uploader': 'Miss Holland',
}
}
@@ -27,12 +27,12 @@ class MakersChannelIE(InfoExtractor):
def extract_data_val(attr, fatal=False):
return self._html_search_regex(r'data-%s\s*=\s*"([^"]+)"' % attr, video_data, attr, fatal=fatal)
minoto_id = self._search_regex(r'/id/([a-zA-Z0-9]+)', extract_data_val('video-src', True), 'minoto id')
minoto_id = extract_data_val('minoto-id') or self._search_regex(r'/id/([a-zA-Z0-9]+)', extract_data_val('video-src', True), 'minoto id')
return {
'_type': 'url_transparent',
'url': 'minoto:%s' % minoto_id,
'id': extract_data_val('video-id', True),
'id': url_id,
'title': extract_data_val('title', True),
'description': extract_data_val('description'),
'thumbnail': extract_data_val('image'),

View File

@@ -68,11 +68,11 @@ class NationalGeographicVideoIE(InfoExtractor):
class NationalGeographicIE(ThePlatformIE, AdobePassIE):
IE_NAME = 'natgeo'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:(?:wild/)?[^/]+/)?(?:videos|episodes)/(?P<id>[^/?]+)'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:(?:(?:wild/)?[^/]+/)?(?:videos|episodes)|u)/(?P<id>[^/?]+)'
_TESTS = [
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/videos/uncovering-a-universal-knowledge/',
'url': 'http://channel.nationalgeographic.com/u/kdi9Ld0PN2molUUIMSBGxoeDhD729KRjQcnxtetilWPMevo8ZwUBIDuPR0Q3D2LVaTsk0MPRkRWDB8ZhqWVeyoxfsZZm36yRp1j-zPfsHEyI_EgAeFY/',
'md5': '518c9aa655686cf81493af5cc21e2a04',
'info_dict': {
'id': 'vKInpacll2pC',
@@ -86,7 +86,7 @@ class NationalGeographicIE(ThePlatformIE, AdobePassIE):
'add_ie': ['ThePlatform'],
},
{
'url': 'http://channel.nationalgeographic.com/wild/destination-wild/videos/the-stunning-red-bird-of-paradise/',
'url': 'http://channel.nationalgeographic.com/u/kdvOstqYaBY-vSBPyYgAZRUL4sWUJ5XUUPEhc7ISyBHqoIO4_dzfY3K6EjHIC0hmFXoQ7Cpzm6RkET7S3oMlm6CFnrQwSUwo/',
'md5': 'c4912f656b4cbe58f3e000c489360989',
'info_dict': {
'id': 'Pok5lWCkiEFA',
@@ -106,6 +106,14 @@ class NationalGeographicIE(ThePlatformIE, AdobePassIE):
{
'url': 'http://channel.nationalgeographic.com/videos/treasures-rediscovered/',
'only_matching': True,
},
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/videos/uncovering-a-universal-knowledge/',
'only_matching': True,
},
{
'url': 'http://channel.nationalgeographic.com/wild/destination-wild/videos/the-stunning-red-bird-of-paradise/',
'only_matching': True,
}
]

View File

@@ -1,8 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
@@ -43,9 +41,14 @@ class NaverIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
m_id = re.search(r'var rmcPlayer = new nhn\.rmcnmv\.RMCVideoPlayer\("(.+?)", "(.+?)"',
webpage)
if m_id is None:
vid = self._search_regex(
r'videoId["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'video id', fatal=None, group='value')
in_key = self._search_regex(
r'inKey["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'key', default=None, group='value')
if not vid or not in_key:
error = self._html_search_regex(
r'(?s)<div class="(?:nation_error|nation_box|error_box)">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
webpage, 'error', default=None)
@@ -53,9 +56,9 @@ class NaverIE(InfoExtractor):
raise ExtractorError(error, expected=True)
raise ExtractorError('couldn\'t extract vid and key')
video_data = self._download_json(
'http://play.rmcnmv.naver.com/vod/play/v2.0/' + m_id.group(1),
'http://play.rmcnmv.naver.com/vod/play/v2.0/' + vid,
video_id, query={
'key': m_id.group(2),
'key': in_key,
})
meta = video_data['meta']
title = meta['subject']

View File

@@ -1,7 +1,8 @@
from __future__ import unicode_literals
import re
import base64
import json
import re
from .common import InfoExtractor
from .theplatform import ThePlatformIE
@@ -9,6 +10,7 @@ from .adobepass import AdobePassIE
from ..utils import (
find_xpath_attr,
smuggle_url,
try_get,
unescapeHTML,
update_url_query,
int_or_none,
@@ -78,10 +80,14 @@ class NBCIE(AdobePassIE):
def _real_extract(self, url):
permalink, video_id = re.match(self._VALID_URL, url).groups()
permalink = 'http' + permalink
video_data = self._download_json(
response = self._download_json(
'https://api.nbc.com/v3/videos', video_id, query={
'filter[permalink]': permalink,
})['data'][0]['attributes']
'fields[videos]': 'description,entitlement,episodeNumber,guid,keywords,seasonNumber,title,vChipRating',
'fields[shows]': 'shortTitle',
'include': 'show.shortTitle',
})
video_data = response['data'][0]['attributes']
query = {
'mbr': 'true',
'manifest': 'm3u',
@@ -103,10 +109,11 @@ class NBCIE(AdobePassIE):
'title': title,
'url': theplatform_url,
'description': video_data.get('description'),
'keywords': video_data.get('keywords'),
'tags': video_data.get('keywords'),
'season_number': int_or_none(video_data.get('seasonNumber')),
'episode_number': int_or_none(video_data.get('episodeNumber')),
'series': video_data.get('showName'),
'episode': title,
'series': try_get(response, lambda x: x['included'][0]['attributes']['shortTitle']),
'ie_key': 'ThePlatform',
}
@@ -169,6 +176,65 @@ class NBCSportsIE(InfoExtractor):
NBCSportsVPlayerIE._extract_url(webpage), 'NBCSportsVPlayer')
class NBCSportsStreamIE(AdobePassIE):
_VALID_URL = r'https?://stream\.nbcsports\.com/.+?\bpid=(?P<id>\d+)'
_TEST = {
'url': 'http://stream.nbcsports.com/nbcsn/generic?pid=206559',
'info_dict': {
'id': '206559',
'ext': 'mp4',
'title': 'Amgen Tour of California Women\'s Recap',
'description': 'md5:66520066b3b5281ada7698d0ea2aa894',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Requires Adobe Pass Authentication',
}
def _real_extract(self, url):
video_id = self._match_id(url)
live_source = self._download_json(
'http://stream.nbcsports.com/data/live_sources_%s.json' % video_id,
video_id)
video_source = live_source['videoSources'][0]
title = video_source['title']
source_url = None
for k in ('source', 'msl4source', 'iossource', 'hlsv4'):
sk = k + 'Url'
source_url = video_source.get(sk) or video_source.get(sk + 'Alt')
if source_url:
break
else:
source_url = video_source['ottStreamUrl']
is_live = video_source.get('type') == 'live' or video_source.get('status') == 'Live'
resource = self._get_mvpd_resource('nbcsports', title, video_id, '')
token = self._extract_mvpd_auth(url, video_id, 'nbcsports', resource)
tokenized_url = self._download_json(
'https://token.playmakerservices.com/cdn',
video_id, data=json.dumps({
'requestorId': 'nbcsports',
'pid': video_id,
'application': 'NBCSports',
'version': 'v1',
'platform': 'desktop',
'cdn': 'akamai',
'url': video_source['sourceUrl'],
'token': base64.b64encode(token.encode()).decode(),
'resourceId': base64.b64encode(resource.encode()).decode(),
}).encode())['tokenizedUrl']
formats = self._extract_m3u8_formats(tokenized_url, video_id, 'mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'description': live_source.get('description'),
'formats': formats,
'is_live': is_live,
}
class CSNNEIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?csnne\.com/video/(?P<id>[0-9a-z-]+)'

Some files were not shown because too many files have changed in this diff Show More