Compare commits

..

2 Commits

Author SHA1 Message Date
97bc05116e Merge branch 'master' into totalwebcasting 2018-01-07 15:03:28 +01:00
7608a91ee7 [totalwebcasting] Add new extractor 2017-01-11 18:51:25 -05:00
157 changed files with 1759 additions and 5062 deletions

View File

@ -6,13 +6,12 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.03.26.1*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.03.26.1**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.12.31*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.12.31**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
- [ ] Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser
### What is the purpose of your *issue*?
- [ ] Bug report (encountered problems with youtube-dl)
@ -36,7 +35,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.03.26.1
[debug] youtube-dl version 2017.12.31
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -12,7 +12,6 @@
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
- [ ] Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser
### What is the purpose of your *issue*?
- [ ] Bug report (encountered problems with youtube-dl)

View File

@ -231,8 +231,3 @@ John Dong
Tatsuyuki Ishi
Daniel Weber
Kay Bouché
Yang Hongbo
Lei Wang
Petr Novák
Leonardo Taccari
Martin Weinelt

290
ChangeLog
View File

@ -1,297 +1,9 @@
version 2018.03.26.1
Core
+ [downloader/external] Add elapsed time to progress hook (#10876)
* [downloader/external,fragment] Fix download finalization when writing file
to stdout (#10809, #10876, #15799)
version <unreleased>
Extractors
* [vrv] Fix extraction on python2 (#15928)
* [afreecatv] Update referrer (#15947)
+ [24video] Add support for 24video.sexy (#15973)
* [crackle] Bypass geo restriction
* [crackle] Fix extraction (#15969)
+ [lenta] Add support for lenta.ru (#15953)
+ [instagram:user] Add pagination (#15934)
* [youku] Update ccode (#15939)
* [libsyn] Adapt to new page structure
version 2018.03.20
Core
* [extractor/common] Improve thumbnail extraction for HTML5 entries
* Generalize XML manifest processing code and improve XSPF parsing
+ [extractor/common] Add _download_xml_handle
+ [extractor/common] Add support for relative URIs in _parse_xspf (#15794)
Extractors
+ [7plus] Extract series metadata (#15862, #15906)
* [9now] Bypass geo restriction (#15920)
* [cbs] Skip unavailable assets (#13490, #13506, #15776)
+ [canalc2] Add support for HTML5 videos (#15916, #15919)
+ [ceskatelevize] Add support for iframe embeds (#15918)
+ [prosiebensat1] Add support for galileo.tv (#15894)
+ [generic] Add support for xfileshare embeds (#15879)
* [bilibili] Switch to v2 playurl API
* [bilibili] Fix and improve extraction (#15048, #15430, #15622, #15863)
* [heise] Improve extraction (#15496, #15784, #15026)
* [instagram] Fix user videos extraction (#15858)
version 2018.03.14
Extractors
* [soundcloud] Update client id (#15866)
+ [tennistv] Add support for tennistv.com
+ [line] Add support for tv.line.me (#9427)
* [xnxx] Fix extraction (#15817)
* [njpwworld] Fix authentication (#15815)
version 2018.03.10
Core
* [downloader/hls] Skip uplynk ad fragments (#15748)
Extractors
* [pornhub] Don't override session cookies (#15697)
+ [raywenderlich] Add support for videos.raywenderlich.com (#15251)
* [funk] Fix extraction and rework extractors (#15792)
* [nexx] Restore reverse engineered approach
+ [heise] Add support for kaltura embeds (#14961, #15728)
+ [tvnow] Extract series metadata (#15774)
* [ruutu] Continue formats extraction on NOT-USED URLs (#15775)
* [vrtnu] Use redirect URL for building video JSON URL (#15767, #15769)
* [vimeo] Modernize login code and improve error messaging
* [archiveorg] Fix extraction (#15770, #15772)
+ [hidive] Add support for hidive.com (#15494)
* [afreecatv] Detect deleted videos
* [afreecatv] Fix extraction (#15755)
* [vice] Fix extraction and rework extractors (#11101, #13019, #13622, #13778)
+ [vidzi] Add support for vidzi.si (#15751)
* [npo] Fix typo
version 2018.03.03
Core
+ [utils] Add parse_resolution
Revert respect --prefer-insecure while updating
Extractors
+ [yapfiles] Add support for yapfiles.ru (#15726, #11085)
* [spankbang] Fix formats extraction (#15727)
* [adn] Fix extraction (#15716)
+ [toggle] Extract DASH and ISM formats (#15721)
+ [nickelodeon] Add support for nickelodeon.com.tr (#15706)
* [npo] Validate and filter format URLs (#15709)
version 2018.02.26
Extractors
* [udemy] Use custom User-Agent (#15571)
version 2018.02.25
Core
* [postprocessor/embedthumbnail] Skip embedding when there aren't any
thumbnails (#12573)
* [extractor/common] Improve jwplayer subtitles extraction (#15695)
Extractors
+ [vidlii] Add support for vidlii.com (#14472, #14512, #14779)
+ [streamango] Capture and output error messages
* [streamango] Fix extraction (#14160, #14256)
+ [telequebec] Add support for emissions (#14649, #14655)
+ [telequebec:live] Add support for live streams (#15688)
+ [mailru:music] Add support for mail.ru/music (#15618)
* [aenetworks] Switch to akamai HLS formats (#15612)
* [ytsearch] Fix flat title extraction (#11260, #15681)
version 2018.02.22
Core
+ [utils] Fixup some common URL typos in sanitize_url (#15649)
* Respect --prefer-insecure while updating (#15497)
Extractors
* [vidio] Fix HLS URL extraction (#15675)
+ [nexx] Add support for arc.nexx.cloud URLs
* [nexx] Switch to arc API (#15652)
* [redtube] Fix duration extraction (#15659)
+ [sonyliv] Respect referrer (#15648)
+ [brightcove:new] Use referrer for formats' HTTP headers
+ [cbc] Add support for olympics.cbc.ca (#15535)
+ [fusion] Add support for fusion.tv (#15628)
* [npo] Improve quality metadata extraction
* [npo] Relax URL regular expression (#14987, #14994)
+ [npo] Capture and output error message
+ [pornhub] Add support for channels (#15613)
* [youtube] Handle shared URLs with generic extractor (#14303)
version 2018.02.11
Core
+ [YoutubeDL] Add support for filesize_approx in format selector (#15550)
Extractors
+ [francetv] Add support for live streams (#13689)
+ [francetv] Add support for zouzous.fr and ludo.fr (#10454, #13087, #13103,
#15012)
* [francetv] Separate main extractor and rework others to delegate to it
* [francetv] Improve manifest URL signing (#15536)
+ [francetv] Sign m3u8 manifest URLs (#15565)
+ [veoh] Add support for embed URLs (#15561)
* [afreecatv] Fix extraction (#15556)
* [periscope] Use accessVideoPublic endpoint (#15554)
* [discovery] Fix auth request (#15542)
+ [6play] Extract subtitles (#15541)
* [newgrounds] Fix metadata extraction (#15531)
+ [nbc] Add support for stream.nbcolympics.com (#10295)
* [dvtv] Fix live streams extraction (#15442)
version 2018.02.08
Extractors
+ [myvi] Extend URL regular expression
+ [myvi:embed] Add support for myvi.tv embeds (#15521)
+ [prosiebensat1] Extend URL regular expression (#15520)
* [pokemon] Relax URL regular expression and extend title extraction (#15518)
+ [gameinformer] Use geo verification headers
* [la7] Fix extraction (#15501, #15502)
* [gameinformer] Fix brightcove id extraction (#15416)
+ [afreecatv] Pass referrer to video info request (#15507)
+ [telebruxelles] Add support for live streams
* [telebruxelles] Relax URL regular expression
* [telebruxelles] Fix extraction (#15504)
* [extractor/common] Respect secure schemes in _extract_wowza_formats
version 2018.02.04
Core
* [downloader/http] Randomize HTTP chunk size
+ [downloader/http] Add ability to pass downloader options via info dict
* [downloader/http] Fix 302 infinite loops by not reusing requests
+ Document http_chunk_size
Extractors
+ [brightcove] Pass embed page URL as referrer (#15486)
+ [youtube] Enforce using chunked HTTP downloading for DASH formats
version 2018.02.03
Core
+ Introduce --http-chunk-size for chunk-based HTTP downloading
+ Add support for IronPython
* [downloader/ism] Fix Python 3.2 support
Extractors
* [redbulltv] Fix extraction (#15481)
* [redtube] Fix metadata extraction (#15472)
* [pladform] Respect platform id and extract HLS formats (#15468)
- [rtlnl] Remove progressive formats (#15459)
* [6play] Do no modify asset URLs with a token (#15248)
* [nationalgeographic] Relax URL regular expression
* [dplay] Relax URL regular expression (#15458)
* [cbsinteractive] Fix data extraction (#15451)
+ [amcnetworks] Add support for sundancetv.com (#9260)
version 2018.01.27
Core
* [extractor/common] Improve _json_ld for articles
* Switch codebase to use compat_b64decode
+ [compat] Add compat_b64decode
Extractors
+ [seznamzpravy] Add support for seznam.cz and seznamzpravy.cz (#14102, #14616)
* [dplay] Bypass geo restriction
+ [dplay] Add support for disco-api videos (#15396)
* [youtube] Extract precise error messages (#15284)
* [teachertube] Capture and output error message
* [teachertube] Fix and relax thumbnail extraction (#15403)
+ [prosiebensat1] Add another clip id regular expression (#15378)
* [tbs] Update tokenizer url (#15395)
* [mixcloud] Use compat_b64decode (#15394)
- [thesixtyone] Remove extractor (#15341)
version 2018.01.21
Core
* [extractor/common] Improve jwplayer DASH formats extraction (#9242, #15187)
* [utils] Improve scientific notation handling in js_to_json (#14789)
Extractors
+ [southparkdk] Add support for southparkstudios.nu
+ [southpark] Add support for collections (#14803)
* [franceinter] Fix upload date extraction (#14996)
+ [rtvs] Add support for rtvs.sk (#9242, #15187)
* [restudy] Fix extraction and extend URL regular expression (#15347)
* [youtube:live] Improve live detection (#15365)
+ [springboardplatform] Add support for springboardplatform.com
* [prosiebensat1] Add another clip id regular expression (#15290)
- [ringtv] Remove extractor (#15345)
version 2018.01.18
Extractors
* [soundcloud] Update client id (#15306)
- [kamcord] Remove extractor (#15322)
+ [spiegel] Add support for nexx videos (#15285)
* [twitch] Fix authentication and error capture (#14090, #15264)
* [vk] Detect more errors due to copyright complaints (#15259)
version 2018.01.14
Extractors
* [youtube] Fix live streams extraction (#15202)
* [wdr] Bypass geo restriction
* [wdr] Rework extractors (#14598)
+ [wdr] Add support for wdrmaus.de/elefantenseite (#14598)
+ [gamestar] Add support for gamepro.de (#3384)
* [viafree] Skip rtmp formats (#15232)
+ [pandoratv] Add support for mobile URLs (#12441)
+ [pandoratv] Add support for new URL format (#15131)
+ [ximalaya] Add support for ximalaya.com (#14687)
+ [digg] Add support for digg.com (#15214)
* [limelight] Tolerate empty pc formats (#15150, #15151, #15207)
* [ndr:embed:base] Make separate formats extraction non fatal (#15203)
+ [weibo] Add extractor (#15079)
+ [ok] Add support for live streams
* [canalplus] Fix extraction (#15072)
* [bilibili] Fix extraction (#15188)
version 2018.01.07
Core
* [utils] Fix youtube-dl under PyPy3 on Windows
* [YoutubeDL] Output python implementation in debug header
Extractors
+ [jwplatform] Add support for multiple embeds (#15192)
* [mitele] Fix extraction (#15186)
+ [motherless] Add support for groups (#15124)
* [lynda] Relax URL regular expression (#15185)
* [soundcloud] Fallback to avatar picture for thumbnail (#12878)
* [youku] Fix list extraction (#15135)
* [openload] Fix extraction (#15166)
* [lynda] Skip invalid subtitles (#15159)
* [twitch] Pass video id to url_result when extracting playlist (#15139)
* [rtve.es:alacarta] Fix extraction of some new URLs
* [acast] Fix extraction (#15147)
version 2017.12.31

View File

@ -46,7 +46,7 @@ Or with [MacPorts](https://www.macports.org/):
Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
# DESCRIPTION
**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on macOS. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on Mac OS X. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
youtube-dl [OPTIONS] URL [URL...]
@ -198,11 +198,6 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
size. By default, the buffer size is
automatically resized from an initial value
of SIZE.
--http-chunk-size SIZE Size of a chunk for chunk-based HTTP
downloading (e.g. 10485760 or 10M) (default
is disabled). May be useful for bypassing
bandwidth throttling imposed by a webserver
(experimental)
--playlist-reverse Download playlist videos in reverse order
--playlist-random Download playlist videos in random order
--xattr-set-filesize Set file xattribute ytdl.filesize with
@ -868,7 +863,7 @@ Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, Mac OS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).

View File

@ -128,14 +128,13 @@
- **CamdemyFolder**
- **CamWithHer**
- **canalc2.tv**
- **Canalplus**: mycanal.fr and piwiplus.fr
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas**
- **CanvasEen**: canvas.be and een.be
- **CarambaTV**
- **CarambaTVPage**
- **CartoonNetwork**
- **cbc.ca**
- **cbc.ca:olympics**
- **cbc.ca:player**
- **cbc.ca:watch**
- **cbc.ca:watch:video**
@ -190,7 +189,7 @@
- **CSpan**: C-SPAN
- **CtsNews**: 華視新聞
- **CTVNews**
- **Culturebox**
- **culturebox.francetvinfo.fr**
- **CultureUnplugged**
- **curiositystream**
- **curiositystream:collection**
@ -211,7 +210,6 @@
- **defense.gouv.fr**
- **democracynow**
- **DHM**: Filmarchiv - Deutsches Historisches Museum
- **Digg**
- **DigitallySpeaking**
- **Digiteka**
- **Discovery**
@ -292,14 +290,11 @@
- **FranceTV**
- **FranceTVEmbed**
- **francetvinfo.fr**
- **FranceTVJeunesse**
- **FranceTVSite**
- **Freesound**
- **freespeech.org**
- **FreshLive**
- **Funimation**
- **FunkChannel**
- **FunkMix**
- **Funk**
- **FunnyOrDie**
- **Fusion**
- **Fux**
@ -337,7 +332,6 @@
- **HentaiStigma**
- **hetklokhuis**
- **hgtv.com:show**
- **HiDive**
- **HistoricFilms**
- **history:topic**: History.com Topic
- **hitbox**
@ -388,6 +382,7 @@
- **JWPlatform**
- **Kakao**
- **Kaltura**
- **Kamcord**
- **KanalPlay**: Kanal 5/9/11 Play
- **Kankan**
- **Karaoketv**
@ -419,7 +414,6 @@
- **Lecture2Go**
- **LEGO**
- **Lemonde**
- **Lenta**
- **LePlaylist**
- **LetvCloud**: 乐视云
- **Libsyn**
@ -428,7 +422,6 @@
- **limelight**
- **limelight:channel**
- **limelight:channel_list**
- **LineTV**
- **LiTV**
- **LiveLeak**
- **LiveLeakEmbed**
@ -444,8 +437,6 @@
- **m6**
- **macgamestore**: MacGameStore trailers
- **mailru**: Видео@Mail.Ru
- **mailru:music**: Музыка@Mail.Ru
- **mailru:music:search**: Музыка@Mail.Ru
- **MakersChannel**
- **MakerTV**
- **mangomolo:live**
@ -487,7 +478,6 @@
- **Moniker**: allmyvideos.net and vidspot.net
- **Morningstar**: morningstar.com
- **Motherless**
- **MotherlessGroup**
- **Motorsport**: motorsport.com
- **MovieClips**
- **MovieFap**
@ -511,7 +501,6 @@
- **MySpass**
- **Myvi**
- **MyVidster**
- **MyviEmbed**
- **n-tv.de**
- **natgeo**
- **natgeo:episodeguide**
@ -520,8 +509,7 @@
- **NBA**
- **NBC**
- **NBCNews**
- **nbcolympics**
- **nbcolympics:stream**
- **NBCOlympics**
- **NBCSports**
- **NBCSportsVPlayer**
- **ndr**: NDR.de - Norddeutscher Rundfunk
@ -678,7 +666,6 @@
- **RaiPlay**
- **RaiPlayLive**
- **RaiPlayPlaylist**
- **RayWenderlich**
- **RBMARadio**
- **RDS**: RDS.ca
- **RedBullTV**
@ -694,6 +681,7 @@
- **revision**
- **revision3:embed**
- **RICE**
- **RingTV**
- **RMCDecouverte**
- **RockstarGames**
- **RoosterTeeth**
@ -714,7 +702,6 @@
- **rtve.es:live**: RTVE.es live streams
- **rtve.es:television**
- **RTVNH**
- **RTVS**
- **Rudo**
- **RUHD**
- **RulePorn**
@ -744,8 +731,6 @@
- **ServingSys**
- **Servus**
- **Sexu**
- **SeznamZpravy**
- **SeznamZpravyArticle**
- **Shahid**
- **ShahidShow**
- **Shared**: shared.sx
@ -787,7 +772,7 @@
- **Sport5**
- **SportBoxEmbed**
- **SportDeutschland**
- **SpringboardPlatform**
- **Sportschau**
- **Sprout**
- **sr:mediathek**: Saarländischer Rundfunk
- **SRGSSR**
@ -827,11 +812,8 @@
- **Telegraaf**
- **TeleMB**
- **TeleQuebec**
- **TeleQuebecEmission**
- **TeleQuebecLive**
- **TeleTask**
- **Telewebion**
- **TennisTV**
- **TF1**
- **TFO**
- **TheIntercept**
@ -839,6 +821,7 @@
- **ThePlatform**
- **ThePlatformFeed**
- **TheScene**
- **TheSixtyOne**
- **TheStar**
- **TheSun**
- **TheWeatherChannel**
@ -940,6 +923,7 @@
- **vice**
- **vice:article**
- **vice:show**
- **Viceland**
- **Vidbit**
- **Viddler**
- **Videa**
@ -955,7 +939,6 @@
- **VideoPress**
- **videoweed**: VideoWeed
- **Vidio**
- **VidLii**
- **vidme**
- **vidme:user**
- **vidme:user:likes**
@ -1018,14 +1001,10 @@
- **WatchIndianPorn**: Watch Indian Porn
- **WDR**
- **wdr:mobile**
- **WDRElefant**
- **WDRPage**
- **Webcaster**
- **WebcasterFeed**
- **WebOfStories**
- **WebOfStoriesPlaylist**
- **Weibo**
- **WeiboMobile**
- **WeiqiTV**: WQTV
- **wholecloud**: WholeCloud
- **Wimp**
@ -1045,8 +1024,6 @@
- **xiami:artist**: 虾米音乐 - 歌手
- **xiami:collection**: 虾米音乐 - 精选集
- **xiami:song**: 虾米音乐
- **ximalaya**: 喜马拉雅FM
- **ximalaya:album**: 喜马拉雅FM 专辑
- **XMinus**
- **XNXX**
- **Xstream**
@ -1060,7 +1037,6 @@
- **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
- **yandexmusic:track**: Яндекс.Музыка - Трек
- **YapFiles**
- **YesJapan**
- **yinyuetai:video**: 音悦Tai
- **Ynet**

View File

@ -3,4 +3,4 @@ universal = True
[flake8]
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git
ignore = E402,E501,E731,E741
ignore = E402,E501,E731

View File

@ -694,55 +694,6 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
self.ie._sort_formats(formats)
expect_value(self, formats, expected_formats, None)
def test_parse_xspf(self):
_TEST_CASES = [
(
'foo_xspf',
'https://example.org/src/foo_xspf.xspf',
[{
'id': 'foo_xspf',
'title': 'Pandemonium',
'description': 'Visit http://bigbrother404.bandcamp.com',
'duration': 202.416,
'formats': [{
'manifest_url': 'https://example.org/src/foo_xspf.xspf',
'url': 'https://example.org/src/cd1/track%201.mp3',
}],
}, {
'id': 'foo_xspf',
'title': 'Final Cartridge (Nichico Twelve Remix)',
'description': 'Visit http://bigbrother404.bandcamp.com',
'duration': 255.857,
'formats': [{
'manifest_url': 'https://example.org/src/foo_xspf.xspf',
'url': 'https://example.org/%E3%83%88%E3%83%A9%E3%83%83%E3%82%AF%E3%80%80%EF%BC%92.mp3',
}],
}, {
'id': 'foo_xspf',
'title': 'Rebuilding Nightingale',
'description': 'Visit http://bigbrother404.bandcamp.com',
'duration': 287.915,
'formats': [{
'manifest_url': 'https://example.org/src/foo_xspf.xspf',
'url': 'https://example.org/src/track3.mp3',
}, {
'manifest_url': 'https://example.org/src/foo_xspf.xspf',
'url': 'https://example.com/track3.mp3',
}]
}]
),
]
for xspf_file, xspf_url, expected_entries in _TEST_CASES:
with io.open('./test/testdata/xspf/%s.xspf' % xspf_file,
mode='r', encoding='utf-8') as f:
entries = self.ie._parse_xspf(
compat_etree_fromstring(f.read().encode('utf-8')),
xspf_file, xspf_url=xspf_url, xspf_base_url=xspf_url)
expect_value(self, entries, expected_entries, None)
for i in range(len(entries)):
expect_dict(self, entries[i], expected_entries[i])
if __name__ == '__main__':
unittest.main()

View File

@ -92,8 +92,8 @@ class TestDownload(unittest.TestCase):
def generator(test_case, tname):
def test_template(self):
ie = youtube_dl.extractor.get_info_extractor(test_case['name'])()
other_ies = [get_info_extractor(ie_key)() for ie_key in test_case.get('add_ie', [])]
ie = youtube_dl.extractor.get_info_extractor(test_case['name'])
other_ies = [get_info_extractor(ie_key) for ie_key in test_case.get('add_ie', [])]
is_playlist = any(k.startswith('playlist') for k in test_case)
test_cases = test_case.get(
'playlist', [] if is_playlist else [test_case])

View File

@ -1,125 +0,0 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
import os
import re
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import try_rm
from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_http_server
from youtube_dl.downloader.http import HttpFD
from youtube_dl.utils import encodeFilename
import ssl
import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
def http_server_port(httpd):
if os.name == 'java' and isinstance(httpd.socket, ssl.SSLSocket):
# In Jython SSLSocket is not a subclass of socket.socket
sock = httpd.socket.sock
else:
sock = httpd.socket
return sock.getsockname()[1]
TEST_SIZE = 10 * 1024
class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
def log_message(self, format, *args):
pass
def send_content_range(self, total=None):
range_header = self.headers.get('Range')
start = end = None
if range_header:
mobj = re.search(r'^bytes=(\d+)-(\d+)', range_header)
if mobj:
start = int(mobj.group(1))
end = int(mobj.group(2))
valid_range = start is not None and end is not None
if valid_range:
content_range = 'bytes %d-%d' % (start, end)
if total:
content_range += '/%d' % total
self.send_header('Content-Range', content_range)
return (end - start + 1) if valid_range else total
def serve(self, range=True, content_length=True):
self.send_response(200)
self.send_header('Content-Type', 'video/mp4')
size = TEST_SIZE
if range:
size = self.send_content_range(TEST_SIZE)
if content_length:
self.send_header('Content-Length', size)
self.end_headers()
self.wfile.write(b'#' * size)
def do_GET(self):
if self.path == '/regular':
self.serve()
elif self.path == '/no-content-length':
self.serve(content_length=False)
elif self.path == '/no-range':
self.serve(range=False)
elif self.path == '/no-range-no-content-length':
self.serve(range=False, content_length=False)
else:
assert False
class FakeLogger(object):
def debug(self, msg):
pass
def warning(self, msg):
pass
def error(self, msg):
pass
class TestHttpFD(unittest.TestCase):
def setUp(self):
self.httpd = compat_http_server.HTTPServer(
('127.0.0.1', 0), HTTPTestRequestHandler)
self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True
self.server_thread.start()
def download(self, params, ep):
params['logger'] = FakeLogger()
ydl = YoutubeDL(params)
downloader = HttpFD(ydl, params)
filename = 'testfile.mp4'
try_rm(encodeFilename(filename))
self.assertTrue(downloader.real_download(filename, {
'url': 'http://127.0.0.1:%d/%s' % (self.port, ep),
}))
self.assertEqual(os.path.getsize(encodeFilename(filename)), TEST_SIZE)
try_rm(encodeFilename(filename))
def download_all(self, params):
for ep in ('regular', 'no-content-length', 'no-range', 'no-range-no-content-length'):
self.download(params, ep)
def test_regular(self):
self.download_all({})
def test_chunked(self):
self.download_all({
'http_chunk_size': 1000,
})
if __name__ == '__main__':
unittest.main()

View File

@ -47,7 +47,7 @@ class HTTPTestRequestHandler(compat_http_server.BaseHTTPRequestHandler):
self.end_headers()
return
new_url = 'http://127.0.0.1:%d/中文.html' % http_server_port(self.server)
new_url = 'http://localhost:%d/中文.html' % http_server_port(self.server)
self.send_response(302)
self.send_header(b'Location', new_url.encode('utf-8'))
self.end_headers()
@ -74,7 +74,7 @@ class FakeLogger(object):
class TestHTTP(unittest.TestCase):
def setUp(self):
self.httpd = compat_http_server.HTTPServer(
('127.0.0.1', 0), HTTPTestRequestHandler)
('localhost', 0), HTTPTestRequestHandler)
self.port = http_server_port(self.httpd)
self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True
@ -86,15 +86,15 @@ class TestHTTP(unittest.TestCase):
return
ydl = YoutubeDL({'logger': FakeLogger()})
r = ydl.extract_info('http://127.0.0.1:%d/302' % self.port)
self.assertEqual(r['entries'][0]['url'], 'http://127.0.0.1:%d/vid.mp4' % self.port)
r = ydl.extract_info('http://localhost:%d/302' % self.port)
self.assertEqual(r['entries'][0]['url'], 'http://localhost:%d/vid.mp4' % self.port)
class TestHTTPS(unittest.TestCase):
def setUp(self):
certfn = os.path.join(TEST_DIR, 'testcert.pem')
self.httpd = compat_http_server.HTTPServer(
('127.0.0.1', 0), HTTPTestRequestHandler)
('localhost', 0), HTTPTestRequestHandler)
self.httpd.socket = ssl.wrap_socket(
self.httpd.socket, certfile=certfn, server_side=True)
self.port = http_server_port(self.httpd)
@ -107,11 +107,11 @@ class TestHTTPS(unittest.TestCase):
ydl = YoutubeDL({'logger': FakeLogger()})
self.assertRaises(
Exception,
ydl.extract_info, 'https://127.0.0.1:%d/video.html' % self.port)
ydl.extract_info, 'https://localhost:%d/video.html' % self.port)
ydl = YoutubeDL({'logger': FakeLogger(), 'nocheckcertificate': True})
r = ydl.extract_info('https://127.0.0.1:%d/video.html' % self.port)
self.assertEqual(r['entries'][0]['url'], 'https://127.0.0.1:%d/vid.mp4' % self.port)
r = ydl.extract_info('https://localhost:%d/video.html' % self.port)
self.assertEqual(r['entries'][0]['url'], 'https://localhost:%d/vid.mp4' % self.port)
def _build_proxy_handler(name):
@ -132,23 +132,23 @@ def _build_proxy_handler(name):
class TestProxy(unittest.TestCase):
def setUp(self):
self.proxy = compat_http_server.HTTPServer(
('127.0.0.1', 0), _build_proxy_handler('normal'))
('localhost', 0), _build_proxy_handler('normal'))
self.port = http_server_port(self.proxy)
self.proxy_thread = threading.Thread(target=self.proxy.serve_forever)
self.proxy_thread.daemon = True
self.proxy_thread.start()
self.geo_proxy = compat_http_server.HTTPServer(
('127.0.0.1', 0), _build_proxy_handler('geo'))
('localhost', 0), _build_proxy_handler('geo'))
self.geo_port = http_server_port(self.geo_proxy)
self.geo_proxy_thread = threading.Thread(target=self.geo_proxy.serve_forever)
self.geo_proxy_thread.daemon = True
self.geo_proxy_thread.start()
def test_proxy(self):
geo_proxy = '127.0.0.1:{0}'.format(self.geo_port)
geo_proxy = 'localhost:{0}'.format(self.geo_port)
ydl = YoutubeDL({
'proxy': '127.0.0.1:{0}'.format(self.port),
'proxy': 'localhost:{0}'.format(self.port),
'geo_verification_proxy': geo_proxy,
})
url = 'http://foo.com/bar'
@ -162,7 +162,7 @@ class TestProxy(unittest.TestCase):
def test_proxy_with_idn(self):
ydl = YoutubeDL({
'proxy': '127.0.0.1:{0}'.format(self.port),
'proxy': 'localhost:{0}'.format(self.port),
})
url = 'http://中文.tw/'
response = ydl.urlopen(url).read().decode('utf-8')

View File

@ -53,12 +53,10 @@ from youtube_dl.utils import (
parse_filesize,
parse_count,
parse_iso8601,
parse_resolution,
pkcs1pad,
read_batch_urls,
sanitize_filename,
sanitize_path,
sanitize_url,
expand_path,
prepend_extension,
replace_extension,
@ -221,12 +219,6 @@ class TestUtil(unittest.TestCase):
self.assertEqual(sanitize_path('./abc'), 'abc')
self.assertEqual(sanitize_path('./../abc'), '..\\abc')
def test_sanitize_url(self):
self.assertEqual(sanitize_url('//foo.bar'), 'http://foo.bar')
self.assertEqual(sanitize_url('httpss://foo.bar'), 'https://foo.bar')
self.assertEqual(sanitize_url('rmtps://foo.bar'), 'rtmps://foo.bar')
self.assertEqual(sanitize_url('https://foo.bar'), 'https://foo.bar')
def test_expand_path(self):
def env(var):
return '%{0}%'.format(var) if sys.platform == 'win32' else '${0}'.format(var)
@ -352,7 +344,6 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('2017-03-30T17:52:41Q'), 1490896361)
self.assertEqual(unified_timestamp('Sep 11, 2013 | 5:49 AM'), 1378878540)
self.assertEqual(unified_timestamp('December 15, 2017 at 7:49 am'), 1513324140)
self.assertEqual(unified_timestamp('2018-03-14T08:32:43.1493874+00:00'), 1521016363)
def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@ -823,9 +814,6 @@ class TestUtil(unittest.TestCase):
inp = '''{"duration": "00:01:07"}'''
self.assertEqual(js_to_json(inp), '''{"duration": "00:01:07"}''')
inp = '''{segments: [{"offset":-3.885780586188048e-16,"duration":39.75000000000001}]}'''
self.assertEqual(js_to_json(inp), '''{"segments": [{"offset":-3.885780586188048e-16,"duration":39.75000000000001}]}''')
def test_js_to_json_edgecases(self):
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
@ -897,13 +885,6 @@ class TestUtil(unittest.TestCase):
on = js_to_json('{/*comment\n*/42/*comment\n*/:/*comment\n*/42/*comment\n*/}')
self.assertEqual(json.loads(on), {'42': 42})
on = js_to_json('{42:4.2e1}')
self.assertEqual(json.loads(on), {'42': 42.0})
def test_js_to_json_malformed(self):
self.assertEqual(js_to_json('42a1'), '42"a1"')
self.assertEqual(js_to_json('42a-1'), '42"a"-1')
def test_extract_attributes(self):
self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})
@ -984,16 +965,6 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_count('1.1kk '), 1100000)
self.assertEqual(parse_count('1.1kk views'), 1100000)
def test_parse_resolution(self):
self.assertEqual(parse_resolution(None), {})
self.assertEqual(parse_resolution(''), {})
self.assertEqual(parse_resolution('1920x1080'), {'width': 1920, 'height': 1080})
self.assertEqual(parse_resolution('1920×1080'), {'width': 1920, 'height': 1080})
self.assertEqual(parse_resolution('1920 x 1080'), {'width': 1920, 'height': 1080})
self.assertEqual(parse_resolution('720p'), {'height': 720})
self.assertEqual(parse_resolution('4k'), {'height': 2160})
self.assertEqual(parse_resolution('8K'), {'height': 4320})
def test_version_tuple(self):
self.assertEqual(version_tuple('1'), (1,))
self.assertEqual(version_tuple('10.23.344'), (10, 23, 344))

View File

@ -1,34 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<playlist version="1" xmlns="http://xspf.org/ns/0/">
<date>2018-03-09T18:01:43Z</date>
<trackList>
<track>
<location>cd1/track%201.mp3</location>
<title>Pandemonium</title>
<creator>Foilverb</creator>
<annotation>Visit http://bigbrother404.bandcamp.com</annotation>
<album>Pandemonium EP</album>
<trackNum>1</trackNum>
<duration>202416</duration>
</track>
<track>
<location>../%E3%83%88%E3%83%A9%E3%83%83%E3%82%AF%E3%80%80%EF%BC%92.mp3</location>
<title>Final Cartridge (Nichico Twelve Remix)</title>
<annotation>Visit http://bigbrother404.bandcamp.com</annotation>
<creator>Foilverb</creator>
<album>Pandemonium EP</album>
<trackNum>2</trackNum>
<duration>255857</duration>
</track>
<track>
<location>track3.mp3</location>
<location>https://example.com/track3.mp3</location>
<title>Rebuilding Nightingale</title>
<annotation>Visit http://bigbrother404.bandcamp.com</annotation>
<creator>Foilverb</creator>
<album>Pandemonium EP</album>
<trackNum>3</trackNum>
<duration>287915</duration>
</track>
</trackList>
</playlist>

View File

@ -298,8 +298,7 @@ class YoutubeDL(object):
the downloader (see youtube_dl/downloader/common.py):
nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test,
noresizebuffer, retries, continuedl, noprogress, consoletitle,
xattr_set_filesize, external_downloader_args, hls_use_mpegts,
http_chunk_size.
xattr_set_filesize, external_downloader_args, hls_use_mpegts.
The following options are used by the post processors:
prefer_ffmpeg: If True, use ffmpeg instead of avconv if both are available,
@ -1033,7 +1032,7 @@ class YoutubeDL(object):
'!=': operator.ne,
}
operator_rex = re.compile(r'''(?x)\s*
(?P<key>width|height|tbr|abr|vbr|asr|filesize|filesize_approx|fps)
(?P<key>width|height|tbr|abr|vbr|asr|filesize|fps)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?P<value>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)
$

View File

@ -191,11 +191,6 @@ def _real_main(argv=None):
if numeric_buffersize is None:
parser.error('invalid buffer size specified')
opts.buffersize = numeric_buffersize
if opts.http_chunk_size is not None:
numeric_chunksize = FileDownloader.parse_bytes(opts.http_chunk_size)
if not numeric_chunksize:
parser.error('invalid http chunk size specified')
opts.http_chunk_size = numeric_chunksize
if opts.playliststart <= 0:
raise ValueError('Playlist start must be positive')
if opts.playlistend not in (-1, None) and opts.playlistend < opts.playliststart:
@ -351,7 +346,6 @@ def _real_main(argv=None):
'keep_fragments': opts.keep_fragments,
'buffersize': opts.buffersize,
'noresizebuffer': opts.noresizebuffer,
'http_chunk_size': opts.http_chunk_size,
'continuedl': opts.continue_dl,
'noprogress': opts.noprogress,
'progress_with_newline': opts.progress_with_newline,

View File

@ -1,8 +1,8 @@
from __future__ import unicode_literals
import base64
from math import ceil
from .compat import compat_b64decode
from .utils import bytes_to_intlist, intlist_to_bytes
BLOCK_SIZE_BYTES = 16
@ -180,7 +180,7 @@ def aes_decrypt_text(data, password, key_size_bytes):
"""
NONCE_LENGTH_BYTES = 8
data = bytes_to_intlist(compat_b64decode(data))
data = bytes_to_intlist(base64.b64decode(data.encode('utf-8')))
password = bytes_to_intlist(password.encode('utf-8'))
key = password[:key_size_bytes] + [0] * (key_size_bytes - len(password))

View File

@ -1,7 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import binascii
import collections
import ctypes
@ -2897,24 +2896,9 @@ except TypeError:
if isinstance(spec, compat_str):
spec = spec.encode('ascii')
return struct.unpack(spec, *args)
class compat_Struct(struct.Struct):
def __init__(self, fmt):
if isinstance(fmt, compat_str):
fmt = fmt.encode('ascii')
super(compat_Struct, self).__init__(fmt)
else:
compat_struct_pack = struct.pack
compat_struct_unpack = struct.unpack
if platform.python_implementation() == 'IronPython' and sys.version_info < (2, 7, 8):
class compat_Struct(struct.Struct):
def unpack(self, string):
if not isinstance(string, buffer): # noqa: F821
string = buffer(string) # noqa: F821
return super(compat_Struct, self).unpack(string)
else:
compat_Struct = struct.Struct
try:
from future_builtins import zip as compat_zip
@ -2924,16 +2908,6 @@ except ImportError: # not 2.6+ or is 3.x
except ImportError:
compat_zip = zip
if sys.version_info < (3, 3):
def compat_b64decode(s, *args, **kwargs):
if isinstance(s, compat_str):
s = s.encode('ascii')
return base64.b64decode(s, *args, **kwargs)
else:
compat_b64decode = base64.b64decode
if platform.python_implementation() == 'PyPy' and sys.pypy_version_info < (5, 4, 0):
# PyPy2 prior to version 5.4.0 expects byte strings as Windows function
# names, see the original PyPy issue [1] and the youtube-dl one [2].
@ -2956,8 +2930,6 @@ __all__ = [
'compat_HTMLParseError',
'compat_HTMLParser',
'compat_HTTPError',
'compat_Struct',
'compat_b64decode',
'compat_basestring',
'compat_chr',
'compat_cookiejar',

View File

@ -49,9 +49,6 @@ class FileDownloader(object):
external_downloader_args: A list of additional command-line arguments for the
external downloader.
hls_use_mpegts: Use the mpegts container for HLS videos.
http_chunk_size: Size of a chunk for chunk-based HTTP downloading. May be
useful for bypassing bandwidth throttling imposed by
a webserver (experimental)
Subclasses of this one must re-define the real_download method.
"""
@ -249,13 +246,12 @@ class FileDownloader(object):
if self.params.get('noprogress', False):
self.to_screen('[download] Download completed')
else:
msg_template = '100%%'
if s.get('total_bytes') is not None:
s['_total_bytes_str'] = format_bytes(s['total_bytes'])
msg_template += ' of %(_total_bytes_str)s'
if s.get('elapsed') is not None:
s['_elapsed_str'] = self.format_seconds(s['elapsed'])
msg_template += ' in %(_elapsed_str)s'
msg_template = '100%% of %(_total_bytes_str)s in %(_elapsed_str)s'
else:
msg_template = '100%% of %(_total_bytes_str)s'
self._report_progress_status(
msg_template % s, is_last_line=True)

View File

@ -1,10 +1,9 @@
from __future__ import unicode_literals
import os.path
import re
import subprocess
import sys
import time
import re
from .common import FileDownloader
from ..compat import (
@ -31,7 +30,6 @@ class ExternalFD(FileDownloader):
tmpfilename = self.temp_name(filename)
try:
started = time.time()
retval = self._call_downloader(tmpfilename, info_dict)
except KeyboardInterrupt:
if not info_dict.get('is_live'):
@ -43,20 +41,15 @@ class ExternalFD(FileDownloader):
self.to_screen('[%s] Interrupted by user' % self.get_basename())
if retval == 0:
status = {
'filename': filename,
'status': 'finished',
'elapsed': time.time() - started,
}
if filename != '-':
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen('\r[%s] Downloaded %s bytes' % (self.get_basename(), fsize))
self.try_rename(tmpfilename, filename)
status.update({
self._hook_progress({
'downloaded_bytes': fsize,
'total_bytes': fsize,
'filename': filename,
'status': 'finished',
})
self._hook_progress(status)
return True
else:
self.to_stderr('\n')

View File

@ -1,12 +1,12 @@
from __future__ import division, unicode_literals
import base64
import io
import itertools
import time
from .fragment import FragmentFD
from ..compat import (
compat_b64decode,
compat_etree_fromstring,
compat_urlparse,
compat_urllib_error,
@ -312,7 +312,7 @@ class F4mFD(FragmentFD):
boot_info = self._get_bootstrap_from_url(bootstrap_url)
else:
bootstrap_url = None
bootstrap = compat_b64decode(node.text)
bootstrap = base64.b64decode(node.text.encode('ascii'))
boot_info = read_bootstrap_info(bootstrap)
return boot_info, bootstrap_url
@ -349,7 +349,7 @@ class F4mFD(FragmentFD):
live = boot_info['live']
metadata_node = media.find(_add_ns('metadata'))
if metadata_node is not None:
metadata = compat_b64decode(metadata_node.text)
metadata = base64.b64decode(metadata_node.text.encode('ascii'))
else:
metadata = None

View File

@ -241,16 +241,12 @@ class FragmentFD(FileDownloader):
if os.path.isfile(ytdl_filename):
os.remove(ytdl_filename)
elapsed = time.time() - ctx['started']
if ctx['tmpfilename'] == '-':
downloaded_bytes = ctx['complete_frags_downloaded_bytes']
else:
self.try_rename(ctx['tmpfilename'], ctx['filename'])
downloaded_bytes = os.path.getsize(encodeFilename(ctx['filename']))
fsize = os.path.getsize(encodeFilename(ctx['filename']))
self._hook_progress({
'downloaded_bytes': downloaded_bytes,
'total_bytes': downloaded_bytes,
'downloaded_bytes': fsize,
'total_bytes': fsize,
'filename': ctx['filename'],
'status': 'finished',
'elapsed': elapsed,

View File

@ -75,9 +75,8 @@ class HlsFD(FragmentFD):
fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict)
def is_ad_fragment(s):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s or
s.startswith('#UPLYNK-SEGMENT') and s.endswith(',ad'))
def anvato_ad(s):
return s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s
media_frags = 0
ad_frags = 0
@ -87,7 +86,7 @@ class HlsFD(FragmentFD):
if not line:
continue
if line.startswith('#'):
if is_ad_fragment(line):
if anvato_ad(line):
ad_frags += 1
ad_frag_next = True
continue
@ -196,7 +195,7 @@ class HlsFD(FragmentFD):
'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]),
}
elif is_ad_fragment(line):
elif anvato_ad(line):
ad_frag_next = True
self._finish_frag_download(ctx)

View File

@ -4,18 +4,13 @@ import errno
import os
import socket
import time
import random
import re
from .common import FileDownloader
from ..compat import (
compat_str,
compat_urllib_error,
)
from ..compat import compat_urllib_error
from ..utils import (
ContentTooShortError,
encodeFilename,
int_or_none,
sanitize_open,
sanitized_Request,
write_xattr,
@ -43,26 +38,21 @@ class HttpFD(FileDownloader):
add_headers = info_dict.get('http_headers')
if add_headers:
headers.update(add_headers)
basic_request = sanitized_Request(url, None, headers)
request = sanitized_Request(url, None, headers)
is_test = self.params.get('test', False)
chunk_size = self._TEST_FILE_SIZE if is_test else (
info_dict.get('downloader_options', {}).get('http_chunk_size') or
self.params.get('http_chunk_size') or 0)
if is_test:
request.add_header('Range', 'bytes=0-%s' % str(self._TEST_FILE_SIZE - 1))
ctx.open_mode = 'wb'
ctx.resume_len = 0
ctx.data_len = None
ctx.block_size = self.params.get('buffersize', 1024)
ctx.start_time = time.time()
ctx.chunk_size = None
if self.params.get('continuedl', True):
# Establish possible resume length
if os.path.isfile(encodeFilename(ctx.tmpfilename)):
ctx.resume_len = os.path.getsize(
encodeFilename(ctx.tmpfilename))
ctx.is_resume = ctx.resume_len > 0
ctx.resume_len = os.path.getsize(encodeFilename(ctx.tmpfilename))
count = 0
retries = self.params.get('retries', 0)
@ -74,36 +64,11 @@ class HttpFD(FileDownloader):
def __init__(self, source_error):
self.source_error = source_error
class NextFragment(Exception):
pass
def set_range(req, start, end):
range_header = 'bytes=%d-' % start
if end:
range_header += compat_str(end)
req.add_header('Range', range_header)
def establish_connection():
ctx.chunk_size = (random.randint(int(chunk_size * 0.95), chunk_size)
if not is_test and chunk_size else chunk_size)
if ctx.resume_len > 0:
range_start = ctx.resume_len
if ctx.is_resume:
if ctx.resume_len != 0:
self.report_resuming_byte(ctx.resume_len)
request.add_header('Range', 'bytes=%d-' % ctx.resume_len)
ctx.open_mode = 'ab'
elif ctx.chunk_size > 0:
range_start = 0
else:
range_start = None
ctx.is_resume = False
range_end = range_start + ctx.chunk_size - 1 if ctx.chunk_size else None
if range_end and ctx.data_len is not None and range_end >= ctx.data_len:
range_end = ctx.data_len - 1
has_range = range_start is not None
ctx.has_range = has_range
request = sanitized_Request(url, None, headers)
if has_range:
set_range(request, range_start, range_end)
# Establish connection
try:
ctx.data = self.ydl.urlopen(request)
@ -112,24 +77,12 @@ class HttpFD(FileDownloader):
# that don't support resuming and serve a whole file with no Content-Range
# set in response despite of requested Range (see
# https://github.com/rg3/youtube-dl/issues/6057#issuecomment-126129799)
if has_range:
if ctx.resume_len > 0:
content_range = ctx.data.headers.get('Content-Range')
if content_range:
content_range_m = re.search(r'bytes (\d+)-(\d+)?(?:/(\d+))?', content_range)
content_range_m = re.search(r'bytes (\d+)-', content_range)
# Content-Range is present and matches requested Range, resume is possible
if content_range_m:
if range_start == int(content_range_m.group(1)):
content_range_end = int_or_none(content_range_m.group(2))
content_len = int_or_none(content_range_m.group(3))
accept_content_len = (
# Non-chunked download
not ctx.chunk_size or
# Chunked download and requested piece or
# its part is promised to be served
content_range_end == range_end or
content_len < range_end)
if accept_content_len:
ctx.data_len = content_len
if content_range_m and ctx.resume_len == int(content_range_m.group(1)):
return
# Content-Range is either not present or invalid. Assuming remote webserver is
# trying to send the whole file, resume is not possible, so wiping the local file
@ -137,15 +90,16 @@ class HttpFD(FileDownloader):
self.report_unable_to_resume()
ctx.resume_len = 0
ctx.open_mode = 'wb'
ctx.data_len = int_or_none(ctx.data.info().get('Content-length', None))
return
except (compat_urllib_error.HTTPError, ) as err:
if err.code == 416:
if (err.code < 500 or err.code >= 600) and err.code != 416:
# Unexpected HTTP error
raise
elif err.code == 416:
# Unable to resume (requested range not satisfiable)
try:
# Open the connection again without the range header
ctx.data = self.ydl.urlopen(
sanitized_Request(url, None, headers))
ctx.data = self.ydl.urlopen(basic_request)
content_length = ctx.data.info()['Content-Length']
except (compat_urllib_error.HTTPError, ) as err:
if err.code < 500 or err.code >= 600:
@ -176,9 +130,6 @@ class HttpFD(FileDownloader):
ctx.resume_len = 0
ctx.open_mode = 'wb'
return
elif err.code < 500 or err.code >= 600:
# Unexpected HTTP error
raise
raise RetryDownload(err)
except socket.error as err:
if err.errno != errno.ECONNRESET:
@ -209,7 +160,7 @@ class HttpFD(FileDownloader):
return False
byte_counter = 0 + ctx.resume_len
block_size = ctx.block_size
block_size = self.params.get('buffersize', 1024)
start = time.time()
# measure time over whole while-loop, so slow_down() and best_block_size() work together properly
@ -282,30 +233,25 @@ class HttpFD(FileDownloader):
# Progress message
speed = self.calc_speed(start, now, byte_counter - ctx.resume_len)
if ctx.data_len is None:
if data_len is None:
eta = None
else:
eta = self.calc_eta(start, time.time(), ctx.data_len - ctx.resume_len, byte_counter - ctx.resume_len)
eta = self.calc_eta(start, time.time(), data_len - ctx.resume_len, byte_counter - ctx.resume_len)
self._hook_progress({
'status': 'downloading',
'downloaded_bytes': byte_counter,
'total_bytes': ctx.data_len,
'total_bytes': data_len,
'tmpfilename': ctx.tmpfilename,
'filename': ctx.filename,
'eta': eta,
'speed': speed,
'elapsed': now - ctx.start_time,
'elapsed': now - start,
})
if is_test and byte_counter == data_len:
break
if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len:
ctx.resume_len = byte_counter
# ctx.block_size = block_size
raise NextFragment()
if ctx.stream is None:
self.to_stderr('\n')
self.report_error('Did not get any data blocks')
@ -330,7 +276,7 @@ class HttpFD(FileDownloader):
'total_bytes': byte_counter,
'filename': ctx.filename,
'status': 'finished',
'elapsed': time.time() - ctx.start_time,
'elapsed': time.time() - start,
})
return True
@ -344,8 +290,6 @@ class HttpFD(FileDownloader):
if count <= retries:
self.report_retry(e.source_error, count, retries)
continue
except NextFragment:
continue
except SucceedDownload:
return True

View File

@ -1,27 +1,25 @@
from __future__ import unicode_literals
import time
import struct
import binascii
import io
from .fragment import FragmentFD
from ..compat import (
compat_Struct,
compat_urllib_error,
)
from ..compat import compat_urllib_error
u8 = compat_Struct('>B')
u88 = compat_Struct('>Bx')
u16 = compat_Struct('>H')
u1616 = compat_Struct('>Hxx')
u32 = compat_Struct('>I')
u64 = compat_Struct('>Q')
u8 = struct.Struct(b'>B')
u88 = struct.Struct(b'>Bx')
u16 = struct.Struct(b'>H')
u1616 = struct.Struct(b'>Hxx')
u32 = struct.Struct(b'>I')
u64 = struct.Struct(b'>Q')
s88 = compat_Struct('>bx')
s16 = compat_Struct('>h')
s1616 = compat_Struct('>hxx')
s32 = compat_Struct('>i')
s88 = struct.Struct(b'>bx')
s16 = struct.Struct(b'>h')
s1616 = struct.Struct(b'>hxx')
s32 = struct.Struct(b'>i')
unity_matrix = (s32.pack(0x10000) + s32.pack(0) * 3) * 2 + s32.pack(0x40000000)
@ -141,7 +139,7 @@ def write_piff_header(stream, params):
sample_entry_payload += u16.pack(0x18) # depth
sample_entry_payload += s16.pack(-1) # pre defined
codec_private_data = binascii.unhexlify(params['codec_private_data'].encode('utf-8'))
codec_private_data = binascii.unhexlify(params['codec_private_data'])
if fourcc in ('H264', 'AVC1'):
sps, pps = codec_private_data.split(u32.pack(1))[1:]
avcc_payload = u8.pack(1) # configuration version

View File

@ -66,7 +66,7 @@ class AbcNewsIE(InfoExtractor):
_TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',
'info_dict': {
'id': '10505354',
'id': '10498713',
'ext': 'flv',
'display_id': 'dramatic-video-rare-death-job-america',
'title': 'Occupational Hazards',
@ -79,7 +79,7 @@ class AbcNewsIE(InfoExtractor):
}, {
'url': 'http://abcnews.go.com/Entertainment/justin-timberlake-performs-stop-feeling-eurovision-2016/story?id=39125818',
'info_dict': {
'id': '38897857',
'id': '39125818',
'ext': 'mp4',
'display_id': 'justin-timberlake-performs-stop-feeling-eurovision-2016',
'title': 'Justin Timberlake Drops Hints For Secret Single',

View File

@ -1,15 +1,13 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import json
import os
from .common import InfoExtractor
from ..aes import aes_cbc_decrypt
from ..compat import (
compat_b64decode,
compat_ord,
)
from ..compat import compat_ord
from ..utils import (
bytes_to_intlist,
ExtractorError,
@ -50,9 +48,9 @@ class ADNIE(InfoExtractor):
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(compat_b64decode(enc_subtitles[24:])),
bytes_to_intlist(b'\xc8\x6e\x06\xbc\xbe\xc6\x49\xf5\x88\x0d\xc8\x47\xc4\x27\x0c\x60'),
bytes_to_intlist(compat_b64decode(enc_subtitles[:24]))
bytes_to_intlist(base64.b64decode(enc_subtitles[24:])),
bytes_to_intlist(b'\x1b\xe0\x29\x61\x38\x94\x24\x00\x12\xbd\xc5\x80\xac\xce\xbe\xb0'),
bytes_to_intlist(base64.b64decode(enc_subtitles[:24]))
))
subtitles_json = self._parse_json(
dec_subtitles[:-compat_ord(dec_subtitles[-1])].decode(),
@ -107,18 +105,15 @@ class ADNIE(InfoExtractor):
options = player_config.get('options') or {}
metas = options.get('metas') or {}
title = metas.get('title') or video_info['title']
links = player_config.get('links') or {}
sub_path = player_config.get('subtitles')
error = None
if not links:
links_url = player_config.get('linksurl') or options['videoUrl']
links_url = player_config['linksurl']
links_data = self._download_json(urljoin(
self._BASE_URL, links_url), video_id)
links = links_data.get('links') or {}
metas = metas or links_data.get('meta') or {}
sub_path = sub_path or links_data.get('subtitles')
error = links_data.get('error')
title = metas.get('title') or video_info['title']
formats = []
for format_id, qualities in links.items():
@ -149,7 +144,7 @@ class ADNIE(InfoExtractor):
'description': strip_or_none(metas.get('summary') or video_info.get('resume')),
'thumbnail': video_info.get('image'),
'formats': formats,
'subtitles': self.extract_subtitles(sub_path, video_id),
'subtitles': self.extract_subtitles(player_config.get('subtitles'), video_id),
'episode': metas.get('subtitle') or video_info.get('videoTitle'),
'series': video_info.get('playlistTitle'),
}

View File

@ -122,8 +122,7 @@ class AENetworksIE(AENetworksBaseIE):
query = {
'mbr': 'true',
'assetTypes': 'high_video_ak',
'switch': 'hls_high_ak',
'assetTypes': 'high_video_s3'
}
video_id = self._html_search_meta('aetn:VideoID', webpage)
media_url = self._search_regex(

View File

@ -175,27 +175,10 @@ class AfreecaTVIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
if re.search(r'alert\(["\']This video has been deleted', webpage):
raise ExtractorError(
'Video %s has been deleted' % video_id, expected=True)
station_id = self._search_regex(
r'nStationNo\s*=\s*(\d+)', webpage, 'station')
bbs_id = self._search_regex(
r'nBbsNo\s*=\s*(\d+)', webpage, 'bbs')
video_id = self._search_regex(
r'nTitleNo\s*=\s*(\d+)', webpage, 'title', default=video_id)
print(video_id, station_id, bbs_id)
video_xml = self._download_xml(
'http://afbbs.afreecatv.com:8080/api/video/get_video_info.php',
video_id, headers={
'Referer': url,
}, query={
video_id, query={
'nTitleNo': video_id,
'nStationNo': station_id,
'nBbsNo': bbs_id,
'partialView': 'SKIP_ADULT',
})
@ -204,10 +187,10 @@ class AfreecaTVIE(InfoExtractor):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, flag), expected=True)
video_element = video_xml.findall(compat_xpath('./track/video'))[-1]
video_element = video_xml.findall(compat_xpath('./track/video'))[1]
if video_element is None or video_element.text is None:
raise ExtractorError(
'Video %s video does not exist' % video_id, expected=True)
raise ExtractorError('Specified AfreecaTV video does not exist',
expected=True)
video_url = video_element.text.strip()

View File

@ -11,7 +11,7 @@ from ..utils import (
class AMCNetworksIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|(?:we|sundance)tv)\.com/(?:movies|shows(?:/[^/]+)+)/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies|shows(?:/[^/]+)+)/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
'md5': '',
@ -51,9 +51,6 @@ class AMCNetworksIE(ThePlatformIE):
}, {
'url': 'http://www.wetv.com/shows/la-hair/videos/season-05/episode-09-episode-9-2/episode-9-sneak-peek-3',
'only_matching': True,
}, {
'url': 'https://www.sundancetv.com/shows/riviera/full-episodes/season-1/episode-01-episode-1',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -41,7 +41,7 @@ class ArchiveOrgIE(InfoExtractor):
webpage = self._download_webpage(
'http://archive.org/embed/' + video_id, video_id)
jwplayer_playlist = self._parse_json(self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\)",
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\);",
webpage, 'jwplayer playlist'), video_id)
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)

View File

@ -24,30 +24,57 @@ class ARDMediathekIE(InfoExtractor):
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{
# available till 26.07.2022
'url': 'http://www.ardmediathek.de/tv/S%C3%9CDLICHT/Was-ist-die-Kunst-der-Zukunft-liebe-Ann/BR-Fernsehen/Video?bcastId=34633636&documentId=44726822',
'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114',
'info_dict': {
'id': '44726822',
'id': '29582122',
'ext': 'mp4',
'title': 'Was ist die Kunst der Zukunft, liebe Anna McCarthy?',
'description': 'md5:4ada28b3e3b5df01647310e41f3a62f5',
'duration': 1740,
'title': 'Ich liebe das Leben trotzdem',
'description': 'md5:45e4c225c72b27993314b31a84a5261c',
'duration': 4557,
},
'params': {
# m3u8 download
'skip_download': True,
}
},
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
'info_dict': {
'id': '29522730',
'ext': 'mp4',
'title': 'Tatort: Scheinwelten - Hörfassung (Video tgl. ab 20 Uhr)',
'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
'duration': 5252,
},
'skip': 'HTTP Error 404: Not Found',
}, {
# audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
'only_matching': True,
'md5': '219d94d8980b4f538c7fcb0865eb7f2c',
'info_dict': {
'id': '28488308',
'ext': 'mp3',
'title': 'Tod eines Fußballers',
'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
'duration': 3240,
},
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
'only_matching': True,
}, {
# audio
'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
'only_matching': True,
'md5': '4e8f00631aac0395fee17368ac0e9867',
'info_dict': {
'id': '30796318',
'ext': 'mp3',
'title': 'Vor dem Fest',
'description': 'md5:c0c1c8048514deaed2a73b3a60eecacb',
'duration': 3287,
},
'skip': 'Video is no longer available',
}]
def _extract_media_info(self, media_info_url, webpage, video_id):
@ -225,23 +252,20 @@ class ARDMediathekIE(InfoExtractor):
class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TESTS = [{
# available till 14.02.2019
'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
'md5': '8e4ec85f31be7c7fc08a26cdbc5a1f49',
_TEST = {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'md5': 'd216c3a86493f9322545e045ddc3eb35',
'info_dict': {
'display_id': 'das-groko-drama-zerlegen-sich-die-volksparteien-video',
'id': '102',
'display_id': 'die-story-im-ersten-mission-unter-falscher-flagge',
'id': '100',
'ext': 'mp4',
'duration': 4435.0,
'title': 'Das GroKo-Drama: Zerlegen sich die Volksparteien?',
'upload_date': '20180214',
'duration': 2600,
'title': 'Die Story im Ersten: Mission unter falscher Flagge',
'upload_date': '20140804',
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'only_matching': True,
}]
'skip': 'HTTP Error 404: Not Found',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@ -1,13 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import re
from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_urllib_parse_unquote,
)
from ..compat import compat_urllib_parse_unquote
class BigflixIE(InfoExtractor):
@ -41,8 +39,8 @@ class BigflixIE(InfoExtractor):
webpage, 'title')
def decode_url(quoted_b64_url):
return compat_b64decode(compat_urllib_parse_unquote(
quoted_b64_url)).decode('utf-8')
return base64.b64decode(compat_urllib_parse_unquote(
quoted_b64_url).encode('ascii')).decode('utf-8')
formats = []
for height, encoded_url in re.findall(

View File

@ -27,14 +27,14 @@ class BiliBiliIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '5f7d29e1a2872f3df0cf76b1f87d3788',
'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
'info_dict': {
'id': '1074402',
'ext': 'flv',
'ext': 'mp4',
'title': '【金坷垃】金泡沫',
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
'duration': 308.067,
'timestamp': 1398012678,
'duration': 308.315,
'timestamp': 1398012660,
'upload_date': '20140420',
'thumbnail': r're:^https?://.+\.jpg',
'uploader': '菊子桑',
@ -59,38 +59,17 @@ class BiliBiliIE(InfoExtractor):
'url': 'http://www.bilibili.com/video/av8903802/',
'info_dict': {
'id': '8903802',
'ext': 'mp4',
'title': '阿滴英文|英文歌分享#6 "Closer',
'description': '滴妹今天唱Closer給你聽! 有史以来,被推最多次也是最久的歌曲,其实歌词跟我原本想像差蛮多的,不过还是好听! 微博@阿滴英文',
},
'playlist': [{
'info_dict': {
'id': '8903802_part1',
'ext': 'flv',
'title': '阿滴英文|英文歌分享#6 "Closer',
'description': 'md5:3b1b9e25b78da4ef87e9b548b88ee76a',
'uploader': '阿滴英文',
'uploader_id': '65880958',
'timestamp': 1488382634,
'timestamp': 1488382620,
'upload_date': '20170301',
},
'params': {
'skip_download': True, # Test metadata only
},
}, {
'info_dict': {
'id': '8903802_part2',
'ext': 'flv',
'title': '阿滴英文|英文歌分享#6 "Closer',
'description': 'md5:3b1b9e25b78da4ef87e9b548b88ee76a',
'uploader': '阿滴英文',
'uploader_id': '65880958',
'timestamp': 1488382634,
'upload_date': '20170301',
},
'params': {
'skip_download': True, # Test metadata only
},
}]
}]
_APP_KEY = '84956560bc028eb7'
@ -113,13 +92,9 @@ class BiliBiliIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
if 'anime/' not in url:
cid = self._search_regex(
r'cid(?:["\']:|=)(\d+)', webpage, 'cid',
default=None
) or compat_parse_qs(self._search_regex(
[r'1EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
r'1EmbedPlayer\([^)]+,\s*\\"([^"]+)\\"\)',
r'1<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
cid = compat_parse_qs(self._search_regex(
[r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
webpage, 'player parameters'))['cid'][0]
else:
if 'no_bangumi_tip' not in smuggled_data:
@ -127,7 +102,6 @@ class BiliBiliIE(InfoExtractor):
video_id, anime_id, compat_urlparse.urljoin(url, '//bangumi.bilibili.com/anime/%s' % anime_id)))
headers = {
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Referer': url
}
headers.update(self.geo_verification_headers())
@ -139,31 +113,19 @@ class BiliBiliIE(InfoExtractor):
self._report_error(js)
cid = js['result']['cid']
headers = {
'Referer': url
}
headers.update(self.geo_verification_headers())
entries = []
RENDITIONS = ('qn=80&quality=80&type=', 'quality=2&type=mp4')
for num, rendition in enumerate(RENDITIONS, start=1):
payload = 'appkey=%s&cid=%s&otype=json&%s' % (self._APP_KEY, cid, rendition)
payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
video_info = self._download_json(
'http://interface.bilibili.com/v2/playurl?%s&sign=%s' % (payload, sign),
'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign),
video_id, note='Downloading video info page',
headers=headers, fatal=num == len(RENDITIONS))
if not video_info:
continue
headers=self.geo_verification_headers())
if 'durl' not in video_info:
if num < len(RENDITIONS):
continue
self._report_error(video_info)
entries = []
for idx, durl in enumerate(video_info['durl']):
formats = [{
'url': durl['url'],
@ -188,17 +150,11 @@ class BiliBiliIE(InfoExtractor):
'duration': float_or_none(durl.get('length'), 1000),
'formats': formats,
})
break
title = self._html_search_regex(
('<h1[^>]+\btitle=(["\'])(?P<title>(?:(?!\1).)+)\1',
'(?s)<h1[^>]*>(?P<title>.+?)</h1>'), webpage, 'title',
group='title')
title = self._html_search_regex('<h1[^>]*>([^<]+)</h1>', webpage, 'title')
description = self._html_search_meta('description', webpage)
timestamp = unified_timestamp(self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time',
default=None) or self._html_search_meta(
'uploadDate', webpage, 'timestamp', default=None))
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', default=None))
thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage)
# TODO 'view_count' requires deobfuscating Javascript
@ -212,16 +168,13 @@ class BiliBiliIE(InfoExtractor):
}
uploader_mobj = re.search(
r'<a[^>]+href="(?:https?:)?//space\.bilibili\.com/(?P<id>\d+)"[^>]*>(?P<name>[^<]+)',
r'<a[^>]+href="(?:https?:)?//space\.bilibili\.com/(?P<id>\d+)"[^>]+title="(?P<name>[^"]+)"',
webpage)
if uploader_mobj:
info.update({
'uploader': uploader_mobj.group('name'),
'uploader_id': uploader_mobj.group('id'),
})
if not info.get('uploader'):
info['uploader'] = self._html_search_meta(
'author', webpage, 'uploader', default=None)
for entry in entries:
entry.update(info)

View File

@ -564,7 +564,7 @@ class BrightcoveNewIE(AdobePassIE):
return entries
def _parse_brightcove_metadata(self, json_data, video_id, headers={}):
def _parse_brightcove_metadata(self, json_data, video_id):
title = json_data['name'].strip()
formats = []
@ -638,9 +638,6 @@ class BrightcoveNewIE(AdobePassIE):
self._sort_formats(formats)
for f in formats:
f.setdefault('http_headers', {}).update(headers)
subtitles = {}
for text_track in json_data.get('text_tracks', []):
if text_track.get('src'):
@ -693,17 +690,10 @@ class BrightcoveNewIE(AdobePassIE):
webpage, 'policy key', group='pk')
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s' % (account_id, video_id)
headers = {
'Accept': 'application/json;pk=%s' % policy_key,
}
referrer = smuggled_data.get('referrer')
if referrer:
headers.update({
'Referer': referrer,
'Origin': re.search(r'https?://[^/]+', referrer).group(0),
})
try:
json_data = self._download_json(api_url, video_id, headers=headers)
json_data = self._download_json(api_url, video_id, headers={
'Accept': 'application/json;pk=%s' % policy_key
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
@ -727,5 +717,4 @@ class BrightcoveNewIE(AdobePassIE):
'tveToken': tve_token,
})
return self._parse_brightcove_metadata(
json_data, video_id, headers=headers)
return self._parse_brightcove_metadata(json_data, video_id)

View File

@ -31,10 +31,6 @@ class Canalc2IE(InfoExtractor):
webpage = self._download_webpage(
'http://www.canalc2.tv/video/%s' % video_id, video_id)
title = self._html_search_regex(
r'(?s)class="[^"]*col_description[^"]*">.*?<h3>(.+?)</h3>',
webpage, 'title')
formats = []
for _, video_url in re.findall(r'file\s*=\s*(["\'])(.+?)\1', webpage):
if video_url.startswith('rtmp://'):
@ -53,21 +49,17 @@ class Canalc2IE(InfoExtractor):
'url': video_url,
'format_id': 'http',
})
self._sort_formats(formats)
if formats:
info = {
'formats': formats,
}
else:
info = self._parse_html5_media_entries(url, webpage, url)[0]
title = self._html_search_regex(
r'(?s)class="[^"]*col_description[^"]*">.*?<h3>(.*?)</h3>', webpage, 'title')
duration = parse_duration(self._search_regex(
r'id=["\']video_duree["\'][^>]*>([^<]+)',
webpage, 'duration', fatal=False))
self._sort_formats(info['formats'])
info.update({
return {
'id': video_id,
'title': title,
'duration': parse_duration(self._search_regex(
r'id=["\']video_duree["\'][^>]*>([^<]+)',
webpage, 'duration', fatal=False)),
})
return info
'duration': duration,
'formats': formats,
}

View File

@ -4,36 +4,59 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse
from ..utils import (
dict_get,
# ExtractorError,
# HEADRequest,
int_or_none,
qualities,
remove_end,
unified_strdate,
)
class CanalplusIE(InfoExtractor):
IE_DESC = 'mycanal.fr and piwiplus.fr'
_VALID_URL = r'https?://(?:www\.)?(?P<site>mycanal|piwiplus)\.fr/(?:[^/]+/)*(?P<display_id>[^?/]+)(?:\.html\?.*\bvid=|/p/)(?P<id>\d+)'
IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
_VALID_URL = r'''(?x)
https?://
(?:
(?:
(?:(?:www|m)\.)?canalplus\.fr|
(?:www\.)?piwiplus\.fr|
(?:www\.)?d8\.tv|
(?:www\.)?c8\.fr|
(?:www\.)?d17\.tv|
(?:(?:football|www)\.)?cstar\.fr|
(?:www\.)?itele\.fr
)/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
player\.canalplus\.fr/#/(?P<id>\d+)
)
'''
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json'
_SITE_ID_MAP = {
'mycanal': 'cplus',
'canalplus': 'cplus',
'piwiplus': 'teletoon',
'd8': 'd8',
'c8': 'd8',
'd17': 'd17',
'cstar': 'd17',
'itele': 'itele',
}
# Only works for direct mp4 URLs
_GEO_COUNTRIES = ['FR']
_TESTS = [{
'url': 'https://www.mycanal.fr/d17-emissions/lolywood/p/1397061',
'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814',
'info_dict': {
'id': '1397061',
'display_id': 'lolywood',
'id': '1405510',
'display_id': 'pid1830-c-zapping',
'ext': 'mp4',
'title': 'Euro 2016 : Je préfère te prévenir - Lolywood - Episode 34',
'description': 'md5:7d97039d455cb29cdba0d652a0efaa5e',
'upload_date': '20160602',
'title': 'Zapping - 02/07/2016',
'description': 'Le meilleur de toutes les chaînes, tous les jours',
'upload_date': '20160702',
},
}, {
# geo restricted, bypassed
@ -47,12 +70,64 @@ class CanalplusIE(InfoExtractor):
'upload_date': '20140724',
},
'expected_warnings': ['HTTP Error 403: Forbidden'],
}, {
# geo restricted, bypassed
'url': 'http://www.c8.fr/c8-divertissement/ms-touche-pas-a-mon-poste/pid6318-videos-integrales.html?vid=1443684',
'md5': 'bb6f9f343296ab7ebd88c97b660ecf8d',
'info_dict': {
'id': '1443684',
'display_id': 'pid6318-videos-integrales',
'ext': 'mp4',
'title': 'Guess my iep ! - TPMP - 07/04/2017',
'description': 'md5:6f005933f6e06760a9236d9b3b5f17fa',
'upload_date': '20170407',
},
'expected_warnings': ['HTTP Error 403: Forbidden'],
}, {
'url': 'http://www.itele.fr/chroniques/invite-michael-darmon/rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'info_dict': {
'id': '1420176',
'display_id': 'rachida-dati-nicolas-sarkozy-est-le-plus-en-phase-avec-les-inquietudes-des-francais-171510',
'ext': 'mp4',
'title': 'L\'invité de Michaël Darmon du 14/10/2016 - ',
'description': 'Chaque matin du lundi au vendredi, Michaël Darmon reçoit un invité politique à 8h25.',
'upload_date': '20161014',
},
}, {
'url': 'http://football.cstar.fr/cstar-minisite-foot/pid7566-feminines-videos.html?vid=1416769',
'info_dict': {
'id': '1416769',
'display_id': 'pid7566-feminines-videos',
'ext': 'mp4',
'title': 'France - Albanie : les temps forts de la soirée - 20/09/2016',
'description': 'md5:c3f30f2aaac294c1c969b3294de6904e',
'upload_date': '20160921',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://m.canalplus.fr/?vid=1398231',
'only_matching': True,
}, {
'url': 'http://www.d17.tv/emissions/pid8303-lolywood.html?vid=1397061',
'only_matching': True,
}]
def _real_extract(self, url):
site, display_id, video_id = re.match(self._VALID_URL, url).groups()
mobj = re.match(self._VALID_URL, url)
site_id = self._SITE_ID_MAP[site]
site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]]
# Beware, some subclasses do not define an id group
display_id = remove_end(dict_get(mobj.groupdict(), ('display_id', 'id', 'vid')), '.html')
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
[r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)',
r'id=["\']canal_video_player(?P<id>\d+)',
r'data-video=["\'](?P<id>\d+)'],
webpage, 'video id', default=mobj.group('vid'), group='id')
info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
video_data = self._download_json(info_url, video_id, 'Downloading video JSON')
@ -86,7 +161,7 @@ class CanalplusIE(InfoExtractor):
format_url + '?hdcore=2.11.3', video_id, f4m_id=format_id, fatal=False))
else:
formats.append({
# the secret extracted from ya function in http://player.canalplus.fr/common/js/canalPlayer.js
# the secret extracted ya function in http://player.canalplus.fr/common/js/canalPlayer.js
'url': format_url + '?secret=pqzerjlsmdkjfoiuerhsdlfknaes',
'format_id': format_id,
'preference': preference(format_id),

View File

@ -246,7 +246,7 @@ class VrtNUIE(GigyaBaseIE):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(url, display_id)
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'(?ms)<h1 class="content__heading">(.+?)</h1>',
@ -276,7 +276,7 @@ class VrtNUIE(GigyaBaseIE):
webpage, 'release_date', default=None))
# If there's a ? or a # in the URL, remove them and everything after
clean_url = urlh.geturl().split('?')[0].split('#')[0].strip('/')
clean_url = url.split('?')[0].split('#')[0].strip('/')
securevideo_url = clean_url + '.mssecurevideo.json'
try:

View File

@ -1,7 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
@ -14,7 +13,6 @@ from ..utils import (
xpath_element,
xpath_with_ns,
find_xpath_attr,
parse_duration,
parse_iso8601,
parse_age_limit,
int_or_none,
@ -361,63 +359,3 @@ class CBCWatchIE(CBCWatchBaseIE):
video_id = self._match_id(url)
rss = self._call_api('web/browse/' + video_id, video_id)
return self._parse_rss_feed(rss)
class CBCOlympicsIE(InfoExtractor):
IE_NAME = 'cbc.ca:olympics'
_VALID_URL = r'https?://olympics\.cbc\.ca/video/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://olympics.cbc.ca/video/whats-on-tv/olympic-morning-featuring-the-opening-ceremony/',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._hidden_inputs(webpage)['videoId']
video_doc = self._download_xml(
'https://olympics.cbc.ca/videodata/%s.xml' % video_id, video_id)
title = xpath_text(video_doc, 'title', fatal=True)
is_live = xpath_text(video_doc, 'kind') == 'Live'
if is_live:
title = self._live_title(title)
formats = []
for video_source in video_doc.findall('videoSources/videoSource'):
uri = xpath_text(video_source, 'uri')
if not uri:
continue
tokenize = self._download_json(
'https://olympics.cbc.ca/api/api-akamai/tokenize',
video_id, data=json.dumps({
'VideoSource': uri,
}).encode(), headers={
'Content-Type': 'application/json',
'Referer': url,
# d3.VideoPlayer._init in https://olympics.cbc.ca/components/script/base.js
'Cookie': '_dvp=TK:C0ObxjerU', # AKAMAI CDN cookie
}, fatal=False)
if not tokenize:
continue
content_url = tokenize['ContentUrl']
video_source_format = video_source.get('format')
if video_source_format == 'IIS':
formats.extend(self._extract_ism_formats(
content_url, video_id, ism_id=video_source_format, fatal=False))
else:
formats.extend(self._extract_m3u8_formats(
content_url, video_id, 'mp4',
'm3u8' if is_live else 'm3u8_native',
m3u8_id=video_source_format, fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': xpath_text(video_doc, 'description'),
'thumbnail': xpath_text(video_doc, 'thumbnailUrl'),
'duration': parse_duration(xpath_text(video_doc, 'duration')),
'formats': formats,
'is_live': is_live,
}

View File

@ -2,7 +2,6 @@ from __future__ import unicode_literals
from .theplatform import ThePlatformFeedIE
from ..utils import (
ExtractorError,
int_or_none,
find_xpath_attr,
xpath_element,
@ -62,7 +61,6 @@ class CBSIE(CBSBaseIE):
asset_types = []
subtitles = {}
formats = []
last_e = None
for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType')
if not asset_type or asset_type in asset_types:
@ -76,17 +74,11 @@ class CBSIE(CBSBaseIE):
query['formats'] = 'MPEG4,M3U'
elif asset_type in ('RTMP', 'WIFI', '3G'):
query['formats'] = 'MPEG4,FLV'
try:
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(tp_release_url, query), content_id,
'Downloading %s SMIL data' % asset_type)
except ExtractorError as e:
last_e = e
continue
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
if last_e and not formats:
raise last_e
self._sort_formats(formats)
info = self._extract_theplatform_metadata(tp_path, content_id)

View File

@ -75,10 +75,10 @@ class CBSInteractiveIE(CBSIE):
webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex(
r"data(?:-(?:cnet|zdnet))?-video(?:-(?:uvp(?:js)?|player))?-options='([^']+)'",
r"data-(?:cnet|zdnet)-video(?:-uvp(?:js)?)?-options='([^']+)'",
webpage, 'data json')
data = self._parse_json(data_json, display_id)
vdata = data.get('video') or (data.get('videos') or data.get('playlist'))[0]
vdata = data.get('video') or data['videos'][0]
video_id = vdata['mpxRefId']

View File

@ -13,7 +13,6 @@ from ..utils import (
float_or_none,
sanitized_Request,
unescapeHTML,
update_url_query,
urlencode_postdata,
USER_AGENTS,
)
@ -266,10 +265,6 @@ class CeskaTelevizePoradyIE(InfoExtractor):
# m3u8 download
'skip_download': True,
},
}, {
# iframe embed
'url': 'http://www.ceskatelevize.cz/porady/10614999031-neviditelni/21251212048/',
'only_matching': True,
}]
def _real_extract(self, url):
@ -277,11 +272,8 @@ class CeskaTelevizePoradyIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
data_url = update_url_query(unescapeHTML(self._search_regex(
(r'<span[^>]*\bdata-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?ceskatelevize\.cz/ivysilani/embed/iFramePlayer\.php.*?)\1'),
webpage, 'iframe player url', group='url')), query={
'autoStart': 'true',
})
data_url = unescapeHTML(self._search_regex(
r'<span[^>]*\bdata-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'iframe player url', group='url'))
return self.url_result(data_url, ie=CeskaTelevizeIE.ie_key())

View File

@ -1,11 +1,11 @@
from __future__ import unicode_literals
import re
import base64
import json
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..compat import compat_b64decode
from ..utils import (
clean_html,
ExtractorError
@ -58,7 +58,7 @@ class ChilloutzoneIE(InfoExtractor):
base64_video_info = self._html_search_regex(
r'var cozVidData = "(.+?)";', webpage, 'video data')
decoded_video_info = compat_b64decode(base64_video_info).decode('utf-8')
decoded_video_info = base64.b64decode(base64_video_info.encode('utf-8')).decode('utf-8')
video_info_dict = json.loads(decoded_video_info)
# get video information from dict

View File

@ -1,10 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import re
from .common import InfoExtractor
from ..compat import compat_b64decode
from ..utils import parse_duration
@ -44,7 +44,8 @@ class ChirbitIE(InfoExtractor):
# Reverse engineered from https://chirb.it/js/chirbit.player.js (look
# for soundURL)
audio_url = compat_b64decode(data_fd[::-1]).decode('utf-8')
audio_url = base64.b64decode(
data_fd[::-1].encode('ascii')).decode('utf-8')
title = self._search_regex(
r'class=["\']chirbit-title["\'][^>]*>([^<]+)', webpage, 'title')

View File

@ -174,8 +174,6 @@ class InfoExtractor(object):
width : height ratio as float.
* no_resume The server does not support resuming the
(HTTP or RTMP) download. Boolean.
* downloader_options A dictionary of downloader options as
described in FileDownloader
url: Final video URL.
ext: Video filename extension.
@ -644,31 +642,19 @@ class InfoExtractor(object):
content, _ = res
return content
def _download_xml_handle(
self, url_or_request, video_id, note='Downloading XML',
errnote='Unable to download XML', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}):
"""Return a tuple (xml as an xml.etree.ElementTree.Element, URL handle)"""
res = self._download_webpage_handle(
url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query)
if res is False:
return res
xml_string, urlh = res
return self._parse_xml(
xml_string, video_id, transform_source=transform_source,
fatal=fatal), urlh
def _download_xml(self, url_or_request, video_id,
note='Downloading XML', errnote='Unable to download XML',
transform_source=None, fatal=True, encoding=None,
data=None, headers={}, query={}):
"""Return the xml as an xml.etree.ElementTree.Element"""
res = self._download_xml_handle(
url_or_request, video_id, note=note, errnote=errnote,
transform_source=transform_source, fatal=fatal, encoding=encoding,
data=data, headers=headers, query=query)
return res if res is False else res[0]
xml_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query)
if xml_string is False:
return xml_string
return self._parse_xml(
xml_string, video_id, transform_source=transform_source,
fatal=fatal)
def _parse_xml(self, xml_string, video_id, transform_source=None, fatal=True):
if transform_source:
@ -1041,7 +1027,7 @@ class InfoExtractor(object):
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type in ('Article', 'NewsArticle'):
elif item_type == 'Article':
info.update({
'timestamp': parse_iso8601(e.get('datePublished')),
'title': unescapeHTML(e.get('headline')),
@ -1706,24 +1692,22 @@ class InfoExtractor(object):
})
return subtitles
def _extract_xspf_playlist(self, xspf_url, playlist_id, fatal=True):
def _extract_xspf_playlist(self, playlist_url, playlist_id, fatal=True):
xspf = self._download_xml(
xspf_url, playlist_id, 'Downloading xpsf playlist',
playlist_url, playlist_id, 'Downloading xpsf playlist',
'Unable to download xspf manifest', fatal=fatal)
if xspf is False:
return []
return self._parse_xspf(
xspf, playlist_id, xspf_url=xspf_url,
xspf_base_url=base_url(xspf_url))
return self._parse_xspf(xspf, playlist_id)
def _parse_xspf(self, xspf_doc, playlist_id, xspf_url=None, xspf_base_url=None):
def _parse_xspf(self, playlist, playlist_id):
NS_MAP = {
'xspf': 'http://xspf.org/ns/0/',
's1': 'http://static.streamone.nl/player/ns/0',
}
entries = []
for track in xspf_doc.findall(xpath_with_ns('./xspf:trackList/xspf:track', NS_MAP)):
for track in playlist.findall(xpath_with_ns('./xspf:trackList/xspf:track', NS_MAP)):
title = xpath_text(
track, xpath_with_ns('./xspf:title', NS_MAP), 'title', default=playlist_id)
description = xpath_text(
@ -1733,18 +1717,12 @@ class InfoExtractor(object):
duration = float_or_none(
xpath_text(track, xpath_with_ns('./xspf:duration', NS_MAP), 'duration'), 1000)
formats = []
for location in track.findall(xpath_with_ns('./xspf:location', NS_MAP)):
format_url = urljoin(xspf_base_url, location.text)
if not format_url:
continue
formats.append({
'url': format_url,
'manifest_url': xspf_url,
formats = [{
'url': location.text,
'format_id': location.get(xpath_with_ns('s1:label', NS_MAP)),
'width': int_or_none(location.get(xpath_with_ns('s1:width', NS_MAP))),
'height': int_or_none(location.get(xpath_with_ns('s1:height', NS_MAP))),
})
} for location in track.findall(xpath_with_ns('./xspf:location', NS_MAP))]
self._sort_formats(formats)
entries.append({
@ -1758,18 +1736,18 @@ class InfoExtractor(object):
return entries
def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}):
res = self._download_xml_handle(
res = self._download_webpage_handle(
mpd_url, video_id,
note=note or 'Downloading MPD manifest',
errnote=errnote or 'Failed to download MPD manifest',
fatal=fatal)
if res is False:
return []
mpd_doc, urlh = res
mpd, urlh = res
mpd_base_url = base_url(urlh.geturl())
return self._parse_mpd_formats(
mpd_doc, mpd_id=mpd_id, mpd_base_url=mpd_base_url,
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url,
formats_dict=formats_dict, mpd_url=mpd_url)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None):
@ -2043,16 +2021,17 @@ class InfoExtractor(object):
return formats
def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnote=None, fatal=True):
res = self._download_xml_handle(
res = self._download_webpage_handle(
ism_url, video_id,
note=note or 'Downloading ISM manifest',
errnote=errnote or 'Failed to download ISM manifest',
fatal=fatal)
if res is False:
return []
ism_doc, urlh = res
ism, urlh = res
return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id)
return self._parse_ism_formats(
compat_etree_fromstring(ism.encode('utf-8')), urlh.geturl(), ism_id)
def _parse_ism_formats(self, ism_doc, ism_url, ism_id=None):
"""
@ -2150,8 +2129,8 @@ class InfoExtractor(object):
return formats
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8', mpd_id=None, preference=None):
def absolute_url(item_url):
return urljoin(base_url, item_url)
def absolute_url(video_url):
return compat_urlparse.urljoin(base_url, video_url)
def parse_content_type(content_type):
if not content_type:
@ -2208,7 +2187,7 @@ class InfoExtractor(object):
if src:
_, formats = _media_formats(src, media_type)
media_info['formats'].extend(formats)
media_info['thumbnail'] = absolute_url(media_attributes.get('poster'))
media_info['thumbnail'] = media_attributes.get('poster')
if media_content:
for source_tag in re.findall(r'<source[^>]+>', media_content):
source_attributes = extract_attributes(source_tag)
@ -2269,10 +2248,9 @@ class InfoExtractor(object):
def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]):
query = compat_urlparse.urlparse(url).query
url = re.sub(r'/(?:manifest|playlist|jwplayer)\.(?:m3u8|f4m|mpd|smil)', '', url)
mobj = re.search(
r'(?:(?:http|rtmp|rtsp)(?P<s>s)?:)?(?P<url>//[^?]+)', url)
url_base = mobj.group('url')
http_base_url = '%s%s:%s' % ('http', mobj.group('s') or '', url_base)
url_base = self._search_regex(
r'(?:(?:https?|rtmp|rtsp):)?(//[^?]+)', url, 'format url')
http_base_url = '%s:%s' % ('http', url_base)
formats = []
def manifest_url(manifest):
@ -2372,10 +2350,7 @@ class InfoExtractor(object):
for track in tracks:
if not isinstance(track, dict):
continue
track_kind = track.get('kind')
if not track_kind or not isinstance(track_kind, compat_str):
continue
if track_kind.lower() not in ('captions', 'subtitles'):
if track.get('kind') != 'captions':
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
@ -2429,7 +2404,7 @@ class InfoExtractor(object):
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=m3u8_id, fatal=False))
elif source_type == 'dash' or ext == 'mpd':
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, video_id, mpd_id=mpd_id, fatal=False))
elif ext == 'smil':

View File

@ -1,45 +1,31 @@
# coding: utf-8
from __future__ import unicode_literals, division
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_HTTPError,
)
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
parse_age_limit,
parse_duration,
ExtractorError
)
from ..utils import int_or_none
class CrackleIE(InfoExtractor):
_GEO_COUNTRIES = ['US']
_VALID_URL = r'(?:crackle:|https?://(?:(?:www|m)\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
_TEST = {
# geo restricted to CA
'url': 'https://www.crackle.com/andromeda/2502343',
'url': 'http://www.crackle.com/comedians-in-cars-getting-coffee/2498934',
'info_dict': {
'id': '2502343',
'id': '2498934',
'ext': 'mp4',
'title': 'Under The Night',
'description': 'md5:d2b8ca816579ae8a7bf28bfff8cefc8a',
'duration': 2583,
'view_count': int,
'average_rating': 0,
'age_limit': 14,
'genre': 'Action, Sci-Fi',
'creator': 'Allan Kroeker',
'artist': 'Keith Hamilton Cobb, Kevin Sorbo, Lisa Ryder, Lexa Doig, Robert Hewitt Wolfe',
'release_year': 2000,
'series': 'Andromeda',
'episode': 'Under The Night',
'season_number': 1,
'episode_number': 1,
'title': 'Everybody Respects A Bloody Nose',
'description': 'Jerry is kaffeeklatsching in L.A. with funnyman J.B. Smoove (Saturday Night Live, Real Husbands of Hollywood). Theyre headed for brew at 10 Speed Coffee in a 1964 Studebaker Avanti.',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 906,
'series': 'Comedians In Cars Getting Coffee',
'season_number': 8,
'episode_number': 4,
'subtitles': {
'en-US': [
{'ext': 'vtt'},
{'ext': 'tt'},
]
},
},
'params': {
# m3u8 download
@ -47,118 +33,109 @@ class CrackleIE(InfoExtractor):
}
}
_THUMBNAIL_RES = [
(120, 90),
(208, 156),
(220, 124),
(220, 220),
(240, 180),
(250, 141),
(315, 236),
(320, 180),
(360, 203),
(400, 300),
(421, 316),
(460, 330),
(460, 460),
(462, 260),
(480, 270),
(587, 330),
(640, 480),
(700, 330),
(700, 394),
(854, 480),
(1024, 1024),
(1920, 1080),
]
# extracted from http://legacyweb-us.crackle.com/flash/ReferrerRedirect.ashx
_MEDIA_FILE_SLOTS = {
'c544.flv': {
'width': 544,
'height': 306,
},
'360p.mp4': {
'width': 640,
'height': 360,
},
'480p.mp4': {
'width': 852,
'height': 478,
},
'480p_1mbps.mp4': {
'width': 852,
'height': 478,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
country_code = self._downloader.params.get('geo_bypass_country', None)
countries = [country_code] if country_code else (
'US', 'AU', 'CA', 'AS', 'FM', 'GU', 'MP', 'PR', 'PW', 'MH', 'VI')
config_doc = self._download_xml(
'http://legacyweb-us.crackle.com/flash/QueryReferrer.ashx?site=16',
video_id, 'Downloading config')
last_e = None
for country in countries:
try:
media = self._download_json(
'https://web-api-us.crackle.com/Service.svc/details/media/%s/%s'
% (video_id, country), video_id,
'Downloading media JSON as %s' % country,
'Unable to download media JSON', query={
'disableProtocols': 'true',
'format': 'json'
})
except ExtractorError as e:
# 401 means geo restriction, trying next country
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
last_e = e
continue
raise
media_urls = media.get('MediaURLs')
if not media_urls or not isinstance(media_urls, list):
continue
title = media['Title']
formats = []
for e in media['MediaURLs']:
if e.get('UseDRM') is True:
continue
format_url = e.get('Path')
if not format_url or not isinstance(format_url, compat_str):
continue
ext = determine_ext(format_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats)
description = media.get('Description')
duration = int_or_none(media.get(
'DurationInSeconds')) or parse_duration(media.get('Duration'))
view_count = int_or_none(media.get('CountViews'))
average_rating = float_or_none(media.get('UserRating'))
age_limit = parse_age_limit(media.get('Rating'))
genre = media.get('Genre')
release_year = int_or_none(media.get('ReleaseYear'))
creator = media.get('Directors')
artist = media.get('Cast')
if media.get('MediaTypeDisplayValue') == 'Full Episode':
series = media.get('ShowName')
episode = title
season_number = int_or_none(media.get('Season'))
episode_number = int_or_none(media.get('Episode'))
else:
series = episode = season_number = episode_number = None
item = self._download_xml(
'http://legacyweb-us.crackle.com/app/revamp/vidwallcache.aspx?flags=-1&fm=%s' % video_id,
video_id, headers=self.geo_verification_headers()).find('i')
title = item.attrib['t']
subtitles = {}
cc_files = media.get('ClosedCaptionFiles')
if isinstance(cc_files, list):
for cc_file in cc_files:
if not isinstance(cc_file, dict):
continue
cc_url = cc_file.get('Path')
if not cc_url or not isinstance(cc_url, compat_str):
continue
lang = cc_file.get('Locale') or 'en'
subtitles.setdefault(lang, []).append({'url': cc_url})
formats = self._extract_m3u8_formats(
'http://content.uplynk.com/ext/%s/%s.m3u8' % (config_doc.attrib['strUplynkOwnerId'], video_id),
video_id, 'mp4', m3u8_id='hls', fatal=None)
thumbnails = []
images = media.get('Images')
if isinstance(images, list):
for image_key, image_url in images.items():
mobj = re.search(r'Img_(\d+)[xX](\d+)', image_key)
if not mobj:
continue
path = item.attrib.get('p')
if path:
for width, height in self._THUMBNAIL_RES:
res = '%dx%d' % (width, height)
thumbnails.append({
'url': image_url,
'width': int(mobj.group(1)),
'height': int(mobj.group(2)),
'id': res,
'url': 'http://images-us-am.crackle.com/%stnl_%s.jpg' % (path, res),
'width': width,
'height': height,
'resolution': res,
})
http_base_url = 'http://ahttp.crackle.com/' + path
for mfs_path, mfs_info in self._MEDIA_FILE_SLOTS.items():
formats.append({
'url': http_base_url + mfs_path,
'format_id': 'http-' + mfs_path.split('.')[0],
'width': mfs_info['width'],
'height': mfs_info['height'],
})
for cc in item.findall('cc'):
locale = cc.attrib.get('l')
v = cc.attrib.get('v')
if locale and v:
if locale not in subtitles:
subtitles[locale] = []
for url_ext, ext in (('vtt', 'vtt'), ('xml', 'tt')):
subtitles.setdefault(locale, []).append({
'url': '%s/%s%s_%s.%s' % (config_doc.attrib['strSubtitleServer'], path, locale, v, url_ext),
'ext': ext,
})
self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'view_count': view_count,
'average_rating': average_rating,
'age_limit': age_limit,
'genre': genre,
'creator': creator,
'artist': artist,
'release_year': release_year,
'series': series,
'episode': episode,
'season_number': season_number,
'episode_number': episode_number,
'description': item.attrib.get('d'),
'duration': int(item.attrib.get('r'), 16) / 1000 if item.attrib.get('r') else None,
'series': item.attrib.get('sn'),
'season_number': int_or_none(item.attrib.get('se')),
'episode_number': int_or_none(item.attrib.get('ep')),
'thumbnails': thumbnails,
'subtitles': subtitles,
'formats': formats,
}
raise last_e

View File

@ -3,13 +3,13 @@ from __future__ import unicode_literals
import re
import json
import base64
import zlib
from hashlib import sha1
from math import pow, sqrt, floor
from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_etree_fromstring,
compat_urllib_parse_urlencode,
compat_urllib_request,
@ -272,8 +272,8 @@ class CrunchyrollIE(CrunchyrollBaseIE):
}
def _decrypt_subtitles(self, data, iv, id):
data = bytes_to_intlist(compat_b64decode(data))
iv = bytes_to_intlist(compat_b64decode(iv))
data = bytes_to_intlist(base64.b64decode(data.encode('utf-8')))
iv = bytes_to_intlist(base64.b64decode(iv.encode('utf-8')))
id = int(id)
def obfuscate_key_aux(count, modulo, start):

View File

@ -10,7 +10,6 @@ from ..aes import (
aes_cbc_decrypt,
aes_cbc_encrypt,
)
from ..compat import compat_b64decode
from ..utils import (
bytes_to_intlist,
bytes_to_long,
@ -94,7 +93,7 @@ class DaisukiMottoIE(InfoExtractor):
rtn = self._parse_json(
intlist_to_bytes(aes_cbc_decrypt(bytes_to_intlist(
compat_b64decode(encrypted_rtn)),
base64.b64decode(encrypted_rtn)),
aes_key, iv)).decode('utf-8').rstrip('\0'),
video_id)

View File

@ -1,56 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import js_to_json
class DiggIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?digg\.com/video/(?P<id>[^/?#&]+)'
_TESTS = [{
# JWPlatform via provider
'url': 'http://digg.com/video/sci-fi-short-jonah-daniel-kaluuya-get-out',
'info_dict': {
'id': 'LcqvmS0b',
'ext': 'mp4',
'title': "'Get Out' Star Daniel Kaluuya Goes On 'Moby Dick'-Like Journey In Sci-Fi Short 'Jonah'",
'description': 'md5:541bb847648b6ee3d6514bc84b82efda',
'upload_date': '20180109',
'timestamp': 1515530551,
},
'params': {
'skip_download': True,
},
}, {
# Youtube via provider
'url': 'http://digg.com/video/dog-boat-seal-play',
'only_matching': True,
}, {
# vimeo as regular embed
'url': 'http://digg.com/video/dream-girl-short-film',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
info = self._parse_json(
self._search_regex(
r'(?s)video_info\s*=\s*({.+?});\n', webpage, 'video info',
default='{}'), display_id, transform_source=js_to_json,
fatal=False)
video_id = info.get('video_id')
if video_id:
provider = info.get('provider_name')
if provider == 'youtube':
return self.url_result(
video_id, ie='Youtube', video_id=video_id)
elif provider == 'jwplayer':
return self.url_result(
'jwplatform:%s' % video_id, ie='JWPlatform',
video_id=video_id)
return self.url_result(url, 'Generic')

View File

@ -5,16 +5,15 @@ import re
import string
from .discoverygo import DiscoveryGoBaseIE
from ..compat import compat_str
from ..utils import (
ExtractorError,
try_get,
update_url_query,
)
from ..compat import compat_HTTPError
class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?(?P<site>
_VALID_URL = r'''(?x)https?://(?:www\.)?(?:
discovery|
investigationdiscovery|
discoverylife|
@ -45,7 +44,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
_GEO_BYPASS = False
def _real_extract(self, url):
site, path, display_id = re.match(self._VALID_URL, url).groups()
path, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
react_data = self._parse_json(self._search_regex(
@ -56,13 +55,14 @@ class DiscoveryIE(DiscoveryGoBaseIE):
video_id = video['id']
access_token = self._download_json(
'https://www.%s.com/anonymous' % site, display_id, query={
'authRel': 'authorization',
'client_id': try_get(
react_data, lambda x: x['application']['apiClientId'],
compat_str) or '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
'https://www.discovery.com/anonymous', display_id, query={
'authLink': update_url_query(
'https://login.discovery.com/v1/oauth2/authorize', {
'client_id': react_data['application']['apiClientId'],
'redirect_uri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html',
'response_type': 'anonymous',
'state': 'nonce,' + ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
})
})['access_token']
try:

View File

@ -12,28 +12,25 @@ from ..compat import (
compat_urlparse,
)
from ..utils import (
determine_ext,
ExtractorError,
float_or_none,
int_or_none,
remove_end,
try_get,
unified_strdate,
unified_timestamp,
update_url_query,
USER_AGENTS,
)
class DPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?P<domain>www\.(?P<host>dplay\.(?P<country>dk|se|no)))/(?:video(?:er|s)/)?(?P<id>[^/]+/[^/?#]+)'
_VALID_URL = r'https?://(?P<domain>www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{
# non geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
'info_dict': {
'id': '3172',
'display_id': 'nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet',
'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
'ext': 'mp4',
'title': 'Svensken lär sig njuta av livet',
'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
@ -51,7 +48,7 @@ class DPlayIE(InfoExtractor):
'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
'info_dict': {
'id': '70816',
'display_id': 'mig-og-min-mor/season-6-episode-12',
'display_id': 'season-6-episode-12',
'ext': 'mp4',
'title': 'Episode 12',
'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
@ -68,33 +65,6 @@ class DPlayIE(InfoExtractor):
# geo restricted, via direct unsigned hls URL
'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
'only_matching': True,
}, {
# disco-api
'url': 'https://www.dplay.no/videoer/i-kongens-klr/sesong-1-episode-7',
'info_dict': {
'id': '40206',
'display_id': 'i-kongens-klr/sesong-1-episode-7',
'ext': 'mp4',
'title': 'Episode 7',
'description': 'md5:e3e1411b2b9aebeea36a6ec5d50c60cf',
'duration': 2611.16,
'timestamp': 1516726800,
'upload_date': '20180123',
'series': 'I kongens klær',
'season_number': 1,
'episode_number': 7,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
'url': 'https://www.dplay.dk/videoer/singleliv/season-5-episode-3',
'only_matching': True,
}, {
'url': 'https://www.dplay.se/videos/sofias-anglar/sofias-anglar-1001',
'only_matching': True,
}]
def _real_extract(self, url):
@ -102,81 +72,10 @@ class DPlayIE(InfoExtractor):
display_id = mobj.group('id')
domain = mobj.group('domain')
self._initialize_geo_bypass([mobj.group('country').upper()])
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'data-video-id=["\'](\d+)', webpage, 'video id', default=None)
if not video_id:
host = mobj.group('host')
disco_base = 'https://disco-api.%s' % host
self._download_json(
'%s/token' % disco_base, display_id, 'Downloading token',
query={
'realm': host.replace('.', ''),
})
video = self._download_json(
'%s/content/videos/%s' % (disco_base, display_id), display_id,
headers={
'Referer': url,
'x-disco-client': 'WEB:UNKNOWN:dplay-client:0.0.1',
}, query={
'include': 'show'
})
video_id = video['data']['id']
info = video['data']['attributes']
title = info['name']
formats = []
for format_id, format_dict in self._download_json(
'%s/playback/videoPlaybackInfo/%s' % (disco_base, video_id),
display_id)['data']['attributes']['streaming'].items():
if not isinstance(format_dict, dict):
continue
format_url = format_dict.get('url')
if not format_url:
continue
ext = determine_ext(format_url)
if format_id == 'dash' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, display_id, mpd_id='dash', fatal=False))
elif format_id == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
else:
formats.append({
'url': format_url,
'format_id': format_id,
})
self._sort_formats(formats)
series = None
try:
included = video.get('included')
if isinstance(included, list):
show = next(e for e in included if e.get('type') == 'show')
series = try_get(
show, lambda x: x['attributes']['name'], compat_str)
except StopIteration:
pass
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': info.get('description'),
'duration': float_or_none(
info.get('videoDuration'), scale=1000),
'timestamp': unified_timestamp(info.get('publishStart')),
'series': series,
'season_number': int_or_none(info.get('seasonNumber')),
'episode_number': int_or_none(info.get('episodeNumber')),
'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats,
}
r'data-video-id=["\'](\d+)', webpage, 'video id')
info = self._download_json(
'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id),

View File

@ -1,10 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import re
from .common import InfoExtractor
from ..compat import compat_b64decode
from ..utils import (
qualities,
sanitized_Request,
@ -42,7 +42,7 @@ class DumpertIE(InfoExtractor):
r'data-files="([^"]+)"', webpage, 'data files')
files = self._parse_json(
compat_b64decode(files_base64).decode('utf-8'),
base64.b64decode(files_base64.encode('utf-8')).decode('utf-8'),
video_id)
quality = qualities(['flv', 'mobile', 'tablet', '720p'])

View File

@ -32,7 +32,7 @@ class DVTVIE(InfoExtractor):
}, {
'url': 'http://video.aktualne.cz/dvtv/dvtv-16-12-2014-utok-talibanu-boj-o-kliniku-uprchlici/r~973eb3bc854e11e498be002590604f2e/',
'info_dict': {
'title': r're:^DVTV 16\. 12\. 2014: útok Talibanu, boj o kliniku, uprchlíci',
'title': 'DVTV 16. 12. 2014: útok Talibanu, boj o kliniku, uprchlíci',
'id': '973eb3bc854e11e498be002590604f2e',
},
'playlist': [{
@ -91,24 +91,10 @@ class DVTVIE(InfoExtractor):
}, {
'url': 'http://video.aktualne.cz/v-cechach-poprve-zazni-zelenkova-zrestaurovana-mse/r~45b4b00483ec11e4883b002590604f2e/',
'only_matching': True,
}, {
'url': 'https://video.aktualne.cz/dvtv/babis-a-zeman-nesou-vinu-za-to-ze-nemame-jasno-v-tom-kdo-bud/r~026afb54fad711e79704ac1f6b220ee8/',
'md5': '87defe16681b1429c91f7a74809823c6',
'info_dict': {
'id': 'f5ae72f6fad611e794dbac1f6b220ee8',
'ext': 'mp4',
'title': 'Babiš a Zeman nesou vinu za to, že nemáme jasno v tom, kdo bude vládnout, říká Pekarová Adamová',
},
'params': {
'skip_download': True,
},
}]
def _parse_video_metadata(self, js, video_id, live_js=None):
def _parse_video_metadata(self, js, video_id):
data = self._parse_json(js, video_id, transform_source=js_to_json)
if live_js:
data.update(self._parse_json(
live_js, video_id, transform_source=js_to_json))
title = unescapeHTML(data['title'])
@ -156,18 +142,13 @@ class DVTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
# live content
live_item = self._search_regex(
r'(?s)embedData[0-9a-f]{32}\.asset\.liveStarter\s*=\s*(\{.+?\});',
webpage, 'video', default=None)
# single video
item = self._search_regex(
r'(?s)embedData[0-9a-f]{32}\[["\']asset["\']\]\s*=\s*(\{.+?\});',
webpage, 'video', default=None)
webpage, 'video', default=None, fatal=False)
if item:
return self._parse_video_metadata(item, video_id, live_item)
return self._parse_video_metadata(item, video_id)
# playlist
items = re.findall(

View File

@ -1,13 +1,13 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import json
from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_str,
compat_urlparse,
compat_str,
)
from ..utils import (
extract_attributes,
@ -36,9 +36,9 @@ class EinthusanIE(InfoExtractor):
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
def _decrypt(self, encrypted_data, video_id):
return self._parse_json(compat_b64decode((
return self._parse_json(base64.b64decode((
encrypted_data[:10] + encrypted_data[-1] + encrypted_data[12:-1]
)).decode('utf-8'), video_id)
).encode('ascii')).decode('utf-8'), video_id)
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@ -162,7 +162,6 @@ from .cbc import (
CBCPlayerIE,
CBCWatchVideoIE,
CBCWatchIE,
CBCOlympicsIE,
)
from .cbs import CBSIE
from .cbslocal import CBSLocalIE
@ -260,7 +259,6 @@ from .deezer import DeezerPlaylistIE
from .democracynow import DemocracynowIE
from .dfb import DFBIE
from .dhm import DHMIE
from .digg import DiggIE
from .dotsub import DotsubIE
from .douyutv import (
DouyuShowIE,
@ -374,10 +372,8 @@ from .franceculture import FranceCultureIE
from .franceinter import FranceInterIE
from .francetv import (
FranceTVIE,
FranceTVSiteIE,
FranceTVEmbedIE,
FranceTVInfoIE,
FranceTVJeunesseIE,
GenerationWhatIE,
CultureboxIE,
)
@ -385,10 +381,7 @@ from .freesound import FreesoundIE
from .freespeech import FreespeechIE
from .freshlive import FreshLiveIE
from .funimation import FunimationIE
from .funk import (
FunkMixIE,
FunkChannelIE,
)
from .funk import FunkIE
from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
@ -432,7 +425,6 @@ from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE
from .hgtv import HGTVComShowIE
from .hidive import HiDiveIE
from .historicfilms import HistoricFilmsIE
from .hitbox import HitboxIE, HitboxLiveIE
from .hitrecord import HitRecordIE
@ -497,6 +489,7 @@ from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE
from .kakao import KakaoIE
from .kaltura import KalturaIE
from .kamcord import KamcordIE
from .kanalplay import KanalPlayIE
from .kankan import KankanIE
from .karaoketv import KaraoketvIE
@ -532,14 +525,13 @@ from .lcp import (
)
from .learnr import LearnrIE
from .lecture2go import Lecture2GoIE
from .lego import LEGOIE
from .lemonde import LemondeIE
from .leeco import (
LeIE,
LePlaylistIE,
LetvCloudIE,
)
from .lego import LEGOIE
from .lemonde import LemondeIE
from .lenta import LentaIE
from .libraryofcongress import LibraryOfCongressIE
from .libsyn import LibsynIE
from .lifenews import (
@ -551,7 +543,6 @@ from .limelight import (
LimelightChannelIE,
LimelightChannelListIE,
)
from .line import LineTVIE
from .litv import LiTVIE
from .liveleak import (
LiveLeakIE,
@ -572,11 +563,7 @@ from .lynda import (
)
from .m6 import M6IE
from .macgamestore import MacGameStoreIE
from .mailru import (
MailRuIE,
MailRuMusicIE,
MailRuMusicSearchIE,
)
from .mailru import MailRuIE
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE
from .mangomolo import (
@ -643,10 +630,7 @@ from .musicplayon import MusicPlayOnIE
from .mwave import MwaveIE, MwaveMeetGreetIE
from .myspace import MySpaceIE, MySpaceAlbumIE
from .myspass import MySpassIE
from .myvi import (
MyviIE,
MyviEmbedIE,
)
from .myvi import MyviIE
from .myvidster import MyVidsterIE
from .nationalgeographic import (
NationalGeographicVideoIE,
@ -660,7 +644,6 @@ from .nbc import (
NBCIE,
NBCNewsIE,
NBCOlympicsIE,
NBCOlympicsStreamIE,
NBCSportsIE,
NBCSportsVPlayerIE,
)
@ -877,7 +860,6 @@ from .rai import (
RaiPlayPlaylistIE,
RaiIE,
)
from .raywenderlich import RayWenderlichIE
from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redbulltv import RedBullTVIE
@ -899,6 +881,7 @@ from .revision3 import (
Revision3IE,
)
from .rice import RICEIE
from .ringtv import RingTVIE
from .rmcdecouverte import RMCDecouverteIE
from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE
@ -918,7 +901,6 @@ from .rtp import RTPIE
from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
from .rtvnh import RTVNHIE
from .rtvs import RTVSIE
from .rudo import RudoIE
from .ruhd import RUHDIE
from .ruleporn import RulePornIE
@ -951,10 +933,6 @@ from .servingsys import ServingSysIE
from .servus import ServusIE
from .sevenplus import SevenPlusIE
from .sexu import SexuIE
from .seznamzpravy import (
SeznamZpravyIE,
SeznamZpravyArticleIE,
)
from .shahid import (
ShahidIE,
ShahidShowIE,
@ -1012,7 +990,7 @@ from .stitcher import StitcherIE
from .sport5 import Sport5IE
from .sportbox import SportBoxEmbedIE
from .sportdeutschland import SportDeutschlandIE
from .springboardplatform import SpringboardPlatformIE
from .sportschau import SportschauIE
from .sprout import SproutIE
from .srgssr import (
SRGSSRIE,
@ -1056,14 +1034,9 @@ from .telebruxelles import TeleBruxellesIE
from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .telequebec import (
TeleQuebecIE,
TeleQuebecEmissionIE,
TeleQuebecLiveIE,
)
from .telequebec import TeleQuebecIE
from .teletask import TeleTaskIE
from .telewebion import TelewebionIE
from .tennistv import TennisTVIE
from .testurl import TestURLIE
from .tf1 import TF1IE
from .tfo import TFOIE
@ -1073,6 +1046,7 @@ from .theplatform import (
ThePlatformFeedIE,
)
from .thescene import TheSceneIE
from .thesixtyone import TheSixtyOneIE
from .thestar import TheStarIE
from .thesun import TheSunIE
from .theweatherchannel import TheWeatherChannelIE
@ -1094,6 +1068,7 @@ from .tnaflix import (
from .toggle import ToggleIE
from .tonline import TOnlineIE
from .toongoggles import ToonGogglesIE
from .totalwebcasting import TotalWebCastingIE
from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE
@ -1218,6 +1193,7 @@ from .vice import (
ViceArticleIE,
ViceShowIE,
)
from .viceland import VicelandIE
from .vidbit import VidbitIE
from .viddler import ViddlerIE
from .videa import VideaIE
@ -1232,7 +1208,6 @@ from .videomore import (
from .videopremium import VideoPremiumIE
from .videopress import VideoPressIE
from .vidio import VidioIE
from .vidlii import VidLiiIE
from .vidme import (
VidmeIE,
VidmeUserIE,
@ -1314,8 +1289,6 @@ from .watchbox import WatchBoxIE
from .watchindianporn import WatchIndianPornIE
from .wdr import (
WDRIE,
WDRPageIE,
WDRElefantIE,
WDRMobileIE,
)
from .webcaster import (
@ -1326,10 +1299,6 @@ from .webofstories import (
WebOfStoriesIE,
WebOfStoriesPlaylistIE,
)
from .weibo import (
WeiboIE,
WeiboMobileIE
)
from .weiqitv import WeiqiTVIE
from .wimp import WimpIE
from .wistia import WistiaIE
@ -1355,10 +1324,6 @@ from .xiami import (
XiamiArtistIE,
XiamiCollectionIE
)
from .ximalaya import (
XimalayaIE,
XimalayaAlbumIE
)
from .xminus import XMinusIE
from .xnxx import XNXXIE
from .xstream import XstreamIE
@ -1376,7 +1341,6 @@ from .yandexmusic import (
YandexMusicPlaylistIE,
)
from .yandexdisk import YandexDiskIE
from .yapfiles import YapFilesIE
from .yesjapan import YesJapanIE
from .yinyuetai import YinYueTaiIE
from .ynet import YnetIE

View File

@ -33,7 +33,7 @@ class FranceInterIE(InfoExtractor):
description = self._og_search_description(webpage)
upload_date_str = self._search_regex(
r'class=["\']\s*cover-emission-period\s*["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<',
r'class=["\']cover-emission-period["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<',
webpage, 'upload date', fatal=False)
if upload_date_str:
upload_date_list = upload_date_str.split()

View File

@ -5,89 +5,19 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..compat import compat_urlparse
from ..utils import (
clean_html,
determine_ext,
ExtractorError,
int_or_none,
parse_duration,
try_get,
determine_ext,
)
from .dailymotion import DailymotionIE
class FranceTVBaseInfoExtractor(InfoExtractor):
def _make_url_result(self, video_or_full_id, catalog=None):
full_id = 'francetv:%s' % video_or_full_id
if '@' not in video_or_full_id and catalog:
full_id += '@%s' % catalog
return self.url_result(
full_id, ie=FranceTVIE.ie_key(),
video_id=video_or_full_id.split('@')[0])
class FranceTVIE(InfoExtractor):
_VALID_URL = r'''(?x)
(?:
https?://
sivideo\.webservices\.francetelevisions\.fr/tools/getInfosOeuvre/v2/\?
.*?\bidDiffusion=[^&]+|
(?:
https?://videos\.francetv\.fr/video/|
francetv:
)
(?P<id>[^@]+)(?:@(?P<catalog>.+))?
)
'''
_TESTS = [{
# without catalog
'url': 'https://sivideo.webservices.francetelevisions.fr/tools/getInfosOeuvre/v2/?idDiffusion=162311093&callback=_jsonp_loader_callback_request_0',
'md5': 'c2248a8de38c4e65ea8fae7b5df2d84f',
'info_dict': {
'id': '162311093',
'ext': 'mp4',
'title': '13h15, le dimanche... - Les mystères de Jésus',
'description': 'md5:75efe8d4c0a8205e5904498ffe1e1a42',
'timestamp': 1502623500,
'upload_date': '20170813',
},
}, {
# with catalog
'url': 'https://sivideo.webservices.francetelevisions.fr/tools/getInfosOeuvre/v2/?idDiffusion=NI_1004933&catalogue=Zouzous&callback=_jsonp_loader_callback_request_4',
'only_matching': True,
}, {
'url': 'http://videos.francetv.fr/video/NI_657393@Regions',
'only_matching': True,
}, {
'url': 'francetv:162311093',
'only_matching': True,
}, {
'url': 'francetv:NI_1004933@Zouzous',
'only_matching': True,
}, {
'url': 'francetv:NI_983319@Info-web',
'only_matching': True,
}, {
'url': 'francetv:NI_983319',
'only_matching': True,
}, {
'url': 'francetv:NI_657393@Regions',
'only_matching': True,
}, {
# france-3 live
'url': 'francetv:SIM_France3',
'only_matching': True,
}]
def _extract_video(self, video_id, catalogue=None):
# Videos are identified by idDiffusion so catalogue part is optional.
# However when provided, some extra formats may be returned so we pass
# it if available.
info = self._download_json(
'https://sivideo.webservices.francetelevisions.fr/tools/getInfosOeuvre/v2/',
video_id, 'Downloading video JSON', query={
@ -97,8 +27,7 @@ class FranceTVIE(InfoExtractor):
if info.get('status') == 'NOK':
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, info['message']),
expected=True)
'%s returned error: %s' % (self.IE_NAME, info['message']), expected=True)
allowed_countries = info['videos'][0].get('geoblocage')
if allowed_countries:
georestricted = True
@ -113,21 +42,6 @@ class FranceTVIE(InfoExtractor):
else:
georestricted = False
def sign(manifest_url, manifest_id):
for host in ('hdfauthftv-a.akamaihd.net', 'hdfauth.francetv.fr'):
signed_url = self._download_webpage(
'https://%s/esi/TA' % host, video_id,
'Downloading signed %s manifest URL' % manifest_id,
fatal=False, query={
'url': manifest_url,
})
if (signed_url and isinstance(signed_url, compat_str) and
re.search(r'^(?:https?:)?//', signed_url)):
return signed_url
return manifest_url
is_live = None
formats = []
for video in info['videos']:
if video['statut'] != 'ONLINE':
@ -135,10 +49,6 @@ class FranceTVIE(InfoExtractor):
video_url = video['url']
if not video_url:
continue
if is_live is None:
is_live = (try_get(
video, lambda x: x['plages_ouverture'][0]['direct'],
bool) is True) or '/live.francetv.fr/' in video_url
format_id = video['format']
ext = determine_ext(video_url)
if ext == 'f4m':
@ -146,14 +56,17 @@ class FranceTVIE(InfoExtractor):
# See https://github.com/rg3/youtube-dl/issues/3963
# m3u8 urls work fine
continue
f4m_url = self._download_webpage(
'http://hdfauth.francetv.fr/esi/TA?url=%s' % video_url,
video_id, 'Downloading f4m manifest token', fatal=False)
if f4m_url:
formats.extend(self._extract_f4m_formats(
sign(video_url, format_id) + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44',
f4m_url + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44',
video_id, f4m_id=format_id, fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
sign(video_url, format_id), video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id=format_id,
fatal=False))
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
elif video_url.startswith('rtmp'):
formats.append({
'url': video_url,
@ -184,48 +97,33 @@ class FranceTVIE(InfoExtractor):
return {
'id': video_id,
'title': self._live_title(title) if is_live else title,
'title': title,
'description': clean_html(info['synopsis']),
'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', info['image']),
'duration': int_or_none(info.get('real_duration')) or parse_duration(info['duree']),
'timestamp': int_or_none(info['diffusion']['timestamp']),
'is_live': is_live,
'formats': formats,
'subtitles': subtitles,
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
catalog = mobj.group('catalog')
if not video_id:
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
video_id = qs.get('idDiffusion', [None])[0]
catalog = qs.get('catalogue', [None])[0]
if not video_id:
raise ExtractorError('Invalid URL', expected=True)
return self._extract_video(video_id, catalog)
class FranceTVSiteIE(FranceTVBaseInfoExtractor):
class FranceTVIE(FranceTVBaseInfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?france\.tv|mobile\.france\.tv)/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_TESTS = [{
'url': 'https://www.france.tv/france-2/13h15-le-dimanche/140921-les-mysteres-de-jesus.html',
'info_dict': {
'id': '162311093',
'id': '157550144',
'ext': 'mp4',
'title': '13h15, le dimanche... - Les mystères de Jésus',
'description': 'md5:75efe8d4c0a8205e5904498ffe1e1a42',
'timestamp': 1502623500,
'upload_date': '20170813',
'timestamp': 1494156300,
'upload_date': '20170507',
},
'params': {
# m3u8 downloads
'skip_download': True,
},
'add_ie': [FranceTVIE.ie_key()],
}, {
# france3
'url': 'https://www.france.tv/france-3/des-chiffres-et-des-lettres/139063-emission-du-mardi-9-mai-2017.html',
@ -258,10 +156,6 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
}, {
'url': 'https://www.france.tv/142749-rouge-sang.html',
'only_matching': True,
}, {
# france-3 live
'url': 'https://www.france.tv/france-3/direct.html',
'only_matching': True,
}]
def _real_extract(self, url):
@ -278,14 +172,13 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
video_id, catalogue = self._html_search_regex(
r'(?:href=|player\.setVideo\(\s*)"http://videos?\.francetv\.fr/video/([^@]+@[^"]+)"',
webpage, 'video ID').split('@')
return self._make_url_result(video_id, catalogue)
return self._extract_video(video_id, catalogue)
class FranceTVEmbedIE(FranceTVBaseInfoExtractor):
_VALID_URL = r'https?://embed\.francetv\.fr/*\?.*?\bue=(?P<id>[^&]+)'
_TESTS = [{
_TEST = {
'url': 'http://embed.francetv.fr/?ue=7fd581a2ccf59d2fc5719c5c13cf6961',
'info_dict': {
'id': 'NI_983319',
@ -295,11 +188,7 @@ class FranceTVEmbedIE(FranceTVBaseInfoExtractor):
'timestamp': 1493981780,
'duration': 16,
},
'params': {
'skip_download': True,
},
'add_ie': [FranceTVIE.ie_key()],
}]
}
def _real_extract(self, url):
video_id = self._match_id(url)
@ -308,12 +197,12 @@ class FranceTVEmbedIE(FranceTVBaseInfoExtractor):
'http://api-embed.webservices.francetelevisions.fr/key/%s' % video_id,
video_id)
return self._make_url_result(video['video_id'], video.get('catalog'))
return self._extract_video(video['video_id'], video.get('catalog'))
class FranceTVInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = 'francetvinfo.fr'
_VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/(?:[^/]+/)*(?P<id>[^/?#&.]+)'
_VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/(?:[^/]+/)*(?P<title>[^/?#&.]+)'
_TESTS = [{
'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
@ -328,18 +217,51 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
},
},
'params': {
# m3u8 downloads
'skip_download': True,
},
'add_ie': [FranceTVIE.ie_key()],
}, {
'url': 'http://www.francetvinfo.fr/elections/europeennes/direct-europeennes-regardez-le-debat-entre-les-candidats-a-la-presidence-de-la-commission_600639.html',
'only_matching': True,
'info_dict': {
'id': 'EV_20019',
'ext': 'mp4',
'title': 'Débat des candidats à la Commission européenne',
'description': 'Débat des candidats à la Commission européenne',
},
'params': {
'skip_download': 'HLS (reqires ffmpeg)'
},
'skip': 'Ce direct est terminé et sera disponible en rattrapage dans quelques minutes.',
}, {
'url': 'http://www.francetvinfo.fr/economie/entreprises/les-entreprises-familiales-le-secret-de-la-reussite_933271.html',
'only_matching': True,
'md5': 'f485bda6e185e7d15dbc69b72bae993e',
'info_dict': {
'id': 'NI_173343',
'ext': 'mp4',
'title': 'Les entreprises familiales : le secret de la réussite',
'thumbnail': r're:^https?://.*\.jpe?g$',
'timestamp': 1433273139,
'upload_date': '20150602',
},
'params': {
# m3u8 downloads
'skip_download': True,
},
}, {
'url': 'http://france3-regions.francetvinfo.fr/bretagne/cotes-d-armor/thalassa-echappee-breizh-ce-venredi-dans-les-cotes-d-armor-954961.html',
'only_matching': True,
'md5': 'f485bda6e185e7d15dbc69b72bae993e',
'info_dict': {
'id': 'NI_657393',
'ext': 'mp4',
'title': 'Olivier Monthus, réalisateur de "Bretagne, le choix de lArmor"',
'description': 'md5:a3264114c9d29aeca11ced113c37b16c',
'thumbnail': r're:^https?://.*\.jpe?g$',
'timestamp': 1458300695,
'upload_date': '20160318',
},
'params': {
'skip_download': True,
},
}, {
# Dailymotion embed
'url': 'http://www.francetvinfo.fr/politique/notre-dame-des-landes/video-sur-france-inter-cecile-duflot-denonce-le-regard-meprisant-de-patrick-cohen_1520091.html',
@ -361,9 +283,9 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
dailymotion_urls = DailymotionIE._extract_urls(webpage)
if dailymotion_urls:
@ -375,13 +297,12 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
(r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"'),
webpage, 'video id').split('@')
return self._make_url_result(video_id, catalogue)
return self._extract_video(video_id, catalogue)
class GenerationWhatIE(InfoExtractor):
IE_NAME = 'france2.fr:generation-what'
_VALID_URL = r'https?://generation-what\.francetv\.fr/[^/]+/video/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://generation-what\.francetv\.fr/[^/]+/video/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://generation-what.francetv.fr/portrait/video/present-arms',
@ -393,10 +314,6 @@ class GenerationWhatIE(InfoExtractor):
'uploader_id': 'UCHH9p1eetWCgt4kXBYCb3_w',
'upload_date': '20160411',
},
'params': {
'skip_download': True,
},
'add_ie': ['Youtube'],
}, {
'url': 'http://generation-what.francetv.fr/europe/video/present-arms',
'only_matching': True,
@ -404,87 +321,42 @@ class GenerationWhatIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
youtube_id = self._search_regex(
r"window\.videoURL\s*=\s*'([0-9A-Za-z_-]{11})';",
webpage, 'youtube id')
return self.url_result(youtube_id, ie='Youtube', video_id=youtube_id)
return self.url_result(youtube_id, 'Youtube', youtube_id)
class CultureboxIE(FranceTVBaseInfoExtractor):
_VALID_URL = r'https?://(?:m\.)?culturebox\.francetvinfo\.fr/(?:[^/]+/)*(?P<id>[^/?#&]+)'
IE_NAME = 'culturebox.francetvinfo.fr'
_VALID_URL = r'https?://(?:m\.)?culturebox\.francetvinfo\.fr/(?P<name>.*?)(\?|$)'
_TESTS = [{
'url': 'https://culturebox.francetvinfo.fr/opera-classique/musique-classique/c-est-baroque/concerts/cantates-bwv-4-106-et-131-de-bach-par-raphael-pichon-57-268689',
_TEST = {
'url': 'http://culturebox.francetvinfo.fr/live/musique/musique-classique/le-livre-vermeil-de-montserrat-a-la-cathedrale-delne-214511',
'md5': '9b88dc156781c4dbebd4c3e066e0b1d6',
'info_dict': {
'id': 'EV_134885',
'ext': 'mp4',
'title': 'Cantates BWV 4, 106 et 131 de Bach par Raphaël Pichon 5/7',
'description': 'md5:19c44af004b88219f4daa50fa9a351d4',
'upload_date': '20180206',
'timestamp': 1517945220,
'duration': 5981,
'id': 'EV_50111',
'ext': 'flv',
'title': "Le Livre Vermeil de Montserrat à la Cathédrale d'Elne",
'description': 'md5:f8a4ad202e8fe533e2c493cc12e739d9',
'upload_date': '20150320',
'timestamp': 1426892400,
'duration': 2760.9,
},
'params': {
'skip_download': True,
},
'add_ie': [FranceTVIE.ie_key()],
}]
}
def _real_extract(self, url):
display_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
webpage = self._download_webpage(url, display_id)
webpage = self._download_webpage(url, name)
if ">Ce live n'est plus disponible en replay<" in webpage:
raise ExtractorError(
'Video %s is not available' % display_id, expected=True)
raise ExtractorError('Video %s is not available' % name, expected=True)
video_id, catalogue = self._search_regex(
r'["\'>]https?://videos\.francetv\.fr/video/([^@]+@.+?)["\'<]',
webpage, 'video id').split('@')
return self._make_url_result(video_id, catalogue)
class FranceTVJeunesseIE(FranceTVBaseInfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?(?:zouzous|ludo)\.fr/heros/(?P<id>[^/?#&]+))'
_TESTS = [{
'url': 'https://www.zouzous.fr/heros/simon',
'info_dict': {
'id': 'simon',
},
'playlist_count': 9,
}, {
'url': 'https://www.ludo.fr/heros/ninjago',
'info_dict': {
'id': 'ninjago',
},
'playlist_count': 10,
}, {
'url': 'https://www.zouzous.fr/heros/simon?abc',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
playlist = self._download_json(
'%s/%s' % (mobj.group('url'), 'playlist'), playlist_id)
if not playlist.get('count'):
raise ExtractorError(
'%s is not available' % playlist_id, expected=True)
entries = []
for item in playlist['items']:
identity = item.get('identity')
if identity and isinstance(identity, compat_str):
entries.append(self._make_url_result(identity))
return self.playlist_result(entries, playlist_id)
return self._extract_video(video_id, catalogue)

View File

@ -1,102 +1,43 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .nexx import NexxIE
from ..utils import int_or_none
from ..utils import extract_attributes
class FunkBaseIE(InfoExtractor):
def _make_url_result(self, video):
return {
'_type': 'url_transparent',
'url': 'nexx:741:%s' % video['sourceId'],
'ie_key': NexxIE.ie_key(),
'id': video['sourceId'],
'title': video.get('title'),
'description': video.get('description'),
'duration': int_or_none(video.get('duration')),
'season_number': int_or_none(video.get('seasonNr')),
'episode_number': int_or_none(video.get('episodeNr')),
}
class FunkMixIE(FunkBaseIE):
_VALID_URL = r'https?://(?:www\.)?funk\.net/mix/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
class FunkIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funk\.net/(?:mix|channel)/(?:[^/]+/)*(?P<id>[^?/#]+)'
_TESTS = [{
'url': 'https://www.funk.net/mix/59d65d935f8b160001828b5b/die-realste-kifferdoku-aller-zeiten',
'md5': '8edf617c2f2b7c9847dfda313f199009',
'url': 'https://www.funk.net/mix/59d65d935f8b160001828b5b/0/59d517e741dca10001252574/',
'md5': '4d40974481fa3475f8bccfd20c5361f8',
'info_dict': {
'id': '123748',
'id': '716599',
'ext': 'mp4',
'title': '"Die realste Kifferdoku aller Zeiten"',
'description': 'md5:c97160f5bafa8d47ec8e2e461012aa9d',
'timestamp': 1490274721,
'upload_date': '20170323',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
mix_id = mobj.group('id')
alias = mobj.group('alias')
lists = self._download_json(
'https://www.funk.net/api/v3.1/curation/curatedLists/',
mix_id, headers={
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbC12Mi4wIiwic2NvcGUiOiJzdGF0aWMtY29udGVudC1hcGksY3VyYXRpb24tc2VydmljZSxzZWFyY2gtYXBpIn0.SGCC1IXHLtZYoo8PvRKlU2gXH1su8YSu47sB3S4iXBI',
'Referer': url,
}, query={
'size': 100,
})['result']['lists']
metas = next(
l for l in lists
if mix_id in (l.get('entityId'), l.get('alias')))['videoMetas']
video = next(
meta['videoDataDelegate']
for meta in metas if meta.get('alias') == alias)
return self._make_url_result(video)
class FunkChannelIE(FunkBaseIE):
_VALID_URL = r'https?://(?:www\.)?funk\.net/channel/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.funk.net/channel/ba/die-lustigsten-instrumente-aus-dem-internet-teil-2',
'info_dict': {
'id': '1155821',
'ext': 'mp4',
'title': 'Die LUSTIGSTEN INSTRUMENTE aus dem Internet - Teil 2',
'description': 'md5:a691d0413ef4835588c5b03ded670c1f',
'timestamp': 1514507395,
'upload_date': '20171229',
'title': 'Neue Rechte Welle',
'description': 'md5:a30a53f740ffb6bfd535314c2cc5fb69',
'timestamp': 1501337639,
'upload_date': '20170729',
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/',
'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/0/59d52049999264000182e79d/',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
channel_id = mobj.group('id')
alias = mobj.group('alias')
video_id = self._match_id(url)
results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter', channel_id,
headers={
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbCIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxzZWFyY2gtYXBpIn0.q4Y2xZG8PFHai24-4Pjx2gym9RmJejtmK6lMXP5wAgc',
'Referer': url,
}, query={
'channelId': channel_id,
'size': 100,
})['result']
webpage = self._download_webpage(url, video_id)
video = next(r for r in results if r.get('alias') == alias)
domain_id = NexxIE._extract_domain_id(webpage) or '741'
nexx_id = extract_attributes(self._search_regex(
r'(<div[^>]id=["\']mediaplayer-funk[^>]+>)',
webpage, 'media player'))['data-id']
return self._make_url_result(video)
return self.url_result(
'nexx:%s:%s' % (domain_id, nexx_id), ie=NexxIE.ie_key(),
video_id=nexx_id)

View File

@ -5,9 +5,9 @@ from .ooyala import OoyalaIE
class FusionIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fusion\.(?:net|tv)/video/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?fusion\.net/video/(?P<id>\d+)'
_TESTS = [{
'url': 'http://fusion.tv/video/201781/u-s-and-panamanian-forces-work-together-to-stop-a-vessel-smuggling-drugs/',
'url': 'http://fusion.net/video/201781/u-s-and-panamanian-forces-work-together-to-stop-a-vessel-smuggling-drugs/',
'info_dict': {
'id': 'ZpcWNoMTE6x6uVIIWYpHh0qQDjxBuq5P',
'ext': 'mp4',
@ -20,7 +20,7 @@ class FusionIE(InfoExtractor):
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://fusion.tv/video/201781',
'url': 'http://fusion.net/video/201781',
'only_matching': True,
}]

View File

@ -23,11 +23,6 @@ class GameInformerIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(
url, display_id, headers=self.geo_verification_headers())
brightcove_id = self._search_regex(
[r'<[^>]+\bid=["\']bc_(\d+)', r"getVideo\('[^']+video_id=(\d+)"],
webpage, 'brightcove id')
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew',
brightcove_id)
webpage = self._download_webpage(url, display_id)
brightcove_id = self._search_regex(r"getVideo\('[^']+video_id=(\d+)", webpage, 'brightcove id')
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)

View File

@ -1,8 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
@ -11,52 +9,44 @@ from ..utils import (
class GameStarIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?game(?P<site>pro|star)\.de/videos/.*,(?P<id>[0-9]+)\.html'
_TESTS = [{
_VALID_URL = r'https?://(?:www\.)?gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html',
'md5': 'ee782f1f8050448c95c5cacd63bc851c',
'md5': '96974ecbb7fd8d0d20fca5a00810cea7',
'info_dict': {
'id': '76110',
'ext': 'mp4',
'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil',
'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den...',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1406542380,
'timestamp': 1406542020,
'upload_date': '20140728',
'duration': 17,
'duration': 17
}
}
}, {
'url': 'http://www.gamepro.de/videos/top-10-indie-spiele-fuer-nintendo-switch-video-tolle-nindies-games-zum-download,95316.html',
'only_matching': True,
}, {
'url': 'http://www.gamestar.de/videos/top-10-indie-spiele-fuer-nintendo-switch-video-tolle-nindies-games-zum-download,95316.html',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
site = mobj.group('site')
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
url = 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id
# TODO: there are multiple ld+json objects in the webpage,
# while _search_json_ld finds only the first one
json_ld = self._parse_json(self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>[^<]+VideoObject[^<]+)</script>',
webpage, 'JSON-LD', group='json_ld'), video_id)
info_dict = self._json_ld(json_ld, video_id)
info_dict['title'] = remove_end(
info_dict['title'], ' - Game%s' % site.title())
info_dict['title'] = remove_end(info_dict['title'], ' - GameStar')
view_count = int_or_none(json_ld.get('interactionCount'))
view_count = json_ld.get('interactionCount')
comment_count = int_or_none(self._html_search_regex(
r'<span>Kommentare</span>\s*<span[^>]+class=["\']count[^>]+>\s*\(\s*([0-9]+)',
webpage, 'comment count', fatal=False))
r'([0-9]+) Kommentare</span>', webpage, 'comment_count',
fatal=False))
info_dict.update({
'id': video_id,
'url': 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id,
'url': url,
'ext': 'mp4',
'view_count': view_count,
'comment_count': comment_count

View File

@ -101,10 +101,6 @@ from .vzaar import VzaarIE
from .channel9 import Channel9IE
from .vshare import VShareIE
from .mediasite import MediasiteIE
from .springboardplatform import SpringboardPlatformIE
from .yapfiles import YapFilesIE
from .vice import ViceIE
from .xfileshare import XFileShareIE
class GenericIE(InfoExtractor):
@ -1270,6 +1266,24 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['Kaltura'],
},
# EaglePlatform embed (generic URL)
{
'url': 'http://lenta.ru/news/2015/03/06/navalny/',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
'info_dict': {
'id': '227304',
'ext': 'mp4',
'title': 'Навальный вышел на свободу',
'description': 'md5:d97861ac9ae77377f3f20eaf9d04b4f5',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 87,
'view_count': int,
'age_limit': 0,
},
'params': {
'skip_download': True,
},
},
# referrer protected EaglePlatform embed
{
'url': 'https://tvrain.ru/lite/teleshow/kak_vse_nachinalos/namin-418921/',
@ -1924,49 +1938,6 @@ class GenericIE(InfoExtractor):
'timestamp': 1474354800,
'upload_date': '20160920',
}
},
{
'url': 'http://www.kidzworld.com/article/30935-trolls-the-beat-goes-on-interview-skylar-astin-and-amanda-leighton',
'info_dict': {
'id': '1731611',
'ext': 'mp4',
'title': 'Official Trailer | TROLLS: THE BEAT GOES ON!',
'description': 'md5:eb5f23826a027ba95277d105f248b825',
'timestamp': 1516100691,
'upload_date': '20180116',
},
'params': {
'skip_download': True,
},
'add_ie': [SpringboardPlatformIE.ie_key()],
},
{
'url': 'https://www.youtube.com/shared?ci=1nEzmT-M4fU',
'info_dict': {
'id': 'uPDB5I9wfp8',
'ext': 'webm',
'title': 'Pocoyo: 90 minutos de episódios completos Português para crianças - PARTE 3',
'description': 'md5:d9e4d9346a2dfff4c7dc4c8cec0f546d',
'upload_date': '20160219',
'uploader': 'Pocoyo - Português (BR)',
'uploader_id': 'PocoyoBrazil',
},
'add_ie': [YoutubeIE.ie_key()],
'params': {
'skip_download': True,
},
},
{
'url': 'https://www.yapfiles.ru/show/1872528/690b05d3054d2dbe1e69523aa21bb3b1.mp4.html',
'info_dict': {
'id': 'vMDE4NzI1Mjgt690b',
'ext': 'mp4',
'title': 'Котята',
},
'add_ie': [YapFilesIE.ie_key()],
'params': {
'skip_download': True,
},
}
# {
# # TODO: find another test
@ -2214,11 +2185,7 @@ class GenericIE(InfoExtractor):
self._sort_formats(smil['formats'])
return smil
elif doc.tag == '{http://xspf.org/ns/0/}playlist':
return self.playlist_result(
self._parse_xspf(
doc, video_id, xspf_url=url,
xspf_base_url=compat_str(full_response.geturl())),
video_id)
return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats(
doc,
@ -2297,10 +2264,7 @@ class GenericIE(InfoExtractor):
# Look for Brightcove New Studio embeds
bc_urls = BrightcoveNewIE._extract_urls(self, webpage)
if bc_urls:
return self.playlist_from_matches(
bc_urls, video_id, video_title,
getter=lambda x: smuggle_url(x, {'referrer': url}),
ie='BrightcoveNew')
return self.playlist_from_matches(bc_urls, video_id, video_title, ie='BrightcoveNew')
# Look for Nexx embeds
nexx_urls = NexxIE._extract_urls(webpage)
@ -2744,9 +2708,9 @@ class GenericIE(InfoExtractor):
return self.url_result(viewlift_url)
# Look for JWPlatform embeds
jwplatform_urls = JWPlatformIE._extract_urls(webpage)
if jwplatform_urls:
return self.playlist_from_matches(jwplatform_urls, video_id, video_title, ie=JWPlatformIE.ie_key())
jwplatform_url = JWPlatformIE._extract_url(webpage)
if jwplatform_url:
return self.url_result(jwplatform_url, 'JWPlatform')
# Look for Digiteka embeds
digiteka_url = DigitekaIE._extract_url(webpage)
@ -2942,27 +2906,6 @@ class GenericIE(InfoExtractor):
for mediasite_url in mediasite_urls]
return self.playlist_result(entries, video_id, video_title)
springboardplatform_urls = SpringboardPlatformIE._extract_urls(webpage)
if springboardplatform_urls:
return self.playlist_from_matches(
springboardplatform_urls, video_id, video_title,
ie=SpringboardPlatformIE.ie_key())
yapfiles_urls = YapFilesIE._extract_urls(webpage)
if yapfiles_urls:
return self.playlist_from_matches(
yapfiles_urls, video_id, video_title, ie=YapFilesIE.ie_key())
vice_urls = ViceIE._extract_urls(webpage)
if vice_urls:
return self.playlist_from_matches(
vice_urls, video_id, video_title, ie=ViceIE.ie_key())
xfileshare_urls = XFileShareIE._extract_urls(webpage)
if xfileshare_urls:
return self.playlist_from_matches(
xfileshare_urls, video_id, video_title, ie=XFileShareIE.ie_key())
def merge_dicts(dict1, dict2):
merged = {}
for k, v in dict1.items():

View File

@ -2,14 +2,11 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .kaltura import KalturaIE
from .youtube import YoutubeIE
from ..utils import (
determine_ext,
int_or_none,
NO_DEFAULT,
parse_iso8601,
smuggle_url,
xpath_text,
)
@ -17,19 +14,18 @@ from ..utils import (
class HeiseIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?heise\.de/(?:[^/]+/)+[^/]+-(?P<id>[0-9]+)\.html'
_TESTS = [{
# kaltura embed
'url': 'http://www.heise.de/video/artikel/Podcast-c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2404147.html',
'md5': 'ffed432483e922e88545ad9f2f15d30e',
'info_dict': {
'id': '1_kkrq94sm',
'id': '2404147',
'ext': 'mp4',
'title': "Podcast: c't uplink 3.3 Owncloud / Tastaturen / Peilsender Smartphone",
'timestamp': 1512734959,
'upload_date': '20171208',
'format_id': 'mp4_720p',
'timestamp': 1411812600,
'upload_date': '20140927',
'description': 'md5:c934cbfb326c669c2bcabcbe3d3fcd20',
},
'params': {
'skip_download': True,
},
'thumbnail': r're:^https?://.*/gallery/$',
}
}, {
# YouTube embed
'url': 'http://www.heise.de/newsticker/meldung/Netflix-In-20-Jahren-vom-Videoverleih-zum-TV-Revolutionaer-3814130.html',
@ -46,32 +42,6 @@ class HeiseIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.heise.de/video/artikel/nachgehakt-Wie-sichert-das-c-t-Tool-Restric-tor-Windows-10-ab-3700244.html',
'info_dict': {
'id': '1_ntrmio2s',
'ext': 'mp4',
'title': "nachgehakt: Wie sichert das c't-Tool Restric'tor Windows 10 ab?",
'description': 'md5:47e8ffb6c46d85c92c310a512d6db271',
'timestamp': 1512470717,
'upload_date': '20171205',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.heise.de/ct/artikel/c-t-uplink-20-8-Staubsaugerroboter-Xiaomi-Vacuum-2-AR-Brille-Meta-2-und-Android-rooten-3959893.html',
'info_dict': {
'id': '1_59mk80sf',
'ext': 'mp4',
'title': "c't uplink 20.8: Staubsaugerroboter Xiaomi Vacuum 2, AR-Brille Meta 2 und Android rooten",
'description': 'md5:f50fe044d3371ec73a8f79fcebd74afc',
'timestamp': 1517567237,
'upload_date': '20180202',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.heise.de/ct/artikel/c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2403911.html',
'only_matching': True,
@ -87,45 +57,19 @@ class HeiseIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
def extract_title(default=NO_DEFAULT):
title = self._html_search_meta(
('fulltitle', 'title'), webpage, default=None)
title = self._html_search_meta('fulltitle', webpage, default=None)
if not title or title == "c't":
title = self._search_regex(
r'<div[^>]+class="videoplayerjw"[^>]+data-title="([^"]+)"',
webpage, 'title', default=None)
if not title:
title = self._html_search_regex(
r'<h1[^>]+\bclass=["\']article_page_title[^>]+>(.+?)<',
webpage, 'title', default=default)
return title
title = extract_title(default=None)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'description', webpage)
kaltura_url = KalturaIE._extract_url(webpage)
if kaltura_url:
return {
'_type': 'url_transparent',
'url': smuggle_url(kaltura_url, {'source_url': url}),
'ie_key': KalturaIE.ie_key(),
'title': title,
'description': description,
}
webpage, 'title')
yt_urls = YoutubeIE._extract_urls(webpage)
if yt_urls:
return self.playlist_from_matches(
yt_urls, video_id, title, ie=YoutubeIE.ie_key())
title = extract_title()
return self.playlist_from_matches(yt_urls, video_id, title, ie=YoutubeIE.ie_key())
container_id = self._search_regex(
r'<div class="videoplayerjw"[^>]+data-container="([0-9]+)"',
webpage, 'container ID')
sequenz_id = self._search_regex(
r'<div class="videoplayerjw"[^>]+data-sequenz="([0-9]+)"',
webpage, 'sequenz ID')
@ -151,6 +95,10 @@ class HeiseIE(InfoExtractor):
})
self._sort_formats(formats)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'description', webpage)
return {
'id': video_id,
'title': title,

View File

@ -1,96 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
urlencode_postdata,
)
class HiDiveIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hidive\.com/stream/(?P<title>[^/]+)/(?P<key>[^/?#&]+)'
# Using X-Forwarded-For results in 403 HTTP error for HLS fragments,
# so disabling geo bypass completely
_GEO_BYPASS = False
_TESTS = [{
'url': 'https://www.hidive.com/stream/the-comic-artist-and-his-assistants/s01e001',
'info_dict': {
'id': 'the-comic-artist-and-his-assistants/s01e001',
'ext': 'mp4',
'title': 'the-comic-artist-and-his-assistants/s01e001',
'series': 'the-comic-artist-and-his-assistants',
'season_number': 1,
'episode_number': 1,
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title, key = mobj.group('title', 'key')
video_id = '%s/%s' % (title, key)
settings = self._download_json(
'https://www.hidive.com/play/settings', video_id,
data=urlencode_postdata({
'Title': title,
'Key': key,
}))
restriction = settings.get('restrictionReason')
if restriction == 'RegionRestricted':
self.raise_geo_restricted()
if restriction and restriction != 'None':
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, restriction), expected=True)
formats = []
subtitles = {}
for rendition_id, rendition in settings['renditions'].items():
bitrates = rendition.get('bitrates')
if not isinstance(bitrates, dict):
continue
m3u8_url = bitrates.get('hls')
if not isinstance(m3u8_url, compat_str):
continue
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='%s-hls' % rendition_id, fatal=False))
cc_files = rendition.get('ccFiles')
if not isinstance(cc_files, list):
continue
for cc_file in cc_files:
if not isinstance(cc_file, list) or len(cc_file) < 3:
continue
cc_lang = cc_file[0]
cc_url = cc_file[2]
if not isinstance(cc_lang, compat_str) or not isinstance(
cc_url, compat_str):
continue
subtitles.setdefault(cc_lang, []).append({
'url': cc_url,
})
season_number = int_or_none(self._search_regex(
r's(\d+)', key, 'season number', default=None))
episode_number = int_or_none(self._search_regex(
r'e(\d+)', key, 'episode number', default=None))
return {
'id': video_id,
'title': video_id,
'subtitles': subtitles,
'formats': formats,
'series': title,
'season_number': season_number,
'episode_number': episode_number,
}

View File

@ -1,7 +1,8 @@
from __future__ import unicode_literals
import base64
from .common import InfoExtractor
from ..compat import compat_b64decode
from ..utils import (
ExtractorError,
HEADRequest,
@ -47,7 +48,7 @@ class HotNewHipHopIE(InfoExtractor):
if 'mediaKey' not in mkd:
raise ExtractorError('Did not get a media key')
redirect_url = compat_b64decode(video_url_base64).decode('utf-8')
redirect_url = base64.b64decode(video_url_base64).decode('utf-8')
redirect_req = HEADRequest(redirect_url)
req = self._request_webpage(
redirect_req, video_id,

View File

@ -2,8 +2,9 @@
from __future__ import unicode_literals
import base64
from ..compat import (
compat_b64decode,
compat_urllib_parse_unquote,
compat_urlparse,
)
@ -60,7 +61,7 @@ class InfoQIE(BokeCCBaseIE):
encoded_id = self._search_regex(
r"jsclassref\s*=\s*'([^']*)'", webpage, 'encoded id', default=None)
real_id = compat_urllib_parse_unquote(compat_b64decode(encoded_id).decode('utf-8'))
real_id = compat_urllib_parse_unquote(base64.b64decode(encoded_id.encode('ascii')).decode('utf-8'))
playpath = 'mp4:' + real_id
return [{

View File

@ -1,7 +1,6 @@
from __future__ import unicode_literals
import itertools
import json
import re
from .common import InfoExtractor
@ -239,34 +238,36 @@ class InstagramUserIE(InfoExtractor):
}
def _entries(self, uploader_id):
def get_count(suffix):
query = {
'__a': 1,
}
def get_count(kind):
return int_or_none(try_get(
node, lambda x: x['edge_media_' + suffix]['count']))
node, lambda x: x['%ss' % kind]['count']))
cursor = ''
for page_num in itertools.count(1):
media = self._download_json(
'https://www.instagram.com/graphql/query/', uploader_id,
'Downloading JSON page %d' % page_num, query={
'query_hash': '472f257a40c653c64c666ce877d59d2b',
'variables': json.dumps({
'id': uploader_id,
'first': 100,
'after': cursor,
})
})['data']['user']['edge_owner_to_timeline_media']
edges = media.get('edges')
if not edges or not isinstance(edges, list):
page = self._download_json(
'https://instagram.com/%s/' % uploader_id, uploader_id,
note='Downloading page %d' % page_num,
fatal=False, query=query)
if not page:
break
for edge in edges:
node = edge.get('node')
if not node or not isinstance(node, dict):
continue
nodes = try_get(page, lambda x: x['user']['media']['nodes'], list)
if not nodes:
break
max_id = None
for node in nodes:
node_id = node.get('id')
if node_id:
max_id = node_id
if node.get('__typename') != 'GraphVideo' and node.get('is_video') is not True:
continue
video_id = node.get('shortcode')
video_id = node.get('code')
if not video_id:
continue
@ -275,14 +276,14 @@ class InstagramUserIE(InfoExtractor):
ie=InstagramIE.ie_key(), video_id=video_id)
description = try_get(
node, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
node, [lambda x: x['caption'], lambda x: x['text']['id']],
compat_str)
thumbnail = node.get('thumbnail_src') or node.get('display_src')
timestamp = int_or_none(node.get('taken_at_timestamp'))
timestamp = int_or_none(node.get('date'))
comment_count = get_count('to_comment')
like_count = get_count('preview_like')
view_count = int_or_none(node.get('video_view_count'))
comment_count = get_count('comment')
like_count = get_count('like')
view_count = int_or_none(node.get('video_views'))
info.update({
'description': description,
@ -295,23 +296,12 @@ class InstagramUserIE(InfoExtractor):
yield info
page_info = media.get('page_info')
if not page_info or not isinstance(page_info, dict):
if not max_id:
break
has_next_page = page_info.get('has_next_page')
if not has_next_page:
break
cursor = page_info.get('end_cursor')
if not cursor or not isinstance(cursor, compat_str):
break
query['max_id'] = max_id
def _real_extract(self, url):
username = self._match_id(url)
uploader_id = self._download_json(
'https://instagram.com/%s/' % username, username, query={
'__a': 1,
})['graphql']['user']['id']
uploader_id = self._match_id(url)
return self.playlist_result(
self._entries(uploader_id), username, username)
self._entries(uploader_id), uploader_id, uploader_id)

View File

@ -23,14 +23,11 @@ class JWPlatformIE(InfoExtractor):
@staticmethod
def _extract_url(webpage):
urls = JWPlatformIE._extract_urls(webpage)
return urls[0] if urls else None
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//content\.jwplatform\.com/players/[a-zA-Z0-9]{8})',
mobj = re.search(
r'<(?:script|iframe)[^>]+?src=["\'](?P<url>(?:https?:)?//content.jwplatform.com/players/[a-zA-Z0-9]{8})',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@ -0,0 +1,71 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
qualities,
)
class KamcordIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?kamcord\.com/v/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.kamcord.com/v/hNYRduDgWb4',
'md5': 'c3180e8a9cfac2e86e1b88cb8751b54c',
'info_dict': {
'id': 'hNYRduDgWb4',
'ext': 'mp4',
'title': 'Drinking Madness',
'uploader': 'jacksfilms',
'uploader_id': '3044562',
'view_count': int,
'like_count': int,
'comment_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video = self._parse_json(
self._search_regex(
r'window\.__props\s*=\s*({.+?});?(?:\n|\s*</script)',
webpage, 'video'),
video_id)['video']
title = video['title']
formats = self._extract_m3u8_formats(
video['play']['hls'], video_id, 'mp4', entry_protocol='m3u8_native')
self._sort_formats(formats)
uploader = video.get('user', {}).get('username')
uploader_id = video.get('user', {}).get('id')
view_count = int_or_none(video.get('viewCount'))
like_count = int_or_none(video.get('heartCount'))
comment_count = int_or_none(video.get('messageCount'))
preference_key = qualities(('small', 'medium', 'large'))
thumbnails = [{
'url': thumbnail_url,
'id': thumbnail_id,
'preference': preference_key(thumbnail_id),
} for thumbnail_id, thumbnail_url in (video.get('thumbnail') or {}).items()
if isinstance(thumbnail_id, compat_str) and isinstance(thumbnail_url, compat_str)]
return {
'id': video_id,
'title': title,
'uploader': uploader,
'uploader_id': uploader_id,
'view_count': view_count,
'like_count': like_count,
'comment_count': comment_count,
'thumbnails': thumbnails,
'formats': formats,
}

View File

@ -49,9 +49,7 @@ class LA7IE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
player_data = self._parse_json(
self._search_regex(
[r'(?s)videoParams\s*=\s*({.+?});', r'videoLa7\(({[^;]+})\);'],
webpage, 'player data'),
self._search_regex(r'videoLa7\(({[^;]+})\);', webpage, 'player data'),
video_id, transform_source=js_to_json)
return {

View File

@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import datetime
import hashlib
import re
@ -8,7 +9,6 @@ import time
from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_ord,
compat_str,
compat_urllib_parse_urlencode,
@ -329,7 +329,7 @@ class LetvCloudIE(InfoExtractor):
raise ExtractorError('Letv cloud returned an unknwon error')
def b64decode(s):
return compat_b64decode(s).decode('utf-8')
return base64.b64decode(s.encode('utf-8')).decode('utf-8')
formats = []
for media in play_json['data']['video_info']['media'].values():

View File

@ -1,53 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class LentaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lenta\.ru/[^/]+/\d+/\d+/\d+/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://lenta.ru/news/2018/03/22/savshenko_go/',
'info_dict': {
'id': '964400',
'ext': 'mp4',
'title': 'Надежду Савченко задержали',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 61,
'view_count': int,
},
'params': {
'skip_download': True,
},
}, {
# EaglePlatform iframe embed
'url': 'http://lenta.ru/news/2015/03/06/navalny/',
'info_dict': {
'id': '227304',
'ext': 'mp4',
'title': 'Навальный вышел на свободу',
'description': 'md5:d97861ac9ae77377f3f20eaf9d04b4f5',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 87,
'view_count': int,
'age_limit': 0,
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'vid\s*:\s*["\']?(\d+)', webpage, 'eagleplatform id',
default=None)
if video_id:
return self.url_result(
'eagleplatform:lentaru.media.eagleplatform.com:%s' % video_id,
ie='EaglePlatform', video_id=video_id)
return self.url_result(url, ie='Generic')

View File

@ -1,28 +1,24 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
unified_strdate,
)
from ..utils import unified_strdate
class LibsynIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://html5-player\.libsyn\.com/embed/episode/id/(?P<id>[0-9]+))'
_TESTS = [{
'url': 'http://html5-player.libsyn.com/embed/episode/id/6385796/',
'md5': '2a55e75496c790cdeb058e7e6c087746',
'url': 'http://html5-player.libsyn.com/embed/episode/id/3377616/',
'md5': '443360ee1b58007bc3dcf09b41d093bb',
'info_dict': {
'id': '6385796',
'id': '3377616',
'ext': 'mp3',
'title': "Champion Minded - Developing a Growth Mindset",
'description': 'In this episode, Allistair talks about the importance of developing a growth mindset, not only in sports, but in life too.',
'upload_date': '20180320',
'title': "The Daily Show Podcast without Jon Stewart - Episode 12: Bassem Youssef: Egypt's Jon Stewart",
'description': 'md5:601cb790edd05908957dae8aaa866465',
'upload_date': '20150220',
'thumbnail': 're:^https?://.*',
},
}, {
@ -43,45 +39,31 @@ class LibsynIE(InfoExtractor):
url = m.group('mainurl')
webpage = self._download_webpage(url, video_id)
formats = [{
'url': media_url,
} for media_url in set(re.findall(r'var\s+mediaURL(?:Libsyn)?\s*=\s*"([^"]+)"', webpage))]
podcast_title = self._search_regex(
r'<h3>([^<]+)</h3>', webpage, 'podcast title', default=None)
if podcast_title:
podcast_title = podcast_title.strip()
r'<h2>([^<]+)</h2>', webpage, 'podcast title', default=None)
episode_title = self._search_regex(
r'(?:<div class="episode-title">|<h4>)([^<]+)</', webpage, 'episode title')
if episode_title:
episode_title = episode_title.strip()
r'(?:<div class="episode-title">|<h3>)([^<]+)</', webpage, 'episode title')
title = '%s - %s' % (podcast_title, episode_title) if podcast_title else episode_title
description = self._html_search_regex(
r'<p\s+id="info_text_body">(.+?)</p>', webpage,
r'<div id="info_text_body">(.+?)</div>', webpage,
'description', default=None)
if description:
# Strip non-breaking and normal spaces
description = description.replace('\u00A0', ' ').strip()
thumbnail = self._search_regex(
r'<img[^>]+class="info-show-icon"[^>]+src="([^"]+)"',
webpage, 'thumbnail', fatal=False)
release_date = unified_strdate(self._search_regex(
r'<div class="release_date">Released: ([^<]+)<', webpage, 'release date', fatal=False))
data_json = self._search_regex(r'var\s+playlistItem\s*=\s*(\{.*?\});\n', webpage, 'JSON data block')
data = json.loads(data_json)
formats = [{
'url': data['media_url'],
'format_id': 'main',
}, {
'url': data['media_url_libsyn'],
'format_id': 'libsyn',
}]
thumbnail = data.get('thumbnail_url')
duration = parse_duration(data.get('duration'))
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': release_date,
'duration': duration,
'formats': formats,
}

View File

@ -10,7 +10,6 @@ from ..utils import (
float_or_none,
int_or_none,
smuggle_url,
try_get,
unsmuggle_url,
ExtractorError,
)
@ -221,12 +220,6 @@ class LimelightBaseIE(InfoExtractor):
'subtitles': subtitles,
}
def _extract_info_helper(self, pc, mobile, i, metadata):
return self._extract_info(
try_get(pc, lambda x: x['playlistItems'][i]['streams'], list) or [],
try_get(mobile, lambda x: x['mediaList'][i]['mobileUrls'], list) or [],
metadata)
class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight'
@ -289,7 +282,10 @@ class LimelightMediaIE(LimelightBaseIE):
'getMobilePlaylistByMediaId', 'properties',
smuggled_data.get('source_url'))
return self._extract_info_helper(pc, mobile, 0, metadata)
return self._extract_info(
pc['playlistItems'][0].get('streams', []),
mobile['mediaList'][0].get('mobileUrls', []) if mobile else [],
metadata)
class LimelightChannelIE(LimelightBaseIE):
@ -330,7 +326,10 @@ class LimelightChannelIE(LimelightBaseIE):
'media', smuggled_data.get('source_url'))
entries = [
self._extract_info_helper(pc, mobile, i, medias['media_list'][i])
self._extract_info(
pc['playlistItems'][i].get('streams', []),
mobile['mediaList'][i].get('mobileUrls', []) if mobile else [],
medias['media_list'][i])
for i in range(len(medias['media_list']))]
return self.playlist_result(entries, channel_id, pc['title'])

View File

@ -1,90 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import js_to_json
class LineTVIE(InfoExtractor):
_VALID_URL = r'https?://tv\.line\.me/v/(?P<id>\d+)_[^/]+-(?P<segment>ep\d+-\d+)'
_TESTS = [{
'url': 'https://tv.line.me/v/793123_goodbye-mrblack-ep1-1/list/69246',
'info_dict': {
'id': '793123_ep1-1',
'ext': 'mp4',
'title': 'Goodbye Mr.Black | EP.1-1',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 998.509,
'view_count': int,
},
}, {
'url': 'https://tv.line.me/v/2587507_%E6%B4%BE%E9%81%A3%E5%A5%B3%E9%86%ABx-ep1-02/list/185245',
'only_matching': True,
}]
def _real_extract(self, url):
series_id, segment = re.match(self._VALID_URL, url).groups()
video_id = '%s_%s' % (series_id, segment)
webpage = self._download_webpage(url, video_id)
player_params = self._parse_json(self._search_regex(
r'naver\.WebPlayer\(({[^}]+})\)', webpage, 'player parameters'),
video_id, transform_source=js_to_json)
video_info = self._download_json(
'https://global-nvapis.line.me/linetv/rmcnmv/vod_play_videoInfo.json',
video_id, query={
'videoId': player_params['videoId'],
'key': player_params['key'],
})
stream = video_info['streams'][0]
extra_query = '?__gda__=' + stream['key']['value']
formats = self._extract_m3u8_formats(
stream['source'] + extra_query, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
for a_format in formats:
a_format['url'] += extra_query
duration = None
for video in video_info.get('videos', {}).get('list', []):
encoding_option = video.get('encodingOption', {})
abr = video['bitrate']['audio']
vbr = video['bitrate']['video']
tbr = abr + vbr
formats.append({
'url': video['source'],
'format_id': 'http-%d' % int(tbr),
'height': encoding_option.get('height'),
'width': encoding_option.get('width'),
'abr': abr,
'vbr': vbr,
'filesize': video.get('size'),
})
if video.get('duration') and duration is None:
duration = video['duration']
self._sort_formats(formats)
if not formats[0].get('width'):
formats[0]['vcodec'] = 'none'
title = self._og_search_title(webpage)
# like_count requires an additional API request https://tv.line.me/api/likeit/getCount
return {
'id': video_id,
'title': title,
'formats': formats,
'extra_param_to_segment_url': extra_query[1:],
'duration': duration,
'thumbnails': [{'url': thumbnail['source']}
for thumbnail in video_info.get('thumbnails', {}).get('list', [])],
'view_count': video_info.get('meta', {}).get('count'),
}

View File

@ -1,17 +1,12 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import json
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
int_or_none,
parse_duration,
remove_end,
try_get,
)
@ -162,153 +157,3 @@ class MailRuIE(InfoExtractor):
'view_count': view_count,
'formats': formats,
}
class MailRuMusicSearchBaseIE(InfoExtractor):
def _search(self, query, url, audio_id, limit=100, offset=0):
search = self._download_json(
'https://my.mail.ru/cgi-bin/my/ajax', audio_id,
'Downloading songs JSON page %d' % (offset // limit + 1),
headers={
'Referer': url,
'X-Requested-With': 'XMLHttpRequest',
}, query={
'xemail': '',
'ajax_call': '1',
'func_name': 'music.search',
'mna': '',
'mnb': '',
'arg_query': query,
'arg_extended': '1',
'arg_search_params': json.dumps({
'music': {
'limit': limit,
'offset': offset,
},
}),
'arg_limit': limit,
'arg_offset': offset,
})
return next(e for e in search if isinstance(e, dict))
@staticmethod
def _extract_track(t, fatal=True):
audio_url = t['URL'] if fatal else t.get('URL')
if not audio_url:
return
audio_id = t['File'] if fatal else t.get('File')
if not audio_id:
return
thumbnail = t.get('AlbumCoverURL') or t.get('FiledAlbumCover')
uploader = t.get('OwnerName') or t.get('OwnerName_Text_HTML')
uploader_id = t.get('UploaderID')
duration = int_or_none(t.get('DurationInSeconds')) or parse_duration(
t.get('Duration') or t.get('DurationStr'))
view_count = int_or_none(t.get('PlayCount') or t.get('PlayCount_hr'))
track = t.get('Name') or t.get('Name_Text_HTML')
artist = t.get('Author') or t.get('Author_Text_HTML')
if track:
title = '%s - %s' % (artist, track) if artist else track
else:
title = audio_id
return {
'extractor_key': MailRuMusicIE.ie_key(),
'id': audio_id,
'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'view_count': view_count,
'vcodec': 'none',
'abr': int_or_none(t.get('BitRate')),
'track': track,
'artist': artist,
'album': t.get('Album'),
'url': audio_url,
}
class MailRuMusicIE(MailRuMusicSearchBaseIE):
IE_NAME = 'mailru:music'
IE_DESC = 'Музыка@Mail.Ru'
_VALID_URL = r'https?://my\.mail\.ru/music/songs/[^/?#&]+-(?P<id>[\da-f]+)'
_TESTS = [{
'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893',
'md5': '0f8c22ef8c5d665b13ac709e63025610',
'info_dict': {
'id': '4e31f7125d0dfaef505d947642366893',
'ext': 'mp3',
'title': 'L.A.H. (Luciferian Aesthetics of Herrschaft) single, 2017 - М8Л8ТХ',
'uploader': 'Игорь Мудрый',
'uploader_id': '1459196328',
'duration': 280,
'view_count': int,
'vcodec': 'none',
'abr': 320,
'track': 'L.A.H. (Luciferian Aesthetics of Herrschaft) single, 2017',
'artist': 'М8Л8ТХ',
},
}]
def _real_extract(self, url):
audio_id = self._match_id(url)
webpage = self._download_webpage(url, audio_id)
title = self._og_search_title(webpage)
music_data = self._search(title, url, audio_id)['MusicData']
t = next(t for t in music_data if t.get('File') == audio_id)
info = self._extract_track(t)
info['title'] = title
return info
class MailRuMusicSearchIE(MailRuMusicSearchBaseIE):
IE_NAME = 'mailru:music:search'
IE_DESC = 'Музыка@Mail.Ru'
_VALID_URL = r'https?://my\.mail\.ru/music/search/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://my.mail.ru/music/search/black%20shadow',
'info_dict': {
'id': 'black shadow',
},
'playlist_mincount': 532,
}]
def _real_extract(self, url):
query = compat_urllib_parse_unquote(self._match_id(url))
entries = []
LIMIT = 100
offset = 0
for _ in itertools.count(1):
search = self._search(query, url, query, LIMIT, offset)
music_data = search.get('MusicData')
if not music_data or not isinstance(music_data, list):
break
for t in music_data:
track = self._extract_track(t, fatal=False)
if track:
entries.append(track)
total = try_get(
search, lambda x: x['Results']['music']['Total'], int)
if total is not None:
if offset > total:
break
offset += LIMIT
return self.playlist_result(entries, query)

View File

@ -1,12 +1,13 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_urllib_parse_unquote,
from ..compat import compat_urllib_parse_unquote
from ..utils import (
int_or_none,
)
from ..utils import int_or_none
class MangomoloBaseIE(InfoExtractor):
@ -50,4 +51,4 @@ class MangomoloLiveIE(MangomoloBaseIE):
_IS_LIVE = True
def _get_real_id(self, page_id):
return compat_b64decode(compat_urllib_parse_unquote(page_id)).decode()
return base64.b64decode(compat_urllib_parse_unquote(page_id).encode()).decode()

View File

@ -1,12 +1,12 @@
from __future__ import unicode_literals
import base64
import functools
import itertools
import re
from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_chr,
compat_ord,
compat_str,
@ -79,7 +79,7 @@ class MixcloudIE(InfoExtractor):
if encrypted_play_info is not None:
# Decode
encrypted_play_info = compat_b64decode(encrypted_play_info)
encrypted_play_info = base64.b64decode(encrypted_play_info)
else:
# New path
full_info_json = self._parse_json(self._html_search_regex(
@ -109,7 +109,7 @@ class MixcloudIE(InfoExtractor):
kpa_target = encrypted_play_info
else:
kps = ['https://', 'http://']
kpa_target = compat_b64decode(info_json['streamInfo']['url'])
kpa_target = base64.b64decode(info_json['streamInfo']['url'])
for kp in kps:
partial_key = self._decrypt_xor_cipher(kpa_target, kp)
for quote in ["'", '"']:
@ -165,7 +165,7 @@ class MixcloudIE(InfoExtractor):
format_url = stream_info.get(url_key)
if not format_url:
continue
decrypted = self._decrypt_xor_cipher(key, compat_b64decode(format_url))
decrypted = self._decrypt_xor_cipher(key, base64.b64decode(format_url))
if not decrypted:
continue
if url_key == 'hlsUrl':

View File

@ -3,18 +3,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .vimple import SprutoBaseIE
class MyviIE(SprutoBaseIE):
_VALID_URL = r'''(?x)
(?:
https?://
(?:www\.)?
myvi\.
(?:
(?:ru/player|tv)/
myvi\.(?:ru/player|tv)/
(?:
(?:
embed/html|
@ -22,10 +17,6 @@ class MyviIE(SprutoBaseIE):
api/Video/Get
)/|
content/preloader\.swf\?.*\bid=
)|
ru/watch/
)|
myvi:
)
(?P<id>[\da-zA-Z_-]+)
'''
@ -51,12 +42,6 @@ class MyviIE(SprutoBaseIE):
}, {
'url': 'http://myvi.ru/player/flash/ocp2qZrHI-eZnHKQBK4cZV60hslH8LALnk0uBfKsB-Q4WnY26SeGoYPi8HWHxu0O30',
'only_matching': True,
}, {
'url': 'https://www.myvi.ru/watch/YwbqszQynUaHPn_s82sx0Q2',
'only_matching': True,
}, {
'url': 'myvi:YwbqszQynUaHPn_s82sx0Q2',
'only_matching': True,
}]
@classmethod
@ -73,39 +58,3 @@ class MyviIE(SprutoBaseIE):
'http://myvi.ru/player/api/Video/Get/%s?sig' % video_id, video_id)['sprutoData']
return self._extract_spruto(spruto, video_id)
class MyviEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?myvi\.tv/(?:[^?]+\?.*?\bv=|embed/)(?P<id>[\da-z]+)'
_TESTS = [{
'url': 'https://www.myvi.tv/embed/ccdqic3wgkqwpb36x9sxg43t4r',
'info_dict': {
'id': 'b3ea0663-3234-469d-873e-7fecf36b31d1',
'ext': 'mp4',
'title': 'Твоя (original song).mp4',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 277,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.myvi.tv/idmi6o?v=ccdqic3wgkqwpb36x9sxg43t4r#watch',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if MyviIE.suitable(url) else super(MyviEmbedIE, cls).suitable(url)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://www.myvi.tv/embed/%s' % video_id, video_id)
myvi_id = self._search_regex(
r'CreatePlayer\s*\(\s*["\'].*?\bv=([\da-zA-Z_]+)',
webpage, 'video id')
return self.url_result('myvi:%s' % myvi_id, ie=MyviIE.ie_key())

View File

@ -68,7 +68,7 @@ class NationalGeographicVideoIE(InfoExtractor):
class NationalGeographicIE(ThePlatformIE, AdobePassIE):
IE_NAME = 'natgeo'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:(?:wild/)?[^/]+/)?(?:videos|episodes)/(?P<id>[^/?]+)'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/(?:videos|episodes)/(?P<id>[^/?]+)'
_TESTS = [
{
@ -102,10 +102,6 @@ class NationalGeographicIE(ThePlatformIE, AdobePassIE):
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/episodes/the-power-of-miracles/',
'only_matching': True,
},
{
'url': 'http://channel.nationalgeographic.com/videos/treasures-rediscovered/',
'only_matching': True,
}
]

View File

@ -1,7 +1,6 @@
from __future__ import unicode_literals
import re
import base64
from .common import InfoExtractor
from .theplatform import ThePlatformIE
@ -359,7 +358,6 @@ class NBCNewsIE(ThePlatformIE):
class NBCOlympicsIE(InfoExtractor):
IE_NAME = 'nbcolympics'
_VALID_URL = r'https?://www\.nbcolympics\.com/video/(?P<id>[a-z-]+)'
_TEST = {
@ -397,54 +395,3 @@ class NBCOlympicsIE(InfoExtractor):
'ie_key': ThePlatformIE.ie_key(),
'display_id': display_id,
}
class NBCOlympicsStreamIE(AdobePassIE):
IE_NAME = 'nbcolympics:stream'
_VALID_URL = r'https?://stream\.nbcolympics\.com/(?P<id>[0-9a-z-]+)'
_TEST = {
'url': 'http://stream.nbcolympics.com/2018-winter-olympics-nbcsn-evening-feb-8',
'info_dict': {
'id': '203493',
'ext': 'mp4',
'title': 're:Curling, Alpine, Luge [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
_DATA_URL_TEMPLATE = 'http://stream.nbcolympics.com/data/%s_%s.json'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
pid = self._search_regex(r'pid\s*=\s*(\d+);', webpage, 'pid')
resource = self._search_regex(
r"resource\s*=\s*'(.+)';", webpage,
'resource').replace("' + pid + '", pid)
event_config = self._download_json(
self._DATA_URL_TEMPLATE % ('event_config', pid),
pid)['eventConfig']
title = self._live_title(event_config['eventTitle'])
source_url = self._download_json(
self._DATA_URL_TEMPLATE % ('live_sources', pid),
pid)['videoSources'][0]['sourceUrl']
media_token = self._extract_mvpd_auth(
url, pid, event_config.get('requestorId', 'NBCOlympics'), resource)
formats = self._extract_m3u8_formats(self._download_webpage(
'http://sp.auth.adobe.com/tvs/v1/sign', pid, query={
'cdn': 'akamai',
'mediaToken': base64.b64encode(media_token.encode()),
'resource': base64.b64encode(resource.encode()),
'url': source_url,
}), pid, 'mp4')
self._sort_formats(formats)
return {
'id': pid,
'display_id': display_id,
'title': title,
'formats': formats,
'is_live': True,
}

View File

@ -190,12 +190,10 @@ class NDREmbedBaseIE(InfoExtractor):
ext = determine_ext(src, None)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
src + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id,
f4m_id='hds', fatal=False))
src + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, f4m_id='hds'))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', m3u8_id='hls',
entry_protocol='m3u8_native', fatal=False))
src, video_id, 'mp4', m3u8_id='hls', entry_protocol='m3u8_native'))
else:
quality = f.get('quality')
ff = {

View File

@ -87,21 +87,19 @@ class NewgroundsIE(InfoExtractor):
self._check_formats(formats, media_id)
self._sort_formats(formats)
uploader = self._html_search_regex(
(r'(?s)<h4[^>]*>(.+?)</h4>.*?<em>\s*Author\s*</em>',
r'(?:Author|Writer)\s*<a[^>]+>([^<]+)'), webpage, 'uploader',
uploader = self._search_regex(
r'(?:Author|Writer)\s*<a[^>]+>([^<]+)', webpage, 'uploader',
fatal=False)
timestamp = unified_timestamp(self._html_search_regex(
(r'<dt>\s*Uploaded\s*</dt>\s*<dd>([^<]+</dd>\s*<dd>[^<]+)',
r'<dt>\s*Uploaded\s*</dt>\s*<dd>([^<]+)'), webpage, 'timestamp',
timestamp = unified_timestamp(self._search_regex(
r'<dt>Uploaded</dt>\s*<dd>([^<]+)', webpage, 'timestamp',
default=None))
duration = parse_duration(self._search_regex(
r'(?s)<dd>\s*Song\s*</dd>\s*<dd>.+?</dd>\s*<dd>([^<]+)', webpage,
'duration', default=None))
r'<dd>Song\s*</dd><dd>.+?</dd><dd>([^<]+)', webpage, 'duration',
default=None))
filesize_approx = parse_filesize(self._html_search_regex(
r'(?s)<dd>\s*Song\s*</dd>\s*<dd>(.+?)</dd>', webpage, 'filesize',
r'<dd>Song\s*</dd><dd>(.+?)</dd>', webpage, 'filesize',
default=None))
if len(formats) == 1:
formats[0]['filesize_approx'] = filesize_approx

View File

@ -21,8 +21,7 @@ class NexxIE(InfoExtractor):
_VALID_URL = r'''(?x)
(?:
https?://api\.nexx(?:\.cloud|cdn\.com)/v3/(?P<domain_id>\d+)/videos/byid/|
nexx:(?:(?P<domain_id_s>\d+):)?|
https?://arc\.nexx\.cloud/api/video/
nexx:(?P<domain_id_s>\d+):
)
(?P<id>\d+)
'''
@ -62,33 +61,12 @@ class NexxIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
# does not work via arc
'url': 'nexx:741:1269984',
'md5': 'c714b5b238b2958dc8d5642addba6886',
'info_dict': {
'id': '1269984',
'ext': 'mp4',
'title': '1 TAG ohne KLO... wortwörtlich! 😑',
'alt_title': '1 TAG ohne KLO... wortwörtlich! 😑',
'description': 'md5:4604539793c49eda9443ab5c5b1d612f',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 607,
'timestamp': 1518614955,
'upload_date': '20180214',
},
}, {
'url': 'https://api.nexxcdn.com/v3/748/videos/byid/128907',
'only_matching': True,
}, {
'url': 'nexx:748:128907',
'only_matching': True,
}, {
'url': 'nexx:128907',
'only_matching': True,
}, {
'url': 'https://arc.nexx.cloud/api/video/128907.json',
'only_matching': True,
}]
@staticmethod
@ -146,18 +124,6 @@ class NexxIE(InfoExtractor):
domain_id = mobj.group('domain_id') or mobj.group('domain_id_s')
video_id = mobj.group('id')
video = None
response = self._download_json(
'https://arc.nexx.cloud/api/video/%s.json' % video_id,
video_id, fatal=False)
if response and isinstance(response, dict):
result = response.get('result')
if result and isinstance(result, dict):
video = result
# not all videos work via arc, e.g. nexx:741:1269984
if not video:
# Reverse engineered from JS code (see getDeviceID function)
device_id = '%d:%d:%d%d' % (
random.randint(1, 4), int(time.time()),

View File

@ -198,7 +198,7 @@ class NickNightIE(NickDeIE):
class NickRuIE(MTVServicesInfoExtractor):
IE_NAME = 'nickelodeonru'
_VALID_URL = r'https?://(?:www\.)nickelodeon\.(?:ru|fr|es|pt|ro|hu|com\.tr)/[^/]+/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)nickelodeon\.(?:ru|fr|es|pt|ro|hu)/[^/]+/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.nickelodeon.ru/shows/henrydanger/videos/episodes/3-sezon-15-seriya-licenziya-na-polyot/pmomfb#playlist/7airc6',
'only_matching': True,
@ -220,9 +220,6 @@ class NickRuIE(MTVServicesInfoExtractor):
}, {
'url': 'http://www.nickelodeon.hu/musorok/spongyabob-kockanadrag/videok/episodes/buborekfujas-az-elszakadt-nadrag/q57iob#playlist/k6te4y',
'only_matching': True,
}, {
'url': 'http://www.nickelodeon.com.tr/programlar/sunger-bob/videolar/kayip-yatak/mgqbjy',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -13,7 +13,7 @@ class NineGagIE(InfoExtractor):
_TESTS = [{
'url': 'http://9gag.com/tv/p/Kk2X5/people-are-awesome-2013-is-absolutely-awesome',
'info_dict': {
'id': 'kXzwOKyGlSA',
'id': 'Kk2X5',
'ext': 'mp4',
'description': 'This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)',
'title': '\"People Are Awesome 2013\" Is Absolutely Awesome',

View File

@ -4,17 +4,15 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
float_or_none,
smuggle_url,
ExtractorError,
)
class NineNowIE(InfoExtractor):
IE_NAME = '9now.com.au'
_VALID_URL = r'https?://(?:www\.)?9now\.com\.au/(?:[^/]+/){2}(?P<id>[^/?#]+)'
_GEO_COUNTRIES = ['AU']
_TESTS = [{
# clip
'url': 'https://www.9now.com.au/afl-footy-show/2016/clip-ciql02091000g0hp5oktrnytc',
@ -77,9 +75,7 @@ class NineNowIE(InfoExtractor):
return {
'_type': 'url_transparent',
'url': smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
{'geo_countries': self._GEO_COUNTRIES}),
'url': self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
'id': video_id,
'title': title,
'description': common_data.get('description'),

View File

@ -43,8 +43,7 @@ class NJPWWorldIE(InfoExtractor):
webpage, urlh = self._download_webpage_handle(
'https://njpwworld.com/auth/login', None,
note='Logging in', errnote='Unable to login',
data=urlencode_postdata({'login_id': username, 'pw': password}),
headers={'Referer': 'https://njpwworld.com/auth'})
data=urlencode_postdata({'login_id': username, 'pw': password}))
# /auth/login will return 302 for successful logins
if urlh.geturl() == 'https://njpwworld.com/auth/login':
self.report_warning('unable to login')

View File

@ -11,7 +11,6 @@ from ..utils import (
determine_ext,
ExtractorError,
fix_xml_ampersands,
int_or_none,
orderedSet,
parse_duration,
qualities,
@ -39,7 +38,7 @@ class NPOIE(NPOBaseIE):
npo\.nl/(?!(?:live|radio)/)(?:[^/]+/){2}|
ntr\.nl/(?:[^/]+/){2,}|
omroepwnl\.nl/video/fragment/[^/]+__|
(?:zapp|npo3)\.nl/(?:[^/]+/){2,}
(?:zapp|npo3)\.nl/(?:[^/]+/){2}
)
)
(?P<id>[^/?#]+)
@ -157,9 +156,6 @@ class NPOIE(NPOBaseIE):
}, {
'url': 'http://www.npo.nl/radio-gaga/13-06-2017/BNN_101383373',
'only_matching': True,
}, {
'url': 'https://www.zapp.nl/1803-skelterlab/instructie-video-s/740-instructievideo-s/POMS_AT_11736927',
'only_matching': True,
}]
def _real_extract(self, url):
@ -174,10 +170,6 @@ class NPOIE(NPOBaseIE):
transform_source=strip_jsonp,
)
error = metadata.get('error')
if error:
raise ExtractorError(error, expected=True)
# For some videos actual video id (prid) is different (e.g. for
# http://www.omroepwnl.nl/video/fragment/vandaag-de-dag-verkiezingen__POMS_WNL_853698
# video id is POMS_WNL_853698 but prid is POW_00996502)
@ -195,15 +187,7 @@ class NPOIE(NPOBaseIE):
formats = []
urls = set()
def is_legal_url(format_url):
return format_url and format_url not in urls and re.match(
r'^(?:https?:)?//', format_url)
QUALITY_LABELS = ('Laag', 'Normaal', 'Hoog')
QUALITY_FORMATS = ('adaptive', 'wmv_sb', 'h264_sb', 'wmv_bb', 'h264_bb', 'wvc1_std', 'h264_std')
quality_from_label = qualities(QUALITY_LABELS)
quality_from_format_id = qualities(QUALITY_FORMATS)
quality = qualities(['adaptive', 'wmv_sb', 'h264_sb', 'wmv_bb', 'h264_bb', 'wvc1_std', 'h264_std'])
items = self._download_json(
'http://ida.omroep.nl/app.php/%s' % video_id, video_id,
'Downloading formats JSON', query={
@ -212,34 +196,18 @@ class NPOIE(NPOBaseIE):
})['items'][0]
for num, item in enumerate(items):
item_url = item.get('url')
if not is_legal_url(item_url):
if not item_url or item_url in urls:
continue
urls.add(item_url)
format_id = self._search_regex(
r'video/ida/([^/]+)', item_url, 'format id',
default=None)
item_label = item.get('label')
def add_format_url(format_url):
width = int_or_none(self._search_regex(
r'(\d+)[xX]\d+', format_url, 'width', default=None))
height = int_or_none(self._search_regex(
r'\d+[xX](\d+)', format_url, 'height', default=None))
if item_label in QUALITY_LABELS:
quality = quality_from_label(item_label)
f_id = item_label
elif item_label in QUALITY_FORMATS:
quality = quality_from_format_id(format_id)
f_id = format_id
else:
quality, f_id = [None] * 2
formats.append({
'url': format_url,
'format_id': f_id,
'width': width,
'height': height,
'quality': quality,
'format_id': format_id,
'quality': quality(format_id),
})
# Example: http://www.npo.nl/de-nieuwe-mens-deel-1/21-07-2010/WO_VPRO_043706
@ -251,7 +219,7 @@ class NPOIE(NPOBaseIE):
stream_info = self._download_json(
item_url + '&type=json', video_id,
'Downloading %s stream JSON'
% item_label or item.get('format') or format_id or num)
% item.get('label') or item.get('format') or format_id or num)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
error = (self._parse_json(
@ -283,7 +251,7 @@ class NPOIE(NPOBaseIE):
if not is_live:
for num, stream in enumerate(metadata.get('streams', [])):
stream_url = stream.get('url')
if not is_legal_url(stream_url):
if not stream_url or stream_url in urls:
continue
urls.add(stream_url)
# smooth streaming is not supported

View File

@ -19,11 +19,11 @@ from ..utils import (
class OdnoklassnikiIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|m|mobile)\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer|live)/(?P<id>[\d-]+)'
_VALID_URL = r'https?://(?:(?:www|m|mobile)\.)?(?:odnoklassniki|ok)\.ru/(?:video(?:embed)?|web-api/video/moviePlayer)/(?P<id>[\d-]+)'
_TESTS = [{
# metadata in JSON
'url': 'http://ok.ru/video/20079905452',
'md5': '0b62089b479e06681abaaca9d204f152',
'md5': '6ba728d85d60aa2e6dd37c9e70fdc6bc',
'info_dict': {
'id': '20079905452',
'ext': 'mp4',
@ -35,6 +35,7 @@ class OdnoklassnikiIE(InfoExtractor):
'like_count': int,
'age_limit': 0,
},
'skip': 'Video has been blocked',
}, {
# metadataUrl
'url': 'http://ok.ru/video/63567059965189-0?fromTime=5',
@ -98,9 +99,6 @@ class OdnoklassnikiIE(InfoExtractor):
}, {
'url': 'http://mobile.ok.ru/video/20079905452',
'only_matching': True,
}, {
'url': 'https://www.ok.ru/live/484531969818',
'only_matching': True,
}]
def _real_extract(self, url):
@ -186,10 +184,6 @@ class OdnoklassnikiIE(InfoExtractor):
})
return info
assert title
if provider == 'LIVE_TV_APP':
info['title'] = self._live_title(title)
quality = qualities(('4', '0', '1', '2', '3', '5'))
formats = [{
@ -216,20 +210,6 @@ class OdnoklassnikiIE(InfoExtractor):
if fmt_type:
fmt['quality'] = quality(fmt_type)
# Live formats
m3u8_url = metadata.get('hlsMasterPlaylistUrl')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8',
m3u8_id='hls', fatal=False))
rtmp_url = metadata.get('rtmpUrl')
if rtmp_url:
formats.append({
'url': rtmp_url,
'format_id': 'rtmp',
'ext': 'flv',
})
self._sort_formats(formats)
info['formats'] = formats

View File

@ -1,13 +1,9 @@
from __future__ import unicode_literals
import re
import base64
from .common import InfoExtractor
from ..compat import (
compat_b64decode,
compat_str,
compat_urllib_parse_urlencode,
)
from ..compat import compat_str
from ..utils import (
determine_ext,
ExtractorError,
@ -16,6 +12,7 @@ from ..utils import (
try_get,
unsmuggle_url,
)
from ..compat import compat_urllib_parse_urlencode
class OoyalaBaseIE(InfoExtractor):
@ -47,7 +44,7 @@ class OoyalaBaseIE(InfoExtractor):
url_data = try_get(stream, lambda x: x['url']['data'], compat_str)
if not url_data:
continue
s_url = compat_b64decode(url_data).decode('utf-8')
s_url = base64.b64decode(url_data.encode('ascii')).decode('utf-8')
if not s_url or s_url in urls:
continue
urls.append(s_url)

View File

@ -1,8 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
@ -20,14 +18,7 @@ from ..utils import (
class PandoraTVIE(InfoExtractor):
IE_NAME = 'pandora.tv'
IE_DESC = '판도라TV'
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?pandora\.tv/view/(?P<user_id>[^/]+)/(?P<id>\d+)| # new format
(?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?| # old format
m\.pandora\.tv/?\? # mobile
)
'''
_VALID_URL = r'https?://(?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?'
_TESTS = [{
'url': 'http://jp.channel.pandora.tv/channel/video.ptv?c1=&prgid=53294230&ch_userid=mikakim&ref=main&lot=cate_01_2',
'info_dict': {
@ -62,20 +53,9 @@ class PandoraTVIE(InfoExtractor):
# Test metadata only
'skip_download': True,
},
}, {
'url': 'http://www.pandora.tv/view/mikakim/53294230#36797454_new',
'only_matching': True,
}, {
'url': 'http://m.pandora.tv/?c=view&ch_userid=mikakim&prgid=54600346',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user_id')
video_id = mobj.group('id')
if not user_id or not video_id:
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
video_id = qs.get('prgid', [None])[0]
user_id = qs.get('ch_userid', [None])[0]

View File

@ -56,16 +56,18 @@ class PeriscopeIE(PeriscopeBaseIE):
def _real_extract(self, url):
token = self._match_id(url)
stream = self._call_api(
'accessVideoPublic', {'broadcast_id': token}, token)
broadcast_data = self._call_api(
'getBroadcastPublic', {'broadcast_id': token}, token)
broadcast = broadcast_data['broadcast']
status = broadcast['status']
broadcast = stream['broadcast']
title = broadcast['status']
user = broadcast_data.get('user', {})
uploader = broadcast.get('user_display_name') or broadcast.get('username')
uploader_id = (broadcast.get('user_id') or broadcast.get('username'))
uploader = broadcast.get('user_display_name') or user.get('display_name')
uploader_id = (broadcast.get('username') or user.get('username') or
broadcast.get('user_id') or user.get('id'))
title = '%s - %s' % (uploader, title) if uploader else title
title = '%s - %s' % (uploader, status) if uploader else status
state = broadcast.get('state').lower()
if state == 'running':
title = self._live_title(title)
@ -75,6 +77,9 @@ class PeriscopeIE(PeriscopeBaseIE):
'url': broadcast[image],
} for image in ('image_url', 'image_url_small') if broadcast.get(image)]
stream = self._call_api(
'getAccessPublic', {'broadcast_id': token}, token)
video_urls = set()
formats = []
for format_id in ('replay', 'rtmp', 'hls', 'https_hls', 'lhls', 'lhlsweb'):

View File

@ -4,9 +4,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
xpath_text,
@ -28,15 +26,17 @@ class PladformIE(InfoExtractor):
(?P<id>\d+)
'''
_TESTS = [{
'url': 'https://out.pladform.ru/player?pl=64471&videoid=3777899&vk_puid15=0&vk_puid34=0',
'md5': '53362fac3a27352da20fa2803cc5cd6f',
# http://muz-tv.ru/kinozal/view/7400/
'url': 'http://out.pladform.ru/player?pl=24822&videoid=100183293',
'md5': '61f37b575dd27f1bb2e1854777fe31f4',
'info_dict': {
'id': '3777899',
'id': '100183293',
'ext': 'mp4',
'title': 'СТУДИЯ СОЮЗ • Шоу Студия Союз, 24 выпуск (01.02.2018) Нурлан Сабуров и Слава Комиссаренко',
'description': 'md5:05140e8bf1b7e2d46e7ba140be57fd95',
'title': 'Тайны перевала Дятлова • 1 серия 2 часть',
'description': 'Документальный сериал-расследование одной из самых жутких тайн ХХ века',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 3190,
'duration': 694,
'age_limit': 0,
},
}, {
'url': 'http://static.pladform.ru/player.swf?pl=21469&videoid=100183293&vkcid=0',
@ -56,48 +56,22 @@ class PladformIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
pl = qs.get('pl', ['1'])[0]
video = self._download_xml(
'http://out.pladform.ru/getVideo', video_id, query={
'pl': pl,
'videoid': video_id,
})
def fail(text):
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, text),
expected=True)
'http://out.pladform.ru/getVideo?pl=1&videoid=%s' % video_id,
video_id)
if video.tag == 'error':
fail(video.text)
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, video.text),
expected=True)
quality = qualities(('ld', 'sd', 'hd'))
formats = []
for src in video.findall('./src'):
if src is None:
continue
format_url = src.text
if not format_url:
continue
if src.get('type') == 'hls' or determine_ext(format_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
formats = [{
'url': src.text,
'format_id': src.get('quality'),
'quality': quality(src.get('quality')),
})
if not formats:
error = xpath_text(video, './cap', 'error', default=None)
if error:
fail(error)
} for src in video.findall('./src')]
self._sort_formats(formats)
webpage = self._download_webpage(

View File

@ -11,34 +11,19 @@ from ..utils import (
class PokemonIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pokemon\.com/[a-z]{2}(?:.*?play=(?P<id>[a-z0-9]{32})|/(?:[^/]+/)+(?P<display_id>[^/?#&]+))'
_VALID_URL = r'https?://(?:www\.)?pokemon\.com/[a-z]{2}(?:.*?play=(?P<id>[a-z0-9]{32})|/[^/]+/\d+_\d+-(?P<display_id>[^/?#]+))'
_TESTS = [{
'url': 'https://www.pokemon.com/us/pokemon-episodes/20_30-the-ol-raise-and-switch/',
'md5': '2fe8eaec69768b25ef898cda9c43062e',
'url': 'http://www.pokemon.com/us/pokemon-episodes/19_01-from-a-to-z/?play=true',
'md5': '9fb209ae3a569aac25de0f5afc4ee08f',
'info_dict': {
'id': 'afe22e30f01c41f49d4f1d9eab5cd9a4',
'id': 'd0436c00c3ce4071ac6cee8130ac54a1',
'ext': 'mp4',
'title': 'The Ol Raise and Switch!',
'description': 'md5:7db77f7107f98ba88401d3adc80ff7af',
'timestamp': 1511824728,
'upload_date': '20171127',
},
'add_id': ['LimelightMedia'],
}, {
# no data-video-title
'url': 'https://www.pokemon.com/us/pokemon-episodes/pokemon-movies/pokemon-the-rise-of-darkrai-2008',
'info_dict': {
'id': '99f3bae270bf4e5097274817239ce9c8',
'ext': 'mp4',
'title': 'Pokémon: The Rise of Darkrai',
'description': 'md5:ea8fbbf942e1e497d54b19025dd57d9d',
'timestamp': 1417778347,
'upload_date': '20141205',
},
'add_id': ['LimelightMedia'],
'params': {
'skip_download': True,
'title': 'From A to Z!',
'description': 'Bonnie makes a new friend, Ash runs into an old friend, and a terrifying premonition begins to unfold!',
'timestamp': 1460478136,
'upload_date': '20160412',
},
'add_id': ['LimelightMedia']
}, {
'url': 'http://www.pokemon.com/uk/pokemon-episodes/?play=2e8b5c761f1d4a9286165d7748c1ece2',
'only_matching': True,
@ -57,9 +42,7 @@ class PokemonIE(InfoExtractor):
r'(<[^>]+data-video-id="%s"[^>]*>)' % (video_id if video_id else '[a-z0-9]{32}'),
webpage, 'video data element'))
video_id = video_data['data-video-id']
title = video_data.get('data-video-title') or self._html_search_meta(
'pkm-title', webpage, ' title', default=None) or self._search_regex(
r'<h1[^>]+\bclass=["\']us-title[^>]+>([^<]+)', webpage, 'title')
title = video_data['data-video-title']
return {
'_type': 'url_transparent',
'id': video_id,

View File

@ -115,13 +115,12 @@ class PornHubIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
self._set_cookie('pornhub.com', 'age_verified', '1')
def dl_webpage(platform):
self._set_cookie('pornhub.com', 'platform', platform)
return self._download_webpage(
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id,
video_id)
video_id, headers={
'Cookie': 'age_verified=1; platform=%s' % platform,
})
webpage = dl_webpage('pc')
@ -276,7 +275,7 @@ class PornHubPlaylistIE(PornHubPlaylistBaseIE):
class PornHubUserVideosIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/(?:user|channel)s/(?P<id>[^/]+)/videos'
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/users/(?P<id>[^/]+)/videos'
_TESTS = [{
'url': 'http://www.pornhub.com/users/zoe_ph/videos/public',
'info_dict': {
@ -286,25 +285,6 @@ class PornHubUserVideosIE(PornHubPlaylistBaseIE):
}, {
'url': 'http://www.pornhub.com/users/rushandlia/videos',
'only_matching': True,
}, {
# default sorting as Top Rated Videos
'url': 'https://www.pornhub.com/channels/povd/videos',
'info_dict': {
'id': 'povd',
},
'playlist_mincount': 293,
}, {
# Top Rated Videos
'url': 'https://www.pornhub.com/channels/povd/videos?o=ra',
'only_matching': True,
}, {
# Most Recent Videos
'url': 'https://www.pornhub.com/channels/povd/videos?o=da',
'only_matching': True,
}, {
# Most Viewed Videos
'url': 'https://www.pornhub.com/channels/povd/videos?o=vi',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -129,11 +129,10 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
https?://
(?:www\.)?
(?:
(?:beta\.)?
(?:
prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia
)\.(?:de|at|ch)|
ran\.de|fem\.com|advopedia\.de|galileo\.tv/video
ran\.de|fem\.com|advopedia\.de
)
/(?P<id>.+)
'''
@ -326,11 +325,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'url': 'http://www.sat1gold.de/tv/edel-starck/video/11-staffel-1-episode-1-partner-wider-willen-ganze-folge',
'only_matching': True,
},
{
# geo restricted to Germany
'url': 'https://www.galileo.tv/video/diese-emojis-werden-oft-missverstanden',
'only_matching': True,
},
{
'url': 'http://www.sat1gold.de/tv/edel-starck/playlist/die-gesamte-1-staffel',
'only_matching': True,
@ -348,10 +342,8 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
r'"clip_id"\s*:\s+"(\d+)"',
r'clipid: "(\d+)"',
r'clip[iI]d=(\d+)',
r'clip[iI][dD]\s*=\s*["\'](\d+)',
r'clip[iI]d\s*=\s*["\'](\d+)',
r"'itemImageUrl'\s*:\s*'/dynamic/thumbnails/full/\d+/(\d+)",
r'proMamsId&quot;\s*:\s*&quot;(\d+)',
r'proMamsId"\s*:\s*"(\d+)',
]
_TITLE_REGEXES = [
r'<h2 class="subtitle" itemprop="name">\s*(.+?)</h2>',

View File

@ -1,102 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .vimeo import VimeoIE
from ..utils import (
extract_attributes,
ExtractorError,
smuggle_url,
unsmuggle_url,
urljoin,
)
class RayWenderlichIE(InfoExtractor):
_VALID_URL = r'https?://videos\.raywenderlich\.com/courses/(?P<course_id>[^/]+)/lessons/(?P<id>\d+)'
_TESTS = [{
'url': 'https://videos.raywenderlich.com/courses/105-testing-in-ios/lessons/1',
'info_dict': {
'id': '248377018',
'ext': 'mp4',
'title': 'Testing In iOS Episode 1: Introduction',
'duration': 133,
'uploader': 'Ray Wenderlich',
'uploader_id': 'user3304672',
},
'params': {
'noplaylist': True,
'skip_download': True,
},
'add_ie': [VimeoIE.ie_key()],
'expected_warnings': ['HTTP Error 403: Forbidden'],
}, {
'url': 'https://videos.raywenderlich.com/courses/105-testing-in-ios/lessons/1',
'info_dict': {
'title': 'Testing in iOS',
'id': '105-testing-in-ios',
},
'params': {
'noplaylist': False,
},
'playlist_count': 29,
}]
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url)
course_id, lesson_id = mobj.group('course_id', 'id')
video_id = '%s/%s' % (course_id, lesson_id)
webpage = self._download_webpage(url, video_id)
no_playlist = self._downloader.params.get('noplaylist')
if no_playlist or smuggled_data.get('force_video', False):
if no_playlist:
self.to_screen(
'Downloading just video %s because of --no-playlist'
% video_id)
if '>Subscribe to unlock' in webpage:
raise ExtractorError(
'This content is only available for subscribers',
expected=True)
vimeo_id = self._search_regex(
r'data-vimeo-id=["\'](\d+)', webpage, 'video id')
return self.url_result(
VimeoIE._smuggle_referrer(
'https://player.vimeo.com/video/%s' % vimeo_id, url),
ie=VimeoIE.ie_key(), video_id=vimeo_id)
self.to_screen(
'Downloading playlist %s - add --no-playlist to just download video'
% course_id)
lesson_ids = set((lesson_id, ))
for lesson in re.findall(
r'(<a[^>]+\bclass=["\']lesson-link[^>]+>)', webpage):
attrs = extract_attributes(lesson)
if not attrs:
continue
lesson_url = attrs.get('href')
if not lesson_url:
continue
lesson_id = self._search_regex(
r'/lessons/(\d+)', lesson_url, 'lesson id', default=None)
if not lesson_id:
continue
lesson_ids.add(lesson_id)
entries = []
for lesson_id in sorted(lesson_ids):
entries.append(self.url_result(
smuggle_url(urljoin(url, lesson_id), {'force_video': True}),
ie=RayWenderlichIE.ie_key()))
title = self._search_regex(
r'class=["\']course-title[^>]+>([^<]+)', webpage, 'course title',
default=None)
return self.playlist_result(entries, course_id, title)

View File

@ -5,93 +5,135 @@ from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
float_or_none,
int_or_none,
try_get,
# unified_timestamp,
ExtractorError,
)
class RedBullTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?redbull\.tv/video/(?P<id>AP-\w+)'
_VALID_URL = r'https?://(?:www\.)?redbull\.tv/(?:video|film|live)/(?:AP-\w+/segment/)?(?P<id>AP-\w+)'
_TESTS = [{
# film
'url': 'https://www.redbull.tv/video/AP-1Q6XCDTAN1W11',
'url': 'https://www.redbull.tv/video/AP-1Q756YYX51W11/abc-of-wrc',
'md5': 'fb0445b98aa4394e504b413d98031d1f',
'info_dict': {
'id': 'AP-1Q6XCDTAN1W11',
'id': 'AP-1Q756YYX51W11',
'ext': 'mp4',
'title': 'ABC of... WRC - ABC of... S1E6',
'title': 'ABC of...WRC',
'description': 'md5:5c7ed8f4015c8492ecf64b6ab31e7d31',
'duration': 1582.04,
# 'timestamp': 1488405786,
# 'upload_date': '20170301',
},
}, {
# episode
'url': 'https://www.redbull.tv/video/AP-1PMHKJFCW1W11',
'url': 'https://www.redbull.tv/video/AP-1PMT5JCWH1W11/grime?playlist=shows:shows-playall:web',
'info_dict': {
'id': 'AP-1PMHKJFCW1W11',
'id': 'AP-1PMT5JCWH1W11',
'ext': 'mp4',
'title': 'Grime - Hashtags S2E4',
'description': 'md5:b5f522b89b72e1e23216e5018810bb25',
'title': 'Grime - Hashtags S2 E4',
'description': 'md5:334b741c8c1ce65be057eab6773c1cf5',
'duration': 904.6,
# 'timestamp': 1487290093,
# 'upload_date': '20170217',
'series': 'Hashtags',
'season_number': 2,
'episode_number': 4,
},
'params': {
'skip_download': True,
},
}, {
# segment
'url': 'https://www.redbull.tv/live/AP-1R5DX49XS1W11/segment/AP-1QSAQJ6V52111/semi-finals',
'info_dict': {
'id': 'AP-1QSAQJ6V52111',
'ext': 'mp4',
'title': 'Semi Finals - Vans Park Series Pro Tour',
'description': 'md5:306a2783cdafa9e65e39aa62f514fd97',
'duration': 11791.991,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.redbull.tv/film/AP-1MSKKF5T92111/in-motion',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
session = self._download_json(
'https://api.redbull.tv/v3/session', video_id,
'https://api-v2.redbull.tv/session', video_id,
note='Downloading access token', query={
'build': '4.370.0',
'category': 'personal_computer',
'os_version': '1.0',
'os_family': 'http',
})
if session.get('code') == 'error':
raise ExtractorError('%s said: %s' % (
self.IE_NAME, session['message']))
token = session['token']
auth = '%s %s' % (session.get('token_type', 'Bearer'), session['access_token'])
try:
video = self._download_json(
'https://api.redbull.tv/v3/products/' + video_id,
info = self._download_json(
'https://api-v2.redbull.tv/content/%s' % video_id,
video_id, note='Downloading video information',
headers={'Authorization': token}
headers={'Authorization': auth}
)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
error_message = self._parse_json(
e.cause.read().decode(), video_id)['error']
e.cause.read().decode(), video_id)['message']
raise ExtractorError('%s said: %s' % (
self.IE_NAME, error_message), expected=True)
raise
title = video['title'].strip()
video = info['video_product']
title = info['title'].strip()
formats = self._extract_m3u8_formats(
'https://dms.redbull.tv/v3/%s/%s/playlist.m3u8' % (video_id, token),
video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls')
video['url'], video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
subtitles = {}
for resource in video.get('resources', []):
if resource.startswith('closed_caption_'):
splitted_resource = resource.split('_')
if splitted_resource[2]:
subtitles.setdefault('en', []).append({
'url': 'https://resources.redbull.tv/%s/%s' % (video_id, resource),
'ext': splitted_resource[2],
for _, captions in (try_get(
video, lambda x: x['attachments']['captions'],
dict) or {}).items():
if not captions or not isinstance(captions, list):
continue
for caption in captions:
caption_url = caption.get('url')
if not caption_url:
continue
ext = caption.get('format')
if ext == 'xml':
ext = 'ttml'
subtitles.setdefault(caption.get('lang') or 'en', []).append({
'url': caption_url,
'ext': ext,
})
subheading = video.get('subheading')
subheading = info.get('subheading')
if subheading:
title += ' - %s' % subheading
return {
'id': video_id,
'title': title,
'description': video.get('long_description') or video.get(
'description': info.get('long_description') or info.get(
'short_description'),
'duration': float_or_none(video.get('duration'), scale=1000),
# 'timestamp': unified_timestamp(info.get('published')),
'series': info.get('show_title'),
'season_number': int_or_none(info.get('season_number')),
'episode_number': int_or_none(info.get('episode_number')),
'formats': formats,
'subtitles': subtitles,
}

View File

@ -15,7 +15,7 @@ class RedditIE(InfoExtractor):
_TEST = {
# from https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/
'url': 'https://v.redd.it/zv89llsvexdz',
'md5': '0a070c53eba7ec4534d95a5a1259e253',
'md5': '655d06ace653ea3b87bccfb1b27ec99d',
'info_dict': {
'id': 'zv89llsvexdz',
'ext': 'mp4',

View File

@ -16,12 +16,12 @@ class RedTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?redtube\.com/|embed\.redtube\.com/\?.*?\bid=)(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.redtube.com/66418',
'md5': 'fc08071233725f26b8f014dba9590005',
'md5': '7b8c22b5e7098a3e1c09709df1126d2d',
'info_dict': {
'id': '66418',
'ext': 'mp4',
'title': 'Sucked on a toilet',
'upload_date': '20110811',
'upload_date': '20120831',
'duration': 596,
'view_count': int,
'age_limit': 18,
@ -46,10 +46,9 @@ class RedTubeIE(InfoExtractor):
raise ExtractorError('Video %s has been removed' % video_id, expected=True)
title = self._html_search_regex(
(r'<h(\d)[^>]+class="(?:video_title_text|videoTitle)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
webpage, 'title', group='title',
default=None) or self._og_search_title(webpage)
(r'<h1 class="videoTitle[^"]*">(?P<title>.+?)</h1>',
r'videoTitle\s*:\s*(["\'])(?P<title>)\1'),
webpage, 'title', group='title')
formats = []
sources = self._parse_json(
@ -88,14 +87,12 @@ class RedTubeIE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage)
upload_date = unified_strdate(self._search_regex(
r'<span[^>]+>ADDED ([^<]+)<',
r'<span[^>]+class="added-time"[^>]*>ADDED ([^<]+)<',
webpage, 'upload date', fatal=False))
duration = int_or_none(self._og_search_property(
'video:duration', webpage, default=None) or self._search_regex(
duration = int_or_none(self._search_regex(
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', default=None))
view_count = str_to_int(self._search_regex(
(r'<div[^>]*>Views</div>\s*<div[^>]*>\s*([\d,.]+)',
r'<span[^>]*>VIEWS</span>\s*</td>\s*<td>\s*([\d,.]+)'),
r'<span[^>]*>VIEWS</span></td>\s*<td>([\d,.]+)',
webpage, 'view count', fatal=False))
# No self-labeling, but they describe themselves as

Some files were not shown because too many files have changed in this diff Show More