Compare commits

...

130 Commits

Author SHA1 Message Date
35ced3985a release 2016.02.13 2016-02-13 08:25:05 +01:00
3e18700d45 [nbc] Correct test 2016-02-13 07:45:32 +06:00
f9f49d87c2 [youtube] Add test for #8536 2016-02-13 05:18:58 +06:00
6863631c26 [youtube] Improve multifeed videos extraction (Closes #8536) 2016-02-13 05:01:20 +06:00
9d939cec48 [extractor/generic] Add direct mpd url test 2016-02-13 00:36:47 +06:00
4c77d3f52a [YoutubeDL] Allow bestvideo+bestaudio for any extractor 2016-02-13 00:23:14 +06:00
7be747b921 [extractor/generic] Pass mpd base url to _parse_mpd_formats 2016-02-13 00:15:59 +06:00
bb20526b64 [extractor/common] Improve base url construction 2016-02-13 00:13:56 +06:00
bcbb1b08b2 Revert "[aenetworks] extract http formats"
This reverts commit 3d98f97c64.
2016-02-12 17:56:06 +01:00
3d98f97c64 [aenetworks] extract http formats 2016-02-12 17:39:32 +01:00
c349456ef6 [extractor/common] strip http urls in smil manifest 2016-02-12 17:38:48 +01:00
5a4905924d [extractor/generic] Improve dailymotion embed detection (Closes #8521, closes #8325) 2016-02-12 22:03:10 +06:00
b826035dd5 [vimeo] Fix authentication (Closes #8520) 2016-02-12 03:16:26 +06:00
a7cab4d039 [theplatform] remove unused import and change smil url for ThePlatformFeedIE 2016-02-11 18:50:14 +01:00
fc3810f6d1 Merge branch 'master' of github.com:rg3/youtube-dl 2016-02-11 18:13:56 +01:00
3dc71d82ce [theplatform] fix pid extraction in the platform feed 2016-02-11 18:13:03 +01:00
9c7b38981c [utils] Bump Firefox version in User-Agent
Old version number causes Youtube not to serve some formats in ytplayer.config
2016-02-11 23:12:30 +06:00
8b85ac3fd9 [cbc] Add new extractor(closes #3803)(closes #4731)(closes #5309) 2016-02-11 18:10:32 +01:00
81e1c4e2fc [extractor/common] remove duplicate rtmp formats in smil manifest 2016-02-11 17:58:48 +01:00
388ae76b52 [YoutubeDL] Fix format resolution when height is missing 2016-02-11 22:46:13 +06:00
b67d63149d [youtube] Fix typos 2016-02-11 22:33:08 +06:00
28280e8ded [plays] PEP 8 2016-02-11 22:02:57 +06:00
6b3fbd3425 [pbs] Fix multi part videos extraction 2016-02-11 22:02:37 +06:00
a7ab46375b [pbs] Update some tests 2016-02-11 21:43:01 +06:00
b14d5e26f6 [pbs] Improve description extraction 2016-02-11 21:28:09 +06:00
9a61dfba0c [pbs] Revert prefer portalplayer 2016-02-11 21:22:57 +06:00
154c209e2d [extractor/common] improve dash format ids 2016-02-11 10:33:26 +01:00
d1ea5e171f [plays] Add new extractor(#8458) 2016-02-11 10:30:31 +01:00
a1188d0ed0 [crackle] add prefix to format ids 2016-02-10 22:39:33 +01:00
47d205a646 [crackle] improve format sorting 2016-02-10 22:23:56 +01:00
80f772c28a [crackle] Add new extractor 2016-02-10 22:16:21 +01:00
f817d9bec1 release 2016.02.10 2016-02-10 16:17:38 +01:00
e2effb08a4 [YoutubeDL] Sanitize format_id (Closes #8494) 2016-02-10 21:16:58 +06:00
7fcea295c5 [pbs] Switch to portal player by default (Closes #8491) 2016-02-10 20:46:38 +06:00
cc799437ea [youku] Report private videos (Closes #8498) 2016-02-10 20:05:17 +06:00
89d23f37f2 [hotstar] Relax _VALID_URL (Closes #8487) 2016-02-10 04:43:00 +06:00
b92071ef00 release 2016.02.09.1 2016-02-09 20:12:36 +01:00
47246ae26c [viddler] Update tests 2016-02-10 01:12:47 +06:00
9c15869c28 [viddler] Add support for secret videos (Closes #8481) 2016-02-10 01:09:07 +06:00
51e9094f4a [extractor/common] extract youtube dash formats filesize(fixes #8480) 2016-02-09 20:05:39 +01:00
5e3a6fec33 [fox] update test 2016-02-09 17:30:42 +01:00
d413095f7e [extractor/common] remove duplicated formats and subtiles in smil manifests 2016-02-09 17:15:41 +01:00
1bedf4de06 [fox] extract http formats 2016-02-09 17:12:34 +01:00
3967a761f4 [mailru] Fix tests 2016-02-09 21:31:51 +06:00
b081350bd9 [mailru] Improve and modernize 2016-02-09 21:30:48 +06:00
16f1430ba6 [mailru] Prefer metaUrl API (Closes #8474) 2016-02-09 21:14:02 +06:00
085ad71157 release 2016.02.09 2016-02-09 12:57:51 +01:00
35972ba172 [vk] Improve rutube embeds detection (Closes #8461) 2016-02-08 21:30:23 +06:00
3834d3e35c [youtube] Clarify itag 36 height and abr (Closes #8457) 2016-02-08 01:30:57 +06:00
8d0a2a2a4e [README.md] Fix typo 2016-02-07 21:23:29 +06:00
11c0339bec [README.md] Clarify quotes in output template 2016-02-07 21:22:33 +06:00
915dd77783 [README.md] Add output template example for streaming to stdout 2016-02-07 21:21:14 +06:00
b6bfa6fb79 [konserthusetplay] Reorder code pieces 2016-02-07 21:18:32 +06:00
f070197bd7 [konserthusetplay] Improve _VALID_URL 2016-02-07 21:16:31 +06:00
5a7699bb2e [konserthusetplay] Improve and extract all formats (Closes #8381) 2016-02-07 21:11:59 +06:00
8628d26f38 [KonserthusetPlay] Add new extractor (partial support) 2016-02-07 19:47:29 +06:00
8411229bd5 [utils] Allow dot in strip_jsonp 2016-02-07 19:47:09 +06:00
72b9ebc65d [README.md] Document extractor sequences in output template 2016-02-07 19:08:54 +06:00
3b799ca14c [README.md] Clarify percent literal and output to stdout 2016-02-07 19:06:42 +06:00
0474512e30 [README.md] Document even more sequences in output template 2016-02-07 19:00:59 +06:00
f0905c6ec3 [README.md] Document more sequences in output template 2016-02-07 18:45:44 +06:00
86296ad2cd [utils] Add ability to control skipping false values in dict_get 2016-02-07 08:13:04 +06:00
52f5889f77 [vlive] Improve and extract more metadata (Closes #8446) 2016-02-07 06:17:40 +06:00
81e0b4f2d1 Credit @EraYaN for vlive update (#8446) 2016-02-07 06:14:26 +06:00
cbecc9b903 [utils] Add dict_get convenience method 2016-02-07 06:12:53 +06:00
b8b465af3e [vlive] Updated to new V App/VLive api.
More robust with getting keys and ids from website.
2016-02-07 05:27:17 +06:00
59b35c6745 [IPrima] Remove test video_id 2016-02-06 21:42:24 +01:00
7032833011 [iprima] Follow pep8 2016-02-06 21:37:28 +01:00
f406c78785 [IPrima] Fix extractor (fixes #7617) 2016-02-06 21:23:41 +01:00
f326b5837a Merge pull request #8445 from bpfoley/rte-newurl
[rte:radio] Add support for RTMP downloads, alternate URL style
2016-02-07 00:28:39 +05:00
5dd4b3468f [rte:radio] Add support for RTMP downloads, alternate URL style
This is useful as
a) RTMP downloads are a good deal faster to download
b) Older items are available only as RTMP streams
2016-02-06 18:42:57 +00:00
d4f8e83404 [FFmpegSubtitlesConvertorPP] remove unused variable 2016-02-06 19:04:53 +01:00
7b8b007cd9 [FFmpegSubtitlesConvertorPP] remove intermediate srt files 2016-02-06 19:04:18 +01:00
3547d26587 [FFmpegSubtitlesConvertorPP] correctly update the extension (fixes #8444) 2016-02-06 18:58:18 +01:00
7e62c2eb6d [FFmpegSubtitlesConvertorPP] fix not working when srt is used as the intermediate format between ttml/dfxp and other format
It was trying to use the ttml/dfxp file with ffmpeg, which doesn't have support for them.
I broke it in e04398e397.
2016-02-06 18:51:05 +01:00
56401e1e5f [downloader/hls] Do not send 'q' to ffmpeg on Windows (Closes #8300) 2016-02-06 23:24:22 +06:00
860db2d508 [README.md] Fix typo 2016-02-06 22:40:20 +06:00
4b8874975c [README.md] Remove non-relevant info 2016-02-06 22:39:50 +06:00
bd6b6f6622 [README.md] Fix typo 2016-02-06 22:36:30 +06:00
4340727e6c [videomore] Fix typo 2016-02-06 22:36:30 +06:00
3ceccade87 [README.md] Improve output template documentation and add more examples 2016-02-06 22:33:49 +06:00
28ad7df65d [generic] detect MPD manfiest only from the content 2016-02-06 14:51:45 +01:00
79a3508579 [extractor/generic] Detect DASH manifests in found URLs and extract mpd formats 2016-02-06 19:42:03 +06:00
1b840245bd [extractor/generic] Detect DASH manifests and extract mpd formats 2016-02-06 19:35:32 +06:00
6a3828fddd [common] use float conversion instead of using division from __future__ 2016-02-06 14:27:04 +01:00
91cb6b5065 rename _parse_mpd to _parse_mpd_formats and add default value for mpd namespace 2016-02-06 14:03:48 +01:00
0826a0b555 [common] sort dash formats 2016-02-06 06:52:48 +01:00
bcbbb98bfe [generic] extract dash formats detected using content type 2016-02-06 06:47:38 +01:00
66159b38aa Merge pull request #8408 from remitamine/dash
Add generic support for mpd manifests(dash formats)
2016-02-06 06:26:02 +01:00
23d17e4beb [youtube] Fix automatic captions 2016-02-06 06:44:38 +06:00
d97b0e3241 [vidme] Clarify IE_NAMEs 2016-02-05 23:58:26 +06:00
eb2533ec4c [vidme:user:likes] Add extractor 2016-02-05 23:57:35 +06:00
b7b365067f [vidme:user] Add extractor (Closes #8416) 2016-02-05 23:32:22 +06:00
86e284e028 Merge branch 'master' of github.com:rg3/youtube-dl 2016-02-05 17:20:00 +01:00
d9e543b680 [spankbang] Add test with single format (#8398) 2016-02-05 22:16:56 +06:00
c773c232d8 [spankbang] Check formats (#8398) 2016-02-05 22:16:40 +06:00
58ae24336a [spankbang] Extend format id regex (Closes #8398) 2016-02-05 22:16:12 +06:00
7d3a035ee0 [ffmpeg] check for m3u8 protocol in FFmpegMetadataPP 2016-02-05 17:12:49 +01:00
e06e75c7e7 release 2016.02.05.1 2016-02-05 15:17:31 +01:00
593e0f43b4 [ffmpeg] fix condition(fixes #8440) 2016-02-05 15:12:06 +01:00
008ab0f814 release 2016.02.05 2016-02-05 11:04:00 +01:00
3f7e8750d4 [arte.tv:+7] Fix extraction (fixes #8427) 2016-02-04 20:16:47 +01:00
f1ed3acae5 release 2016.02.04 2016-02-04 13:39:26 +01:00
920d21b9d3 [test_subtitles] update youtube subtitles tests 2016-02-04 08:50:55 +01:00
2fb35d1c28 [youtube] fix subtitle order 2016-02-04 08:39:01 +01:00
09be85b8dd [youtube] fix subtitle extraction(fixes #8415) 2016-02-04 08:28:37 +01:00
eadc3ccd50 [generic] extract m3u8 formats when mpegurl content type detected 2016-02-04 01:25:36 +01:00
255732f0d3 [common] fix segment duration calculation 2016-02-03 23:57:08 +01:00
53c269c6fd [common] fix media_template string formating 2016-02-03 23:54:34 +01:00
675d001633 [common] skip drm protected dash formats 2016-02-03 18:44:43 +01:00
58be922079 [kuwo] Check for georestriction 2016-02-04 01:26:25 +08:00
c84d3a557d [README.md] Clarify unavailable sequences in output format 2016-02-03 19:18:25 +05:00
d577c79632 [common] ignore ISO 639-2 generic codes 2016-02-03 13:24:07 +01:00
6ad2b01e14 [srgssr] use flv as ext for rtmp formats 2016-02-02 23:09:50 +01:00
fd3a1f3d60 [cbsnews] add support for live videos(fixes #7010) 2016-02-02 23:02:18 +01:00
87de7069b9 [utils] dfxp2srt: make TTMLPElementParser inherit from object
For consistency between python 2 and 3.
2016-02-02 22:30:13 +01:00
6fba62c87a [ffmpeg] fix adding metadata when using --hls-prefer-native(#8350) 2016-02-02 22:14:23 +01:00
f14be22816 [common] remove duplicate reference to namespace 2016-02-02 22:02:08 +01:00
1df4141196 [test_YoutubeDL] Fix test_youtube_format_selection
Broken since a6c2c24479. Thanks to
@jaimeMF and @anisse for pointing that out
2016-02-03 03:42:37 +08:00
fae45ede08 Merge pull request #8354 from remitamine/m3u8_metadata
[ffmpeg] fix adding metadata when using m3u8_native(fixes #8350)
2016-02-02 19:13:58 +01:00
4e0cff2a50 Merge pull request #8348 from remitamine/dfxp2srt-text
[utils] fix dfxp2srt text extraction(fixes #8055)
2016-02-02 18:36:26 +01:00
9c74423510 [common] fix media template regex 2016-02-02 18:30:31 +01:00
5976e7ab57 [vevo] add support for dash formats 2016-02-02 18:13:01 +01:00
a1a22572fb [downloader/dash] make initialization_url optional 2016-02-02 18:12:32 +01:00
c11875b328 [facebook] use _parse_mpd 2016-02-02 18:11:16 +01:00
8ff648e4f9 [youtube] use _extract_mpd_formats 2016-02-02 18:10:23 +01:00
1bac34556f [common] add a generic support for mpd manifests 2016-02-02 18:09:25 +01:00
0436157b95 [vk:uservideos] Improve _VALID_URL (Closes #8389) 2016-02-02 00:52:37 +06:00
cf57433bbd [ffmpeg] fix adding metadata when using m3u8_native(fixes #8350) 2016-01-28 18:57:32 +01:00
2b14cb566f [utils] fix dfxp2srt text extraction(fixes #8055) 2016-01-28 12:38:34 +01:00
43 changed files with 1228 additions and 369 deletions

View File

@ -156,3 +156,4 @@ Tom Gijselinck
Founder Fang
Andrew Alexeyew
Saso Bezlaj
Erwin de Haan

View File

@ -442,28 +442,97 @@ On Windows you may also need to setup the `%HOME%` environment variable manually
The `-o` option allows users to indicate a template for the output file names. The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "http://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences have the format `%(NAME)s`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a lowercase S. Allowed names are:
- `id`: The sequence will be replaced by the video identifier.
- `url`: The sequence will be replaced by the video URL.
- `uploader`: The sequence will be replaced by the nickname of the person who uploaded the video.
- `upload_date`: The sequence will be replaced by the upload date in YYYYMMDD format.
- `title`: The sequence will be replaced by the video title.
- `ext`: The sequence will be replaced by the appropriate extension (like flv or mp4).
- `epoch`: The sequence will be replaced by the Unix epoch when creating the file.
- `autonumber`: The sequence will be replaced by a five-digit number that will be increased with each download, starting at zero.
- `playlist`: The sequence will be replaced by the name or the id of the playlist that contains the video.
- `playlist_index`: The sequence will be replaced by the index of the video in the playlist padded with leading zeros according to the total length of the playlist.
- `format_id`: The sequence will be replaced by the format code specified by `--format`.
- `duration`: The sequence will be replaced by the length of the video in seconds.
- `id`: Video identifier
- `title`: Video title
- `url`: Video URL
- `ext`: Video filename extension
- `alt_title`: A secondary title of the video
- `display_id`: An alternative identifier for the video
- `uploader`: Full name of the video uploader
- `creator`: The main artist who created the video
- `release_date`: The date (YYYYMMDD) when the video was released
- `timestamp`: UNIX timestamp of the moment the video became available
- `upload_date`: Video upload date (YYYYMMDD)
- `uploader_id`: Nickname or id of the video uploader
- `location`: Physical location where the video was filmed
- `duration`: Length of the video in seconds
- `view_count`: How many users have watched the video on the platform
- `like_count`: Number of positive ratings of the video
- `dislike_count`: Number of negative ratings of the video
- `repost_count`: Number of reposts of the video
- `average_rating`: Average rating give by users, the scale used depends on the webpage
- `comment_count`: Number of comments on the video
- `age_limit`: Age restriction for the video (years)
- `format`: A human-readable description of the format
- `format_id`: Format code specified by `--format`
- `format_note`: Additional info about the format
- `width`: Width of the video
- `height`: Height of the video
- `resolution`: Textual description of width and height
- `tbr`: Average bitrate of audio and video in KBit/s
- `abr`: Average audio bitrate in KBit/s
- `acodec`: Name of the audio codec in use
- `asr`: Audio sampling rate in Hertz
- `vbr`: Average video bitrate in KBit/s
- `fps`: Frame rate
- `vcodec`: Name of the video codec in use
- `container`: Name of the container format
- `filesize`: The number of bytes, if known in advance
- `filesize_approx`: An estimate for the number of bytes
- `protocol`: The protocol that will be used for the actual download
- `extractor`: Name of the extractor
- `extractor_key`: Key name of the extractor
- `epoch`: Unix epoch when creating the file
- `autonumber`: Five-digit number that will be increased with each download, starting at zero
- `playlist`: Name or id of the playlist that contains the video
- `playlist_index`: Index of the video in the playlist padded with leading zeros according to the total length of the playlist
Available for the video that belongs to some logical chapter or section:
- `chapter`: Name or title of the chapter the video belongs to
- `chapter_number`: Number of the chapter the video belongs to
- `chapter_id`: Id of the chapter the video belongs to
Available for the video that is an episode of some series or programme:
- `series`: Title of the series or programme the video episode belongs to
- `season`: Title of the season the video episode belongs to
- `season_number`: Number of the season the video episode belongs to
- `season_id`: Id of the season the video episode belongs to
- `episode`: Title of the video episode
- `episode_number`: Number of the video episode within a season
- `episode_id`: Id of the video episode
Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`.
For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
Output template can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` that will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
To specify percent literal in output template use `%%`. To output to stdout use `-o -`.
The current default template is `%(title)s-%(id)s.%(ext)s`.
In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
Examples (note on Windows you may need to use double quotes instead of single):
```bash
$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc
$ youtube-dl --get-filename -o '%(title)s.%(ext)s' BaW_jenozKc
youtube-dl test video ''_ä↭𝕐.mp4 # All kinds of weird characters
$ youtube-dl --get-filename -o "%(title)s.%(ext)s" BaW_jenozKc --restrict-filenames
$ youtube-dl --get-filename -o '%(title)s.%(ext)s' BaW_jenozKc --restrict-filenames
youtube-dl_test_video_.mp4 # A simple file name
# Download YouTube playlist videos in separate directory indexed by video order in a playlist
$ youtube-dl -o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re
# Download Udemy course keeping each chapter in separate directory under MyVideos directory in your home
$ youtube-dl -u user -p password -o '~/MyVideos/%(playlist)s/%(chapter_number)s - %(chapter)s/%(title)s.%(ext)s' https://www.udemy.com/java-tutorial/
# Download entire series season keeping each series and each season in separate directory under C:/MyVideos
$ youtube-dl -o "C:/MyVideos/%(series)s/%(season_number)s - %(season)s/%(episode_number)s - %(episode)s.%(ext)s" http://videomore.ru/kino_v_detalayah/5_sezon/367617
# Stream the video being downloaded to stdout
$ youtube-dl -o - BaW_jenozKc
```
# FORMAT SELECTION

View File

@ -89,8 +89,11 @@
- **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas**
- **CBC**
- **CBCPlayer**
- **CBS**
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **CBSSports**
- **CeskaTelevize**
- **channel9**: Channel 9
@ -119,6 +122,7 @@
- **ComedyCentralShows**: The Daily Show / The Colbert Report
- **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
- **Cracked**
- **Crackle**
- **Criterion**
- **CrooksAndLiars**
- **Crunchyroll**
@ -261,7 +265,7 @@
- **Instagram**
- **instagram:user**: Instagram user profile
- **InternetVideoArchive**
- **IPrima** (Currently broken)
- **IPrima**
- **iqiyi**: 爱奇艺
- **Ir90Tv**
- **ivi**: ivi.ru
@ -282,6 +286,7 @@
- **KeezMovies**
- **KhanAcademy**
- **KickStarter**
- **KonserthusetPlay**
- **kontrtube**: KontrTube.ru - Труба зовёт
- **KrasView**: Красвью
- **Ku6**
@ -443,6 +448,7 @@
- **PlanetaPlay**
- **play.fm**
- **played.to**
- **PlaysTV**
- **Playtvak**: Playtvak.cz, iDNES.cz and Lidovky.cz
- **Playvid**
- **Playwire**
@ -680,7 +686,9 @@
- **VideoPremium**
- **VideoTt**: video.tt - Your True Tube (Currently broken)
- **videoweed**: VideoWeed
- **Vidme**
- **vidme**
- **vidme:user**
- **vidme:user:likes**
- **Vidzi**
- **vier**
- **vier:videos**

View File

@ -248,6 +248,17 @@ class TestFormatSelection(unittest.TestCase):
def format_info(f_id):
info = YoutubeIE._formats[f_id].copy()
# XXX: In real cases InfoExtractor._parse_mpd_formats() fills up 'acodec'
# and 'vcodec', while in tests such information is incomplete since
# commit a6c2c24479e5f4827ceb06f64d855329c0a6f593
# test_YoutubeDL.test_youtube_format_selection is broken without
# this fix
if 'acodec' in info and 'vcodec' not in info:
info['vcodec'] = 'none'
elif 'vcodec' in info and 'acodec' not in info:
info['acodec'] = 'none'
info['format_id'] = f_id
info['url'] = 'url:' + f_id
return info

View File

@ -65,16 +65,16 @@ class TestYoutubeSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
self.assertEqual(md5(subtitles['en']), '4cd9278a35ba2305f47354ee13472260')
self.assertEqual(md5(subtitles['it']), '164a51f16f260476a05b50fe4c2f161d')
for lang in ['it', 'fr', 'de']:
self.assertEqual(md5(subtitles['en']), '3cb210999d3e021bd6c7f0ea751eab06')
self.assertEqual(md5(subtitles['it']), '6d752b98c31f1cf8d597050c7a2cb4b5')
for lang in ['fr', 'de']:
self.assertTrue(subtitles.get(lang) is not None, 'Subtitles for \'%s\' not extracted' % lang)
def test_youtube_subtitles_sbv_format(self):
def test_youtube_subtitles_ttml_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'sbv'
self.DL.params['subtitlesformat'] = 'ttml'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '13aeaa0c245a8bed9a451cb643e3ad8b')
self.assertEqual(md5(subtitles['en']), 'e306f8c42842f723447d9f63ad65df54')
def test_youtube_subtitles_vtt_format(self):
self.DL.params['writesubtitles'] = True

View File

@ -22,6 +22,7 @@ from youtube_dl.utils import (
DateRange,
detect_exe_version,
determine_ext,
dict_get,
encode_compat_str,
encodeFilename,
escape_rfc3986,
@ -450,6 +451,28 @@ class TestUtil(unittest.TestCase):
data = urlencode_postdata({'username': 'foo@bar.com', 'password': '1234'})
self.assertTrue(isinstance(data, bytes))
def test_dict_get(self):
FALSE_VALUES = {
'none': None,
'false': False,
'zero': 0,
'empty_string': '',
'empty_list': [],
}
d = FALSE_VALUES.copy()
d['a'] = 42
self.assertEqual(dict_get(d, 'a'), 42)
self.assertEqual(dict_get(d, 'b'), None)
self.assertEqual(dict_get(d, 'b', 42), 42)
self.assertEqual(dict_get(d, ('a', )), 42)
self.assertEqual(dict_get(d, ('b', 'a', )), 42)
self.assertEqual(dict_get(d, ('b', 'c', 'a', 'd', )), 42)
self.assertEqual(dict_get(d, ('b', 'c', )), None)
self.assertEqual(dict_get(d, ('b', 'c', ), 42), 42)
for key, false_value in FALSE_VALUES.items():
self.assertEqual(dict_get(d, ('b', 'c', key, )), None)
self.assertEqual(dict_get(d, ('b', 'c', key, ), skip_false_values=False), false_value)
def test_encode_compat_str(self):
self.assertEqual(encode_compat_str(b'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82', 'utf-8'), 'тест')
self.assertEqual(encode_compat_str('тест', 'utf-8'), 'тест')
@ -471,6 +494,10 @@ class TestUtil(unittest.TestCase):
d = json.loads(stripped)
self.assertEqual(d, {'STATUS': 'OK'})
stripped = strip_jsonp('ps.embedHandler({"status": "success"});')
d = json.loads(stripped)
self.assertEqual(d, {'status': 'success'})
def test_uppercase_escape(self):
self.assertEqual(uppercase_escape(''), '')
self.assertEqual(uppercase_escape('\\U0001d550'), '𝕐')

View File

@ -1288,6 +1288,9 @@ class YoutubeDL(object):
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
else:
# Sanitize format_id from characters used in format selector expression
format['format_id'] = re.sub('[\s,/+\[\]()]', '_', format['format_id'])
format_id = format['format_id']
if format_id not in formats_dict:
formats_dict[format_id] = []
@ -1338,7 +1341,6 @@ class YoutubeDL(object):
if req_format is None:
req_format_list = []
if (self.params.get('outtmpl', DEFAULT_OUTTMPL) != '-' and
info_dict['extractor'] in ['youtube', 'ted'] and
not info_dict.get('is_live')):
merger = FFmpegMergerPP(self)
if merger.available and merger.can_merge():
@ -1795,7 +1797,7 @@ class YoutubeDL(object):
else:
res = '%sp' % format['height']
elif format.get('width') is not None:
res = '?x%d' % format['width']
res = '%dx?' % format['width']
else:
res = default
return res

View File

@ -40,9 +40,10 @@ class DashSegmentsFD(FileDownloader):
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
with open(tmpfilename, 'wb') as outf:
append_url_to_file(
outf, combine_url(base_url, info_dict['initialization_url']),
'initialization segment')
if info_dict.get('initialization_url'):
append_url_to_file(
outf, combine_url(base_url, info_dict['initialization_url']),
'initialization segment')
for i, segment_url in enumerate(segment_urls):
segment_len = append_url_to_file(
outf, combine_url(base_url, segment_url),

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
import os
import re
import subprocess
import sys
from .common import FileDownloader
from .fragment import FragmentFD
@ -57,8 +58,10 @@ class HlsFD(FileDownloader):
# subprocces.run would send the SIGKILL signal to ffmpeg and the
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams)
proc.communicate(b'q')
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/rg3/youtube-dl/issues/8300).
if sys.platform != 'win32':
proc.communicate(b'q')
raise
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))

View File

@ -89,8 +89,15 @@ from .camdemy import (
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .canvas import CanvasIE
from .cbc import (
CBCIE,
CBCPlayerIE,
)
from .cbs import CBSIE
from .cbsnews import CBSNewsIE
from .cbsnews import (
CBSNewsIE,
CBSNewsLiveVideoIE,
)
from .cbssports import CBSSportsIE
from .ccc import CCCIE
from .ceskatelevize import CeskaTelevizeIE
@ -123,6 +130,7 @@ from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .condenast import CondeNastIE
from .cracked import CrackedIE
from .crackle import CrackleIE
from .criterion import CriterionIE
from .crooksandliars import CrooksAndLiarsIE
from .crunchyroll import (
@ -325,6 +333,7 @@ from .keezmovies import KeezMoviesIE
from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
from .konserthusetplay import KonserthusetPlayIE
from .kontrtube import KontrTubeIE
from .krasview import KrasViewIE
from .ku6 import Ku6IE
@ -529,6 +538,7 @@ from .planetaplay import PlanetaPlayIE
from .pladform import PladformIE
from .played import PlayedIE
from .playfm import PlayFMIE
from .plays import PlaysTVIE
from .playtvak import PlaytvakIE
from .playvid import PlayvidIE
from .playwire import PlaywireIE
@ -819,7 +829,11 @@ from .videomore import (
)
from .videopremium import VideoPremiumIE
from .videott import VideoTtIE
from .vidme import VidmeIE
from .vidme import (
VidmeIE,
VidmeUserIE,
VidmeUserLikesIE,
)
from .vidzi import VidziIE
from .vier import VierIE, VierVideosIE
from .viewster import ViewsterIE

View File

@ -13,6 +13,7 @@ from ..utils import (
unified_strdate,
get_element_by_attribute,
int_or_none,
NO_DEFAULT,
qualities,
)
@ -93,9 +94,18 @@ class ArteTVPlus7IE(InfoExtractor):
json_url = self._html_search_regex(
patterns, webpage, 'json vp url', default=None)
if not json_url:
iframe_url = self._html_search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
webpage, 'iframe url', group='url')
def find_iframe_url(webpage, default=NO_DEFAULT):
return self._html_search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
webpage, 'iframe url', group='url', default=default)
iframe_url = find_iframe_url(webpage, None)
if not iframe_url:
embed_url = self._html_search_regex(
r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url')
player = self._download_json(
embed_url, video_id, 'Downloading player page')
iframe_url = find_iframe_url(player['html'])
json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
return self._extract_from_json_url(json_url, video_id, lang)

113
youtube_dl/extractor/cbc.py Normal file
View File

@ -0,0 +1,113 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import js_to_json
class CBCIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
# with mediaId
'url': 'http://www.cbc.ca/22minutes/videos/clips-season-23/don-cherry-play-offs',
'info_dict': {
'id': '2682904050',
'ext': 'flv',
'title': 'Don Cherry All-Stars',
'description': 'Don Cherry has a bee in his bonnet about AHL player John Scott because that guys got heart.',
'timestamp': 1454475540,
'upload_date': '20160203',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# with clipId
'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live',
'info_dict': {
'id': '2487345465',
'ext': 'flv',
'title': 'Robin Williams freestyles on 90 Minutes Live',
'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.',
'upload_date': '19700101',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
# multiple iframes
'url': 'http://www.cbc.ca/natureofthings/blog/birds-eye-view-from-vancouvers-burrard-street-bridge-how-we-got-the-shot',
'playlist': [{
'info_dict': {
'id': '2680832926',
'ext': 'flv',
'title': 'An Eagle\'s-Eye View Off Burrard Bridge',
'description': 'Hercules the eagle flies from Vancouver\'s Burrard Bridge down to a nearby park with a mini-camera strapped to his back.',
'upload_date': '19700101',
},
}, {
'info_dict': {
'id': '2658915080',
'ext': 'flv',
'title': 'Fly like an eagle!',
'description': 'Eagle equipped with a mini camera flies from the world\'s tallest tower',
'upload_date': '19700101',
},
}],
'params': {
# rtmp download
'skip_download': True,
},
}]
@classmethod
def suitable(cls, url):
return False if CBCPlayerIE.suitable(url) else super(CBCIE, cls).suitable(url)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
player_init = self._search_regex(
r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage, 'player init',
default=None)
if player_init:
player_info = self._parse_json(player_init, display_id, js_to_json)
media_id = player_info.get('mediaId')
if not media_id:
clip_id = player_info['clipId']
media_id = self._download_json(
'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
clip_id)['entries'][0]['id'].split('/')[-1]
return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
else:
entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]
return self.playlist_result(entries)
class CBCPlayerIE(InfoExtractor):
_VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)'
_TEST = {
'url': 'http://www.cbc.ca/player/play/2683190193',
'info_dict': {
'id': '2683190193',
'ext': 'flv',
'title': 'Gerry Runs a Sweat Shop',
'description': 'md5:b457e1c01e8ff408d9d801c1c2cd29b0',
'timestamp': 1455067800,
'upload_date': '20160210',
},
'params': {
# rtmp download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'http://feed.theplatform.com/f/ExhSPC/vms_5akSXx4Ng_Zn?byGuid=%s' % video_id,
'ThePlatformFeed', video_id)

View File

@ -1,15 +1,14 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from ..utils import parse_duration
class CBSNewsIE(ThePlatformIE):
IE_DESC = 'CBS News'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/(?:[^/]+/)+(?P<id>[\da-z_-]+)'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
_TESTS = [
{
@ -48,14 +47,13 @@ class CBSNewsIE(ThePlatformIE):
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_info = json.loads(self._html_search_regex(
video_info = self._parse_json(self._html_search_regex(
r'(?:<ul class="media-list items" id="media-related-items"><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
webpage, 'video JSON info'))
webpage, 'video JSON info'), video_id)
item = video_info['item'] if 'item' in video_info else video_info
title = item.get('articleTitle') or item.get('hed')
@ -88,3 +86,41 @@ class CBSNewsIE(ThePlatformIE):
'formats': formats,
'subtitles': subtitles,
}
class CBSNewsLiveVideoIE(InfoExtractor):
IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh',
'ext': 'flv',
'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_info = self._parse_json(self._html_search_regex(
r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story']
hdcore_sign = 'hdcore=3.3.1'
f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
if f4m_formats:
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
return {
'id': video_id,
'title': video_info['headline'],
'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
'duration': parse_duration(video_info.get('segmentDur')),
'formats': f4m_formats,
}

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
parse_duration,
@ -14,14 +15,13 @@ class ComCarCoffIE(InfoExtractor):
_TESTS = [{
'url': 'http://comediansincarsgettingcoffee.com/miranda-sings-happy-thanksgiving-miranda/',
'info_dict': {
'id': 'miranda-sings-happy-thanksgiving-miranda',
'id': '2494164',
'ext': 'mp4',
'upload_date': '20141127',
'timestamp': 1417107600,
'duration': 1232,
'title': 'Happy Thanksgiving Miranda',
'description': 'Jerry Seinfeld and his special guest Miranda Sings cruise around town in search of coffee, complaining and apologizing along the way.',
'thumbnail': 'http://ccc.crackle.com/images/s5e4_thumb.jpg',
},
'params': {
'skip_download': 'requires ffmpeg',
@ -39,15 +39,14 @@ class ComCarCoffIE(InfoExtractor):
r'window\.app\s*=\s*({.+?});\n', webpage, 'full data json'),
display_id)['videoData']
video_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(video_id) or full_data['singleshots'][video_id]
display_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id]
video_id = compat_str(video_data['mediaId'])
thumbnails = [{
'url': video_data['images']['thumb'],
}, {
'url': video_data['images']['poster'],
}]
formats = self._extract_m3u8_formats(
video_data['mediaUrl'], video_id, ext='mp4')
timestamp = int_or_none(video_data.get('pubDateTime')) or parse_iso8601(
video_data.get('pubDate'))
@ -55,6 +54,8 @@ class ComCarCoffIE(InfoExtractor):
video_data.get('duration'))
return {
'_type': 'url_transparent',
'url': 'crackle:%s' % video_id,
'id': video_id,
'display_id': display_id,
'title': video_data['title'],
@ -62,6 +63,7 @@ class ComCarCoffIE(InfoExtractor):
'timestamp': timestamp,
'duration': duration,
'thumbnails': thumbnails,
'formats': formats,
'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episode')),
'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),
}

View File

@ -10,6 +10,7 @@ import re
import socket
import sys
import time
import math
from ..compat import (
compat_cookiejar,
@ -44,6 +45,7 @@ from ..utils import (
xpath_text,
xpath_with_ns,
determine_protocol,
parse_duration,
)
@ -1184,11 +1186,13 @@ class InfoExtractor(object):
http_count = 0
m3u8_count = 0
srcs = []
videos = smil.findall(self._xpath_ns('.//video', namespace))
for video in videos:
src = video.get('src')
if not src:
if not src or src in srcs:
continue
srcs.append(src)
bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
filesize = int_or_none(video.get('size') or video.get('fileSize'))
@ -1220,6 +1224,7 @@ class InfoExtractor(object):
continue
src_url = src if src.startswith('http') else compat_urlparse.urljoin(base, src)
src_url = src_url.strip()
if proto == 'm3u8' or src_ext == 'm3u8':
m3u8_formats = self._extract_m3u8_formats(
@ -1265,11 +1270,13 @@ class InfoExtractor(object):
return formats
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
urls = []
subtitles = {}
for num, textstream in enumerate(smil.findall(self._xpath_ns('.//textstream', namespace))):
src = textstream.get('src')
if not src:
if not src or src in urls:
continue
urls.append(src)
ext = textstream.get('ext') or determine_ext(src)
if not ext:
type_ = textstream.get('type')
@ -1330,81 +1337,161 @@ class InfoExtractor(object):
})
return entries
def _download_dash_manifest(self, dash_manifest_url, video_id, fatal=True):
return self._download_xml(
dash_manifest_url, video_id,
note='Downloading DASH manifest',
errnote='Could not download DASH manifest',
def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}):
res = self._download_webpage_handle(
mpd_url, video_id,
note=note or 'Downloading MPD manifest',
errnote=errnote or 'Failed to download MPD manifest',
fatal=fatal)
if res is False:
return []
mpd, urlh = res
mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group()
def _extract_dash_manifest_formats(self, dash_manifest_url, video_id, fatal=True, namespace=None, formats_dict={}):
dash_doc = self._download_dash_manifest(dash_manifest_url, video_id, fatal)
if dash_doc is False:
return self._parse_mpd_formats(
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}):
if mpd_doc.get('type') == 'dynamic':
return []
return self._parse_dash_manifest(
dash_doc, namespace=namespace, formats_dict=formats_dict)
namespace = self._search_regex(r'(?i)^{([^}]+)?}MPD$', mpd_doc.tag, 'namespace', default=None)
def _parse_dash_manifest(self, dash_doc, namespace=None, formats_dict={}):
def _add_ns(path):
return self._xpath_ns(path, namespace)
formats = []
for a in dash_doc.findall('.//' + _add_ns('AdaptationSet')):
mime_type = a.attrib.get('mimeType')
for r in a.findall(_add_ns('Representation')):
mime_type = r.attrib.get('mimeType') or mime_type
url_el = r.find(_add_ns('BaseURL'))
if mime_type == 'text/vtt':
# TODO implement WebVTT downloading
pass
elif mime_type.startswith('audio/') or mime_type.startswith('video/'):
segment_list = r.find(_add_ns('SegmentList'))
format_id = r.attrib['id']
video_url = url_el.text if url_el is not None else None
filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
f = {
'format_id': format_id,
'url': video_url,
'width': int_or_none(r.attrib.get('width')),
'height': int_or_none(r.attrib.get('height')),
'tbr': int_or_none(r.attrib.get('bandwidth'), 1000),
'asr': int_or_none(r.attrib.get('audioSamplingRate')),
'filesize': filesize,
'fps': int_or_none(r.attrib.get('frameRate')),
}
if segment_list is not None:
initialization_url = segment_list.find(_add_ns('Initialization')).attrib['sourceURL']
f.update({
'initialization_url': initialization_url,
'segment_urls': [segment.attrib.get('media') for segment in segment_list.findall(_add_ns('SegmentURL'))],
'protocol': 'http_dash_segments',
})
if not f.get('url'):
f['url'] = initialization_url
try:
existing_format = next(
fo for fo in formats
if fo['format_id'] == format_id)
except StopIteration:
full_info = formats_dict.get(format_id, {}).copy()
full_info.update(f)
codecs = r.attrib.get('codecs')
if codecs:
if mime_type.startswith('video/'):
vcodec, acodec = codecs, 'none'
else: # mime_type.startswith('audio/')
vcodec, acodec = 'none', codecs
def is_drm_protected(element):
return element.find(_add_ns('ContentProtection')) is not None
full_info.update({
'vcodec': vcodec,
'acodec': acodec,
})
formats.append(full_info)
def extract_multisegment_info(element, ms_parent_info):
ms_info = ms_parent_info.copy()
segment_list = element.find(_add_ns('SegmentList'))
if segment_list is not None:
segment_urls_e = segment_list.findall(_add_ns('SegmentURL'))
if segment_urls_e:
ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e]
initialization = segment_list.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
else:
segment_template = element.find(_add_ns('SegmentTemplate'))
if segment_template is not None:
start_number = segment_template.get('startNumber')
if start_number:
ms_info['start_number'] = int(start_number)
segment_timeline = segment_template.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
for s in s_e:
ms_info['total_number'] += 1 + int(s.get('r', '0'))
else:
existing_format.update(f)
else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
timescale = segment_template.get('timescale')
if timescale:
ms_info['timescale'] = int(timescale)
segment_duration = segment_template.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
media_template = segment_template.get('media')
if media_template:
ms_info['media_template'] = media_template
initialization = segment_template.get('initialization')
if initialization:
ms_info['initialization_url'] = initialization
else:
initialization = segment_template.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
return ms_info
mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
formats = []
for period in mpd_doc.findall(_add_ns('Period')):
period_duration = parse_duration(period.get('duration')) or mpd_duration
period_ms_info = extract_multisegment_info(period, {
'start_number': 1,
'timescale': 1,
})
for adaptation_set in period.findall(_add_ns('AdaptationSet')):
if is_drm_protected(adaptation_set):
continue
adaption_set_ms_info = extract_multisegment_info(adaptation_set, period_ms_info)
for representation in adaptation_set.findall(_add_ns('Representation')):
if is_drm_protected(representation):
continue
representation_attrib = adaptation_set.attrib.copy()
representation_attrib.update(representation.attrib)
mime_type = representation_attrib.get('mimeType')
content_type = mime_type.split('/')[0] if mime_type else representation_attrib.get('contentType')
if content_type == 'text':
# TODO implement WebVTT downloading
pass
elif content_type == 'video' or content_type == 'audio':
base_url = ''
for element in (representation, adaptation_set, period, mpd_doc):
base_url_e = element.find(_add_ns('BaseURL'))
if base_url_e is not None:
base_url = base_url_e.text + base_url
if re.match(r'^https?://', base_url):
break
if mpd_base_url and not re.match(r'^https?://', base_url):
if not mpd_base_url.endswith('/') and not base_url.startswith('/'):
mpd_base_url += '/'
base_url = mpd_base_url + base_url
representation_id = representation_attrib.get('id')
lang = representation_attrib.get('lang')
url_el = representation.find(_add_ns('BaseURL'))
filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
f = {
'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
'url': base_url,
'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')),
'tbr': int_or_none(representation_attrib.get('bandwidth'), 1000),
'asr': int_or_none(representation_attrib.get('audioSamplingRate')),
'fps': int_or_none(representation_attrib.get('frameRate')),
'vcodec': 'none' if content_type == 'audio' else representation_attrib.get('codecs'),
'acodec': 'none' if content_type == 'video' else representation_attrib.get('codecs'),
'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None,
'format_note': 'DASH %s' % content_type,
'filesize': filesize,
}
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float(representation_ms_info['segment_duration']) / float(representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth)(?:%(0\d+)d)?\$', r'%(\1)\2d', media_template)
media_template.replace('$$', '$')
representation_ms_info['segment_urls'] = [media_template % {'Number': segment_number, 'Bandwidth': representation_attrib.get('bandwidth')} for segment_number in range(representation_ms_info['start_number'], representation_ms_info['total_number'] + representation_ms_info['start_number'])]
if 'segment_urls' in representation_ms_info:
f.update({
'segment_urls': representation_ms_info['segment_urls'],
'protocol': 'http_dash_segments',
})
if 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id)
f.update({
'initialization_url': initialization_url,
})
if not f.get('url'):
f['url'] = initialization_url
try:
existing_format = next(
fo for fo in formats
if fo['format_id'] == representation_id)
except StopIteration:
full_info = formats_dict.get(representation_id, {}).copy()
full_info.update(f)
formats.append(full_info)
else:
existing_format.update(f)
else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
self._sort_formats(formats)
return formats
def _live_title(self, name):

View File

@ -0,0 +1,95 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class CrackleIE(InfoExtractor):
_VALID_URL = r'(?:crackle:|https?://(?:www\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
_TEST = {
'url': 'http://www.crackle.com/the-art-of-more/2496419',
'info_dict': {
'id': '2496419',
'ext': 'mp4',
'title': 'Heavy Lies the Head',
'description': 'md5:bb56aa0708fe7b9a4861535f15c3abca',
},
'params': {
# m3u8 download
'skip_download': True,
}
}
# extracted from http://legacyweb-us.crackle.com/flash/QueryReferrer.ashx
_SUBTITLE_SERVER = 'http://web-us-az.crackle.com'
_UPLYNK_OWNER_ID = 'e8773f7770a44dbd886eee4fca16a66b'
_THUMBNAIL_TEMPLATE = 'http://images-us-am.crackle.com/%stnl_1920x1080.jpg?ts=20140107233116?c=635333335057637614'
# extracted from http://legacyweb-us.crackle.com/flash/ReferrerRedirect.ashx
_MEDIA_FILE_SLOTS = {
'c544.flv': {
'width': 544,
'height': 306,
},
'360p.mp4': {
'width': 640,
'height': 360,
},
'480p.mp4': {
'width': 852,
'height': 478,
},
'480p_1mbps.mp4': {
'width': 852,
'height': 478,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
item = self._download_xml(
'http://legacyweb-us.crackle.com/app/revamp/vidwallcache.aspx?flags=-1&fm=%s' % video_id,
video_id).find('i')
title = item.attrib['t']
thumbnail = None
subtitles = {}
formats = self._extract_m3u8_formats(
'http://content.uplynk.com/ext/%s/%s.m3u8' % (self._UPLYNK_OWNER_ID, video_id),
video_id, 'mp4', m3u8_id='hls', fatal=None)
path = item.attrib.get('p')
if path:
thumbnail = self._THUMBNAIL_TEMPLATE % path
http_base_url = 'http://ahttp.crackle.com/' + path
for mfs_path, mfs_info in self._MEDIA_FILE_SLOTS.items():
formats.append({
'url': http_base_url + mfs_path,
'format_id': 'http-' + mfs_path.split('.')[0],
'width': mfs_info['width'],
'height': mfs_info['height'],
})
for cc in item.findall('cc'):
locale = cc.attrib.get('l')
v = cc.attrib.get('v')
if locale and v:
if locale not in subtitles:
subtitles[locale] = []
subtitles[locale] = [{
'url': '%s/%s%s_%s.xml' % (self._SUBTITLE_SERVER, path, locale, v),
'ext': 'ttml',
}]
self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
return {
'id': video_id,
'title': title,
'description': item.attrib.get('d'),
'duration': int(item.attrib.get('r'), 16) if item.attrib.get('r') else None,
'series': item.attrib.get('sn'),
'season_number': int_or_none(item.attrib.get('se')),
'episode_number': int_or_none(item.attrib.get('ep')),
'thumbnail': thumbnail,
'subtitles': subtitles,
'formats': formats,
}

View File

@ -215,9 +215,8 @@ class FacebookIE(InfoExtractor):
})
dash_manifest = f[0].get('dash_manifest')
if dash_manifest:
formats.extend(self._parse_dash_manifest(
compat_etree_fromstring(compat_urllib_parse_unquote_plus(dash_manifest)),
namespace='urn:mpeg:dash:schema:mpd:2011'))
formats.extend(self._parse_mpd_formats(
compat_etree_fromstring(compat_urllib_parse_unquote_plus(dash_manifest))))
if not formats:
raise ExtractorError('Cannot find video formats')

View File

@ -9,6 +9,7 @@ class FOXIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.fox.com/watch/255180355939/7684182528',
'md5': 'ebd296fcc41dd4b19f8115d8461a3165',
'info_dict': {
'id': '255180355939',
'ext': 'mp4',
@ -17,10 +18,6 @@ class FOXIE(InfoExtractor):
'duration': 129,
},
'add_ie': ['ThePlatform'],
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
@ -29,7 +26,7 @@ class FOXIE(InfoExtractor):
release_url = self._parse_json(self._search_regex(
r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
video_id)['release_url'] + '&manifest=m3u'
video_id)['release_url'] + '&switch=http'
return {
'_type': 'url_transparent',

View File

@ -224,6 +224,20 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
},
# MPD from http://dash-mse-test.appspot.com/media.html
{
'url': 'http://yt-dash-mse-test.commondatastorage.googleapis.com/media/car-20120827-manifest.mpd',
'md5': '4b57baab2e30d6eb3a6a09f0ba57ef53',
'info_dict': {
'id': 'car-20120827-manifest',
'ext': 'mp4',
'title': 'car-20120827-manifest',
'formats': 'mincount:9',
},
'params': {
'format': 'bestvideo',
},
},
# google redirect
{
'url': 'http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCUQtwIwAA&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DcmQHVoWB5FY&ei=F-sNU-LLCaXk4QT52ICQBQ&usg=AFQjCNEw4hL29zgOohLXvpJ-Bdh2bils1Q&bvm=bv.61965928,d.bGE',
@ -1229,19 +1243,24 @@ class GenericIE(InfoExtractor):
# Check for direct link to a video
content_type = head_response.headers.get('Content-Type', '')
m = re.match(r'^(?P<type>audio|video|application(?=/ogg$))/(?P<format_id>.+)$', content_type)
m = re.match(r'^(?P<type>audio|video|application(?=/(?:ogg$|(?:vnd\.apple\.|x-)?mpegurl)))/(?P<format_id>.+)$', content_type)
if m:
upload_date = unified_strdate(
head_response.headers.get('Last-Modified'))
formats = []
if m.group('format_id').endswith('mpegurl'):
formats = self._extract_m3u8_formats(url, video_id, 'mp4')
else:
formats = [{
'format_id': m.group('format_id'),
'url': url,
'vcodec': 'none' if m.group('type') == 'audio' else None
}]
return {
'id': video_id,
'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
'direct': True,
'formats': [{
'format_id': m.group('format_id'),
'url': url,
'vcodec': 'none' if m.group('type') == 'audio' else None
}],
'formats': formats,
'upload_date': upload_date,
}
@ -1284,7 +1303,7 @@ class GenericIE(InfoExtractor):
self.report_extraction(video_id)
# Is it an RSS feed, a SMIL file or a XSPF playlist?
# Is it an RSS feed, a SMIL file, an XSPF playlist or a MPD manifest?
try:
doc = compat_etree_fromstring(webpage.encode('utf-8'))
if doc.tag == 'rss':
@ -1293,6 +1312,13 @@ class GenericIE(InfoExtractor):
return self._parse_smil(doc, url, video_id)
elif doc.tag == '{http://xspf.org/ns/0/}playlist':
return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
return {
'id': video_id,
'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
'formats': self._parse_mpd_formats(
doc, video_id, mpd_base_url=url.rpartition('/')[0]),
}
except compat_xml_parse_error:
pass
@ -1402,7 +1428,7 @@ class GenericIE(InfoExtractor):
# Look for embedded Dailymotion player
matches = re.findall(
r'<(?:embed|iframe)[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
if matches:
return _playlist_from_matches(
matches, lambda m: unescapeHTML(m[1]))
@ -1946,6 +1972,8 @@ class GenericIE(InfoExtractor):
return self.playlist_result(self._extract_xspf_playlist(video_url, video_id), video_id)
elif ext == 'm3u8':
entry_info_dict['formats'] = self._extract_m3u8_formats(video_url, video_id, ext='mp4')
elif ext == 'mpd':
entry_info_dict['formats'] = self._extract_mpd_formats(video_url, video_id)
else:
entry_info_dict['url'] = video_url

View File

@ -10,8 +10,8 @@ from ..utils import (
class HotStarIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/.*?[/-](?P<id>\d{10})'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?hotstar\.com/(?:.+?[/-])?(?P<id>\d{10})'
_TESTS = [{
'url': 'http://www.hotstar.com/on-air-with-aib--english-1000076273',
'info_dict': {
'id': '1000076273',
@ -26,7 +26,13 @@ class HotStarIE(InfoExtractor):
# m3u8 download
'skip_download': True,
}
}
}, {
'url': 'http://www.hotstar.com/sports/cricket/rajitha-sizzles-on-debut-with-329/2001477583',
'only_matching': True,
}, {
'url': 'http://www.hotstar.com/1000000515',
'only_matching': True,
}]
_GET_CONTENT_TEMPLATE = 'http://account.hotstar.com/AVS/besc?action=GetAggregatedContentDetails&channel=PCTV&contentId=%s'
_GET_CDN_TEMPLATE = 'http://getcdn.hotstar.com/AVS/besc?action=GetCDN&asJson=Y&channel=%s&id=%s&type=%s'

View File

@ -2,46 +2,30 @@
from __future__ import unicode_literals
import re
from random import random
from math import floor
import time
from .common import InfoExtractor
from ..utils import (
ExtractorError,
remove_end,
sanitized_Request,
)
class IPrimaIE(InfoExtractor):
_WORKING = False
_VALID_URL = r'https?://play\.iprima\.cz/(?:[^/]+/)*(?P<id>[^?#]+)'
_VALID_URL = r'https?://play\.iprima\.cz/(?:.+/)?(?P<id>[^?#]+)'
_TESTS = [{
'url': 'http://play.iprima.cz/gondici-s-r-o-33',
'info_dict': {
'id': 'p136534',
'ext': 'mp4',
'title': 'Gondíci s. r. o. (34)',
'description': 'md5:16577c629d006aa91f59ca8d8e7f99bd',
},
'params': {
'skip_download': True, # m3u8 download
},
}, {
'url': 'http://play.iprima.cz/particka/particka-92',
'info_dict': {
'id': '39152',
'ext': 'flv',
'title': 'Partička (92)',
'description': 'md5:74e9617e51bca67c3ecfb2c6f9766f45',
'thumbnail': 'http://play.iprima.cz/sites/default/files/image_crops/image_620x349/3/491483_particka-92_image_620x349.jpg',
},
'params': {
'skip_download': True, # requires rtmpdump
},
}, {
'url': 'http://play.iprima.cz/particka/tchibo-particka-jarni-moda',
'info_dict': {
'id': '9718337',
'ext': 'flv',
'title': 'Tchibo Partička - Jarní móda',
'thumbnail': 're:^http:.*\.jpg$',
},
'params': {
'skip_download': True, # requires rtmpdump
},
}, {
'url': 'http://play.iprima.cz/zpravy-ftv-prima-2752015',
'only_matching': True,
}]
@ -51,62 +35,24 @@ class IPrimaIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
if re.search(r'Nemáte oprávnění přistupovat na tuto stránku\.\s*</div>', webpage):
raise ExtractorError(
'%s said: You do not have permission to access this page' % self.IE_NAME, expected=True)
video_id = self._search_regex(r'data-product="([^"]+)">', webpage, 'real id')
player_url = (
'http://embed.livebox.cz/iprimaplay/player-embed-v2.js?__tok%s__=%s' %
(floor(random() * 1073741824), floor(random() * 1073741824))
)
req = sanitized_Request(player_url)
req = sanitized_Request(
'http://play.iprima.cz/prehravac/init?_infuse=1'
'&_ts=%s&productId=%s' % (round(time.time()), video_id))
req.add_header('Referer', url)
playerpage = self._download_webpage(req, video_id)
playerpage = self._download_webpage(req, video_id, note='Downloading player')
base_url = ''.join(re.findall(r"embed\['stream'\] = '(.+?)'.+'(\?auth=)'.+'(.+?)';", playerpage)[1])
m3u8_url = self._search_regex(r"'src': '([^']+\.m3u8)'", playerpage, 'm3u8 url')
zoneGEO = self._html_search_regex(r'"zoneGEO":(.+?),', webpage, 'zoneGEO')
if zoneGEO != '0':
base_url = base_url.replace('token', 'token_' + zoneGEO)
formats = []
for format_id in ['lq', 'hq', 'hd']:
filename = self._html_search_regex(
r'"%s_id":(.+?),' % format_id, webpage, 'filename')
if filename == 'null':
continue
real_id = self._search_regex(
r'Prima-(?:[0-9]{10}|WEB)-([0-9]+)[-_]',
filename, 'real video id')
if format_id == 'lq':
quality = 0
elif format_id == 'hq':
quality = 1
elif format_id == 'hd':
quality = 2
filename = 'hq/' + filename
formats.append({
'format_id': format_id,
'url': base_url,
'quality': quality,
'play_path': 'mp4:' + filename.replace('"', '')[:-4],
'rtmp_live': True,
'ext': 'flv',
})
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
self._sort_formats(formats)
return {
'id': real_id,
'title': remove_end(self._og_search_title(webpage), ' | Prima PLAY'),
'id': video_id,
'title': self._og_search_title(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
'description': self._search_regex(
r'<p[^>]+itemprop="description"[^>]*>([^<]+)',
webpage, 'description', default=None),
'description': self._og_search_description(webpage),
}

View File

@ -0,0 +1,107 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
)
class KonserthusetPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?konserthusetplay\.se/\?.*\bm=(?P<id>[^&]+)'
_TEST = {
'url': 'http://www.konserthusetplay.se/?m=CKDDnlCY-dhWAAqiMERd-A',
'info_dict': {
'id': 'CKDDnlCY-dhWAAqiMERd-A',
'ext': 'flv',
'title': 'Orkesterns instrument: Valthornen',
'description': 'md5:f10e1f0030202020396a4d712d2fa827',
'thumbnail': 're:^https?://.*$',
'duration': 398.8,
},
'params': {
# rtmp download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
e = self._search_regex(
r'https?://csp\.picsearch\.com/rest\?.*\be=(.+?)[&"\']', webpage, 'e')
rest = self._download_json(
'http://csp.picsearch.com/rest?e=%s&containerId=mediaplayer&i=object' % e,
video_id, transform_source=lambda s: s[s.index('{'):s.rindex('}') + 1])
media = rest['media']
player_config = media['playerconfig']
playlist = player_config['playlist']
source = next(f for f in playlist if f.get('bitrates'))
FORMAT_ID_REGEX = r'_([^_]+)_h264m\.mp4'
formats = []
fallback_url = source.get('fallbackUrl')
fallback_format_id = None
if fallback_url:
fallback_format_id = self._search_regex(
FORMAT_ID_REGEX, fallback_url, 'format id', default=None)
connection_url = (player_config.get('rtmp', {}).get(
'netConnectionUrl') or player_config.get(
'plugins', {}).get('bwcheck', {}).get('netConnectionUrl'))
if connection_url:
for f in source['bitrates']:
video_url = f.get('url')
if not video_url:
continue
format_id = self._search_regex(
FORMAT_ID_REGEX, video_url, 'format id', default=None)
f_common = {
'vbr': int_or_none(f.get('bitrate')),
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
}
f = f_common.copy()
f.update({
'url': connection_url,
'play_path': video_url,
'format_id': 'rtmp-%s' % format_id if format_id else 'rtmp',
'ext': 'flv',
})
formats.append(f)
if format_id and format_id == fallback_format_id:
f = f_common.copy()
f.update({
'url': fallback_url,
'format_id': 'http-%s' % format_id if format_id else 'http',
})
formats.append(f)
if not formats and fallback_url:
formats.append({
'url': fallback_url,
})
self._sort_formats(formats)
title = player_config.get('title') or media['title']
description = player_config.get('mediaInfo', {}).get('description')
thumbnail = media.get('image')
duration = float_or_none(media.get('duration'), 1000)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@ -31,6 +31,10 @@ class KuwoBaseIE(InfoExtractor):
(file_format['ext'], file_format.get('br', ''), song_id),
song_id, note='Download %s url info' % file_format['format'],
)
if song_url == 'IPDeny':
raise ExtractorError('This song is blocked in this region', expected=True)
if song_url.startswith('http://') or song_url.startswith('https://'):
formats.append({
'url': song_url,

View File

@ -4,6 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
remove_end,
)
class MailRuIE(InfoExtractor):
@ -34,14 +38,30 @@ class MailRuIE(InfoExtractor):
'id': '46843144_1263',
'ext': 'mp4',
'title': 'Samsung Galaxy S5 Hammer Smash Fail Battery Explosion',
'timestamp': 1397217632,
'upload_date': '20140411',
'uploader': 'hitech',
'timestamp': 1397039888,
'upload_date': '20140409',
'uploader': 'hitech@corp.mail.ru',
'uploader_id': 'hitech@corp.mail.ru',
'duration': 245,
},
'skip': 'Not accessible from Travis CI server',
},
{
# only available via metaUrl API
'url': 'http://my.mail.ru/mail/720pizle/video/_myvideo/502.html',
'md5': '3b26d2491c6949d031a32b96bd97c096',
'info_dict': {
'id': '56664382_502',
'ext': 'mp4',
'title': ':8336',
'timestamp': 1449094163,
'upload_date': '20151202',
'uploader': '720pizle@mail.ru',
'uploader_id': '720pizle@mail.ru',
'duration': 6001,
},
'skip': 'Not accessible from Travis CI server',
}
]
def _real_extract(self, url):
@ -51,32 +71,55 @@ class MailRuIE(InfoExtractor):
if not video_id:
video_id = mobj.group('idv2prefix') + mobj.group('idv2suffix')
video_data = self._download_json(
'http://api.video.mail.ru/videos/%s.json?new=1' % video_id, video_id, 'Downloading video JSON')
webpage = self._download_webpage(url, video_id)
author = video_data['author']
uploader = author['name']
uploader_id = author.get('id') or author.get('email')
view_count = video_data.get('views_count')
video_data = None
page_config = self._parse_json(self._search_regex(
r'(?s)<script[^>]+class="sp-video__page-config"[^>]*>(.+?)</script>',
webpage, 'page config', default='{}'), video_id, fatal=False)
if page_config:
meta_url = page_config.get('metaUrl') or page_config.get('video', {}).get('metaUrl')
if meta_url:
video_data = self._download_json(
meta_url, video_id, 'Downloading video meta JSON', fatal=False)
# Fallback old approach
if not video_data:
video_data = self._download_json(
'http://api.video.mail.ru/videos/%s.json?new=1' % video_id,
video_id, 'Downloading video JSON')
formats = []
for f in video_data['videos']:
video_url = f.get('url')
if not video_url:
continue
format_id = f.get('key')
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', format_id, 'height', default=None)) if format_id else None
formats.append({
'url': video_url,
'format_id': format_id,
'height': height,
})
self._sort_formats(formats)
meta_data = video_data['meta']
content_id = '%s_%s' % (
meta_data.get('accId', ''), meta_data['itemId'])
title = meta_data['title']
if title.endswith('.mp4'):
title = title[:-4]
thumbnail = meta_data['poster']
duration = meta_data['duration']
timestamp = meta_data['timestamp']
title = remove_end(meta_data['title'], '.mp4')
formats = [
{
'url': video['url'],
'format_id': video['key'],
'height': int(video['key'].rstrip('p'))
} for video in video_data['videos']
]
self._sort_formats(formats)
author = video_data.get('author')
uploader = author.get('name')
uploader_id = author.get('id') or author.get('email')
view_count = int_or_none(video_data.get('viewsCount') or video_data.get('views_count'))
acc_id = meta_data.get('accId')
item_id = meta_data.get('itemId')
content_id = '%s_%s' % (acc_id, item_id) if acc_id and item_id else video_id
thumbnail = meta_data.get('poster')
duration = int_or_none(meta_data.get('duration'))
timestamp = int_or_none(meta_data.get('timestamp'))
return {
'id': content_id,

View File

@ -57,7 +57,7 @@ class NBCIE(InfoExtractor):
{
# This video has expired but with an escaped embedURL
'url': 'http://www.nbc.com/parenthood/episode-guide/season-5/just-like-at-home/515',
'skip': 'Expired'
'only_matching': True,
}
]

View File

@ -4,10 +4,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
js_to_json,
strip_jsonp,
unified_strdate,
US_RATINGS,
@ -199,7 +201,7 @@ class PBSIE(InfoExtractor):
'id': '2365006249',
'ext': 'mp4',
'title': 'Constitution USA with Peter Sagal - A More Perfect Union',
'description': 'md5:ba0c207295339c8d6eced00b7c363c6a',
'description': 'md5:36f341ae62e251b8f5bd2b754b95a071',
'duration': 3190,
},
'params': {
@ -213,7 +215,7 @@ class PBSIE(InfoExtractor):
'id': '2365297690',
'ext': 'mp4',
'title': 'FRONTLINE - Losing Iraq',
'description': 'md5:f5bfbefadf421e8bb8647602011caf8e',
'description': 'md5:4d3eaa01f94e61b3e73704735f1196d9',
'duration': 5050,
},
'params': {
@ -227,7 +229,7 @@ class PBSIE(InfoExtractor):
'id': '2201174722',
'ext': 'mp4',
'title': 'PBS NewsHour - Cyber Schools Gain Popularity, but Quality Questions Persist',
'description': 'md5:5871c15cba347c1b3d28ac47a73c7c28',
'description': 'md5:95a19f568689d09a166dff9edada3301',
'duration': 801,
},
},
@ -237,8 +239,8 @@ class PBSIE(InfoExtractor):
'info_dict': {
'id': '2365297708',
'ext': 'mp4',
'description': 'md5:68d87ef760660eb564455eb30ca464fe',
'title': 'Great Performances - Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full',
'description': 'md5:657897370e09e2bc6bf0f8d2cd313c6b',
'duration': 6559,
'thumbnail': 're:^https?://.*\.jpg$',
},
@ -278,7 +280,7 @@ class PBSIE(InfoExtractor):
'display_id': 'player',
'ext': 'mp4',
'title': 'American Experience - Death and the Civil War, Chapter 1',
'description': 'American Experience, TVs most-watched history series, brings to life the compelling stories from our past that inform our understanding of the world today.',
'description': 'md5:1b80a74e0380ed2a4fb335026de1600d',
'duration': 682,
'thumbnail': 're:^https?://.*\.jpg$',
},
@ -287,20 +289,19 @@ class PBSIE(InfoExtractor):
},
},
{
'url': 'http://video.pbs.org/video/2365367186/',
'url': 'http://www.pbs.org/video/2365245528/',
'info_dict': {
'id': '2365367186',
'display_id': '2365367186',
'id': '2365245528',
'display_id': '2365245528',
'ext': 'mp4',
'title': 'To Catch A Comet - Full Episode',
'description': 'On November 12, 2014, billions of kilometers from Earth, spacecraft orbiter Rosetta and lander Philae did what no other had dared to attempt \u2014 land on the volatile surface of a comet as it zooms around the sun at 67,000 km/hr. The European Space Agency hopes this mission can help peer into our past and unlock secrets of our origins.',
'duration': 3342,
'title': 'FRONTLINE - United States of Secrets (Part One)',
'description': 'md5:55756bd5c551519cc4b7703e373e217e',
'duration': 6851,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True, # requires ffmpeg
},
'skip': 'Expired',
},
{
# Video embedded in iframe containing angle brackets as attribute's value (e.g.
@ -312,7 +313,7 @@ class PBSIE(InfoExtractor):
'display_id': 'a-chefs-life-season-3-episode-5-prickly-business',
'ext': 'mp4',
'title': "A Chef's Life - Season 3, Ep. 5: Prickly Business",
'description': 'md5:61db2ddf27c9912f09c241014b118ed1',
'description': 'md5:54033c6baa1f9623607c6e2ed245888b',
'duration': 1480,
'thumbnail': 're:^https?://.*\.jpg$',
},
@ -328,7 +329,7 @@ class PBSIE(InfoExtractor):
'display_id': 'the-atomic-artists',
'ext': 'mp4',
'title': 'FRONTLINE - The Atomic Artists',
'description': 'md5:f5bfbefadf421e8bb8647602011caf8e',
'description': 'md5:1a2481e86b32b2e12ec1905dd473e2c1',
'duration': 723,
'thumbnail': 're:^https?://.*\.jpg$',
},
@ -365,10 +366,14 @@ class PBSIE(InfoExtractor):
webpage, 'upload date', default=None))
# tabbed frontline videos
tabbed_videos = re.findall(
r'<div[^>]+class="videotab[^"]*"[^>]+vid="(\d+)"', webpage)
if tabbed_videos:
return tabbed_videos, presumptive_id, upload_date
MULTI_PART_REGEXES = (
r'<div[^>]+class="videotab[^"]*"[^>]+vid="(\d+)"',
r'<a[^>]+href=["\']#video-\d+["\'][^>]+data-coveid=["\'](\d+)',
)
for p in MULTI_PART_REGEXES:
tabbed_videos = re.findall(p, webpage)
if tabbed_videos:
return tabbed_videos, presumptive_id, upload_date
MEDIA_ID_REGEXES = [
r"div\s*:\s*'videoembed'\s*,\s*mediaid\s*:\s*'(\d+)'", # frontline video embed
@ -432,9 +437,21 @@ class PBSIE(InfoExtractor):
for vid_id in video_id]
return self.playlist_result(entries, display_id)
info = self._download_json(
'http://player.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
display_id)
try:
info = self._download_json(
'http://player.pbs.org/videoInfo/%s?format=json&type=partner' % video_id,
display_id, 'Downloading video info JSON')
except ExtractorError as e:
if not isinstance(e.cause, compat_HTTPError) or e.cause.code != 404:
raise
# videoInfo API may not work for some videos, fallback to portalplayer API
player = self._download_webpage(
'http://player.pbs.org/portalplayer/%s' % video_id, display_id)
info = self._parse_json(
self._search_regex(
r'(?s)PBS\.videoData\s*=\s*({.+?});\n',
player, 'video data', default='{}'),
display_id, transform_source=js_to_json, fatal=False)
formats = []
for encoding_name in ('recommended_encoding', 'alternate_encoding'):
@ -493,7 +510,7 @@ class PBSIE(InfoExtractor):
'id': video_id,
'display_id': display_id,
'title': info['title'],
'description': info['program'].get('description'),
'description': info.get('description') or info.get('program', {}).get('description'),
'thumbnail': info.get('image_url'),
'duration': int_or_none(info.get('duration')),
'age_limit': age_limit,

View File

@ -0,0 +1,51 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class PlaysTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?plays\.tv/video/(?P<id>[0-9a-f]{18})'
_TEST = {
'url': 'http://plays.tv/video/56af17f56c95335490/when-you-outplay-the-azir-wall',
'md5': 'dfeac1198506652b5257a62762cec7bc',
'info_dict': {
'id': '56af17f56c95335490',
'ext': 'mp4',
'title': 'When you outplay the Azir wall',
'description': 'Posted by Bjergsen',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
content = self._parse_json(
self._search_regex(
r'R\.bindContent\(({.+?})\);', webpage,
'content'), video_id)['content']
mpd_url, sources = re.search(
r'(?s)<video[^>]+data-mpd="([^"]+)"[^>]*>(.+?)</video>',
content).groups()
formats = self._extract_mpd_formats(
self._proto_relative_url(mpd_url), video_id, mpd_id='DASH')
for format_id, height, format_url in re.findall(r'<source\s+res="((\d+)h?)"\s+src="([^"]+)"', sources):
formats.append({
'url': self._proto_relative_url(format_url),
'format_id': 'http-' + format_id,
'height': int_or_none(height),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
}

View File

@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
float_or_none,
@ -61,12 +63,15 @@ class RteIE(InfoExtractor):
class RteRadioIE(InfoExtractor):
IE_NAME = 'rte:radio'
IE_DESC = 'Raidió Teilifís Éireann radio'
# Radioplayer URLs have the specifier #!rii=<channel_id>:<id>:<playable_item_id>:<date>:
# Radioplayer URLs have two distinct specifier formats,
# the old format #!rii=<channel_id>:<id>:<playable_item_id>:<date>:
# the new format #!rii=b<channel_id>_<id>_<playable_item_id>_<date>_
# where the IDs are int/empty, the date is DD-MM-YYYY, and the specifier may be truncated.
# An <id> uniquely defines an individual recording, and is the only part we require.
_VALID_URL = r'https?://(?:www\.)?rte\.ie/radio/utils/radioplayer/rteradioweb\.html#!rii=(?:[0-9]*)(?:%3A|:)(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?rte\.ie/radio/utils/radioplayer/rteradioweb\.html#!rii=(?:b?[0-9]*)(?:%3A|:|%5F|_)(?P<id>[0-9]+)'
_TEST = {
_TESTS = [{
# Old-style player URL; HLS and RTMPE formats
'url': 'http://www.rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=16:10507902:2414:27-12-2015:',
'info_dict': {
'id': '10507902',
@ -81,7 +86,23 @@ class RteRadioIE(InfoExtractor):
'params': {
'skip_download': 'f4m fails with --test atm'
}
}
}, {
# New-style player URL; RTMPE formats only
'url': 'http://rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=b16_3250678_8861_06-04-2012_',
'info_dict': {
'id': '3250678',
'ext': 'flv',
'title': 'The Lyric Concert with Paul Herriott',
'thumbnail': 're:^https?://.*\.jpg$',
'description': '',
'timestamp': 1333742400,
'upload_date': '20120406',
'duration': 7199.016,
},
'params': {
'skip_download': 'f4m fails with --test atm'
}
}]
def _real_extract(self, url):
item_id = self._match_id(url)
@ -102,8 +123,18 @@ class RteRadioIE(InfoExtractor):
formats = []
if mg.get('url') and not mg['url'].startswith('rtmpe:'):
formats.append({'url': mg['url']})
if mg.get('url'):
m = re.match(r'(?P<url>rtmpe?://[^/]+)/(?P<app>.+)/(?P<playpath>mp4:.*)', mg['url'])
if m:
m = m.groupdict()
formats.append({
'url': m['url'] + '/' + m['app'],
'app': m['app'],
'play_path': m['playpath'],
'player_url': url,
'ext': 'flv',
'format_id': 'rtmp',
})
if mg.get('hls_server') and mg.get('hls_url'):
formats.extend(self._extract_m3u8_formats(

View File

@ -7,7 +7,7 @@ from .common import InfoExtractor
class SpankBangIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|[a-z]{2})\.)?spankbang\.com/(?P<id>[\da-z]+)/video'
_TEST = {
_TESTS = [{
'url': 'http://spankbang.com/3vvn/video/fantasy+solo',
'md5': '1cc433e1d6aa14bc376535b8679302f7',
'info_dict': {
@ -19,7 +19,11 @@ class SpankBangIE(InfoExtractor):
'uploader': 'silly2587',
'age_limit': 18,
}
}
}, {
# 480p only
'url': 'http://spankbang.com/1vt0/video/solvane+gangbang',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -34,7 +38,8 @@ class SpankBangIE(InfoExtractor):
'ext': 'mp4',
'format_id': '%sp' % height,
'height': int(height),
} for height in re.findall(r'<(?:span|li)[^>]+q_(\d+)p', webpage)]
} for height in re.findall(r'<(?:span|li|p)[^>]+[qb]_(\d+)p', webpage)]
self._check_formats(formats, video_id)
self._sort_formats(formats)
title = self._html_search_regex(

View File

@ -70,14 +70,11 @@ class SRGSSRIE(InfoExtractor):
asset_url, media_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
else:
ext = None
if protocol == 'RTMP':
ext = self._search_regex(r'([a-z0-9]+):[^/]+', asset_url, 'ext')
formats.append({
'format_id': format_id,
'url': asset_url,
'preference': preference(quality),
'ext': ext,
'ext': 'flv' if protocol == 'RTMP' else None,
})
self._sort_formats(formats)

View File

@ -20,7 +20,6 @@ from ..utils import (
int_or_none,
sanitized_Request,
unsmuggle_url,
url_basename,
xpath_with_ns,
)
@ -283,8 +282,8 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
first_video_id = None
duration = None
for item in entry['media$content']:
smil_url = item['plfile$url'] + '&format=SMIL&Tracking=true&Embedded=true&formats=MPEG4,F4M'
cur_video_id = url_basename(smil_url)
smil_url = item['plfile$url'] + '&format=SMIL&mbr=true'
cur_video_id = ThePlatformIE._match_id(smil_url)
if first_video_id is None:
first_video_id = cur_video_id
duration = float_or_none(item.get('plfile$duration'))

View File

@ -197,8 +197,14 @@ class VevoIE(InfoExtractor):
if not version_url:
continue
if '.mpd' in version_url or '.ism' in version_url:
if '.ism' in version_url:
continue
elif '.mpd' in version_url:
formats.extend(self._extract_mpd_formats(
version_url, video_id, mpd_id='dash-%s' % version,
note='Downloading %s MPD information' % version,
errnote='Failed to download %s MPD information' % version,
fatal=False))
elif '.m3u8' in version_url:
formats.extend(self._extract_m3u8_formats(
version_url, video_id, 'mp4', 'm3u8_native',

View File

@ -1,6 +1,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urlparse,
)
from ..utils import (
float_or_none,
int_or_none,
@ -12,10 +16,10 @@ class ViddlerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?viddler\.com/(?:v|embed|player)/(?P<id>[a-z0-9]+)'
_TESTS = [{
'url': 'http://www.viddler.com/v/43903784',
'md5': 'ae43ad7cb59431ce043f0ff7fa13cbf4',
'md5': '9eee21161d2c7f5b39690c3e325fab2f',
'info_dict': {
'id': '43903784',
'ext': 'mp4',
'ext': 'mov',
'title': 'Video Made Easy',
'description': 'md5:6a697ebd844ff3093bd2e82c37b409cd',
'uploader': 'viddler',
@ -29,10 +33,10 @@ class ViddlerIE(InfoExtractor):
}
}, {
'url': 'http://www.viddler.com/v/4d03aad9/',
'md5': 'faa71fbf70c0bee7ab93076fd007f4b0',
'md5': 'f12c5a7fa839c47a79363bfdf69404fb',
'info_dict': {
'id': '4d03aad9',
'ext': 'mp4',
'ext': 'ts',
'title': 'WALL-TO-GORTAT',
'upload_date': '20150126',
'uploader': 'deadspin',
@ -42,10 +46,10 @@ class ViddlerIE(InfoExtractor):
}
}, {
'url': 'http://www.viddler.com/player/221ebbbd/0/',
'md5': '0defa2bd0ea613d14a6e9bd1db6be326',
'md5': '740511f61d3d1bb71dc14a0fe01a1c10',
'info_dict': {
'id': '221ebbbd',
'ext': 'mp4',
'ext': 'mov',
'title': 'LETeens-Grammar-snack-third-conditional',
'description': ' ',
'upload_date': '20140929',
@ -54,16 +58,42 @@ class ViddlerIE(InfoExtractor):
'view_count': int,
'comment_count': int,
}
}, {
# secret protected
'url': 'http://www.viddler.com/v/890c0985?secret=34051570',
'info_dict': {
'id': '890c0985',
'ext': 'mp4',
'title': 'Complete Property Training - Traineeships',
'description': ' ',
'upload_date': '20130606',
'uploader': 'TiffanyBowtell',
'timestamp': 1370496993,
'view_count': int,
'comment_count': int,
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
json_url = (
'http://api.viddler.com/api/v2/viddler.videos.getPlaybackDetails.json?video_id=%s&key=v0vhrt7bg2xq1vyxhkct' %
video_id)
query = {
'video_id': video_id,
'key': 'v0vhrt7bg2xq1vyxhkct',
}
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
secret = qs.get('secret', [None])[0]
if secret:
query['secret'] = secret
headers = {'Referer': 'http://static.cdn-ec.viddler.com/js/arpeggio/v2/embed.html'}
request = sanitized_Request(json_url, None, headers)
request = sanitized_Request(
'http://api.viddler.com/api/v2/viddler.videos.getPlaybackDetails.json?%s'
% compat_urllib_parse.urlencode(query), None, headers)
data = self._download_json(request, video_id)['video']
formats = []

View File

@ -114,7 +114,7 @@ class VideomoreIE(InfoExtractor):
data = self._download_json(
'http://videomore.ru/video/tracks/%s.json' % video_id,
video_id, 'Downloadinng video JSON')
video_id, 'Downloading video JSON')
title = data.get('title') or data['project_title']
description = data.get('description') or data.get('description_raw')

View File

@ -1,5 +1,7 @@
from __future__ import unicode_literals
import itertools
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
@ -11,7 +13,8 @@ from ..utils import (
class VidmeIE(InfoExtractor):
_VALID_URL = r'https?://vid\.me/(?:e/)?(?P<id>[\da-zA-Z]+)'
IE_NAME = 'vidme'
_VALID_URL = r'https?://vid\.me/(?:e/)?(?P<id>[\da-zA-Z]{,5})(?:[^\da-zA-Z]|$)'
_TESTS = [{
'url': 'https://vid.me/QNB',
'md5': 'f42d05e7149aeaec5c037b17e5d3dc82',
@ -202,3 +205,69 @@ class VidmeIE(InfoExtractor):
'comment_count': comment_count,
'formats': formats,
}
class VidmeListBaseIE(InfoExtractor):
# Max possible limit according to https://docs.vid.me/#api-Videos-List
_LIMIT = 100
def _entries(self, user_id, user_name):
for page_num in itertools.count(1):
page = self._download_json(
'https://api.vid.me/videos/%s?user=%s&limit=%d&offset=%d'
% (self._API_ITEM, user_id, self._LIMIT, (page_num - 1) * self._LIMIT),
user_name, 'Downloading user %s page %d' % (self._API_ITEM, page_num))
videos = page.get('videos', [])
if not videos:
break
for video in videos:
video_url = video.get('full_url') or video.get('embed_url')
if video_url:
yield self.url_result(video_url, VidmeIE.ie_key())
total = int_or_none(page.get('page', {}).get('total'))
if total and self._LIMIT * page_num >= total:
break
def _real_extract(self, url):
user_name = self._match_id(url)
user_id = self._download_json(
'https://api.vid.me/userByUsername?username=%s' % user_name,
user_name)['user']['user_id']
return self.playlist_result(
self._entries(user_id, user_name), user_id,
'%s - %s' % (user_name, self._TITLE))
class VidmeUserIE(VidmeListBaseIE):
IE_NAME = 'vidme:user'
_VALID_URL = r'https?://vid\.me/(?:e/)?(?P<id>[\da-zA-Z]{6,})(?!/likes)(?:[^\da-zA-Z]|$)'
_API_ITEM = 'list'
_TITLE = 'Videos'
_TEST = {
'url': 'https://vid.me/EFARCHIVE',
'info_dict': {
'id': '3834632',
'title': 'EFARCHIVE - %s' % _TITLE,
},
'playlist_mincount': 238,
}
class VidmeUserLikesIE(VidmeListBaseIE):
IE_NAME = 'vidme:user:likes'
_VALID_URL = r'https?://vid\.me/(?:e/)?(?P<id>[\da-zA-Z]{6,})/likes'
_API_ITEM = 'likes'
_TITLE = 'Likes'
_TEST = {
'url': 'https://vid.me/ErinAlexis/likes',
'info_dict': {
'id': '6483530',
'title': 'ErinAlexis - %s' % _TITLE,
},
'playlist_mincount': 415,
}

View File

@ -57,7 +57,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
def _extract_xsrft_and_vuid(self, webpage):
xsrft = self._search_regex(
r'xsrft\s*[=:]\s*(?P<q>["\'])(?P<xsrft>.+?)(?P=q)',
r'(?:(?P<q1>["\'])xsrft(?P=q1)\s*:|xsrft\s*[=:])\s*(?P<q>["\'])(?P<xsrft>.+?)(?P=q)',
webpage, 'login token', group='xsrft')
vuid = self._search_regex(
r'["\']vuid["\']\s*:\s*(["\'])(?P<vuid>.+?)\1',

View File

@ -265,7 +265,7 @@ class VKIE(InfoExtractor):
return self.url_result(pladform_url)
m_rutube = re.search(
r'\ssrc="((?:https?:)?//rutube\.ru\\?/video\\?/embed(?:.*?))\\?"', info_page)
r'\ssrc="((?:https?:)?//rutube\.ru\\?/(?:video|play)\\?/embed(?:.*?))\\?"', info_page)
if m_rutube is not None:
rutube_url = self._proto_relative_url(
m_rutube.group(1).replace('\\', ''))
@ -321,7 +321,7 @@ class VKIE(InfoExtractor):
class VKUserVideosIE(InfoExtractor):
IE_NAME = 'vk:uservideos'
IE_DESC = "VK - User's Videos"
_VALID_URL = r'https?://vk\.com/videos(?P<id>-?[0-9]+)$'
_VALID_URL = r'https?://vk\.com/videos(?P<id>-?[0-9]+)(?!\?.*\bz=video)(?:[/?#&]|$)'
_TEMPLATE_URL = 'https://vk.com/videos'
_TESTS = [{
'url': 'http://vk.com/videos205387401',
@ -333,6 +333,9 @@ class VKUserVideosIE(InfoExtractor):
}, {
'url': 'http://vk.com/videos-77521',
'only_matching': True,
}, {
'url': 'http://vk.com/videos-97664626?section=all',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -1,86 +1,88 @@
# coding: utf-8
from __future__ import unicode_literals
import hmac
from hashlib import sha1
from base64 import b64encode
from time import time
from .common import InfoExtractor
from ..utils import (
ExtractorError,
determine_ext
dict_get,
float_or_none,
int_or_none,
)
from ..compat import compat_urllib_parse
class VLiveIE(InfoExtractor):
IE_NAME = 'vlive'
# www.vlive.tv/video/ links redirect to m.vlive.tv/video/ for mobile devices
_VALID_URL = r'https?://(?:(www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://m.vlive.tv/video/1326',
'url': 'http://www.vlive.tv/video/1326',
'md5': 'cc7314812855ce56de70a06a27314983',
'info_dict': {
'id': '1326',
'ext': 'mp4',
'title': '[V] Girl\'s Day\'s Broadcast',
'creator': 'Girl\'s Day',
'title': "[V] Girl's Day's Broadcast",
'creator': "Girl's Day",
'view_count': int,
},
}
_SECRET = 'rFkwZet6pqk1vQt6SxxUkAHX7YL3lmqzUMrU4IDusTo4jEBdtOhNfT4BYYAdArwH'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://m.vlive.tv/video/%s' % video_id,
video_id, note='Download video page')
'http://www.vlive.tv/video/%s' % video_id, video_id)
long_video_id = self._search_regex(
r'vlive\.tv\.video\.ajax\.request\.handler\.init\(\s*"[0-9]+"\s*,\s*"[^"]*"\s*,\s*"([^"]+)"',
webpage, 'long video id')
key = self._search_regex(
r'vlive\.tv\.video\.ajax\.request\.handler\.init\(\s*"[0-9]+"\s*,\s*"[^"]*"\s*,\s*"[^"]+"\s*,\s*"([^"]+)"',
webpage, 'key')
title = self._og_search_title(webpage)
thumbnail = self._og_search_thumbnail(webpage)
creator = self._html_search_regex(
r'<span[^>]+class="name">([^<>]+)</span>', webpage, 'creator')
url = 'http://global.apis.naver.com/globalV/globalV/vod/%s/playinfo?' % video_id
msgpad = '%.0f' % (time() * 1000)
md = b64encode(
hmac.new(self._SECRET.encode('ascii'),
(url[:255] + msgpad).encode('ascii'), sha1).digest()
)
url += '&' + compat_urllib_parse.urlencode({'msgpad': msgpad, 'md': md})
playinfo = self._download_json(url, video_id, 'Downloading video json')
playinfo = self._download_json(
'http://global.apis.naver.com/rmcnmv/rmcnmv/vod_play_videoInfo.json?%s'
% compat_urllib_parse.urlencode({
'videoId': long_video_id,
'key': key,
'ptc': 'http',
'doct': 'json', # document type (xml or json)
'cpt': 'vtt', # captions type (vtt or ttml)
}), video_id)
if playinfo.get('message', '') != 'success':
raise ExtractorError(playinfo.get('message', 'JSON request unsuccessful'))
if not playinfo.get('result'):
raise ExtractorError('No videos found.')
formats = []
for vid in playinfo['result'].get('videos', {}).get('list', []):
formats.append({
'url': vid['source'],
'ext': 'mp4',
'abr': vid.get('bitrate', {}).get('audio'),
'vbr': vid.get('bitrate', {}).get('video'),
'format_id': vid['encodingOption']['name'],
'height': vid.get('height'),
'width': vid.get('width'),
})
formats = [{
'url': vid['source'],
'format_id': vid.get('encodingOption', {}).get('name'),
'abr': float_or_none(vid.get('bitrate', {}).get('audio')),
'vbr': float_or_none(vid.get('bitrate', {}).get('video')),
'width': int_or_none(vid.get('encodingOption', {}).get('width')),
'height': int_or_none(vid.get('encodingOption', {}).get('height')),
'filesize': int_or_none(vid.get('size')),
} for vid in playinfo.get('videos', {}).get('list', []) if vid.get('source')]
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage)
creator = self._html_search_regex(
r'<div[^>]+class="info_area"[^>]*>\s*<strong[^>]+class="name"[^>]*>([^<]+)</strong>',
webpage, 'creator', fatal=False)
view_count = int_or_none(playinfo.get('meta', {}).get('count'))
subtitles = {}
for caption in playinfo['result'].get('captions', {}).get('list', []):
subtitles[caption['language']] = [
{'ext': determine_ext(caption['source'], default_ext='vtt'),
'url': caption['source']}]
for caption in playinfo.get('captions', {}).get('list', []):
lang = dict_get(caption, ('language', 'locale', 'country', 'label'))
if lang and caption.get('source'):
subtitles[lang] = [{
'ext': 'vtt',
'url': caption['source']}]
return {
'id': video_id,
'title': title,
'creator': creator,
'thumbnail': thumbnail,
'view_count': view_count,
'formats': formats,
'subtitles': subtitles,
}

View File

@ -229,6 +229,9 @@ class YoukuIE(InfoExtractor):
if error_note is not None and '因版权原因无法观看此视频' in error_note:
raise ExtractorError(
'Youku said: Sorry, this video is available in China only', expected=True)
elif error_note and '该视频被设为私密' in error_note:
raise ExtractorError(
'Youku said: Sorry, this video is private', expected=True)
else:
msg = 'Youku server reported error %i' % error.get('code')
if error_note is not None:

View File

@ -286,7 +286,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'22': {'ext': 'mp4', 'width': 1280, 'height': 720, 'acodec': 'aac', 'abr': 192, 'vcodec': 'h264'},
'34': {'ext': 'flv', 'width': 640, 'height': 360, 'acodec': 'aac', 'abr': 128, 'vcodec': 'h264'},
'35': {'ext': 'flv', 'width': 854, 'height': 480, 'acodec': 'aac', 'abr': 128, 'vcodec': 'h264'},
'36': {'ext': '3gp', 'width': 320, 'height': 240, 'acodec': 'aac', 'abr': 32, 'vcodec': 'mp4v'},
# itag 36 videos are either 320x180 (BaW_jenozKc) or 320x240 (__2ABJjxzNo), abr varies as well
'36': {'ext': '3gp', 'width': 320, 'acodec': 'aac', 'vcodec': 'mp4v'},
'37': {'ext': 'mp4', 'width': 1920, 'height': 1080, 'acodec': 'aac', 'abr': 192, 'vcodec': 'h264'},
'38': {'ext': 'mp4', 'width': 4096, 'height': 3072, 'acodec': 'aac', 'abr': 192, 'vcodec': 'h264'},
'43': {'ext': 'webm', 'width': 640, 'height': 360, 'acodec': 'vorbis', 'abr': 128, 'vcodec': 'vp8'},
@ -369,11 +370,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# RTMP (unnamed)
'_rtmp': {'protocol': 'rtmp'},
}
_SUBTITLE_FORMATS = ('ttml', 'vtt')
IE_NAME = 'youtube'
_TESTS = [
{
'url': 'http://www.youtube.com/watch?v=BaW_jenozKcj&t=1s&end=9',
'url': 'http://www.youtube.com/watch?v=BaW_jenozKc&t=1s&end=9',
'info_dict': {
'id': 'BaW_jenozKc',
'ext': 'mp4',
@ -439,7 +441,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}
},
{
'url': 'http://www.youtube.com/watch?v=BaW_jenozKcj&v=UxxajLWwzqY',
'url': 'http://www.youtube.com/watch?v=BaW_jenozKc&v=UxxajLWwzqY',
'note': 'Use the first video ID in the URL',
'info_dict': {
'id': 'BaW_jenozKc',
@ -702,6 +704,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'skip_download': True,
},
},
{
# Multifeed video with comma in title (see https://github.com/rg3/youtube-dl/issues/8536)
'url': 'https://www.youtube.com/watch?v=gVfLd0zydlo',
'info_dict': {
'id': 'gVfLd0zydlo',
'title': 'DevConf.cz 2016 Day 2 Workshops 1 14:00 - 15:30',
},
'playlist_count': 2,
},
{
'url': 'http://vid.plus/FlRa-iH7PGw',
'only_matching': True,
@ -918,7 +929,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if lang in sub_lang_list:
continue
sub_formats = []
for ext in ['sbv', 'vtt', 'srt']:
for ext in self._SUBTITLE_FORMATS:
params = compat_urllib_parse.urlencode({
'lang': lang,
'v': video_id,
@ -988,7 +999,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
for lang_node in caption_list.findall('target'):
sub_lang = lang_node.attrib['lang_code']
sub_formats = []
for ext in ['sbv', 'vtt', 'srt']:
for ext in self._SUBTITLE_FORMATS:
params = compat_urllib_parse.urlencode({
'lang': original_lang,
'tlang': sub_lang,
@ -1194,9 +1205,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if not self._downloader.params.get('noplaylist'):
entries = []
feed_ids = []
multifeed_metadata_list = compat_urllib_parse_unquote_plus(video_info['multifeed_metadata_list'][0])
multifeed_metadata_list = video_info['multifeed_metadata_list'][0]
for feed in multifeed_metadata_list.split(','):
feed_data = compat_parse_qs(feed)
# Unquote should take place before split on comma (,) since textual
# fields may contain comma as well (see
# https://github.com/rg3/youtube-dl/issues/8536)
feed_data = compat_parse_qs(compat_urllib_parse_unquote_plus(feed))
entries.append({
'_type': 'url_transparent',
'ie_key': 'Youtube',
@ -1463,7 +1477,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Look for the DASH manifest
if self._downloader.params.get('youtube_include_dash_manifest', True):
dash_mpd_fatal = True
for dash_manifest_url in dash_mpds:
for mpd_url in dash_mpds:
dash_formats = {}
try:
def decrypt_sig(mobj):
@ -1471,11 +1485,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
dec_s = self._decrypt_signature(s, video_id, player_url, age_gate)
return '/signature/%s' % dec_s
dash_manifest_url = re.sub(r'/s/([a-fA-F0-9\.]+)', decrypt_sig, dash_manifest_url)
mpd_url = re.sub(r'/s/([a-fA-F0-9\.]+)', decrypt_sig, mpd_url)
for df in self._extract_dash_manifest_formats(
dash_manifest_url, video_id, fatal=dash_mpd_fatal,
namespace='urn:mpeg:DASH:schema:MPD:2011', formats_dict=self._formats):
for df in self._extract_mpd_formats(
mpd_url, video_id, fatal=dash_mpd_fatal,
formats_dict=self._formats):
# Do not overwrite DASH format found in some previous DASH manifest
if df['format_id'] not in dash_formats:
dash_formats[df['format_id']] = df

View File

@ -391,6 +391,10 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
for (name, value) in metadata.items():
options.extend(['-metadata', '%s=%s' % (name, value)])
# https://github.com/rg3/youtube-dl/issues/8350
if info.get('protocol') == 'm3u8_native' or info.get('protocol') == 'm3u8' and self._downloader.params.get('hls_prefer_native', False):
options.extend(['-bsf:a', 'aac_adtstoasc'])
self._downloader.to_screen('[ffmpeg] Adding metadata to \'%s\'' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
@ -504,8 +508,8 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
with io.open(srt_file, 'wt', encoding='utf-8') as f:
f.write(srt_data)
old_file = srt_file
ext = 'srt'
subs[lang] = {
'ext': 'srt',
'data': srt_data
@ -513,12 +517,14 @@ class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
if new_ext == 'srt':
continue
else:
sub_filenames.append(srt_file)
self.run_ffmpeg(old_file, new_file, ['-f', new_format])
with io.open(new_file, 'rt', encoding='utf-8') as f:
subs[lang] = {
'ext': ext,
'ext': new_ext,
'data': f.read(),
}

View File

@ -56,7 +56,7 @@ from .compat import (
compiled_regex_type = type(re.compile(''))
std_headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/20.0 (Chrome)',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/44.0 (Chrome)',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
@ -1717,6 +1717,16 @@ def encode_dict(d, encoding='utf-8'):
return dict((encode(k), encode(v)) for k, v in d.items())
def dict_get(d, key_or_keys, default=None, skip_false_values=True):
if isinstance(key_or_keys, (list, tuple)):
for key in key_or_keys:
if key not in d or d[key] is None or skip_false_values and not d[key]:
continue
return d[key]
return default
return d.get(key_or_keys, default)
def encode_compat_str(string, encoding=preferredencoding(), errors='strict'):
return string if isinstance(string, compat_str) else compat_str(string, encoding, errors)
@ -1739,7 +1749,7 @@ def parse_age_limit(s):
def strip_jsonp(code):
return re.sub(
r'(?s)^[a-zA-Z0-9_]+\s*\(\s*(.*)\);?\s*?(?://[^\n]*)*$', r'\1', code)
r'(?s)^[a-zA-Z0-9_.]+\s*\(\s*(.*)\);?\s*?(?://[^\n]*)*$', r'\1', code)
def js_to_json(code):
@ -2017,20 +2027,27 @@ def dfxp2srt(dfxp_data):
'ttaf1': 'http://www.w3.org/2006/10/ttaf1',
})
class TTMLPElementParser(object):
out = ''
def start(self, tag, attrib):
if tag in (_x('ttml:br'), _x('ttaf1:br'), 'br'):
self.out += '\n'
def end(self, tag):
pass
def data(self, data):
self.out += data
def close(self):
return self.out.strip()
def parse_node(node):
str_or_empty = functools.partial(str_or_none, default='')
out = str_or_empty(node.text)
for child in node:
if child.tag in (_x('ttml:br'), _x('ttaf1:br'), 'br'):
out += '\n' + str_or_empty(child.tail)
elif child.tag in (_x('ttml:span'), _x('ttaf1:span'), 'span'):
out += str_or_empty(parse_node(child))
else:
out += str_or_empty(xml.etree.ElementTree.tostring(child))
return out
target = TTMLPElementParser()
parser = xml.etree.ElementTree.XMLParser(target=target)
parser.feed(xml.etree.ElementTree.tostring(node))
return parser.close()
dfxp = compat_etree_fromstring(dfxp_data.encode('utf-8'))
out = []

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2016.02.01'
__version__ = '2016.02.13'