Compare commits

..

51 Commits

Author SHA1 Message Date
07ac9e2cc2 release 2013.09.12 2013-09-12 11:26:44 +02:00
6bc520c207 Check for both automatic captions and subtitles with options --write-sub and --write-auto-sub (fixes #1224) 2013-09-12 11:15:25 +02:00
e3dc22ca3a [youtube] Fix detection of videos with automatic captions 2013-09-11 19:24:56 +02:00
d665f8d3cb [subtitles] Also list the available automatic captions languages with '--list-sub' 2013-09-11 19:17:30 +02:00
055e6f3657 [youtube] Support automatic captions with original language different from English (fixes #1225) and download in multiple languages. 2013-09-11 19:08:43 +02:00
ac4f319ba1 Credit @iemejia 2013-09-11 17:58:51 +02:00
542cca0e8c Merge branch 'subtitles_rework' (closes PR #1326) 2013-09-11 17:41:24 +02:00
6a2449df3b [howcast] Do not download from http://www.howcast.com/videos/{video_id}
It takes too much to follow the redirection.
2013-09-11 17:36:23 +02:00
7fad1c6328 [subtitles] Use self._download_webpage for extracting the subtitles
It raises ExtractorError for the same exceptions we have to catch.
2013-09-11 16:24:47 +02:00
d82134c339 [subtitles] Simplify the extraction of subtitles in subclasses and remove NoAutoSubtitlesInfoExtractor
Subclasses just need to call the method extract_subtitles, which will call _extract_subtitles and _request_automatic_caption
Now the default implementation of _request_automatic_caption returns {}.
2013-09-11 16:05:49 +02:00
54d39d8b2f [subtitles] rename SubitlesIE to SubtitlesInfoExtractor
Otherwise it can be automatically detected as a IE ready for use.
2013-09-11 15:51:04 +02:00
de7f3446e0 [youtube] move subtitles methods from the base extractor to YoutubeIE 2013-09-11 15:48:23 +02:00
f8e52269c1 [subtitles] made inheritance hierarchy flat as requested 2013-09-11 15:21:09 +02:00
cf1dd0c59e Merge branch 'master' into subtitles_rework 2013-09-11 14:26:48 +02:00
22c8b52545 In the supported sites page, sort the extractors in case insensitive 2013-09-11 12:04:27 +02:00
1f7dc42cd0 release 2013.11.09 2013-09-11 11:30:10 +02:00
aa8f2641da [youtube] update algo for length 85 (fixes #1408 and fixes #1406) 2013-09-11 11:24:58 +02:00
648d25d43d [francetv] Add an extractor for francetvinfo.fr (closes #1317)
It uses the same system as Pluzz, create a base class for both extractors.
2013-09-10 15:50:34 +02:00
df3e61003a Merge pull request #1402 from Rudloff/canalc2
Wrong property name
2013-09-10 03:19:37 -07:00
6b361ad5ee Wrong property name 2013-09-10 12:13:22 +02:00
5d8afe69f7 Add an extractor for pluzz.francetv.fr (closes PR #1399) 2013-09-10 12:00:00 +02:00
a1ab553858 release 2013.09.10 2013-09-10 11:25:11 +02:00
07463ea162 Add an extractor for Slideshare (closes #1400) 2013-09-10 11:19:58 +02:00
6d2d21f713 [sohu] add support for my.tv.sohu.com urls (fixes #1398) 2013-09-09 19:56:16 +02:00
061b2889a9 Fix the minutes part in FileDownloader.format_seconds (fixed #1397)
It printed for the minutes the result of (seconds // 60)
2013-09-09 10:38:54 +02:00
8963d9c266 [youtube] Modify the regex to match ids of length 11 (fixes #1396)
In urls like http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930 you can't split the query string and ids always have that length.
2013-09-09 10:33:12 +02:00
890f62e868 Revert "[youtube] Fix detection of tags from HLS videos."
They have undo the change

This reverts commit 0638ad9999.
2013-09-08 18:50:07 +02:00
8f362589a5 release 2013.09.07 2013-09-07 22:29:15 +02:00
a27a2470cd Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-07 22:28:54 +02:00
72836fcee4 Merge branch 'master' into subtitles_rework 2013-09-06 23:24:41 +02:00
a7130543fa [generic] If the url doesn't specify the protocol, then try to extract prepending 'http://' 2013-09-06 18:39:35 +02:00
a490fda746 [daylimotion] accept embed urls (fixes #1386) 2013-09-06 18:36:07 +02:00
7e77275293 Add an extractor for Metacritic 2013-09-06 18:08:07 +02:00
d6e203b3dc [subtitles] fixed multiple subtitles language separated by comma after merge
As mentioned in the pull request, I forgot to include this changes.
aa6a10c44a
2013-09-06 16:30:13 +02:00
e3ea479087 [youtube] Fix some issues with the detection of playlist/channel urls (reported in #1374)
They were being caught by YoutubeUserIE, now it only extracts a url if the rest of extractors aren't suitable.
Now the url tests check that the urls can only be extracted with an specific extractor.
2013-09-06 16:24:24 +02:00
faab1d3836 [youtube] Fix detection of feeds urls (fixes #1294)
Urls like https://www.youtube.com/feed/watch_later were being as users (before the last changes to YoutubeUserIE, as videos)
2013-09-06 14:45:49 +02:00
8851a574a3 Fix add-versions 2013-09-06 11:07:34 +02:00
06a401c845 Merge branch 'master' into subtitles_rework 2013-08-28 00:33:12 +02:00
bd2dee6c67 Merge branch 'master' into subtitles_rework 2013-08-23 01:47:10 +02:00
18b4e04f1c Merge branch 'master' into subtitles_rework 2013-08-22 23:29:36 +02:00
d80a064eff [subtitles] Added tests to check correct behavior when no subtitles are
available
2013-08-08 22:22:33 +02:00
d55de6eec2 [subtitles] Skips now the subtitles that has already been downloaded.
Just a validation for file exists, I also removed a method that wasn't
been used because it was a copy paste from FileDownloader.
2013-08-08 18:30:04 +02:00
69df680b97 [subtitles] Improved docs + new class for servers who don't support
auto-caption
2013-08-08 11:20:56 +02:00
447591e1ae [test] Cleaned subtitles tests 2013-08-08 11:03:52 +02:00
33eb0ce4c4 [subtitles] removed only-sub option (--skip-download achieves the same
functionality)
2013-08-08 10:06:24 +02:00
505c28aac9 Separated subtitle options in their own group 2013-08-08 09:53:25 +02:00
8377574c9c [internal] Improved subtitle architecture + (update in
youtube/dailymotion)

The structure of subtitles was refined, you only need to implement one
method that returns a dictionnary of the available subtitles (lang, url) to
support all the subtitle options in a website. I updated the subtitle
downloaders for youtube/dailymotion to show how it works.
2013-08-08 08:54:10 +02:00
372297e713 Undo the previous commit (it was a mistake) 2013-08-07 21:24:42 +02:00
953e32b2c1 [dailymotion] Added support for subtitles + new InfoExtractor for
generic subtitle download.

The idea is that all subtitle downloaders must descend from SubtitlesIE
and implement only three basic methods to achieve the complete subtitle
download functionality. This will allow to reduce the code in YoutubeIE
once it is rewritten.
2013-08-07 18:59:11 +02:00
5898e28272 Fixed small type issue 2013-08-07 18:48:24 +02:00
67dfbc0cb9 Added exceptions for the subtitle and video types in .gitignore 2013-08-07 18:42:40 +02:00
23 changed files with 549 additions and 217 deletions

6
.gitignore vendored
View File

@ -18,3 +18,9 @@ youtube-dl.tar.gz
cover/ cover/
updates_key.pem updates_key.pem
*.egg-info *.egg-info
*.srt
*.sbv
*.vtt
*.flv
*.mp4
*.part

View File

@ -123,10 +123,8 @@ which means you can modify it, redistribute it or use it however you like.
only) only)
## Subtitle Options: ## Subtitle Options:
--write-sub write subtitle file (currently youtube only) --write-sub write subtitle file
--write-auto-sub write automatic subtitle file (currently youtube --write-auto-sub write automatic subtitle file (youtube only)
only)
--only-sub [deprecated] alias of --skip-download
--all-subs downloads all the available subtitles of the --all-subs downloads all the available subtitles of the
video video
--list-subs lists all available subtitles for the video --list-subs lists all available subtitles for the video

View File

@ -3,7 +3,8 @@
import json import json
import sys import sys
import hashlib import hashlib
import urllib.request import os.path
if len(sys.argv) <= 1: if len(sys.argv) <= 1:
print('Specify the version number as parameter') print('Specify the version number as parameter')
@ -25,6 +26,7 @@ filenames = {
'tar': 'youtube-dl-%s.tar.gz' % version} 'tar': 'youtube-dl-%s.tar.gz' % version}
build_dir = os.path.join('..', '..', 'build', version) build_dir = os.path.join('..', '..', 'build', version)
for key, filename in filenames.items(): for key, filename in filenames.items():
url = 'https://yt-dl.org/downloads/%s/%s' % (version, filename)
fn = os.path.join(build_dir, filename) fn = os.path.join(build_dir, filename)
with open(fn, 'rb') as f: with open(fn, 'rb') as f:
data = f.read() data = f.read()

View File

@ -14,7 +14,7 @@ def main():
template = tmplf.read() template = tmplf.read()
ie_htmls = [] ie_htmls = []
for ie in sorted(youtube_dl.gen_extractors(), key=lambda i: i.IE_NAME): for ie in sorted(youtube_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower()):
ie_html = '<b>{}</b>'.format(ie.IE_NAME) ie_html = '<b>{}</b>'.format(ie.IE_NAME)
try: try:
ie_html += ': {}'.format(ie.IE_DESC) ie_html += ': {}'.format(ie.IE_DESC)

View File

@ -23,9 +23,9 @@ tests = [
# 86 - vfluy6kdb 2013/09/06 # 86 - vfluy6kdb 2013/09/06
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<", ("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<",
"yuioplkjhgfdsazxcvbnm12345678q0QWrRTYUIOELKJHGFD-AZXCVBNM!@#$%^&*()_<+={[|};?/>.S"), "yuioplkjhgfdsazxcvbnm12345678q0QWrRTYUIOELKJHGFD-AZXCVBNM!@#$%^&*()_<+={[|};?/>.S"),
# 85 # 85 - vflkuzxcs 2013/09/11
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?/>.<", ("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?/>.<",
".>/?;}[{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYTREWQ0q876543r1mnbvcx9asdfghjklpoiuyt2"), "T>/?;}[{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOvUY.REWQ0987654321mnbqcxzasdfghjklpoiuytr"),
# 84 - vflg0g8PQ 2013/08/29 (sporadic) # 84 - vflg0g8PQ 2013/08/29 (sporadic)
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?>.<", ("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?>.<",
">?;}[{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYTREWq0987654321mnbvcxzasdfghjklpoiuytr"), ">?;}[{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYTREWq0987654321mnbvcxzasdfghjklpoiuytr"),

View File

@ -38,7 +38,6 @@
"writedescription": false, "writedescription": false,
"writeinfojson": true, "writeinfojson": true,
"writesubtitles": false, "writesubtitles": false,
"onlysubtitles": false,
"allsubtitles": false, "allsubtitles": false,
"listssubtitles": false "listssubtitles": false
} }

View File

@ -21,14 +21,15 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertEqual(self.matching_ies(url), ie_list) self.assertEqual(self.matching_ies(url), ie_list)
def test_youtube_playlist_matching(self): def test_youtube_playlist_matching(self):
self.assertTrue(YoutubePlaylistIE.suitable(u'ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')) assertPlaylist = lambda url: self.assertMatch(url, ['youtube:playlist'])
self.assertTrue(YoutubePlaylistIE.suitable(u'UUBABnxM4Ar9ten8Mdjj1j0Q')) #585 assertPlaylist(u'ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
self.assertTrue(YoutubePlaylistIE.suitable(u'PL63F0C78739B09958')) assertPlaylist(u'UUBABnxM4Ar9ten8Mdjj1j0Q') #585
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')) assertPlaylist(u'PL63F0C78739B09958')
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')) assertPlaylist(u'https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')) assertPlaylist(u'https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012')) #668 assertPlaylist(u'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
self.assertFalse(YoutubePlaylistIE.suitable(u'PLtS2H6bU1M')) assertPlaylist(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') #668
self.assertFalse('youtube:playlist' in self.matching_ies(u'PLtS2H6bU1M'))
def test_youtube_matching(self): def test_youtube_matching(self):
self.assertTrue(YoutubeIE.suitable(u'PLtS2H6bU1M')) self.assertTrue(YoutubeIE.suitable(u'PLtS2H6bU1M'))
@ -37,13 +38,23 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertMatch('http://www.youtube.com/v/BaW_jenozKc', ['youtube']) self.assertMatch('http://www.youtube.com/v/BaW_jenozKc', ['youtube'])
def test_youtube_channel_matching(self): def test_youtube_channel_matching(self):
self.assertTrue(YoutubeChannelIE.suitable('https://www.youtube.com/channel/HCtnHdj3df7iM')) assertChannel = lambda url: self.assertMatch(url, ['youtube:channel'])
self.assertTrue(YoutubeChannelIE.suitable('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec')) assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM')
self.assertTrue(YoutubeChannelIE.suitable('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')) assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
def test_youtube_user_matching(self): def test_youtube_user_matching(self):
self.assertMatch('www.youtube.com/NASAgovVideo/videos', ['youtube:user']) self.assertMatch('www.youtube.com/NASAgovVideo/videos', ['youtube:user'])
def test_youtube_feeds(self):
self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:watch_later'])
self.assertMatch('https://www.youtube.com/feed/subscriptions', ['youtube:subscriptions'])
self.assertMatch('https://www.youtube.com/feed/recommended', ['youtube:recommended'])
self.assertMatch('https://www.youtube.com/my_favorites', ['youtube:favorites'])
def test_youtube_show_matching(self):
self.assertMatch('http://www.youtube.com/show/airdisasters', ['youtube:show'])
def test_justin_tv_channelid_matching(self): def test_justin_tv_channelid_matching(self):
self.assertTrue(JustinTVIE.suitable(u"justin.tv/vanillatv")) self.assertTrue(JustinTVIE.suitable(u"justin.tv/vanillatv"))
self.assertTrue(JustinTVIE.suitable(u"twitch.tv/vanillatv")) self.assertTrue(JustinTVIE.suitable(u"twitch.tv/vanillatv"))
@ -61,10 +72,13 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/tsm_theoddone/c/2349361")) self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/tsm_theoddone/c/2349361"))
def test_youtube_extract(self): def test_youtube_extract(self):
self.assertEqual(YoutubeIE()._extract_id('http://www.youtube.com/watch?&v=BaW_jenozKc'), 'BaW_jenozKc') assertExtractId = lambda url, id: self.assertEqual(YoutubeIE()._extract_id(url), id)
self.assertEqual(YoutubeIE()._extract_id('https://www.youtube.com/watch?&v=BaW_jenozKc'), 'BaW_jenozKc') assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
self.assertEqual(YoutubeIE()._extract_id('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc'), 'BaW_jenozKc') assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
self.assertEqual(YoutubeIE()._extract_id('https://www.youtube.com/watch_popup?v=BaW_jenozKc'), 'BaW_jenozKc') assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
def test_no_duplicates(self): def test_no_duplicates(self):
ies = gen_extractors() ies = gen_extractors()

View File

@ -0,0 +1,69 @@
#!/usr/bin/env python
import sys
import unittest
import json
import io
import hashlib
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import DailymotionIE
from youtube_dl.utils import *
from helper import FakeYDL
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class TestDailymotionSubtitles(unittest.TestCase):
def setUp(self):
self.DL = FakeYDL()
self.url = 'http://www.dailymotion.com/video/xczg00'
def getInfoDict(self):
IE = DailymotionIE(self.DL)
info_dict = IE.extract(self.url)
return info_dict
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict[0]['subtitles']
def test_no_writesubtitles(self):
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '976553874490cba125086bbfea3ff76f')
def test_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['fr']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['fr']), '594564ec7d588942e384e920e5341792')
def test_allsubtitles(self):
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 5)
def test_list_subtitles(self):
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_automatic_captions(self):
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslang'] = ['en']
subtitles = self.getSubtitles()
self.assertTrue(len(subtitles.keys()) == 0)
def test_nosubtitles(self):
self.url = 'http://www.dailymotion.com/video/x12u166_le-zapping-tele-star-du-08-aout-2013_tv'
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_multiple_langs(self):
self.DL.params['writesubtitles'] = True
langs = ['es', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
if __name__ == '__main__':
unittest.main()

View File

@ -18,85 +18,63 @@ md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class TestYoutubeSubtitles(unittest.TestCase): class TestYoutubeSubtitles(unittest.TestCase):
def setUp(self): def setUp(self):
DL = FakeYDL() self.DL = FakeYDL()
DL.params['allsubtitles'] = False self.url = 'QRS8MkLhQmM'
DL.params['writesubtitles'] = False def getInfoDict(self):
DL.params['subtitlesformat'] = 'srt' IE = YoutubeIE(self.DL)
DL.params['listsubtitles'] = False info_dict = IE.extract(self.url)
def test_youtube_no_subtitles(self): return info_dict
DL = FakeYDL() def getSubtitles(self):
DL.params['writesubtitles'] = False info_dict = self.getInfoDict()
IE = YoutubeIE(DL) return info_dict[0]['subtitles']
info_dict = IE.extract('QRS8MkLhQmM') def test_youtube_no_writesubtitles(self):
subtitles = info_dict[0]['subtitles'] self.DL.params['writesubtitles'] = False
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None) self.assertEqual(subtitles, None)
def test_youtube_subtitles(self): def test_youtube_subtitles(self):
DL = FakeYDL() self.DL.params['writesubtitles'] = True
DL.params['writesubtitles'] = True subtitles = self.getSubtitles()
IE = YoutubeIE(DL) self.assertEqual(md5(subtitles['en']), '4cd9278a35ba2305f47354ee13472260')
info_dict = IE.extract('QRS8MkLhQmM') def test_youtube_subtitles_lang(self):
sub = info_dict[0]['subtitles']['en'] self.DL.params['writesubtitles'] = True
self.assertEqual(md5(sub), '4cd9278a35ba2305f47354ee13472260') self.DL.params['subtitleslangs'] = ['it']
def test_youtube_subtitles_it(self): subtitles = self.getSubtitles()
DL = FakeYDL() self.assertEqual(md5(subtitles['it']), '164a51f16f260476a05b50fe4c2f161d')
DL.params['writesubtitles'] = True
DL.params['subtitleslangs'] = ['it']
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles']['it']
self.assertEqual(md5(sub), '164a51f16f260476a05b50fe4c2f161d')
def test_youtube_onlysubtitles(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['onlysubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles']['en']
self.assertEqual(md5(sub), '4cd9278a35ba2305f47354ee13472260')
def test_youtube_allsubtitles(self): def test_youtube_allsubtitles(self):
DL = FakeYDL() self.DL.params['allsubtitles'] = True
DL.params['allsubtitles'] = True subtitles = self.getSubtitles()
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
subtitles = info_dict[0]['subtitles']
self.assertEqual(len(subtitles.keys()), 13) self.assertEqual(len(subtitles.keys()), 13)
def test_youtube_subtitles_sbv_format(self): def test_youtube_subtitles_sbv_format(self):
DL = FakeYDL() self.DL.params['writesubtitles'] = True
DL.params['writesubtitles'] = True self.DL.params['subtitlesformat'] = 'sbv'
DL.params['subtitlesformat'] = 'sbv' subtitles = self.getSubtitles()
IE = YoutubeIE(DL) self.assertEqual(md5(subtitles['en']), '13aeaa0c245a8bed9a451cb643e3ad8b')
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles']['en']
self.assertEqual(md5(sub), '13aeaa0c245a8bed9a451cb643e3ad8b')
def test_youtube_subtitles_vtt_format(self): def test_youtube_subtitles_vtt_format(self):
DL = FakeYDL() self.DL.params['writesubtitles'] = True
DL.params['writesubtitles'] = True self.DL.params['subtitlesformat'] = 'vtt'
DL.params['subtitlesformat'] = 'vtt' subtitles = self.getSubtitles()
IE = YoutubeIE(DL) self.assertEqual(md5(subtitles['en']), '356cdc577fde0c6783b9b822e7206ff7')
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles']['en']
self.assertEqual(md5(sub), '356cdc577fde0c6783b9b822e7206ff7')
def test_youtube_list_subtitles(self): def test_youtube_list_subtitles(self):
DL = FakeYDL() self.DL.params['listsubtitles'] = True
DL.params['listsubtitles'] = True info_dict = self.getInfoDict()
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
self.assertEqual(info_dict, None) self.assertEqual(info_dict, None)
def test_youtube_automatic_captions(self): def test_youtube_automatic_captions(self):
DL = FakeYDL() self.url = '8YoUxe5ncPo'
DL.params['writeautomaticsub'] = True self.DL.params['writeautomaticsub'] = True
DL.params['subtitleslangs'] = ['it'] self.DL.params['subtitleslangs'] = ['it']
IE = YoutubeIE(DL) subtitles = self.getSubtitles()
info_dict = IE.extract('8YoUxe5ncPo') self.assertTrue(subtitles['it'] is not None)
sub = info_dict[0]['subtitles']['it'] def test_youtube_nosubtitles(self):
self.assertTrue(sub is not None) self.url = 'sAjKT8FhjI8'
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_youtube_multiple_langs(self): def test_youtube_multiple_langs(self):
DL = FakeYDL() self.url = 'QRS8MkLhQmM'
DL.params['writesubtitles'] = True self.DL.params['writesubtitles'] = True
langs = ['it', 'fr', 'de'] langs = ['it', 'fr', 'de']
DL.params['subtitleslangs'] = langs self.DL.params['subtitleslangs'] = langs
IE = YoutubeIE(DL) subtitles = self.getSubtitles()
subtitles = IE.extract('QRS8MkLhQmM')[0]['subtitles']
for lang in langs: for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang) self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)

View File

@ -66,7 +66,7 @@ class FileDownloader(object):
@staticmethod @staticmethod
def format_seconds(seconds): def format_seconds(seconds):
(mins, secs) = divmod(seconds, 60) (mins, secs) = divmod(seconds, 60)
(hours, eta_mins) = divmod(mins, 60) (hours, mins) = divmod(mins, 60)
if hours > 99: if hours > 99:
return '--:--:--' return '--:--:--'
if hours == 0: if hours == 0:

View File

@ -29,6 +29,7 @@ __authors__ = (
'Albert Kim', 'Albert Kim',
'Pierre Rudloff', 'Pierre Rudloff',
'Huarong Huo', 'Huarong Huo',
'Ismael Mejía',
) )
__license__ = 'Public Domain' __license__ = 'Public Domain'
@ -205,13 +206,10 @@ def parseOpts(overrideArguments=None):
subtitles.add_option('--write-sub', '--write-srt', subtitles.add_option('--write-sub', '--write-srt',
action='store_true', dest='writesubtitles', action='store_true', dest='writesubtitles',
help='write subtitle file (currently youtube only)', default=False) help='write subtitle file', default=False)
subtitles.add_option('--write-auto-sub', '--write-automatic-sub', subtitles.add_option('--write-auto-sub', '--write-automatic-sub',
action='store_true', dest='writeautomaticsub', action='store_true', dest='writeautomaticsub',
help='write automatic subtitle file (currently youtube only)', default=False) help='write automatic subtitle file (youtube only)', default=False)
subtitles.add_option('--only-sub',
action='store_true', dest='skip_download',
help='[deprecated] alias of --skip-download', default=False)
subtitles.add_option('--all-subs', subtitles.add_option('--all-subs',
action='store_true', dest='allsubtitles', action='store_true', dest='allsubtitles',
help='downloads all the available subtitles of the video', default=False) help='downloads all the available subtitles of the video', default=False)
@ -222,7 +220,7 @@ def parseOpts(overrideArguments=None):
action='store', dest='subtitlesformat', metavar='FORMAT', action='store', dest='subtitlesformat', metavar='FORMAT',
help='subtitle format (default=srt) ([sbv/vtt] youtube only)', default='srt') help='subtitle format (default=srt) ([sbv/vtt] youtube only)', default='srt')
subtitles.add_option('--sub-lang', '--sub-langs', '--srt-lang', subtitles.add_option('--sub-lang', '--sub-langs', '--srt-lang',
action='callback', dest='subtitleslang', metavar='LANGS', type='str', action='callback', dest='subtitleslangs', metavar='LANGS', type='str',
default=[], callback=_comma_separated_values_options_callback, default=[], callback=_comma_separated_values_options_callback,
help='languages of the subtitles to download (optional) separated by commas, use IETF language tags like \'en,pt\'') help='languages of the subtitles to download (optional) separated by commas, use IETF language tags like \'en,pt\'')
@ -593,7 +591,7 @@ def _real_main(argv=None):
'allsubtitles': opts.allsubtitles, 'allsubtitles': opts.allsubtitles,
'listsubtitles': opts.listsubtitles, 'listsubtitles': opts.listsubtitles,
'subtitlesformat': opts.subtitlesformat, 'subtitlesformat': opts.subtitlesformat,
'subtitleslangs': opts.subtitleslang, 'subtitleslangs': opts.subtitleslangs,
'matchtitle': decodeOption(opts.matchtitle), 'matchtitle': decodeOption(opts.matchtitle),
'rejecttitle': decodeOption(opts.rejecttitle), 'rejecttitle': decodeOption(opts.rejecttitle),
'max_downloads': opts.max_downloads, 'max_downloads': opts.max_downloads,

View File

@ -29,6 +29,10 @@ from .escapist import EscapistIE
from .exfm import ExfmIE from .exfm import ExfmIE
from .facebook import FacebookIE from .facebook import FacebookIE
from .flickr import FlickrIE from .flickr import FlickrIE
from .francetv import (
PluzzIE,
FranceTvInfoIE,
)
from .freesound import FreesoundIE from .freesound import FreesoundIE
from .funnyordie import FunnyOrDieIE from .funnyordie import FunnyOrDieIE
from .gamespot import GameSpotIE from .gamespot import GameSpotIE
@ -52,6 +56,7 @@ from .keek import KeekIE
from .liveleak import LiveLeakIE from .liveleak import LiveLeakIE
from .livestream import LivestreamIE from .livestream import LivestreamIE
from .metacafe import MetacafeIE from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .mit import TechTVMITIE, MITIE from .mit import TechTVMITIE, MITIE
from .mixcloud import MixcloudIE from .mixcloud import MixcloudIE
from .mtv import MTVIE from .mtv import MTVIE
@ -74,6 +79,7 @@ from .roxwel import RoxwelIE
from .rtlnow import RTLnowIE from .rtlnow import RTLnowIE
from .sina import SinaIE from .sina import SinaIE
from .slashdot import SlashdotIE from .slashdot import SlashdotIE
from .slideshare import SlideshareIE
from .sohu import SohuIE from .sohu import SohuIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE from .soundcloud import SoundcloudIE, SoundcloudSetIE
from .spiegel import SpiegelIE from .spiegel import SpiegelIE

View File

@ -5,7 +5,7 @@ from .common import InfoExtractor
class Canalc2IE(InfoExtractor): class Canalc2IE(InfoExtractor):
_IE_NAME = 'canalc2.tv' IE_NAME = 'canalc2.tv'
_VALID_URL = r'http://.*?\.canalc2\.tv/video\.asp\?idVideo=(\d+)&voir=oui' _VALID_URL = r'http://.*?\.canalc2\.tv/video\.asp\?idVideo=(\d+)&voir=oui'
_TEST = { _TEST = {

View File

@ -3,18 +3,22 @@ import json
import itertools import itertools
from .common import InfoExtractor from .common import InfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..utils import ( from ..utils import (
compat_urllib_request, compat_urllib_request,
compat_str,
get_element_by_attribute, get_element_by_attribute,
get_element_by_id, get_element_by_id,
ExtractorError, ExtractorError,
) )
class DailymotionIE(InfoExtractor):
class DailymotionIE(SubtitlesInfoExtractor):
"""Information Extractor for Dailymotion""" """Information Extractor for Dailymotion"""
_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/video/([^/]+)' _VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/(?:embed/)?video/([^/]+)'
IE_NAME = u'dailymotion' IE_NAME = u'dailymotion'
_TEST = { _TEST = {
u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech', u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech',
@ -33,6 +37,7 @@ class DailymotionIE(InfoExtractor):
video_id = mobj.group(1).split('_')[0].split('?')[0] video_id = mobj.group(1).split('_')[0].split('?')[0]
video_extension = 'mp4' video_extension = 'mp4'
url = 'http://www.dailymotion.com/video/%s' % video_id
# Retrieve video webpage to extract further information # Retrieve video webpage to extract further information
request = compat_urllib_request.Request(url) request = compat_urllib_request.Request(url)
@ -72,6 +77,12 @@ class DailymotionIE(InfoExtractor):
raise ExtractorError(u'Unable to extract video URL') raise ExtractorError(u'Unable to extract video URL')
video_url = info[max_quality] video_url = info[max_quality]
# subtitles
video_subtitles = self.extract_subtitles(video_id)
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id)
return
return [{ return [{
'id': video_id, 'id': video_id,
'url': video_url, 'url': video_url,
@ -79,9 +90,25 @@ class DailymotionIE(InfoExtractor):
'upload_date': video_upload_date, 'upload_date': video_upload_date,
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'ext': video_extension, 'ext': video_extension,
'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url'] 'thumbnail': info['thumbnail_url']
}] }]
def _get_available_subtitles(self, video_id):
try:
sub_list = self._download_webpage(
'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
video_id, note=False)
except ExtractorError as err:
self._downloader.report_warning(u'unable to download video subtitles: %s' % compat_str(err))
return {}
info = json.loads(sub_list)
if (info['total'] > 0):
sub_lang_list = dict((l['language'], l['url']) for l in info['list'])
return sub_lang_list
self._downloader.report_warning(u'video doesn\'t have subtitles')
return {}
class DailymotionPlaylistIE(InfoExtractor): class DailymotionPlaylistIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/' _VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/'

View File

@ -0,0 +1,77 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
)
class FranceTVBaseInfoExtractor(InfoExtractor):
def _extract_video(self, video_id):
xml_desc = self._download_webpage(
'http://www.francetvinfo.fr/appftv/webservices/video/'
'getInfosOeuvre.php?id-diffusion='
+ video_id, video_id, 'Downloading XML config')
info = xml.etree.ElementTree.fromstring(xml_desc.encode('utf-8'))
manifest_url = info.find('videos/video/url').text
video_url = manifest_url.replace('manifest.f4m', 'index_2_av.m3u8')
video_url = video_url.replace('/z/', '/i/')
thumbnail_path = info.find('image').text
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': info.find('titre').text,
'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', thumbnail_path),
'description': info.find('synopsis').text,
}
class PluzzIE(FranceTVBaseInfoExtractor):
IE_NAME = u'pluzz.francetv.fr'
_VALID_URL = r'https?://pluzz\.francetv\.fr/videos/(.*?)\.html'
_TEST = {
u'url': u'http://pluzz.francetv.fr/videos/allo_rufo_saison5_,88439064.html',
u'file': u'88439064.mp4',
u'info_dict': {
u'title': u'Allô Rufo',
u'description': u'md5:d909f1ebdf963814b65772aea250400e',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title)
video_id = self._search_regex(
r'data-diffusion="(\d+)"', webpage, 'ID')
return self._extract_video(video_id)
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = u'francetvinfo.fr'
_VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+).html'
_TEST = {
u'url': u'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
u'file': u'84981923.mp4',
u'info_dict': {
u'title': u'Soir 3',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
video_id = self._search_regex(r'id-video=(\d+?)"', webpage, u'video id')
return self._extract_video(video_id)

View File

@ -109,6 +109,11 @@ class GenericIE(InfoExtractor):
return new_url return new_url
def _real_extract(self, url): def _real_extract(self, url):
parsed_url = compat_urlparse.urlparse(url)
if not parsed_url.scheme:
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url)
try: try:
new_url = self._test_redirect(url) new_url = self._test_redirect(url)
if new_url: if new_url:

View File

@ -19,8 +19,7 @@ class HowcastIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
webpage_url = 'http://www.howcast.com/videos/' + video_id webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(webpage_url, video_id)
self.report_extraction(video_id) self.report_extraction(video_id)

View File

@ -0,0 +1,55 @@
import re
import xml.etree.ElementTree
import operator
from .common import InfoExtractor
class MetacriticIE(InfoExtractor):
_VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',
u'file': u'3698222.mp4',
u'info_dict': {
u'title': u'inFamous: Second Son - inSide Sucker Punch: Smoke & Mirrors',
u'description': u'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.',
u'duration': 221,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
# The xml is not well formatted, there are raw '&'
info_xml = self._download_webpage('http://www.metacritic.com/video_data?video=' + video_id,
video_id, u'Downloading info xml').replace('&', '&amp;')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
clip = next(c for c in info.findall('playList/clip') if c.find('id').text == video_id)
formats = []
for videoFile in clip.findall('httpURI/videoFile'):
rate_str = videoFile.find('rate').text
video_url = videoFile.find('filePath').text
formats.append({
'url': video_url,
'ext': 'mp4',
'format_id': rate_str,
'rate': int(rate_str),
})
formats.sort(key=operator.itemgetter('rate'))
description = self._html_search_regex(r'<b>Description:</b>(.*?)</p>',
webpage, u'description', flags=re.DOTALL)
info = {
'id': video_id,
'title': clip.find('title').text,
'formats': formats,
'description': description,
'duration': int(clip.find('duration').text),
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@ -0,0 +1,47 @@
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
ExtractorError,
)
class SlideshareIE(InfoExtractor):
_VALID_URL = r'https?://www\.slideshare\.net/[^/]+?/(?P<title>.+?)($|\?)'
_TEST = {
u'url': u'http://www.slideshare.net/Dataversity/keynote-presentation-managing-scale-and-complexity',
u'file': u'25665706.mp4',
u'info_dict': {
u'title': u'Managing Scale and Complexity',
u'description': u'This was a keynote presentation at the NoSQL Now! 2013 Conference & Expo (http://www.nosqlnow.com). This presentation was given by Adrian Cockcroft from Netflix',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
slideshare_obj = self._search_regex(
r'var slideshare_object = ({.*?}); var user_info =',
webpage, u'slideshare object')
info = json.loads(slideshare_obj)
if info['slideshow']['type'] != u'video':
raise ExtractorError(u'Webpage type is "%s": only video extraction is supported for Slideshare' % info['slideshow']['type'], expected=True)
doc = info['doc']
bucket = info['jsplayer']['video_bucket']
ext = info['jsplayer']['video_extension']
video_url = compat_urlparse.urljoin(bucket, doc + '-SD.' + ext)
return {
'_type': 'video',
'id': info['slideshow']['id'],
'title': info['slideshow']['title'],
'ext': ext,
'url': video_url,
'thumbnail': info['slideshow']['pin_image_url'],
'description': self._og_search_description(webpage),
}

View File

@ -8,7 +8,7 @@ from ..utils import ExtractorError
class SohuIE(InfoExtractor): class SohuIE(InfoExtractor):
_VALID_URL = r'https?://tv\.sohu\.com/\d+?/n(?P<id>\d+)\.shtml.*?' _VALID_URL = r'https?://(?P<mytv>my\.)?tv\.sohu\.com/.+?/(?(mytv)|n)(?P<id>\d+)\.shtml.*?'
_TEST = { _TEST = {
u'url': u'http://tv.sohu.com/20130724/n382479172.shtml#super', u'url': u'http://tv.sohu.com/20130724/n382479172.shtml#super',
@ -21,7 +21,10 @@ class SohuIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
def _fetch_data(vid_id): def _fetch_data(vid_id, mytv=False):
if mytv:
base_data_url = 'http://my.tv.sohu.com/play/videonew.do?vid='
else:
base_data_url = u'http://hot.vrs.sohu.com/vrs_flash.action?vid=' base_data_url = u'http://hot.vrs.sohu.com/vrs_flash.action?vid='
data_url = base_data_url + str(vid_id) data_url = base_data_url + str(vid_id)
data_json = self._download_webpage( data_json = self._download_webpage(
@ -31,15 +34,16 @@ class SohuIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
mytv = mobj.group('mytv') is not None
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
raw_title = self._html_search_regex(r'(?s)<title>(.+?)</title>', raw_title = self._html_search_regex(r'(?s)<title>(.+?)</title>',
webpage, u'video title') webpage, u'video title')
title = raw_title.partition('-')[0].strip() title = raw_title.partition('-')[0].strip()
vid = self._html_search_regex(r'var vid="(\d+)"', webpage, vid = self._html_search_regex(r'var vid ?= ?["\'](\d+)["\']', webpage,
u'video path') u'video path')
data = _fetch_data(vid) data = _fetch_data(vid, mytv)
QUALITIES = ('ori', 'super', 'high', 'nor') QUALITIES = ('ori', 'super', 'high', 'nor')
vid_ids = [data['data'][q + 'Vid'] vid_ids = [data['data'][q + 'Vid']
@ -51,7 +55,7 @@ class SohuIE(InfoExtractor):
# For now, we just pick the highest available quality # For now, we just pick the highest available quality
vid_id = vid_ids[-1] vid_id = vid_ids[-1]
format_data = data if vid == vid_id else _fetch_data(vid_id) format_data = data if vid == vid_id else _fetch_data(vid_id, mytv)
part_count = format_data['data']['totalBlocks'] part_count = format_data['data']['totalBlocks']
allot = format_data['allot'] allot = format_data['allot']
prot = format_data['prot'] prot = format_data['prot']

View File

@ -0,0 +1,92 @@
from .common import InfoExtractor
from ..utils import (
compat_str,
ExtractorError,
)
class SubtitlesInfoExtractor(InfoExtractor):
@property
def _have_to_download_any_subtitles(self):
return any([self._downloader.params.get('writesubtitles', False),
self._downloader.params.get('writeautomaticsub'),
self._downloader.params.get('allsubtitles', False)])
def _list_available_subtitles(self, video_id, webpage=None):
""" outputs the available subtitles for the video """
sub_lang_list = self._get_available_subtitles(video_id)
auto_captions_list = self._get_available_automatic_caption(video_id, webpage)
sub_lang = ",".join(list(sub_lang_list.keys()))
self.to_screen(u'%s: Available subtitles for video: %s' %
(video_id, sub_lang))
auto_lang = ",".join(auto_captions_list.keys())
self.to_screen(u'%s: Available automatic captions for video: %s' %
(video_id, auto_lang))
def extract_subtitles(self, video_id, video_webpage=None):
"""
returns {sub_lang: sub} ,{} if subtitles not found or None if the
subtitles aren't requested.
"""
if not self._have_to_download_any_subtitles:
return None
available_subs_list = {}
if self._downloader.params.get('writeautomaticsub', False):
available_subs_list.update(self._get_available_automatic_caption(video_id, video_webpage))
if self._downloader.params.get('writesubtitles', False) or self._downloader.params.get('allsubtitles', False):
available_subs_list.update(self._get_available_subtitles(video_id))
if not available_subs_list: # error, it didn't get the available subtitles
return {}
if self._downloader.params.get('allsubtitles', False):
sub_lang_list = available_subs_list
else:
if self._downloader.params.get('subtitleslangs', False):
requested_langs = self._downloader.params.get('subtitleslangs')
elif 'en' in available_subs_list:
requested_langs = ['en']
else:
requested_langs = [list(available_subs_list.keys())[0]]
sub_lang_list = {}
for sub_lang in requested_langs:
if not sub_lang in available_subs_list:
self._downloader.report_warning(u'no closed captions found in the specified language "%s"' % sub_lang)
continue
sub_lang_list[sub_lang] = available_subs_list[sub_lang]
subtitles = {}
for sub_lang, url in sub_lang_list.items():
subtitle = self._request_subtitle_url(sub_lang, url)
if subtitle:
subtitles[sub_lang] = subtitle
return subtitles
def _request_subtitle_url(self, sub_lang, url):
""" makes the http request for the subtitle """
try:
sub = self._download_webpage(url, None, note=False)
except ExtractorError as err:
self._downloader.report_warning(u'unable to download video subtitles for %s: %s' % (sub_lang, compat_str(err)))
return
if not sub:
self._downloader.report_warning(u'Did not fetch video subtitles')
return
return sub
def _get_available_subtitles(self, video_id):
"""
returns {sub_lang: url} or {} if not available
Must be redefined by the subclasses
"""
pass
def _get_available_automatic_caption(self, video_id, webpage):
"""
returns {sub_lang: url} or {} if not available
Must be redefined by the subclasses that support automatic captions,
otherwise it will return {}
"""
self._downloader.report_warning(u'Automatic Captions not supported by this server')
return {}

View File

@ -5,8 +5,10 @@ import netrc
import re import re
import socket import socket
import itertools import itertools
import xml.etree.ElementTree
from .common import InfoExtractor, SearchInfoExtractor from .common import InfoExtractor, SearchInfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..utils import ( from ..utils import (
compat_http_client, compat_http_client,
compat_parse_qs, compat_parse_qs,
@ -130,7 +132,8 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return return
self._confirm_age() self._confirm_age()
class YoutubeIE(YoutubeBaseInfoExtractor):
class YoutubeIE(YoutubeBaseInfoExtractor, SubtitlesInfoExtractor):
IE_DESC = u'YouTube.com' IE_DESC = u'YouTube.com'
_VALID_URL = r"""^ _VALID_URL = r"""^
( (
@ -150,7 +153,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|youtu\.be/ # just youtu.be/xxxx |youtu\.be/ # just youtu.be/xxxx
) )
)? # all until now is optional -> you can pass the naked ID )? # all until now is optional -> you can pass the naked ID
([0-9A-Za-z_-]+) # here is it! the YouTube video ID ([0-9A-Za-z_-]{11}) # here is it! the YouTube video ID
(?(1).+)? # if we found the ID, everything can follow (?(1).+)? # if we found the ID, everything can follow
$""" $"""
_NEXT_URL_RE = r'[\?&]next_url=([^&]+)' _NEXT_URL_RE = r'[\?&]next_url=([^&]+)'
@ -386,7 +389,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
"""Receives a URL and returns True if suitable for this IE.""" """Receives a URL and returns True if suitable for this IE."""
if YoutubePlaylistIE.suitable(url) or YoutubeSubscriptionsIE.suitable(url): return False if YoutubePlaylistIE.suitable(url): return False
return re.match(cls._VALID_URL, url, re.VERBOSE) is not None return re.match(cls._VALID_URL, url, re.VERBOSE) is not None
def report_video_webpage_download(self, video_id): def report_video_webpage_download(self, video_id):
@ -397,19 +400,6 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
"""Report attempt to download video info webpage.""" """Report attempt to download video info webpage."""
self.to_screen(u'%s: Downloading video info webpage' % video_id) self.to_screen(u'%s: Downloading video info webpage' % video_id)
def report_video_subtitles_download(self, video_id):
"""Report attempt to download video info webpage."""
self.to_screen(u'%s: Checking available subtitles' % video_id)
def report_video_subtitles_request(self, video_id, sub_lang, format):
"""Report attempt to download video info webpage."""
self.to_screen(u'%s: Downloading video subtitles for %s.%s' % (video_id, sub_lang, format))
def report_video_subtitles_available(self, video_id, sub_lang_list):
"""Report available subtitles."""
sub_lang = ",".join(list(sub_lang_list.keys()))
self.to_screen(u'%s: Available subtitles for video: %s' % (video_id, sub_lang))
def report_information_extraction(self, video_id): def report_information_extraction(self, video_id):
"""Report attempt to extract video information.""" """Report attempt to extract video information."""
self.to_screen(u'%s: Extracting video information' % video_id) self.to_screen(u'%s: Extracting video information' % video_id)
@ -438,7 +428,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
elif len(s) == 86: elif len(s) == 86:
return s[5:34] + s[0] + s[35:38] + s[3] + s[39:45] + s[38] + s[46:53] + s[73] + s[54:73] + s[85] + s[74:85] + s[53] return s[5:34] + s[0] + s[35:38] + s[3] + s[39:45] + s[38] + s[46:53] + s[73] + s[54:73] + s[85] + s[74:85] + s[53]
elif len(s) == 85: elif len(s) == 85:
return s[83:34:-1] + s[0] + s[33:27:-1] + s[3] + s[26:19:-1] + s[34] + s[18:3:-1] + s[27] return s[40] + s[82:43:-1] + s[22] + s[42:40:-1] + s[83] + s[39:22:-1] + s[0] + s[21:2:-1]
elif len(s) == 84: elif len(s) == 84:
return s[81:36:-1] + s[0] + s[35:2:-1] return s[81:36:-1] + s[0] + s[35:2:-1]
elif len(s) == 83: elif len(s) == 83:
@ -464,56 +454,38 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Fallback to the other algortihms # Fallback to the other algortihms
return self._decrypt_signature(s) return self._decrypt_signature(s)
def _get_available_subtitles(self, video_id): def _get_available_subtitles(self, video_id):
self.report_video_subtitles_download(video_id)
request = compat_urllib_request.Request('http://video.google.com/timedtext?hl=en&type=list&v=%s' % video_id)
try: try:
sub_list = compat_urllib_request.urlopen(request).read().decode('utf-8') sub_list = self._download_webpage(
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err: 'http://video.google.com/timedtext?hl=en&type=list&v=%s' % video_id,
video_id, note=False)
except ExtractorError as err:
self._downloader.report_warning(u'unable to download video subtitles: %s' % compat_str(err)) self._downloader.report_warning(u'unable to download video subtitles: %s' % compat_str(err))
return {} return {}
sub_lang_list = re.findall(r'name="([^"]*)"[^>]+lang_code="([\w\-]+)"', sub_list) lang_list = re.findall(r'name="([^"]*)"[^>]+lang_code="([\w\-]+)"', sub_list)
sub_lang_list = dict((l[1], l[0]) for l in sub_lang_list)
sub_lang_list = {}
for l in lang_list:
lang = l[1]
params = compat_urllib_parse.urlencode({
'lang': lang,
'v': video_id,
'fmt': self._downloader.params.get('subtitlesformat'),
})
url = u'http://www.youtube.com/api/timedtext?' + params
sub_lang_list[lang] = url
if not sub_lang_list: if not sub_lang_list:
self._downloader.report_warning(u'video doesn\'t have subtitles') self._downloader.report_warning(u'video doesn\'t have subtitles')
return {} return {}
return sub_lang_list return sub_lang_list
def _list_available_subtitles(self, video_id): def _get_available_automatic_caption(self, video_id, webpage):
sub_lang_list = self._get_available_subtitles(video_id)
self.report_video_subtitles_available(video_id, sub_lang_list)
def _request_subtitle(self, sub_lang, sub_name, video_id, format):
"""
Return the subtitle as a string or None if they are not found
"""
self.report_video_subtitles_request(video_id, sub_lang, format)
params = compat_urllib_parse.urlencode({
'lang': sub_lang,
'name': sub_name,
'v': video_id,
'fmt': format,
})
url = 'http://www.youtube.com/api/timedtext?' + params
try:
sub = compat_urllib_request.urlopen(url).read().decode('utf-8')
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self._downloader.report_warning(u'unable to download video subtitles for %s: %s' % (sub_lang, compat_str(err)))
return
if not sub:
self._downloader.report_warning(u'Did not fetch video subtitles')
return
return sub
def _request_automatic_caption(self, video_id, webpage):
"""We need the webpage for getting the captions url, pass it as an """We need the webpage for getting the captions url, pass it as an
argument to speed up the process.""" argument to speed up the process."""
sub_lang = (self._downloader.params.get('subtitleslangs') or ['en'])[0]
sub_format = self._downloader.params.get('subtitlesformat') sub_format = self._downloader.params.get('subtitlesformat')
self.to_screen(u'%s: Looking for automatic captions' % video_id) self.to_screen(u'%s: Looking for automatic captions' % video_id)
mobj = re.search(r';ytplayer.config = ({.*?});', webpage) mobj = re.search(r';ytplayer.config = ({.*?});', webpage)
err_msg = u'Couldn\'t find automatic captions for "%s"' % sub_lang err_msg = u'Couldn\'t find automatic captions for %s' % video_id
if mobj is None: if mobj is None:
self._downloader.report_warning(err_msg) self._downloader.report_warning(err_msg)
return {} return {}
@ -522,54 +494,39 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
args = player_config[u'args'] args = player_config[u'args']
caption_url = args[u'ttsurl'] caption_url = args[u'ttsurl']
timestamp = args[u'timestamp'] timestamp = args[u'timestamp']
# We get the available subtitles
list_params = compat_urllib_parse.urlencode({
'type': 'list',
'tlangs': 1,
'asrs': 1,
})
list_url = caption_url + '&' + list_params
list_page = self._download_webpage(list_url, video_id)
caption_list = xml.etree.ElementTree.fromstring(list_page.encode('utf-8'))
original_lang_node = caption_list.find('track')
if original_lang_node.attrib.get('kind') != 'asr' :
self._downloader.report_warning(u'Video doesn\'t have automatic captions')
return {}
original_lang = original_lang_node.attrib['lang_code']
sub_lang_list = {}
for lang_node in caption_list.findall('target'):
sub_lang = lang_node.attrib['lang_code']
params = compat_urllib_parse.urlencode({ params = compat_urllib_parse.urlencode({
'lang': 'en', 'lang': original_lang,
'tlang': sub_lang, 'tlang': sub_lang,
'fmt': sub_format, 'fmt': sub_format,
'ts': timestamp, 'ts': timestamp,
'kind': 'asr', 'kind': 'asr',
}) })
subtitles_url = caption_url + '&' + params sub_lang_list[sub_lang] = caption_url + '&' + params
sub = self._download_webpage(subtitles_url, video_id, u'Downloading automatic captions') return sub_lang_list
return {sub_lang: sub}
# An extractor error can be raise by the download process if there are # An extractor error can be raise by the download process if there are
# no automatic captions but there are subtitles # no automatic captions but there are subtitles
except (KeyError, ExtractorError): except (KeyError, ExtractorError):
self._downloader.report_warning(err_msg) self._downloader.report_warning(err_msg)
return {} return {}
def _extract_subtitles(self, video_id):
"""
Return a dictionary: {language: subtitles} or {} if the subtitles
couldn't be found
"""
available_subs_list = self._get_available_subtitles(video_id)
sub_format = self._downloader.params.get('subtitlesformat')
if not available_subs_list: #There was some error, it didn't get the available subtitles
return {}
if self._downloader.params.get('allsubtitles', False):
sub_lang_list = available_subs_list
else:
if self._downloader.params.get('subtitleslangs', False):
reqested_langs = self._downloader.params.get('subtitleslangs')
elif 'en' in available_subs_list:
reqested_langs = ['en']
else:
reqested_langs = [list(available_subs_list.keys())[0]]
sub_lang_list = {}
for sub_lang in reqested_langs:
if not sub_lang in available_subs_list:
self._downloader.report_warning(u'no closed captions found in the specified language "%s"' % sub_lang)
continue
sub_lang_list[sub_lang] = available_subs_list[sub_lang]
subtitles = {}
for sub_lang in sub_lang_list:
subtitle = self._request_subtitle(sub_lang, sub_lang_list[sub_lang].encode('utf-8'), video_id, sub_format)
if subtitle:
subtitles[sub_lang] = subtitle
return subtitles
def _print_formats(self, formats): def _print_formats(self, formats):
print('Available formats:') print('Available formats:')
for x in formats: for x in formats:
@ -643,7 +600,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
manifest = self._download_webpage(manifest_url, video_id, u'Downloading formats manifest') manifest = self._download_webpage(manifest_url, video_id, u'Downloading formats manifest')
formats_urls = _get_urls(manifest) formats_urls = _get_urls(manifest)
for format_url in formats_urls: for format_url in formats_urls:
itag = self._search_regex(r'itag%3D(\d+?)/', format_url, 'itag') itag = self._search_regex(r'itag/(\d+?)/', format_url, 'itag')
url_map[itag] = format_url url_map[itag] = format_url
return url_map return url_map
@ -768,15 +725,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_description = u'' video_description = u''
# subtitles # subtitles
video_subtitles = None video_subtitles = self.extract_subtitles(video_id, video_webpage)
if self._downloader.params.get('writesubtitles', False) or self._downloader.params.get('allsubtitles', False):
video_subtitles = self._extract_subtitles(video_id)
elif self._downloader.params.get('writeautomaticsub', False):
video_subtitles = self._request_automatic_caption(video_id, video_webpage)
if self._downloader.params.get('listsubtitles', False): if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id) self._list_available_subtitles(video_id, video_webpage)
return return
if 'length_seconds' not in video_info: if 'length_seconds' not in video_info:
@ -1015,14 +967,18 @@ class YoutubeChannelIE(InfoExtractor):
class YoutubeUserIE(InfoExtractor): class YoutubeUserIE(InfoExtractor):
IE_DESC = u'YouTube.com user videos (URL or "ytuser" keyword)' IE_DESC = u'YouTube.com user videos (URL or "ytuser" keyword)'
_VALID_URL = r'(?:(?:(?:https?://)?(?:\w+\.)?youtube\.com/(?:user/)?)|ytuser:)([A-Za-z0-9_-]+)' _VALID_URL = r'(?:(?:(?:https?://)?(?:\w+\.)?youtube\.com/(?:user/)?)|ytuser:)(?!feed/)([A-Za-z0-9_-]+)'
_TEMPLATE_URL = 'http://gdata.youtube.com/feeds/api/users/%s' _TEMPLATE_URL = 'http://gdata.youtube.com/feeds/api/users/%s'
_GDATA_PAGE_SIZE = 50 _GDATA_PAGE_SIZE = 50
_GDATA_URL = 'http://gdata.youtube.com/feeds/api/users/%s/uploads?max-results=%d&start-index=%d&alt=json' _GDATA_URL = 'http://gdata.youtube.com/feeds/api/users/%s/uploads?max-results=%d&start-index=%d&alt=json'
IE_NAME = u'youtube:user' IE_NAME = u'youtube:user'
@classmethod
def suitable(cls, url): def suitable(cls, url):
if YoutubeIE.suitable(url): return False # Don't return True if the url can be extracted with other youtube
# extractor, the regex would is too permissive and it would match.
other_ies = iter(klass for (name, klass) in globals().items() if name.endswith('IE') and klass is not cls)
if any(ie.suitable(url) for ie in other_ies): return False
else: return super(YoutubeUserIE, cls).suitable(url) else: return super(YoutubeUserIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,2 +1,2 @@
__version__ = '2013.09.06.1' __version__ = '2013.09.12'