release 2016.06.22

[svt] Various improvements
+ [svt:play] Add fallback path looking for video id and fix extraction for oppetarkiv * [svt:base] Detect geo restriction * [svt:base] Extract series related metadata
2016-06-22 23:43:24 +07:00 · 2016-06-22 23:36:07 +07:00 · 2016-06-22 12:52:15 +01:00 · 2016-06-21 22:31:41 +07:00 · 2016-06-21 13:37:57 +01:00 · 2016-06-21 17:55:53 +08:00
19 changed files with 346 additions and 304 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@ -6,8 +6,8 @@

 ---

-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.19.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.19.1**
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.22*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.22**

 ### Before submitting an *issue* make sure you have:
 - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
 [debug] User config: []
 [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2016.06.19.1
+[debug] youtube-dl version 2016.06.22
 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
 [debug] Proxy map: {}
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@ -44,8 +44,8 @@
 - **appletrailers:section**
 - **archive.org**: archive.org videos
 - **ARD**
- - **ARD:mediathek**
 - **ARD:mediathek**: Saarländischer Rundfunk
+ - **ARD:mediathek**
 - **arte.tv**
 - **arte.tv:+7**
 - **arte.tv:cinema**
@ -248,7 +248,6 @@
 - **Gamersyde**
 - **GameSpot**
 - **GameStar**
- - **Gametrailers**
 - **Gazeta**
 - **GDCVault**
 - **generic**: Generic downloader that works on some sites
--- a/youtube_dl/downloader/hls.py
+++ b/youtube_dl/downloader/hls.py
@ -2,14 +2,24 @@ from __future__ import unicode_literals

 import os.path
 import re
+import binascii
+try:
+    from Crypto.Cipher import AES
+    can_decrypt_frag = True
+except ImportError:
+    can_decrypt_frag = False

 from .fragment import FragmentFD
 from .external import FFmpegFD

-from ..compat import compat_urlparse
+from ..compat import (
+    compat_urlparse,
+    compat_struct_pack,
+)
 from ..utils import (
    encodeFilename,
    sanitize_open,
+    parse_m3u8_attributes,
 )


@ -21,7 +31,7 @@ class HlsFD(FragmentFD):
    @staticmethod
    def can_download(manifest):
        UNSUPPORTED_FEATURES = (
-            r'#EXT-X-KEY:METHOD=(?!NONE)',  # encrypted streams [1]
+            r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)',  # encrypted streams [1]
            r'#EXT-X-BYTERANGE',  # playlists composed of byte ranges of media files [2]

            # Live streams heuristic does not always work (e.g. geo restricted to Germany
@ -39,7 +49,9 @@ class HlsFD(FragmentFD):
            # 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2
            # 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5
        )
-        return all(not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES)
+        check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
+        check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
+        return all(check_results)

    def real_download(self, filename, info_dict):
        man_url = info_dict['url']
@ -57,36 +69,60 @@ class HlsFD(FragmentFD):
                fd.add_progress_hook(ph)
            return fd.real_download(filename, info_dict)

-        fragment_urls = []
+        total_frags = 0
        for line in s.splitlines():
            line = line.strip()
            if line and not line.startswith('#'):
-                segment_url = (
-                    line
-                    if re.match(r'^https?://', line)
-                    else compat_urlparse.urljoin(man_url, line))
-                fragment_urls.append(segment_url)
-                # We only download the first fragment during the test
-                if self.params.get('test', False):
-                    break
+                total_frags += 1

        ctx = {
            'filename': filename,
-            'total_frags': len(fragment_urls),
+            'total_frags': total_frags,
        }

        self._prepare_and_start_frag_download(ctx)

+        i = 0
+        media_sequence = 0
+        decrypt_info = {'METHOD': 'NONE'}
        frags_filenames = []
-        for i, frag_url in enumerate(fragment_urls):
-            frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
-            success = ctx['dl'].download(frag_filename, {'url': frag_url})
-            if not success:
-                return False
-            down, frag_sanitized = sanitize_open(frag_filename, 'rb')
-            ctx['dest_stream'].write(down.read())
-            down.close()
-            frags_filenames.append(frag_sanitized)
+        for line in s.splitlines():
+            line = line.strip()
+            if line:
+                if not line.startswith('#'):
+                    frag_url = (
+                        line
+                        if re.match(r'^https?://', line)
+                        else compat_urlparse.urljoin(man_url, line))
+                    frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
+                    success = ctx['dl'].download(frag_filename, {'url': frag_url})
+                    if not success:
+                        return False
+                    down, frag_sanitized = sanitize_open(frag_filename, 'rb')
+                    frag_content = down.read()
+                    down.close()
+                    if decrypt_info['METHOD'] == 'AES-128':
+                        iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
+                        frag_content = AES.new(
+                            decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
+                    ctx['dest_stream'].write(frag_content)
+                    frags_filenames.append(frag_sanitized)
+                    # We only download the first fragment during the test
+                    if self.params.get('test', False):
+                        break
+                    i += 1
+                    media_sequence += 1
+                elif line.startswith('#EXT-X-KEY'):
+                    decrypt_info = parse_m3u8_attributes(line[11:])
+                    if decrypt_info['METHOD'] == 'AES-128':
+                        if 'IV' in decrypt_info:
+                            decrypt_info['IV'] = binascii.unhexlify(decrypt_info['IV'][2:])
+                        if not re.match(r'^https?://', decrypt_info['URI']):
+                            decrypt_info['URI'] = compat_urlparse.urljoin(
+                                man_url, decrypt_info['URI'])
+                        decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read()
+                elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
+                    media_sequence = int(line[22:])

        self._finish_frag_download(ctx)

--- a/youtube_dl/extractor/cbs.py
+++ b/youtube_dl/extractor/cbs.py
@ -1,17 +1,13 @@
 from __future__ import unicode_literals

-import re
-
-from .theplatform import ThePlatformIE
+from .theplatform import ThePlatformFeedIE
 from ..utils import (
-    xpath_text,
-    xpath_element,
    int_or_none,
    find_xpath_attr,
 )


-class CBSBaseIE(ThePlatformIE):
+class CBSBaseIE(ThePlatformFeedIE):
    def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
        closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
        return {
@ -21,9 +17,22 @@ class CBSBaseIE(ThePlatformIE):
            }]
        } if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []

+    def _extract_video_info(self, filter_query, video_id):
+        return self._extract_feed_info(
+            'dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id, lambda entry: {
+                'series': entry.get('cbs$SeriesTitle'),
+                'season_number': int_or_none(entry.get('cbs$SeasonNumber')),
+                'episode': entry.get('cbs$EpisodeTitle'),
+                'episode_number': int_or_none(entry.get('cbs$EpisodeNumber')),
+            }, {
+                'StreamPack': {
+                    'manifest': 'm3u',
+                }
+            })
+

 class CBSIE(CBSBaseIE):
-    _VALID_URL = r'(?:cbs:(?P<content_id>\w+)|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<display_id>[^/]+))'
+    _VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'

    _TESTS = [{
        'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
@ -38,25 +47,7 @@ class CBSIE(CBSBaseIE):
            'upload_date': '20131127',
            'uploader': 'CBSI-NEW',
        },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
-        '_skip': 'Blocked outside the US',
-    }, {
-        'url': 'http://www.cbs.com/shows/liveonletterman/artist/221752/st-vincent/',
-        'info_dict': {
-            'id': 'WWF_5KqY3PK1',
-            'display_id': 'st-vincent',
-            'ext': 'flv',
-            'title': 'Live on Letterman - St. Vincent',
-            'description': 'Live On Letterman: St. Vincent in concert from New York\'s Ed Sullivan Theater on Tuesday, July 16, 2014.',
-            'duration': 3221,
-        },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
+        'expected_warnings': ['Failed to download m3u8 information'],
        '_skip': 'Blocked outside the US',
    }, {
        'url': 'http://colbertlateshow.com/video/8GmB0oY0McANFvp2aEffk9jZZZ2YyXxy/the-colbeard/',
@ -68,44 +59,5 @@ class CBSIE(CBSBaseIE):
    TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'

    def _real_extract(self, url):
-        content_id, display_id = re.match(self._VALID_URL, url).groups()
-        if not content_id:
-            webpage = self._download_webpage(url, display_id)
-            content_id = self._search_regex(
-                [r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"],
-                webpage, 'content id')
-        items_data = self._download_xml(
-            'http://can.cbs.com/thunder/player/videoPlayerService.php',
-            content_id, query={'partner': 'cbs', 'contentId': content_id})
-        video_data = xpath_element(items_data, './/item')
-        title = xpath_text(video_data, 'videoTitle', 'title', True)
-
-        subtitles = {}
-        formats = []
-        for item in items_data.findall('.//item'):
-            pid = xpath_text(item, 'pid')
-            if not pid:
-                continue
-            tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid
-            if '.m3u8' in xpath_text(item, 'contentUrl', default=''):
-                tp_release_url += '&manifest=m3u'
-            tp_formats, tp_subtitles = self._extract_theplatform_smil(
-                tp_release_url, content_id, 'Downloading %s SMIL data' % pid)
-            formats.extend(tp_formats)
-            subtitles = self._merge_subtitles(subtitles, tp_subtitles)
-        self._sort_formats(formats)
-
-        info = self.get_metadata('dJ5BDC/media/guid/2198311517/%s' % content_id, content_id)
-        info.update({
-            'id': content_id,
-            'display_id': display_id,
-            'title': title,
-            'series': xpath_text(video_data, 'seriesTitle'),
-            'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
-            'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
-            'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
-            'thumbnail': xpath_text(video_data, 'previewImageURL'),
-            'formats': formats,
-            'subtitles': subtitles,
-        })
-        return info
+        content_id = self._match_id(url)
+        return self._extract_video_info('byGuid=%s' % content_id, content_id)
--- a/youtube_dl/extractor/cbsnews.py
+++ b/youtube_dl/extractor/cbsnews.py
@ -30,9 +30,12 @@ class CBSNewsIE(CBSBaseIE):
        {
            'url': 'http://www.cbsnews.com/videos/fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack/',
            'info_dict': {
-                'id': 'fort-hood-shooting-army-downplays-mental-illness-as-cause-of-attack',
+                'id': 'SNJBOYzXiWBOvaLsdzwH8fmtP1SCd91Y',
                'ext': 'mp4',
                'title': 'Fort Hood shooting: Army downplays mental illness as cause of attack',
+                'description': 'md5:4a6983e480542d8b333a947bfc64ddc7',
+                'upload_date': '19700101',
+                'uploader': 'CBSI-NEW',
                'thumbnail': 're:^https?://.*\.jpg$',
                'duration': 205,
                'subtitles': {
@ -58,30 +61,8 @@ class CBSNewsIE(CBSBaseIE):
            webpage, 'video JSON info'), video_id)

        item = video_info['item'] if 'item' in video_info else video_info
-        title = item.get('articleTitle') or item.get('hed')
-        duration = item.get('duration')
-        thumbnail = item.get('mediaImage') or item.get('thumbnail')
-
-        subtitles = {}
-        formats = []
-        for format_id in ['RtmpMobileLow', 'RtmpMobileHigh', 'Hls', 'RtmpDesktop']:
-            pid = item.get('media' + format_id)
-            if not pid:
-                continue
-            release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' % pid
-            tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid)
-            formats.extend(tp_formats)
-            subtitles = self._merge_subtitles(subtitles, tp_subtitles)
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'formats': formats,
-            'subtitles': subtitles,
-        }
+        guid = item['mpxRefId']
+        return self._extract_video_info('byGuid=%s' % guid, guid)


 class CBSNewsLiveVideoIE(InfoExtractor):
--- a/youtube_dl/extractor/cbssports.py
+++ b/youtube_dl/extractor/cbssports.py
@ -1,30 +1,28 @@
 from __future__ import unicode_literals

-import re
-
-from .common import InfoExtractor
+from .cbs import CBSBaseIE


-class CBSSportsIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)'
+class CBSSportsIE(CBSBaseIE):
+    _VALID_URL = r'https?://www\.cbssports\.com/video/player/[^/]+/(?P<id>\d+)'

-    _TEST = {
-        'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s',
+    _TESTS = [{
+        'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast',
        'info_dict': {
-            'id': '_d5_GbO8p1sT',
-            'ext': 'flv',
-            'title': 'US Open flashbacks: 1990s',
-            'description': 'Bill Macatee relives the best moments in US Open history from the 1990s.',
+            'id': '708337219968',
+            'ext': 'mp4',
+            'title': 'Ben Simmons the next LeBron? Not so fast',
+            'description': 'md5:854294f627921baba1f4b9a990d87197',
+            'timestamp': 1466293740,
+            'upload_date': '20160618',
+            'uploader': 'CBSI-NEW',
        },
-    }
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }]

    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        section = mobj.group('section')
-        video_id = mobj.group('id')
-        all_videos = self._download_json(
-            'http://www.cbssports.com/data/video/player/getVideos/%s?as=json' % section,
-            video_id)
-        # The json file contains the info of all the videos in the section
-        video_info = next(v for v in all_videos if v['pcid'] == video_id)
-        return self.url_result('theplatform:%s' % video_info['pid'], 'ThePlatform')
+        video_id = self._match_id(url)
+        return self._extract_video_info('byId=%s' % video_id, video_id)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@ -53,6 +53,7 @@ from ..utils import (
    mimetype2ext,
    update_Request,
    update_url_query,
+    parse_m3u8_attributes,
 )


@ -1150,23 +1151,11 @@ class InfoExtractor(object):
            }]
        last_info = None
        last_media = None
-        kv_rex = re.compile(
-            r'(?P<key>[a-zA-Z_-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)')
        for line in m3u8_doc.splitlines():
            if line.startswith('#EXT-X-STREAM-INF:'):
-                last_info = {}
-                for m in kv_rex.finditer(line):
-                    v = m.group('val')
-                    if v.startswith('"'):
-                        v = v[1:-1]
-                    last_info[m.group('key')] = v
+                last_info = parse_m3u8_attributes(line)
            elif line.startswith('#EXT-X-MEDIA:'):
-                last_media = {}
-                for m in kv_rex.finditer(line):
-                    v = m.group('val')
-                    if v.startswith('"'):
-                        v = v[1:-1]
-                    last_media[m.group('key')] = v
+                last_media = parse_m3u8_attributes(line)
            elif line.startswith('#') or not line.strip():
                continue
            else:
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@ -285,7 +285,6 @@ from .gameone import (
 from .gamersyde import GamersydeIE
 from .gamespot import GameSpotIE
 from .gamestar import GameStarIE
-from .gametrailers import GametrailersIE
 from .gazeta import GazetaIE
 from .gdcvault import GDCVaultIE
 from .generic import GenericIE
--- a/youtube_dl/extractor/facebook.py
+++ b/youtube_dl/extractor/facebook.py
@ -239,6 +239,8 @@ class FacebookIE(InfoExtractor):

        formats = []
        for format_id, f in video_data.items():
+            if f and isinstance(f, dict):
+                f = [f]
            if not f or not isinstance(f, list):
                continue
            for quality in ('sd', 'hd'):
--- a/youtube_dl/extractor/foxsports.py
+++ b/youtube_dl/extractor/foxsports.py
@ -1,7 +1,10 @@
 from __future__ import unicode_literals

 from .common import InfoExtractor
-from ..utils import smuggle_url
+from ..utils import (
+    smuggle_url,
+    update_url_query,
+)


 class FoxSportsIE(InfoExtractor):
@ -9,11 +12,15 @@ class FoxSportsIE(InfoExtractor):

    _TEST = {
        'url': 'http://www.foxsports.com/video?vid=432609859715',
+        'md5': 'b49050e955bebe32c301972e4012ac17',
        'info_dict': {
-            'id': 'gA0bHB3Ladz3',
-            'ext': 'flv',
+            'id': 'i0qKWsk3qJaM',
+            'ext': 'mp4',
            'title': 'Courtney Lee on going up 2-0 in series vs. Blazers',
            'description': 'Courtney Lee talks about Memphis being focused.',
+            'upload_date': '20150423',
+            'timestamp': 1429761109,
+            'uploader': 'NEWA-FNG-FOXSPORTS',
        },
        'add_ie': ['ThePlatform'],
    }
@ -28,5 +35,8 @@ class FoxSportsIE(InfoExtractor):
                r"data-player-config='([^']+)'", webpage, 'data player config'),
            video_id)

-        return self.url_result(smuggle_url(
-            config['releaseURL'] + '&manifest=f4m', {'force_smil_url': True}))
+        return self.url_result(smuggle_url(update_url_query(
+            config['releaseURL'], {
+                'mbr': 'true',
+                'switch': 'http',
+            }), {'force_smil_url': True}))
--- a/youtube_dl/extractor/gamespot.py
+++ b/youtube_dl/extractor/gamespot.py
@ -1,19 +1,19 @@
 from __future__ import unicode_literals

 import re
-import json

-from .common import InfoExtractor
+from .once import OnceIE
 from ..compat import (
    compat_urllib_parse_unquote,
-    compat_urlparse,
 )
 from ..utils import (
    unescapeHTML,
+    url_basename,
+    dict_get,
 )


-class GameSpotIE(InfoExtractor):
+class GameSpotIE(OnceIE):
    _VALID_URL = r'https?://(?:www\.)?gamespot\.com/.*-(?P<id>\d+)/?'
    _TESTS = [{
        'url': 'http://www.gamespot.com/videos/arma-3-community-guide-sitrep-i/2300-6410818/',
@ -39,29 +39,73 @@ class GameSpotIE(InfoExtractor):
        webpage = self._download_webpage(url, page_id)
        data_video_json = self._search_regex(
            r'data-video=["\'](.*?)["\']', webpage, 'data video')
-        data_video = json.loads(unescapeHTML(data_video_json))
+        data_video = self._parse_json(unescapeHTML(data_video_json), page_id)
        streams = data_video['videoStreams']

+        manifest_url = None
        formats = []
        f4m_url = streams.get('f4m_stream')
-        if f4m_url is not None:
-            # Transform the manifest url to a link to the mp4 files
-            # they are used in mobile devices.
-            f4m_path = compat_urlparse.urlparse(f4m_url).path
-            QUALITIES_RE = r'((,\d+)+,?)'
-            qualities = self._search_regex(QUALITIES_RE, f4m_path, 'qualities').strip(',').split(',')
-            http_path = f4m_path[1:].split('/', 1)[1]
-            http_template = re.sub(QUALITIES_RE, r'%s', http_path)
-            http_template = http_template.replace('.csmil/manifest.f4m', '')
-            http_template = compat_urlparse.urljoin(
-                'http://video.gamespotcdn.com/', http_template)
-            for q in qualities:
-                formats.append({
-                    'url': http_template % q,
-                    'ext': 'mp4',
-                    'format_id': q,
-                })
-        else:
+        if f4m_url:
+            manifest_url = f4m_url
+            formats.extend(self._extract_f4m_formats(
+                f4m_url + '?hdcore=3.7.0', page_id, f4m_id='hds', fatal=False))
+        m3u8_url = streams.get('m3u8_stream')
+        if m3u8_url:
+            manifest_url = m3u8_url
+            m3u8_formats = self._extract_m3u8_formats(
+                m3u8_url, page_id, 'mp4', 'm3u8_native',
+                m3u8_id='hls', fatal=False)
+            formats.extend(m3u8_formats)
+        progressive_url = dict_get(
+            streams, ('progressive_hd', 'progressive_high', 'progressive_low'))
+        if progressive_url and manifest_url:
+            qualities_basename = self._search_regex(
+                '/([^/]+)\.csmil/',
+                manifest_url, 'qualities basename', default=None)
+            if qualities_basename:
+                QUALITIES_RE = r'((,\d+)+,?)'
+                qualities = self._search_regex(
+                    QUALITIES_RE, qualities_basename,
+                    'qualities', default=None)
+                if qualities:
+                    qualities = list(map(lambda q: int(q), qualities.strip(',').split(',')))
+                    qualities.sort()
+                    http_template = re.sub(QUALITIES_RE, r'%d', qualities_basename)
+                    http_url_basename = url_basename(progressive_url)
+                    if m3u8_formats:
+                        self._sort_formats(m3u8_formats)
+                        m3u8_formats = list(filter(
+                            lambda f: f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
+                            m3u8_formats))
+                    if len(qualities) == len(m3u8_formats):
+                        for q, m3u8_format in zip(qualities, m3u8_formats):
+                            f = m3u8_format.copy()
+                            f.update({
+                                'url': progressive_url.replace(
+                                    http_url_basename, http_template % q),
+                                'format_id': f['format_id'].replace('hls', 'http'),
+                                'protocol': 'http',
+                            })
+                            formats.append(f)
+                    else:
+                        for q in qualities:
+                            formats.append({
+                                'url': progressive_url.replace(
+                                    http_url_basename, http_template % q),
+                                'ext': 'mp4',
+                                'format_id': 'http-%d' % q,
+                                'tbr': q,
+                            })
+
+        onceux_json = self._search_regex(
+            r'data-onceux-options=["\'](.*?)["\']', webpage, 'data video', default=None)
+        if onceux_json:
+            onceux_url = self._parse_json(unescapeHTML(onceux_json), page_id).get('metadataUri')
+            if onceux_url:
+                formats.extend(self._extract_once_formats(re.sub(
+                    r'https?://[^/]+', 'http://once.unicornmedia.com', onceux_url).replace('ads/vmap/', '')))
+
+        if not formats:
            for quality in ['sd', 'hd']:
                # It's actually a link to a flv file
                flv_url = streams.get('f4m_{0}'.format(quality))
@ -71,6 +115,7 @@ class GameSpotIE(InfoExtractor):
                        'ext': 'flv',
                        'format_id': quality,
                    })
+        self._sort_formats(formats)

        return {
            'id': data_video['guid'],
--- a/youtube_dl/extractor/gametrailers.py
+++ b/youtube_dl/extractor/gametrailers.py
@ -1,62 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    parse_age_limit,
-    url_basename,
-)
-
-
-class GametrailersIE(InfoExtractor):
-    _VALID_URL = r'https?://www\.gametrailers\.com/videos/view/[^/]+/(?P<id>.+)'
-
-    _TEST = {
-        'url': 'http://www.gametrailers.com/videos/view/gametrailers-com/116437-Just-Cause-3-Review',
-        'md5': 'f28c4efa0bdfaf9b760f6507955b6a6a',
-        'info_dict': {
-            'id': '2983958',
-            'ext': 'mp4',
-            'display_id': '116437-Just-Cause-3-Review',
-            'title': 'Just Cause 3 - Review',
-            'description': 'It\'s a lot of fun to shoot at things and then watch them explode in Just Cause 3, but should there be more to the experience than that?',
-        },
-    }
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-        title = self._html_search_regex(
-            r'<title>(.+?)\|', webpage, 'title').strip()
-        embed_url = self._proto_relative_url(
-            self._search_regex(
-                r'src=\'(//embed.gametrailers.com/embed/[^\']+)\'', webpage,
-                'embed url'),
-            scheme='http:')
-        video_id = url_basename(embed_url)
-        embed_page = self._download_webpage(embed_url, video_id)
-        embed_vars_json = self._search_regex(
-            r'(?s)var embedVars = (\{.*?\})\s*</script>', embed_page,
-            'embed vars')
-        info = self._parse_json(embed_vars_json, video_id)
-
-        formats = []
-        for media in info['media']:
-            if media['mediaPurpose'] == 'play':
-                formats.append({
-                    'url': media['uri'],
-                    'height': media['height'],
-                    'width:': media['width'],
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'formats': formats,
-            'thumbnail': info.get('thumbUri'),
-            'description': self._og_search_description(webpage),
-            'duration': int_or_none(info.get('videoLengthInSeconds')),
-            'age_limit': parse_age_limit(info.get('audienceRating')),
-        }
--- a/youtube_dl/extractor/radiojavan.py
+++ b/youtube_dl/extractor/radiojavan.py
@ -3,7 +3,7 @@ from __future__ import unicode_literals
 import re

 from .common import InfoExtractor
-from ..utils import(
+from ..utils import (
    unified_strdate,
    str_to_int,
 )
--- a/youtube_dl/extractor/streamcloud.py
+++ b/youtube_dl/extractor/streamcloud.py
@ -6,7 +6,6 @@ import re
 from .common import InfoExtractor
 from ..utils import (
    ExtractorError,
-    sanitized_Request,
    urlencode_postdata,
 )

@ -45,20 +44,26 @@ class StreamcloudIE(InfoExtractor):
            (?:id="[^"]+"\s+)?
            value="([^"]*)"
            ''', orig_webpage)
-        post = urlencode_postdata(fields)

        self._sleep(12, video_id)
-        headers = {
-            b'Content-Type': b'application/x-www-form-urlencoded',
-        }
-        req = sanitized_Request(url, post, headers)

        webpage = self._download_webpage(
-            req, video_id, note='Downloading video page ...')
-        title = self._html_search_regex(
-            r'<h1[^>]*>([^<]+)<', webpage, 'title')
-        video_url = self._search_regex(
-            r'file:\s*"([^"]+)"', webpage, 'video URL')
+            url, video_id, data=urlencode_postdata(fields), headers={
+                b'Content-Type': b'application/x-www-form-urlencoded',
+            })
+
+        try:
+            title = self._html_search_regex(
+                r'<h1[^>]*>([^<]+)<', webpage, 'title')
+            video_url = self._search_regex(
+                r'file:\s*"([^"]+)"', webpage, 'video URL')
+        except ExtractorError:
+            message = self._html_search_regex(
+                r'(?s)<div[^>]+class=(["\']).*?msgboxinfo.*?\1[^>]*>(?P<message>.+?)</div>',
+                webpage, 'message', default=None, group='message')
+            if message:
+                raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
+            raise
        thumbnail = self._search_regex(
            r'image:\s*"([^"]+)"', webpage, 'thumbnail URL', fatal=False)

--- a/youtube_dl/extractor/svt.py
+++ b/youtube_dl/extractor/svt.py
@ -6,17 +6,14 @@ import re
 from .common import InfoExtractor
 from ..utils import (
    determine_ext,
+    dict_get,
+    int_or_none,
+    try_get,
 )


 class SVTBaseIE(InfoExtractor):
-    def _extract_video(self, url, video_id):
-        info = self._download_json(url, video_id)
-
-        title = info['context']['title']
-        thumbnail = info['context'].get('thumbnailImage')
-
-        video_info = info['video']
+    def _extract_video(self, video_info, video_id):
        formats = []
        for vr in video_info['videoReferences']:
            player_type = vr.get('playerType')
@ -40,27 +37,49 @@ class SVTBaseIE(InfoExtractor):
                    'format_id': player_type,
                    'url': vurl,
                })
+        if not formats and video_info.get('rights', {}).get('geoBlockedSweden'):
+            self.raise_geo_restricted('This video is only available in Sweden')
        self._sort_formats(formats)

        subtitles = {}
-        subtitle_references = video_info.get('subtitleReferences')
+        subtitle_references = dict_get(video_info, ('subtitles', 'subtitleReferences'))
        if isinstance(subtitle_references, list):
            for sr in subtitle_references:
                subtitle_url = sr.get('url')
+                subtitle_lang = sr.get('language', 'sv')
                if subtitle_url:
-                    subtitles.setdefault('sv', []).append({'url': subtitle_url})
+                    if determine_ext(subtitle_url) == 'm3u8':
+                        # TODO(yan12125): handle WebVTT in m3u8 manifests
+                        continue

-        duration = video_info.get('materialLength')
-        age_limit = 18 if video_info.get('inappropriateForChildren') else 0
+                    subtitles.setdefault(subtitle_lang, []).append({'url': subtitle_url})
+
+        title = video_info.get('title')
+
+        series = video_info.get('programTitle')
+        season_number = int_or_none(video_info.get('season'))
+        episode = video_info.get('episodeTitle')
+        episode_number = int_or_none(video_info.get('episodeNumber'))
+
+        duration = int_or_none(dict_get(video_info, ('materialLength', 'contentDuration')))
+        age_limit = None
+        adult = dict_get(
+            video_info, ('inappropriateForChildren', 'blockedForChildren'),
+            skip_false_values=False)
+        if adult is not None:
+            age_limit = 18 if adult else 0

        return {
            'id': video_id,
            'title': title,
            'formats': formats,
            'subtitles': subtitles,
-            'thumbnail': thumbnail,
            'duration': duration,
            'age_limit': age_limit,
+            'series': series,
+            'season_number': season_number,
+            'episode': episode,
+            'episode_number': episode_number,
        }


@ -68,11 +87,11 @@ class SVTIE(SVTBaseIE):
    _VALID_URL = r'https?://(?:www\.)?svt\.se/wd\?(?:.*?&)?widgetId=(?P<widget_id>\d+)&.*?\barticleId=(?P<id>\d+)'
    _TEST = {
        'url': 'http://www.svt.se/wd?widgetId=23991&sectionId=541&articleId=2900353&type=embed&contextSectionId=123&autostart=false',
-        'md5': '9648197555fc1b49e3dc22db4af51d46',
+        'md5': '33e9a5d8f646523ce0868ecfb0eed77d',
        'info_dict': {
            'id': '2900353',
-            'ext': 'flv',
-            'title': 'Här trycker Jagr till Giroux (under SVT-intervjun)',
+            'ext': 'mp4',
+            'title': 'Stjärnorna skojar till det - under SVT-intervjun',
            'duration': 27,
            'age_limit': 0,
        },
@ -89,15 +108,20 @@ class SVTIE(SVTBaseIE):
        mobj = re.match(self._VALID_URL, url)
        widget_id = mobj.group('widget_id')
        article_id = mobj.group('id')
-        return self._extract_video(
+
+        info = self._download_json(
            'http://www.svt.se/wd?widgetId=%s&articleId=%s&format=json&type=embed&output=json' % (widget_id, article_id),
            article_id)

+        info_dict = self._extract_video(info['video'], article_id)
+        info_dict['title'] = info['context']['title']
+        return info_dict
+

 class SVTPlayIE(SVTBaseIE):
    IE_DESC = 'SVT Play and Öppet arkiv'
-    _VALID_URL = r'https?://(?:www\.)?(?P<host>svtplay|oppetarkiv)\.se/video/(?P<id>[0-9]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/video/(?P<id>[0-9]+)'
+    _TESTS = [{
        'url': 'http://www.svtplay.se/video/5996901/flygplan-till-haile-selassie/flygplan-till-haile-selassie-2',
        'md5': '2b6704fe4a28801e1a098bbf3c5ac611',
        'info_dict': {
@ -113,12 +137,47 @@ class SVTPlayIE(SVTBaseIE):
                }]
            },
        },
-    }
+    }, {
+        # geo restricted to Sweden
+        'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten',
+        'only_matching': True,
+    }]

    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        host = mobj.group('host')
-        return self._extract_video(
-            'http://www.%s.se/video/%s?output=json' % (host, video_id),
-            video_id)
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        data = self._parse_json(
+            self._search_regex(
+                r'root\["__svtplay"\]\s*=\s*([^;]+);',
+                webpage, 'embedded data', default='{}'),
+            video_id, fatal=False)
+
+        thumbnail = self._og_search_thumbnail(webpage)
+
+        if data:
+            video_info = try_get(
+                data, lambda x: x['context']['dispatcher']['stores']['VideoTitlePageStore']['data']['video'],
+                dict)
+            if video_info:
+                info_dict = self._extract_video(video_info, video_id)
+                info_dict.update({
+                    'title': data['context']['dispatcher']['stores']['MetaStore']['title'],
+                    'thumbnail': thumbnail,
+                })
+                return info_dict
+
+        video_id = self._search_regex(
+            r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
+            webpage, 'video id', default=None)
+
+        if video_id:
+            data = self._download_json(
+                'http://www.svt.se/videoplayer-api/video/%s' % video_id, video_id)
+            info_dict = self._extract_video(data, video_id)
+            if not info_dict.get('title'):
+                info_dict['title'] = re.sub(
+                    r'\s*\|\s*.+?$', '',
+                    info_dict.get('episode') or self._og_search_title(webpage))
+            return info_dict
--- a/youtube_dl/extractor/theplatform.py
+++ b/youtube_dl/extractor/theplatform.py
@ -277,9 +277,9 @@ class ThePlatformIE(ThePlatformBaseIE):


 class ThePlatformFeedIE(ThePlatformBaseIE):
-    _URL_TEMPLATE = '%s//feed.theplatform.com/f/%s/%s?form=json&byGuid=%s'
-    _VALID_URL = r'https?://feed\.theplatform\.com/f/(?P<provider_id>[^/]+)/(?P<feed_id>[^?/]+)\?(?:[^&]+&)*byGuid=(?P<id>[a-zA-Z0-9_]+)'
-    _TEST = {
+    _URL_TEMPLATE = '%s//feed.theplatform.com/f/%s/%s?form=json&%s'
+    _VALID_URL = r'https?://feed\.theplatform\.com/f/(?P<provider_id>[^/]+)/(?P<feed_id>[^?/]+)\?(?:[^&]+&)*(?P<filter>by(?:Gui|I)d=(?P<id>[\w-]+))'
+    _TESTS = [{
        # From http://player.theplatform.com/p/7wvmTC/MSNBCEmbeddedOffSite?guid=n_hardball_5biden_140207
        'url': 'http://feed.theplatform.com/f/7wvmTC/msnbc_video-p-test?form=json&pretty=true&range=-40&byGuid=n_hardball_5biden_140207',
        'md5': '6e32495b5073ab414471b615c5ded394',
@ -295,32 +295,38 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
            'categories': ['MSNBC/Issues/Democrats', 'MSNBC/Issues/Elections/Election 2016'],
            'uploader': 'NBCU-NEWS',
        },
-    }
+    }]

-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-
-        video_id = mobj.group('id')
-        provider_id = mobj.group('provider_id')
-        feed_id = mobj.group('feed_id')
-
-        real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, video_id)
-        feed = self._download_json(real_url, video_id)
-        entry = feed['entries'][0]
+    def _extract_feed_info(self, provider_id, feed_id, filter_query, video_id, custom_fields=None, asset_types_query={}):
+        real_url = self._URL_TEMPLATE % (self.http_scheme(), provider_id, feed_id, filter_query)
+        entry = self._download_json(real_url, video_id)['entries'][0]

        formats = []
        subtitles = {}
        first_video_id = None
        duration = None
+        asset_types = []
        for item in entry['media$content']:
-            smil_url = item['plfile$url'] + '&mbr=true'
+            smil_url = item['plfile$url']
            cur_video_id = ThePlatformIE._match_id(smil_url)
            if first_video_id is None:
                first_video_id = cur_video_id
                duration = float_or_none(item.get('plfile$duration'))
-            cur_formats, cur_subtitles = self._extract_theplatform_smil(smil_url, video_id, 'Downloading SMIL data for %s' % cur_video_id)
-            formats.extend(cur_formats)
-            subtitles = self._merge_subtitles(subtitles, cur_subtitles)
+            for asset_type in item['plfile$assetTypes']:
+                if asset_type in asset_types:
+                    continue
+                asset_types.append(asset_type)
+                query = {
+                    'mbr': 'true',
+                    'formats': item['plfile$format'],
+                    'assetTypes': asset_type,
+                }
+                if asset_type in asset_types_query:
+                    query.update(asset_types_query[asset_type])
+                cur_formats, cur_subtitles = self._extract_theplatform_smil(update_url_query(
+                    smil_url, query), video_id, 'Downloading SMIL data for %s' % asset_type)
+                formats.extend(cur_formats)
+                subtitles = self._merge_subtitles(subtitles, cur_subtitles)

        self._sort_formats(formats)

@ -344,5 +350,17 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
            'timestamp': timestamp,
            'categories': categories,
        })
+        if custom_fields:
+            ret.update(custom_fields(entry))

        return ret
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+
+        video_id = mobj.group('id')
+        provider_id = mobj.group('provider_id')
+        feed_id = mobj.group('feed_id')
+        filter_query = mobj.group('filter')
+
+        return self._extract_feed_info(provider_id, feed_id, filter_query, video_id)
--- a/youtube_dl/jsinterp.py
+++ b/youtube_dl/jsinterp.py
@ -131,8 +131,9 @@ class JSInterpreter(object):
            if variable in local_vars:
                obj = local_vars[variable]
            else:
-                obj = self._objects.setdefault(
-                    variable, self.extract_object(variable))
+                if variable not in self._objects:
+                    self._objects[variable] = self.extract_object(variable)
+                obj = self._objects[variable]

            if arg_str is None:
                # Member access
@ -203,7 +204,8 @@ class JSInterpreter(object):
            argvals = tuple([
                int(v) if v.isdigit() else local_vars[v]
                for v in m.group('args').split(',')])
-            self._functions.setdefault(fname, self.extract_function(fname))
+            if fname not in self._functions:
+                self._functions[fname] = self.extract_function(fname)
            return self._functions[fname](argvals)

        raise ExtractorError('Unsupported JS expression %r' % expr)
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@ -2852,3 +2852,12 @@ def decode_packed_codes(code):
    return re.sub(
        r'\b(\w+)\b', lambda mobj: symbol_table[mobj.group(0)],
        obfucasted_code)
+
+
+def parse_m3u8_attributes(attrib):
+    info = {}
+    for (key, val) in re.findall(r'(?P<key>[A-Z0-9-]+)=(?P<val>"[^"]+"|[^",]+)(?:,|$)', attrib):
+        if val.startswith('"'):
+            val = val[1:-1]
+        info[key] = val
+    return info
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@ -1,3 +1,3 @@
 from __future__ import unicode_literals

-__version__ = '2016.06.19.1'
+__version__ = '2016.06.22'
Author	SHA1	Message	Date
Sergey M․	cf40fdf5c1	release 2016.06.22	2016-06-22 23:43:24 +07:00
Sergey M․	23bdae0955	[svt] Various improvements + [svt:play] Add fallback path looking for video id and fix extraction for oppetarkiv * [svt:base] Detect geo restriction * [svt:base] Extract series related metadata	2016-06-22 23:36:07 +07:00
Shai Coleman	ca74c90bf5	Fix issue downloading facebook videos youtube-dl expects the format items to be returned as a list, but when there's only one item Facebook returns a dict instead, this wraps the dict in a list if necessary	2016-06-22 12:52:15 +01:00
Sergey M․	7cfc1e2a10	[gametrailers] Remove extractor gametrailers closed (see http://www.polygon.com/2016/2/8/10944452/gametrailers-shuts-down-after-13-year-run)	2016-06-21 22:31:41 +07:00
Remita Amine	1ac5705f62	[gamespot] extract all formats	2016-06-21 13:37:57 +01:00
Yen Chi Hsuan	e4f90ea0a7	[svt] Fix extraction for SVTPlay (closes #9809 )	2016-06-21 17:55:53 +08:00
Sergey M․	cdfc187cd5	[cbs] Remove unused import	2016-06-20 22:40:33 +07:00
Sergey M․	feef925f49	[streamcloud] Capture error message (#9840 )	2016-06-20 22:40:22 +07:00
Sergey M․	19e2d1cdea	release 2016.06.20	2016-06-20 20:50:01 +07:00
Sergey M․	8369a4fe76	[downloader/hls] Simplify and carry long lines	2016-06-20 21:55:17 +07:00
Philipp Hagemeister	1f749b6658	Revert "[jsinterp] Avoid double key lookup for setting new key" This reverts commit `7c05097633`.	2016-06-20 13:29:13 +02:00
Remita Amine	819707920a	[cbs] fix _VALID_URL	2016-06-19 23:55:19 +01:00
Remita Amine	43518503a6	[cbs,cbsnews,cbssports] reduce requests while extracting all formats	2016-06-19 23:40:00 +01:00
Remita Amine	5839d556e4	[theplatform] reduce requests for theplatform feed info extraction	2016-06-19 23:37:05 +01:00
Yen Chi Hsuan	6c83e583b3	[radiojavan] PEP8 E275 is added in pycodestyle 2.6 See https://github.com/PyCQA/pycodestyle/pull/491	2016-06-19 13:32:08 +08:00
Yen Chi Hsuan	6aeb64b673	Merge pull request #8201 from remitamine/hls-aes [downloader/hls] Add support for AES-128 encrypted segments in hlsnative downloader	2016-06-19 13:25:08 +08:00
Remita Amine	6cd64b6806	[foxsports] extract http formats	2016-06-19 05:45:48 +01:00
remitamine	e154c65128	[downloader/hls] Add support for AES-128 encrypted segments in hlsnative downloader	2016-06-19 01:01:40 +01:00