release 2016.06.19

[r7] Fix extraction and add support for articles (Closes #9826 )
[closertotruth] Update and improve (Closes #8680 )
2016-06-19 02:30:29 +07:00 · 2016-06-19 02:25:34 +07:00 · 2016-06-19 00:35:29 +07:00 · 2016-06-18 23:19:56 +07:00 · 2016-06-18 22:23:48 +07:00 · 2016-06-18 22:08:48 +07:00
18 changed files with 380 additions and 265 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@ -6,8 +6,8 @@
 ---
-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.18.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.19*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.18.1**
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.19**
 ### Before submitting an *issue* make sure you have:
 - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
 [debug] User config: []
 [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2016.06.18.1
+[debug] youtube-dl version 2016.06.19
 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
 [debug] Proxy map: {}
--- a/README.md
+++ b/README.md
@ -44,7 +44,7 @@ Or with [MacPorts](https://www.macports.org/):
 Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
 # DESCRIPTION
-**youtube-dl** is a small command-line program to download videos from
+**youtube-dl** is a command-line program to download videos from
 YouTube.com and a few more sites. It requires the Python interpreter, version
 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on
 your Unix box, on Windows or on Mac OS X. It is released to the public domain,
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@ -44,8 +44,8 @@
 - **appletrailers:section**
 - **archive.org**: archive.org videos
 - **ARD**
 - **ARD:mediathek**: Saarländischer Rundfunk
 - **ARD:mediathek**
 - **ARD:mediathek**: Saarländischer Rundfunk
 - **arte.tv**
 - **arte.tv:+7**
 - **arte.tv:cinema**
@ -128,6 +128,7 @@
 - **cliphunter**
 - **ClipRs**
 - **Clipsyndicate**
 - **CloserToTruth**
 - **cloudtime**: CloudTime
 - **Cloudy**
 - **Clubic**
@ -521,6 +522,7 @@
 - **qqmusic:singer**: QQ音乐 - 歌手
 - **qqmusic:toplist**: QQ音乐 - 排行榜
 - **R7**
 - **R7Article**
 - **radio.de**
 - **radiobremen**
 - **radiocanada**
--- a/youtube_dl/extractor/adobetv.py
+++ b/youtube_dl/extractor/adobetv.py
@ -156,7 +156,10 @@ class AdobeTVVideoIE(InfoExtractor):
    def _real_extract(self, url):
        video_id = self._match_id(url)
-        video_data = self._download_json(url + '?format=json', video_id)
+        webpage = self._download_webpage(url, video_id)
        video_data = self._parse_json(self._search_regex(
            r'var\s+bridge\s*=\s*([^;]+);', webpage, 'bridged data'), video_id)
        formats = [{
            'format_id': '%s-%s' % (determine_ext(source['src']), source.get('height')),
--- a/youtube_dl/extractor/aftonbladet.py
+++ b/youtube_dl/extractor/aftonbladet.py
@ -24,10 +24,10 @@ class AftonbladetIE(InfoExtractor):
        webpage = self._download_webpage(url, video_id)
        # find internal video meta data
-        meta_url = 'http://aftonbladet-play.drlib.aptoma.no/video/%s.json'
+        meta_url = 'http://aftonbladet-play-metadata.cdn.drvideo.aptoma.no/video/%s.json'
        player_config = self._parse_json(self._html_search_regex(
            r'data-player-config="([^"]+)"', webpage, 'player config'), video_id)
-        internal_meta_id = player_config['videoId']
+        internal_meta_id = player_config['aptomaVideoId']
        internal_meta_url = meta_url % internal_meta_id
        internal_meta_json = self._download_json(
            internal_meta_url, video_id, 'Downloading video meta data')
--- a/youtube_dl/extractor/ard.py
+++ b/youtube_dl/extractor/ard.py
@ -8,7 +8,6 @@ from .generic import GenericIE
 from ..utils import (
    determine_ext,
    ExtractorError,
    get_element_by_attribute,
    qualities,
    int_or_none,
    parse_duration,
@ -274,41 +273,3 @@ class ARDIE(InfoExtractor):
            'upload_date': upload_date,
            'thumbnail': thumbnail,
        }
 class SportschauIE(ARDMediathekIE):
    IE_NAME = 'Sportschau'
    _VALID_URL = r'(?P<baseurl>https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video(?P<id>[^/#?]+))\.html'
    _TESTS = [{
        'url': 'http://www.sportschau.de/tourdefrance/videoseppeltkokainhatnichtsmitklassischemdopingzutun100.html',
        'info_dict': {
            'id': 'seppeltkokainhatnichtsmitklassischemdopingzutun100',
            'ext': 'mp4',
            'title': 'Seppelt: "Kokain hat nichts mit klassischem Doping zu tun"',
            'thumbnail': 're:^https?://.*\.jpg$',
            'description': 'Der ARD-Doping Experte Hajo Seppelt gibt seine Einschätzung zum ersten Dopingfall der diesjährigen Tour de France um den Italiener Luca Paolini ab.',
        },
        'params': {
            # m3u8 download
            'skip_download': True,
        },
    }]
    def _real_extract(self, url):
        mobj = re.match(self._VALID_URL, url)
        video_id = mobj.group('id')
        base_url = mobj.group('baseurl')
        webpage = self._download_webpage(url, video_id)
        title = get_element_by_attribute('class', 'headline', webpage)
        description = self._html_search_meta('description', webpage, 'description')
        info = self._extract_media_info(
            base_url + '-mc_defaultQuality-h.json', webpage, video_id)
        info.update({
            'title': title,
            'description': description,
        })
        return info
--- a/youtube_dl/extractor/arte.py
+++ b/youtube_dl/extractor/arte.py
@ -180,11 +180,14 @@ class ArteTVBaseIE(InfoExtractor):
 class ArteTVPlus7IE(ArteTVBaseIE):
    IE_NAME = 'arte.tv:+7'
-    _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/[^/]+/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
    _TESTS = [{
        'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D',
        'only_matching': True,
    }, {
        'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22',
        'only_matching': True,
    }]
    @classmethod
@ -240,10 +243,10 @@ class ArteTVPlus7IE(ArteTVBaseIE):
            return self._extract_from_json_url(json_url, video_id, lang, title=title)
        # Different kind of embed URL (e.g.
        # http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
-        embed_url = self._search_regex(
+        entries = [
-            r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1',
+            self.url_result(url)
-            webpage, 'embed url', group='url')
+            for _, url in re.findall(r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', webpage)]
-        return self.url_result(embed_url)
+        return self.playlist_result(entries)
 # It also uses the arte_vp_url url from the webpage to extract the information
@ -252,22 +255,17 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
    _VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
    _TESTS = [{
-        'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
+        'url': 'http://creative.arte.tv/fr/episode/osmosis-episode-1',
        'info_dict': {
-            'id': '72176',
+            'id': '057405-001-A',
            'ext': 'mp4',
-            'title': 'Folge 2 - Corporate Design',
+            'title': 'OSMOSIS - N\'AYEZ PLUS PEUR D\'AIMER (1)',
-            'upload_date': '20131004',
+            'upload_date': '20150716',
        },
    }, {
        'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion',
-        'info_dict': {
+        'playlist_count': 11,
-            'id': '160676',
+        'add_ie': ['Youtube'],
            'ext': 'mp4',
            'title': 'Monty Python live (mostly)',
            'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
            'upload_date': '20140805',
        }
    }, {
        'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
        'only_matching': True,
@ -349,14 +347,13 @@ class ArteTVCinemaIE(ArteTVPlus7IE):
    _VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
    _TESTS = [{
-        'url': 'http://cinema.arte.tv/de/node/38291',
+        'url': 'http://cinema.arte.tv/fr/article/les-ailes-du-desir-de-julia-reck',
-        'md5': '6b275511a5107c60bacbeeda368c3aa1',
+        'md5': 'a5b9dd5575a11d93daf0e3f404f45438',
        'info_dict': {
-            'id': '055876-000_PWA12025-D',
+            'id': '062494-000-A',
            'ext': 'mp4',
-            'title': 'Tod auf dem Nil',
+            'title': 'Film lauréat du concours web - "Les ailes du désir" de Julia Reck',
-            'upload_date': '20160122',
+            'upload_date': '20150807',
            'description': 'md5:7f749bbb77d800ef2be11d54529b96bc',
        },
    }]
--- a/youtube_dl/extractor/azubu.py
+++ b/youtube_dl/extractor/azubu.py
@ -46,6 +46,7 @@ class AzubuIE(InfoExtractor):
                'uploader_id': 272749,
                'view_count': int,
            },
            'skip': 'Channel offline',
        },
    ]
@ -56,22 +57,26 @@ class AzubuIE(InfoExtractor):
            'http://www.azubu.tv/api/video/%s' % video_id, video_id)['data']
        title = data['title'].strip()
-        description = data['description']
+        description = data.get('description')
-        thumbnail = data['thumbnail']
+        thumbnail = data.get('thumbnail')
-        view_count = data['view_count']
+        view_count = data.get('view_count')
-        uploader = data['user']['username']
+        user = data.get('user', {})
-        uploader_id = data['user']['id']
+        uploader = user.get('username')
        uploader_id = user.get('id')
        stream_params = json.loads(data['stream_params'])
-        timestamp = float_or_none(stream_params['creationDate'], 1000)
+        timestamp = float_or_none(stream_params.get('creationDate'), 1000)
-        duration = float_or_none(stream_params['length'], 1000)
+        duration = float_or_none(stream_params.get('length'), 1000)
        renditions = stream_params.get('renditions') or []
        video = stream_params.get('FLVFullLength') or stream_params.get('videoFullLength')
        if video:
            renditions.append(video)
        if not renditions and not user.get('channel', {}).get('is_live', True):
            raise ExtractorError('%s said: channel is offline.' % self.IE_NAME, expected=True)
        formats = [{
            'url': fmt['url'],
            'width': fmt['frameWidth'],
--- a/youtube_dl/extractor/bbc.py
+++ b/youtube_dl/extractor/bbc.py
@ -192,6 +192,7 @@ class BBCCoUkIE(InfoExtractor):
                # rtmp download
                'skip_download': True,
            },
            'skip': 'Now it\'s really geo-restricted',
        }, {
            # compact player (https://github.com/rg3/youtube-dl/issues/8147)
            'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
--- a/youtube_dl/extractor/bet.py
+++ b/youtube_dl/extractor/bet.py
@ -1,31 +1,27 @@
 from __future__ import unicode_literals
-from .common import InfoExtractor
+from .mtv import MTVServicesInfoExtractor
-from ..compat import compat_urllib_parse_unquote
+from ..utils import unified_strdate
-from ..utils import (
+from ..compat import compat_urllib_parse_urlencode
    xpath_text,
    xpath_with_ns,
    int_or_none,
    parse_iso8601,
 )
-class BetIE(InfoExtractor):
+class BetIE(MTVServicesInfoExtractor):
    _VALID_URL = r'https?://(?:www\.)?bet\.com/(?:[^/]+/)+(?P<id>.+?)\.html'
    _TESTS = [
        {
            'url': 'http://www.bet.com/news/politics/2014/12/08/in-bet-exclusive-obama-talks-race-and-racism.html',
            'info_dict': {
-                'id': 'news/national/2014/a-conversation-with-president-obama',
+                'id': '07e96bd3-8850-3051-b856-271b457f0ab8',
                'display_id': 'in-bet-exclusive-obama-talks-race-and-racism',
                'ext': 'flv',
                'title': 'A Conversation With President Obama',
-                'description': 'md5:699d0652a350cf3e491cd15cc745b5da',
+                'description': 'President Obama urges persistence in confronting racism and bias.',
                'duration': 1534,
                'timestamp': 1418075340,
                'upload_date': '20141208',
                'uploader': 'admin',
                'thumbnail': 're:(?i)^https?://.*\.jpg$',
                'subtitles': {
                    'en': 'mincount:2',
                }
            },
            'params': {
                # rtmp download
@ -35,16 +31,17 @@ class BetIE(InfoExtractor):
        {
            'url': 'http://www.bet.com/video/news/national/2014/justice-for-ferguson-a-community-reacts.html',
            'info_dict': {
-                'id': 'news/national/2014/justice-for-ferguson-a-community-reacts',
+                'id': '9f516bf1-7543-39c4-8076-dd441b459ba9',
                'display_id': 'justice-for-ferguson-a-community-reacts',
                'ext': 'flv',
                'title': 'Justice for Ferguson: A Community Reacts',
                'description': 'A BET News special.',
                'duration': 1696,
                'timestamp': 1416942360,
                'upload_date': '20141125',
                'uploader': 'admin',
                'thumbnail': 're:(?i)^https?://.*\.jpg$',
                'subtitles': {
                    'en': 'mincount:2',
                }
            },
            'params': {
                # rtmp download
@ -53,57 +50,32 @@ class BetIE(InfoExtractor):
        }
    ]
    _FEED_URL = "http://feeds.mtvnservices.com/od/feed/bet-mrss-player"
    def _get_feed_query(self, uri):
        return compat_urllib_parse_urlencode({
            'uuid': uri,
        })
    def _extract_mgid(self, webpage):
        return self._search_regex(r'data-uri="([^"]+)', webpage, 'mgid')
    def _real_extract(self, url):
        display_id = self._match_id(url)
        webpage = self._download_webpage(url, display_id)
        mgid = self._extract_mgid(webpage)
        videos_info = self._get_videos_info(mgid)
-        media_url = compat_urllib_parse_unquote(self._search_regex(
+        info_dict = videos_info['entries'][0]
            [r'mediaURL\s*:\s*"([^"]+)"', r"var\s+mrssMediaUrl\s*=\s*'([^']+)'"],
            webpage, 'media URL'))
-        video_id = self._search_regex(
+        upload_date = unified_strdate(self._html_search_meta('date', webpage))
-            r'/video/(.*)/_jcr_content/', media_url, 'video id')
+        description = self._html_search_meta('description', webpage)
-        mrss = self._download_xml(media_url, display_id)
+        info_dict.update({
        item = mrss.find('./channel/item')
        NS_MAP = {
            'dc': 'http://purl.org/dc/elements/1.1/',
            'media': 'http://search.yahoo.com/mrss/',
            'ka': 'http://kickapps.com/karss',
        }
        title = xpath_text(item, './title', 'title')
        description = xpath_text(
            item, './description', 'description', fatal=False)
        timestamp = parse_iso8601(xpath_text(
            item, xpath_with_ns('./dc:date', NS_MAP),
            'upload date', fatal=False))
        uploader = xpath_text(
            item, xpath_with_ns('./dc:creator', NS_MAP),
            'uploader', fatal=False)
        media_content = item.find(
            xpath_with_ns('./media:content', NS_MAP))
        duration = int_or_none(media_content.get('duration'))
        smil_url = media_content.get('url')
        thumbnail = media_content.find(
            xpath_with_ns('./media:thumbnail', NS_MAP)).get('url')
        formats = self._extract_smil_formats(smil_url, display_id)
        self._sort_formats(formats)
        return {
            'id': video_id,
            'display_id': display_id,
            'title': title,
            'description': description,
-            'thumbnail': thumbnail,
+            'upload_date': upload_date,
-            'timestamp': timestamp,
+        })
-            'uploader': uploader,
+
-            'duration': duration,
+        return info_dict
            'formats': formats,
        }
--- a/youtube_dl/extractor/br.py
+++ b/youtube_dl/extractor/br.py
@ -29,7 +29,8 @@ class BRIE(InfoExtractor):
                'duration': 180,
                'uploader': 'Reinhard Weber',
                'upload_date': '20150422',
-            }
+            },
            'skip': '404 not found',
        },
        {
            'url': 'http://www.br.de/nachrichten/oberbayern/inhalt/muenchner-polizeipraesident-schreiber-gestorben-100.html',
@ -40,7 +41,8 @@ class BRIE(InfoExtractor):
                'title': 'Manfred Schreiber ist tot',
                'description': 'md5:b454d867f2a9fc524ebe88c3f5092d97',
                'duration': 26,
-            }
+            },
            'skip': '404 not found',
        },
        {
            'url': 'https://www.br-klassik.de/audio/peeping-tom-premierenkritik-dance-festival-muenchen-100.html',
@ -51,7 +53,8 @@ class BRIE(InfoExtractor):
                'title': 'Kurzweilig und sehr bewegend',
                'description': 'md5:0351996e3283d64adeb38ede91fac54e',
                'duration': 296,
-            }
+            },
            'skip': '404 not found',
        },
        {
            'url': 'http://www.br.de/radio/bayern1/service/team/videos/team-video-erdelt100.html',
--- a/youtube_dl/extractor/closertotruth.py
+++ b/youtube_dl/extractor/closertotruth.py
@ -0,0 +1,92 @@
 # coding: utf-8
 from __future__ import unicode_literals
 import re
 from .common import InfoExtractor
 class CloserToTruthIE(InfoExtractor):
    _VALID_URL = r'https?://(?:www\.)?closertotruth\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
    _TESTS = [{
        'url': 'http://closertotruth.com/series/solutions-the-mind-body-problem#video-3688',
        'info_dict': {
            'id': '0_zof1ktre',
            'display_id': 'solutions-the-mind-body-problem',
            'ext': 'mov',
            'title': 'Solutions to the Mind-Body Problem?',
            'upload_date': '20140221',
            'timestamp': 1392956007,
            'uploader_id': 'CTTXML'
        },
        'params': {
            'skip_download': True,
        },
    }, {
        'url': 'http://closertotruth.com/episodes/how-do-brains-work',
        'info_dict': {
            'id': '0_iuxai6g6',
            'display_id': 'how-do-brains-work',
            'ext': 'mov',
            'title': 'How do Brains Work?',
            'upload_date': '20140221',
            'timestamp': 1392956024,
            'uploader_id': 'CTTXML'
        },
        'params': {
            'skip_download': True,
        },
    }, {
        'url': 'http://closertotruth.com/interviews/1725',
        'info_dict': {
            'id': '1725',
            'title': 'AyaFr-002',
        },
        'playlist_mincount': 2,
    }]
    def _real_extract(self, url):
        display_id = self._match_id(url)
        webpage = self._download_webpage(url, display_id)
        partner_id = self._search_regex(
            r'<script[^>]+src=["\'].*?\b(?:partner_id|p)/(\d+)',
            webpage, 'kaltura partner_id')
        title = self._search_regex(
            r'<title>(.+?)\s*\|\s*.+?</title>', webpage, 'video title')
        select = self._search_regex(
            r'(?s)<select[^>]+id="select-version"[^>]*>(.+?)</select>',
            webpage, 'select version', default=None)
        if select:
            entry_ids = set()
            entries = []
            for mobj in re.finditer(
                    r'<option[^>]+value=(["\'])(?P<id>[0-9a-z_]+)(?:#.+?)?\1[^>]*>(?P<title>[^<]+)',
                    webpage):
                entry_id = mobj.group('id')
                if entry_id in entry_ids:
                    continue
                entry_ids.add(entry_id)
                entries.append({
                    '_type': 'url_transparent',
                    'url': 'kaltura:%s:%s' % (partner_id, entry_id),
                    'ie_key': 'Kaltura',
                    'title': mobj.group('title'),
                })
            if entries:
                return self.playlist_result(entries, display_id, title)
        entry_id = self._search_regex(
            r'<a[^>]+id=(["\'])embed-kaltura\1[^>]+data-kaltura=(["\'])(?P<id>[0-9a-z_]+)\2',
            webpage, 'kaltura entry_id', group='id')
        return {
            '_type': 'url_transparent',
            'display_id': display_id,
            'url': 'kaltura:%s:%s' % (partner_id, entry_id),
            'ie_key': 'Kaltura',
            'title': title
        }
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@ -44,7 +44,6 @@ from .archiveorg import ArchiveOrgIE
 from .ard import (
    ARDIE,
    ARDMediathekIE,
    SportschauIE,
 )
 from .arte import (
    ArteTvIE,
@ -141,6 +140,7 @@ from .cliprs import ClipRsIE
 from .clipfish import ClipfishIE
 from .cliphunter import CliphunterIE
 from .clipsyndicate import ClipsyndicateIE
 from .closertotruth import CloserToTruthIE
 from .cloudy import CloudyIE
 from .clubic import ClubicIE
 from .clyp import ClypIE
@ -631,7 +631,10 @@ from .qqmusic import (
    QQMusicToplistIE,
    QQMusicPlaylistIE,
 )
-from .r7 import R7IE
+from .r7 import (
    R7IE,
    R7ArticleIE,
 )
 from .radiocanada import (
    RadioCanadaIE,
    RadioCanadaAudioVideoIE,
@ -747,6 +750,7 @@ from .sportbox import (
    SportBoxEmbedIE,
 )
 from .sportdeutschland import SportDeutschlandIE
 from .sportschau import SportschauIE
 from .srgssr import (
    SRGSSRIE,
    SRGSSRPlayIE,
--- a/youtube_dl/extractor/mtv.py
+++ b/youtube_dl/extractor/mtv.py
@ -85,9 +85,10 @@ class MTVServicesInfoExtractor(InfoExtractor):
                rtmp_video_url = rendition.find('./src').text
                if rtmp_video_url.endswith('siteunavail.png'):
                    continue
                new_url = self._transform_rtmp_url(rtmp_video_url)
                formats.append({
-                    'ext': ext,
+                    'ext': 'flv' if new_url.startswith('rtmp') else ext,
-                    'url': self._transform_rtmp_url(rtmp_video_url),
+                    'url': new_url,
                    'format_id': rendition.get('bitrate'),
                    'width': int(rendition.get('width')),
                    'height': int(rendition.get('height')),
--- a/youtube_dl/extractor/r7.py
+++ b/youtube_dl/extractor/r7.py
@ -2,22 +2,19 @@
 from __future__ import unicode_literals
 from .common import InfoExtractor
-from ..utils import (
+from ..utils import int_or_none
    js_to_json,
    unescapeHTML,
    int_or_none,
 )
 class R7IE(InfoExtractor):
-    _VALID_URL = r'''(?x)https?://
+    _VALID_URL = r'''(?x)
                        https?://
                        (?:
                            (?:[a-zA-Z]+)\.r7\.com(?:/[^/]+)+/idmedia/|
                            noticias\.r7\.com(?:/[^/]+)+/[^/]+-|
                            player\.r7\.com/video/i/
                        )
                        (?P<id>[\da-f]{24})
-                        '''
+                    '''
    _TESTS = [{
        'url': 'http://videos.r7.com/policiais-humilham-suspeito-a-beira-da-morte-morre-com-dignidade-/idmedia/54e7050b0cf2ff57e0279389.html',
        'md5': '403c4e393617e8e8ddc748978ee8efde',
@ -25,6 +22,7 @@ class R7IE(InfoExtractor):
            'id': '54e7050b0cf2ff57e0279389',
            'ext': 'mp4',
            'title': 'Policiais humilham suspeito à beira da morte: "Morre com dignidade"',
            'description': 'md5:01812008664be76a6479aa58ec865b72',
            'thumbnail': 're:^https?://.*\.jpg$',
            'duration': 98,
            'like_count': int,
@ -44,45 +42,72 @@ class R7IE(InfoExtractor):
    def _real_extract(self, url):
        video_id = self._match_id(url)
-        webpage = self._download_webpage(
+        video = self._download_json(
-            'http://player.r7.com/video/i/%s' % video_id, video_id)
+            'http://player-api.r7.com/video/i/%s' % video_id, video_id)
-        item = self._parse_json(js_to_json(self._search_regex(
+        title = video['title']
            r'(?s)var\s+item\s*=\s*({.+?});', webpage, 'player')), video_id)
        title = unescapeHTML(item['title'])
        thumbnail = item.get('init', {}).get('thumbUri')
        duration = None
        statistics = item.get('statistics', {})
        like_count = int_or_none(statistics.get('likes'))
        view_count = int_or_none(statistics.get('views'))
        formats = []
-        for format_key, format_dict in item['playlist'][0].items():
+        media_url_hls = video.get('media_url_hls')
-            src = format_dict.get('src')
+        if media_url_hls:
-            if not src:
+            formats.extend(self._extract_m3u8_formats(
-                continue
+                media_url_hls, video_id, 'mp4', entry_protocol='m3u8_native',
-            format_id = format_dict.get('format') or format_key
+                m3u8_id='hls', fatal=False))
-            if duration is None:
+        media_url = video.get('media_url')
-                duration = format_dict.get('duration')
+        if media_url:
-            if '.f4m' in src:
+            f = {
-                formats.extend(self._extract_f4m_formats(src, video_id, preference=-1))
+                'url': media_url,
-            elif src.endswith('.m3u8'):
+                'format_id': 'http',
-                formats.extend(self._extract_m3u8_formats(src, video_id, 'mp4', preference=-2))
+            }
-            else:
+            # m3u8 format always matches the http format, let's copy metadata from
-                formats.append({
+            # one to another
-                    'url': src,
+            m3u8_formats = list(filter(
-                    'format_id': format_id,
+                lambda f: f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
-                })
+                formats))
            if len(m3u8_formats) == 1:
                f_copy = m3u8_formats[0].copy()
                f_copy.update(f)
                f_copy['protocol'] = 'http'
                f = f_copy
            formats.append(f)
        self._sort_formats(formats)
        description = video.get('description')
        thumbnail = video.get('thumb')
        duration = int_or_none(video.get('media_duration'))
        like_count = int_or_none(video.get('likes'))
        view_count = int_or_none(video.get('views'))
        return {
            'id': video_id,
            'title': title,
            'description': description,
            'thumbnail': thumbnail,
            'duration': duration,
            'like_count': like_count,
            'view_count': view_count,
            'formats': formats,
        }
 class R7ArticleIE(InfoExtractor):
    _VALID_URL = r'https?://(?:[a-zA-Z]+)\.r7\.com/(?:[^/]+/)+[^/?#&]+-(?P<id>\d+)'
    _TEST = {
        'url': 'http://tv.r7.com/record-play/balanco-geral/videos/policiais-humilham-suspeito-a-beira-da-morte-morre-com-dignidade-16102015',
        'only_matching': True,
    }
    @classmethod
    def suitable(cls, url):
        return False if R7IE.suitable(url) else super(R7ArticleIE, cls).suitable(url)
    def _real_extract(self, url):
        display_id = self._match_id(url)
        webpage = self._download_webpage(url, display_id)
        video_id = self._search_regex(
            r'<div[^>]+(?:id=["\']player-|class=["\']embed["\'][^>]+id=["\'])([\da-f]{24})',
            webpage, 'video id')
        return self.url_result('http://player.r7.com/video/i/%s' % video_id, R7IE.ie_key())
--- a/youtube_dl/extractor/sportschau.py
+++ b/youtube_dl/extractor/sportschau.py
@ -0,0 +1,38 @@
 # coding: utf-8
 from __future__ import unicode_literals
 from .wdr import WDRBaseIE
 from ..utils import get_element_by_attribute
 class SportschauIE(WDRBaseIE):
    IE_NAME = 'Sportschau'
    _VALID_URL = r'https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video-?(?P<id>[^/#?]+)\.html'
    _TEST = {
        'url': 'http://www.sportschau.de/uefaeuro2016/videos/video-dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100.html',
        'info_dict': {
            'id': 'mdb-1140188',
            'display_id': 'dfb-team-geht-gut-gelaunt-ins-spiel-gegen-polen-100',
            'ext': 'mp4',
            'title': 'DFB-Team geht gut gelaunt ins Spiel gegen Polen',
            'description': 'Vor dem zweiten Gruppenspiel gegen Polen herrscht gute Stimmung im deutschen Team. Insbesondere Bastian Schweinsteiger strotzt vor Optimismus nach seinem Tor gegen die Ukraine.',
            'upload_date': '20160615',
        },
        'skip': 'Geo-restricted to Germany',
    }
    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(url, video_id)
        title = get_element_by_attribute('class', 'headline', webpage)
        description = self._html_search_meta('description', webpage, 'description')
        info = self._extract_wdr_video(webpage, video_id)
        info.update({
            'title': title,
            'description': description,
        })
        return info
--- a/youtube_dl/extractor/wdr.py
+++ b/youtube_dl/extractor/wdr.py
@ -15,7 +15,87 @@ from ..utils import (
 )
-class WDRIE(InfoExtractor):
+class WDRBaseIE(InfoExtractor):
    def _extract_wdr_video(self, webpage, display_id):
        # for wdr.de the data-extension is in a tag with the class "mediaLink"
        # for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
        # for wdrmaus its in a link to the page in a multiline "videoLink"-tag
        json_metadata = self._html_search_regex(
            r'class=(?:"(?:mediaLink|wdrrPlayerPlayBtn)\b[^"]*"[^>]+|"videoLink\b[^"]*"[\s]*>\n[^\n]*)data-extension="([^"]+)"',
            webpage, 'media link', default=None, flags=re.MULTILINE)
        if not json_metadata:
            return
        media_link_obj = self._parse_json(json_metadata, display_id,
                                          transform_source=js_to_json)
        jsonp_url = media_link_obj['mediaObj']['url']
        metadata = self._download_json(
            jsonp_url, 'metadata', transform_source=strip_jsonp)
        metadata_tracker_data = metadata['trackerData']
        metadata_media_resource = metadata['mediaResource']
        formats = []
        # check if the metadata contains a direct URL to a file
        for kind, media_resource in metadata_media_resource.items():
            if kind not in ('dflt', 'alt'):
                continue
            for tag_name, medium_url in media_resource.items():
                if tag_name not in ('videoURL', 'audioURL'):
                    continue
                ext = determine_ext(medium_url)
                if ext == 'm3u8':
                    formats.extend(self._extract_m3u8_formats(
                        medium_url, display_id, 'mp4', 'm3u8_native',
                        m3u8_id='hls'))
                elif ext == 'f4m':
                    manifest_url = update_url_query(
                        medium_url, {'hdcore': '3.2.0', 'plugin': 'aasp-3.2.0.77.18'})
                    formats.extend(self._extract_f4m_formats(
                        manifest_url, display_id, f4m_id='hds', fatal=False))
                elif ext == 'smil':
                    formats.extend(self._extract_smil_formats(
                        medium_url, 'stream', fatal=False))
                else:
                    a_format = {
                        'url': medium_url
                    }
                    if ext == 'unknown_video':
                        urlh = self._request_webpage(
                            medium_url, display_id, note='Determining extension')
                        ext = urlhandle_detect_ext(urlh)
                        a_format['ext'] = ext
                    formats.append(a_format)
        self._sort_formats(formats)
        subtitles = {}
        caption_url = metadata_media_resource.get('captionURL')
        if caption_url:
            subtitles['de'] = [{
                'url': caption_url,
                'ext': 'ttml',
            }]
        title = metadata_tracker_data['trackerClipTitle']
        return {
            'id': metadata_tracker_data.get('trackerClipId', display_id),
            'display_id': display_id,
            'title': title,
            'alt_title': metadata_tracker_data.get('trackerClipSubcategory'),
            'formats': formats,
            'subtitles': subtitles,
            'upload_date': unified_strdate(metadata_tracker_data.get('trackerClipAirTime')),
        }
 class WDRIE(WDRBaseIE):
    _CURRENT_MAUS_URL = r'https?://(?:www\.)wdrmaus.de/(?:[^/]+/){1,2}[^/?#]+\.php5'
    _PAGE_REGEX = r'/(?:mediathek/)?[^/]+/(?P<type>[^/]+)/(?P<display_id>.+)\.html'
    _VALID_URL = r'(?P<page_url>https?://(?:www\d\.)?wdr\d?\.de)' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL
@ -91,10 +171,10 @@ class WDRIE(InfoExtractor):
        },
        {
            'url': 'http://www.wdrmaus.de/sachgeschichten/sachgeschichten/achterbahn.php5',
-            # HDS download, MD5 is unstable
+            'md5': '803138901f6368ee497b4d195bb164f2',
            'info_dict': {
                'id': 'mdb-186083',
-                'ext': 'flv',
+                'ext': 'mp4',
                'upload_date': '20130919',
                'title': 'Sachgeschichte - Achterbahn ',
                'description': '- Die Sendung mit der Maus -',
@ -120,14 +200,9 @@ class WDRIE(InfoExtractor):
        display_id = mobj.group('display_id')
        webpage = self._download_webpage(url, display_id)
-        # for wdr.de the data-extension is in a tag with the class "mediaLink"
+        info_dict = self._extract_wdr_video(webpage, display_id)
        # for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
        # for wdrmaus its in a link to the page in a multiline "videoLink"-tag
        json_metadata = self._html_search_regex(
            r'class=(?:"(?:mediaLink|wdrrPlayerPlayBtn)\b[^"]*"[^>]+|"videoLink\b[^"]*"[\s]*>\n[^\n]*)data-extension="([^"]+)"',
            webpage, 'media link', default=None, flags=re.MULTILINE)
-        if not json_metadata:
+        if not info_dict:
            entries = [
                self.url_result(page_url + href[0], 'WDR')
                for href in re.findall(
@ -140,86 +215,22 @@ class WDRIE(InfoExtractor):
            raise ExtractorError('No downloadable streams found', expected=True)
        media_link_obj = self._parse_json(json_metadata, display_id,
                                          transform_source=js_to_json)
        jsonp_url = media_link_obj['mediaObj']['url']
        metadata = self._download_json(
            jsonp_url, 'metadata', transform_source=strip_jsonp)
        metadata_tracker_data = metadata['trackerData']
        metadata_media_resource = metadata['mediaResource']
        formats = []
        # check if the metadata contains a direct URL to a file
        for kind, media_resource in metadata_media_resource.items():
            if kind not in ('dflt', 'alt'):
                continue
            for tag_name, medium_url in media_resource.items():
                if tag_name not in ('videoURL', 'audioURL'):
                    continue
                ext = determine_ext(medium_url)
                if ext == 'm3u8':
                    formats.extend(self._extract_m3u8_formats(
                        medium_url, display_id, 'mp4', 'm3u8_native',
                        m3u8_id='hls'))
                elif ext == 'f4m':
                    manifest_url = update_url_query(
                        medium_url, {'hdcore': '3.2.0', 'plugin': 'aasp-3.2.0.77.18'})
                    formats.extend(self._extract_f4m_formats(
                        manifest_url, display_id, f4m_id='hds', fatal=False))
                elif ext == 'smil':
                    formats.extend(self._extract_smil_formats(
                        medium_url, 'stream', fatal=False))
                else:
                    a_format = {
                        'url': medium_url
                    }
                    if ext == 'unknown_video':
                        urlh = self._request_webpage(
                            medium_url, display_id, note='Determining extension')
                        ext = urlhandle_detect_ext(urlh)
                        a_format['ext'] = ext
                    formats.append(a_format)
        self._sort_formats(formats)
        subtitles = {}
        caption_url = metadata_media_resource.get('captionURL')
        if caption_url:
            subtitles['de'] = [{
                'url': caption_url,
                'ext': 'ttml',
            }]
        title = metadata_tracker_data.get('trackerClipTitle')
        is_live = url_type == 'live'
        if is_live:
-            title = self._live_title(title)
+            info_dict.update({
-            upload_date = None
+                'title': self._live_title(info_dict['title']),
-        elif 'trackerClipAirTime' in metadata_tracker_data:
+                'upload_date': None,
-            upload_date = metadata_tracker_data['trackerClipAirTime']
+            })
-        else:
+        elif 'upload_date' not in info_dict:
-            upload_date = self._html_search_meta('DC.Date', webpage, 'upload date')
+            info_dict['upload_date'] = unified_strdate(self._html_search_meta('DC.Date', webpage, 'upload date'))
-        if upload_date:
+        info_dict.update({
            upload_date = unified_strdate(upload_date)
        return {
            'id': metadata_tracker_data.get('trackerClipId', display_id),
            'display_id': display_id,
            'title': title,
            'alt_title': metadata_tracker_data.get('trackerClipSubcategory'),
            'formats': formats,
            'upload_date': upload_date,
            'description': self._html_search_meta('Description', webpage),
            'is_live': is_live,
-            'subtitles': subtitles,
+        })
-        }
+
        return info_dict
 class WDRMobileIE(InfoExtractor):
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@ -1,3 +1,3 @@
 from __future__ import unicode_literals
-__version__ = '2016.06.18.1'
+__version__ = '2016.06.19'
Author	SHA1	Message	Date
Sergey M․	589568789f	release 2016.06.19	2016-06-19 02:30:29 +07:00
Sergey M․	7577d849a6	[r7] Fix extraction and add support for articles (Closes #9826 )	2016-06-19 02:25:34 +07:00
Sergey M․	cb23192bc4	[closertotruth] Update and improve (Closes #8680 )	2016-06-19 00:35:29 +07:00
Steven Gosseling	41c1023300	[closertotruth] Add extractor Removed print statement from code. Replaced two regex searches with the corret ones. Removed some unnecessary semicolumns fixed title extraction refactored everything to search_regex processed comments on commit 5650b0d, fixed feedback from flake8 Improved regexes and returns info dict now. Added support for closertotruth interview URL Added support for episodes page	2016-06-18 23:19:56 +07:00
Sergey M․	90b6288cce	[arte:+7] Simplify _VALID_URL	2016-06-18 22:23:48 +07:00
Sergey M․	c1823c8ad9	[README.md] Remove 'small' from description (#9814 )	2016-06-18 22:08:48 +07:00
Sergey M․	d7c6c656c5	[arte:+7] Expand _VALID_URL (Closes #9820 )	2016-06-18 21:42:17 +07:00
Yen Chi Hsuan	b0b128049a	[extractors] Update references to sportschau (#9799 )	2016-06-18 13:43:47 +08:00
Yen Chi Hsuan	e8f13f2637	[sportschau.de] Fix extraction and moved to its own file (closes #9799 )	2016-06-18 13:42:58 +08:00
Yen Chi Hsuan	b5aad37f6b	[ard] Remove SportschauIE, which is now based on WDR (#9799 )	2016-06-18 13:42:39 +08:00
Yen Chi Hsuan	6d0d4fc26d	[wdr] Add WDRBaseIE, for Sportschau (#9799 )	2016-06-18 13:40:55 +08:00
Yen Chi Hsuan	0278aa443f	[br] Skip invalid tests	2016-06-18 12:53:48 +08:00
Yen Chi Hsuan	1f35745758	[azubu] Don't fail on optional fields	2016-06-18 12:39:08 +08:00
Yen Chi Hsuan	573c35272f	[bbc] Skip a geo-restricted test case	2016-06-18 12:35:55 +08:00
Yen Chi Hsuan	09e3f91e40	[arte] Update _TESTS and fix for pages with multiple YouTube videos Some tests are from #6895 and #6613	2016-06-18 12:34:58 +08:00
Yen Chi Hsuan	1b6cf16be7	[aftonbladet] Fix extraction	2016-06-18 12:27:39 +08:00
Yen Chi Hsuan	26264cb056	[adobetv] Use embedded data in the webpage Sometimes the HTML webpage is returned even with '?format=json'	2016-06-18 12:21:40 +08:00
Yen Chi Hsuan	a72df5f36f	[mtvservices] Fix ext for RTMP streams	2016-06-18 12:19:06 +08:00
Yen Chi Hsuan	c878e635de	[bet] Moved to MTVServices	2016-06-18 12:17:24 +08:00
`@ -1,3 +1,3 @@`
	`from __future__ import unicode_literals`	`from __future__ import unicode_literals`

	`__version__ = '2016.06.18.1'`	`__version__ = '2016.06.19'`