Compare commits

...

19 Commits

Author SHA1 Message Date
cd7342755f release 2015.02.03.1 2015-02-03 10:59:27 +01:00
9bb8e0a3f9 [wsj] Add new extractor (Fixes #4854) 2015-02-03 10:58:28 +01:00
1a6373ef39 [sort_formats] Prefer bitrate over video size
720p @ 1000KB/s looks way better than 1080p @ 500KB/s
2015-02-03 10:53:07 +01:00
f6c24009be [YoutubeDL] Calculate thumbnail IDs automatically 2015-02-03 10:52:22 +01:00
d862042301 [aftonbladet] Modernize 2015-02-03 10:18:32 +01:00
23d9ded655 [franceculture] Rewrite for new HTML scheme (Fixes #4853) 2015-02-03 10:17:13 +01:00
4c1a017e69 release 2015.02.03 2015-02-03 00:22:52 +01:00
ee623d9247 [descripts/release] Regenerate auxiliary documentation on build as well 2015-02-03 00:22:17 +01:00
330537d08a [README] typo 2015-02-03 00:20:57 +01:00
2cf0ecac7b [ffmpeg] --add-metadata: Set comment and purl fields (Fixes #4847) 2015-02-03 00:16:45 +01:00
d200b11c7e [Makefile] Simplify clean/cleanall 2015-02-03 00:14:42 +01:00
d0eca21021 release 2015.02.02.5 2015-02-02 23:47:19 +01:00
c1147c05e1 [brightcove] Fix up more generically invalid XML (Fixes #4849) 2015-02-02 23:47:14 +01:00
55898ad2cf release 2015.02.02.4 2015-02-02 23:39:03 +01:00
a465808592 Merge branch 'master' of github.com:rg3/youtube-dl 2015-02-02 23:38:54 +01:00
5c4862bad4 [normalboots] Remove unused import 2015-02-02 23:38:45 +01:00
995029a142 [nerdist] Add new extractor (Fixes #4851) 2015-02-02 23:38:35 +01:00
a57b562cff [nfl] Add support for articles pages (fixes #4848) 2015-02-02 23:17:00 +01:00
531572578e [normalboots] Modernize 2015-02-02 23:04:39 +01:00
20 changed files with 329 additions and 87 deletions

View File

@ -1,4 +1,6 @@
Please include the full output of the command when run with `--verbose`. The output (including the first lines) contain important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever. **Please include the full output of youtube-dl when run with `-v`**.
The output (including the first lines) contain important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
Please re-read your issue once again to avoid a couple of common mistakes (you can and should use this as a checklist): Please re-read your issue once again to avoid a couple of common mistakes (you can and should use this as a checklist):
@ -122,7 +124,7 @@ If you want to add support for a new site, you can follow this quick list (assum
5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py). 5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will be then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will be then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/common/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L38). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/common/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L38). Add tests and code for as many as you want.
8. If you can, check the code with [pyflakes](https://pypi.python.org/pypi/pyflakes) (a good idea) and [pep8](https://pypi.python.org/pypi/pep8) (optional, ignore E501). 8. If you can, check the code with [flake8](https://pypi.python.org/pypi/flake8).
9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this: 9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/__init__.py $ git add youtube_dl/extractor/__init__.py

View File

@ -1,10 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean: clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json CONTRIBUTING.md.tmp rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 CONTRIBUTING.md.tmp youtube-dl youtube-dl.exe
cleanall: clean
rm -f youtube-dl youtube-dl.exe
PREFIX ?= /usr/local PREFIX ?= /usr/local
BINDIR ?= $(PREFIX)/bin BINDIR ?= $(PREFIX)/bin

View File

@ -728,7 +728,7 @@ In particular, every site support request issue should only pertain to services
### Is anyone going to need the feature? ### Is anyone going to need the feature?
Only post features that you (or an incapicated friend you can personally talk to) require. Do not post features because they seem like a good idea. If they are really useful, they will be requested by someone who requires them. Only post features that you (or an incapacitated friend you can personally talk to) require. Do not post features because they seem like a good idea. If they are really useful, they will be requested by someone who requires them.
### Is your question about youtube-dl? ### Is your question about youtube-dl?

View File

@ -35,7 +35,7 @@ if [ ! -z "$useless_files" ]; then echo "ERROR: Non-.py files in youtube_dl: $us
if [ ! -f "updates_key.pem" ]; then echo 'ERROR: updates_key.pem missing'; exit 1; fi if [ ! -f "updates_key.pem" ]; then echo 'ERROR: updates_key.pem missing'; exit 1; fi
/bin/echo -e "\n### First of all, testing..." /bin/echo -e "\n### First of all, testing..."
make cleanall make clean
if $skip_tests ; then if $skip_tests ; then
echo 'SKIPPING TESTS' echo 'SKIPPING TESTS'
else else
@ -45,9 +45,9 @@ fi
/bin/echo -e "\n### Changing version in version.py..." /bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Committing README.md and youtube_dl/version.py..." /bin/echo -e "\n### Committing documentation and youtube_dl/version.py..."
make README.md make README.md CONTRIBUTING.md supportedsites
git add README.md youtube_dl/version.py git add README.md CONTRIBUTING.md docs/supportedsites.md youtube_dl/version.py
git commit -m "release $version" git commit -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..." /bin/echo -e "\n### Now tagging, signing and pushing..."

View File

@ -9,6 +9,7 @@
- **8tracks** - **8tracks**
- **9gag** - **9gag**
- **abc.net.au** - **abc.net.au**
- **Abc7News**
- **AcademicEarth:Course** - **AcademicEarth:Course**
- **AddAnime** - **AddAnime**
- **AdobeTV** - **AdobeTV**
@ -16,9 +17,12 @@
- **Aftonbladet** - **Aftonbladet**
- **AlJazeera** - **AlJazeera**
- **Allocine** - **Allocine**
- **AlphaPorno**
- **anitube.se** - **anitube.se**
- **AnySex** - **AnySex**
- **Aparat** - **Aparat**
- **AppleDailyAnimationNews**
- **AppleDailyRealtimeNews**
- **AppleTrailers** - **AppleTrailers**
- **archive.org**: archive.org videos - **archive.org**: archive.org videos
- **ARD** - **ARD**
@ -30,8 +34,10 @@
- **arte.tv:ddc** - **arte.tv:ddc**
- **arte.tv:embed** - **arte.tv:embed**
- **arte.tv:future** - **arte.tv:future**
- **AtresPlayer**
- **ATTTechChannel**
- **audiomack** - **audiomack**
- **AUEngine** - **audiomack:album**
- **Azubu** - **Azubu**
- **bambuser** - **bambuser**
- **bambuser:channel** - **bambuser:channel**
@ -71,8 +77,10 @@
- **cmt.com** - **cmt.com**
- **CNET** - **CNET**
- **CNN** - **CNN**
- **CNNArticle**
- **CNNBlogs** - **CNNBlogs**
- **CollegeHumor** - **CollegeHumor**
- **CollegeRama**
- **ComCarCoff** - **ComCarCoff**
- **ComedyCentral** - **ComedyCentral**
- **ComedyCentralShows**: The Daily Show / The Colbert Report - **ComedyCentralShows**: The Daily Show / The Colbert Report
@ -82,23 +90,27 @@
- **Crunchyroll** - **Crunchyroll**
- **crunchyroll:playlist** - **crunchyroll:playlist**
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CtsNews**
- **culturebox.francetvinfo.fr** - **culturebox.francetvinfo.fr**
- **dailymotion** - **dailymotion**
- **dailymotion:playlist** - **dailymotion:playlist**
- **dailymotion:user** - **dailymotion:user**
- **daum.net** - **daum.net**
- **DBTV** - **DBTV**
- **DctpTv**
- **DeezerPlaylist** - **DeezerPlaylist**
- **defense.gouv.fr** - **defense.gouv.fr**
- **Discovery** - **Discovery**
- **divxstage**: DivxStage - **divxstage**: DivxStage
- **Dotsub** - **Dotsub**
- **DRBonanza**
- **Dropbox** - **Dropbox**
- **DrTuber** - **DrTuber**
- **DRTV** - **DRTV**
- **Dump** - **Dump**
- **dvtv**: http://video.aktualne.cz/ - **dvtv**: http://video.aktualne.cz/
- **EbaumsWorld** - **EbaumsWorld**
- **EchoMsk**
- **eHow** - **eHow**
- **Einthusan** - **Einthusan**
- **eitb.tv** - **eitb.tv**
@ -108,6 +120,7 @@
- **EMPFlix** - **EMPFlix**
- **Engadget** - **Engadget**
- **Eporner** - **Eporner**
- **EroProfile**
- **Escapist** - **Escapist**
- **EveryonesMixtape** - **EveryonesMixtape**
- **exfm**: ex.fm - **exfm**: ex.fm
@ -143,6 +156,7 @@
- **GDCVault** - **GDCVault**
- **generic**: Generic downloader that works on some sites - **generic**: Generic downloader that works on some sites
- **GiantBomb** - **GiantBomb**
- **Giga**
- **Glide**: Glide mobile video messages (glide.me) - **Glide**: Glide mobile video messages (glide.me)
- **Globo** - **Globo**
- **GodTube** - **GodTube**
@ -153,9 +167,14 @@
- **Grooveshark** - **Grooveshark**
- **Groupon** - **Groupon**
- **Hark** - **Hark**
- **HearThisAt**
- **Heise** - **Heise**
- **HellPorno**
- **Helsinki**: helsinki.fi - **Helsinki**: helsinki.fi
- **HentaiStigma** - **HentaiStigma**
- **HistoricFilms**
- **hitbox**
- **hitbox:live**
- **HornBunny** - **HornBunny**
- **HostingBulk** - **HostingBulk**
- **HotNewHipHop** - **HotNewHipHop**
@ -182,6 +201,7 @@
- **jpopsuki.tv** - **jpopsuki.tv**
- **Jukebox** - **Jukebox**
- **Kankan** - **Kankan**
- **Karaoketv**
- **keek** - **keek**
- **KeezMovies** - **KeezMovies**
- **KhanAcademy** - **KhanAcademy**
@ -195,6 +215,7 @@
- **LiveLeak** - **LiveLeak**
- **livestream** - **livestream**
- **livestream:original** - **livestream:original**
- **LnkGo**
- **lrt.lt** - **lrt.lt**
- **lynda**: lynda.com videos - **lynda**: lynda.com videos
- **lynda:course**: lynda.com online courses - **lynda:course**: lynda.com online courses
@ -235,6 +256,7 @@
- **MySpass** - **MySpass**
- **myvideo** - **myvideo**
- **MyVidster** - **MyVidster**
- **n-tv.de**
- **Naver** - **Naver**
- **NBA** - **NBA**
- **NBC** - **NBC**
@ -242,11 +264,16 @@
- **ndr**: NDR.de - Mediathek - **ndr**: NDR.de - Mediathek
- **NDTV** - **NDTV**
- **NerdCubedFeed** - **NerdCubedFeed**
- **Nerdist**
- **Netzkino**
- **Newgrounds** - **Newgrounds**
- **Newstube** - **Newstube**
- **NextMedia**
- **NextMediaActionNews**
- **nfb**: National Film Board of Canada - **nfb**: National Film Board of Canada
- **nfl.com** - **nfl.com**
- **nhl.com** - **nhl.com**
- **nhl.com:news**: NHL news
- **nhl.com:videocenter**: NHL videocenter category - **nhl.com:videocenter**: NHL videocenter category
- **niconico**: ニコニコ動画 - **niconico**: ニコニコ動画
- **NiconicoPlaylist** - **NiconicoPlaylist**
@ -257,18 +284,20 @@
- **Nowness** - **Nowness**
- **nowvideo**: NowVideo - **nowvideo**: NowVideo
- **npo.nl** - **npo.nl**
- **npo.nl:live**
- **NRK** - **NRK**
- **NRKTV** - **NRKTV**
- **NTV** - **ntv.ru**
- **Nuvid** - **Nuvid**
- **NYTimes** - **NYTimes**
- **ocw.mit.edu** - **ocw.mit.edu**
- **OktoberfestTV** - **OktoberfestTV**
- **on.aol.com** - **on.aol.com**
- **Ooyala** - **Ooyala**
- **OpenFilm**
- **orf:fm4**: radio FM4
- **orf:oe1**: Radio Österreich 1 - **orf:oe1**: Radio Österreich 1
- **orf:tvthek**: ORF TVthek - **orf:tvthek**: ORF TVthek
- **ORFFM4**: radio FM4
- **parliamentlive.tv**: UK parliament videos - **parliamentlive.tv**: UK parliament videos
- **Patreon** - **Patreon**
- **PBS** - **PBS**
@ -290,6 +319,7 @@
- **Pyvideo** - **Pyvideo**
- **QuickVid** - **QuickVid**
- **radio.de** - **radio.de**
- **radiobremen**
- **radiofrance** - **radiofrance**
- **Rai** - **Rai**
- **RBMARadio** - **RBMARadio**
@ -300,6 +330,8 @@
- **RottenTomatoes** - **RottenTomatoes**
- **Roxwel** - **Roxwel**
- **RTBF** - **RTBF**
- **Rte**
- **RTL2**
- **RTLnow** - **RTLnow**
- **rtlxl.nl** - **rtlxl.nl**
- **RTP** - **RTP**
@ -309,6 +341,7 @@
- **RUHD** - **RUHD**
- **rutube**: Rutube videos - **rutube**: Rutube videos
- **rutube:channel**: Rutube channels - **rutube:channel**: Rutube channels
- **rutube:embed**: Rutube embedded videos
- **rutube:movie**: Rutube movies - **rutube:movie**: Rutube movies
- **rutube:person**: Rutube person videos - **rutube:person**: Rutube person videos
- **RUTV**: RUTV.RU - **RUTV**: RUTV.RU
@ -351,11 +384,12 @@
- **Sport5** - **Sport5**
- **SportBox** - **SportBox**
- **SportDeutschland** - **SportDeutschland**
- **SRMediathek**: Süddeutscher Rundfunk - **SRMediathek**: Saarländischer Rundfunk
- **stanfordoc**: Stanford Open ClassRoom - **stanfordoc**: Stanford Open ClassRoom
- **Steam** - **Steam**
- **streamcloud.eu** - **streamcloud.eu**
- **StreamCZ** - **StreamCZ**
- **StreetVoice**
- **SunPorno** - **SunPorno**
- **SWRMediathek** - **SWRMediathek**
- **Syfy** - **Syfy**
@ -375,7 +409,9 @@
- **TeleBruxelles** - **TeleBruxelles**
- **telecinco.es** - **telecinco.es**
- **TeleMB** - **TeleMB**
- **TeleTask**
- **TenPlay** - **TenPlay**
- **TestTube**
- **TF1** - **TF1**
- **TheOnion** - **TheOnion**
- **ThePlatform** - **ThePlatform**
@ -403,8 +439,15 @@
- **tv.dfb.de** - **tv.dfb.de**
- **tvigle**: Интернет-телевидение Tvigle.ru - **tvigle**: Интернет-телевидение Tvigle.ru
- **tvp.pl** - **tvp.pl**
- **tvp.pl:Series**
- **TVPlay**: TV3Play and related services - **TVPlay**: TV3Play and related services
- **Twitch** - **twitch:bookmarks**
- **twitch:chapter**
- **twitch:past_broadcasts**
- **twitch:profile**
- **twitch:stream**
- **twitch:video**
- **twitch:vod**
- **Ubu** - **Ubu**
- **udemy** - **udemy**
- **udemy:course** - **udemy:course**
@ -433,6 +476,8 @@
- **videoweed**: VideoWeed - **videoweed**: VideoWeed
- **Vidme** - **Vidme**
- **Vidzi** - **Vidzi**
- **vier**
- **vier:videos**
- **viki** - **viki**
- **vimeo** - **vimeo**
- **vimeo:album** - **vimeo:album**
@ -460,11 +505,13 @@
- **WDR** - **WDR**
- **wdr:mobile** - **wdr:mobile**
- **WDRMaus**: Sendung mit der Maus - **WDRMaus**: Sendung mit der Maus
- **WebOfStories**
- **Weibo** - **Weibo**
- **Wimp** - **Wimp**
- **Wistia** - **Wistia**
- **WorldStarHipHop** - **WorldStarHipHop**
- **wrzuta.pl** - **wrzuta.pl**
- **WSJ**: Wall Street Journal
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XHamster** - **XHamster**
@ -472,7 +519,9 @@
- **XNXX** - **XNXX**
- **XTube** - **XTube**
- **XTubeUser**: XTube user profile - **XTubeUser**: XTube user profile
- **Xuite**
- **XVideos** - **XVideos**
- **XXXYMovies**
- **Yahoo**: Yahoo screen and movies - **Yahoo**: Yahoo screen and movies
- **YesJapan** - **YesJapan**
- **Ynet** - **Ynet**
@ -491,7 +540,6 @@
- **youtube:search_url**: YouTube.com search URLs - **youtube:search_url**: YouTube.com search URLs
- **youtube:show**: YouTube.com (multi-season) shows - **youtube:show**: YouTube.com (multi-season) shows
- **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication) - **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication)
- **youtube:toplist**: YouTube.com top lists, "yttoplist:{channel}:{list title}" (Example: "yttoplist:music:Top Tracks")
- **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword) - **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)
- **youtube:watch_later**: Youtube watch later list, ":ytwatchlater" for short (requires authentication) - **youtube:watch_later**: Youtube watch later list, ":ytwatchlater" for short (requires authentication)
- **ZDF** - **ZDF**

View File

@ -103,6 +103,16 @@ def expect_info_dict(self, got_dict, expected_dict):
self.assertTrue( self.assertTrue(
match_rex.match(got), match_rex.match(got),
'field %s (value: %r) should match %r' % (info_field, got, match_str)) 'field %s (value: %r) should match %r' % (info_field, got, match_str))
elif isinstance(expected, compat_str) and expected.startswith('startswith:'):
got = got_dict.get(info_field)
start_str = expected[len('startswith:'):]
self.assertTrue(
isinstance(got, compat_str),
'Expected a %s object, but got %s for field %s' % (
compat_str.__name__, type(got).__name__, info_field))
self.assertTrue(
got.startswith(start_str),
'field %s (value: %r) should start with %r' % (info_field, got, start_str))
elif isinstance(expected, type): elif isinstance(expected, type):
got = got_dict.get(info_field) got = got_dict.get(info_field)
self.assertTrue(isinstance(got, expected), self.assertTrue(isinstance(got, expected),

View File

@ -156,6 +156,9 @@ class TestUtil(unittest.TestCase):
self.assertEqual( self.assertEqual(
unified_strdate('11/26/2014 11:30:00 AM PST', day_first=False), unified_strdate('11/26/2014 11:30:00 AM PST', day_first=False),
'20141126') '20141126')
self.assertEqual(
unified_strdate('2/2/2015 6:47:40 PM', day_first=False),
'20150202')
def test_find_xpath_attr(self): def test_find_xpath_attr(self):
testxml = '''<root> testxml = '''<root>

View File

@ -964,9 +964,11 @@ class YoutubeDL(object):
thumbnails.sort(key=lambda t: ( thumbnails.sort(key=lambda t: (
t.get('preference'), t.get('width'), t.get('height'), t.get('preference'), t.get('width'), t.get('height'),
t.get('id'), t.get('url'))) t.get('id'), t.get('url')))
for t in thumbnails: for i, t in enumerate(thumbnails):
if 'width' in t and 'height' in t: if 'width' in t and 'height' in t:
t['resolution'] = '%dx%d' % (t['width'], t['height']) t['resolution'] = '%dx%d' % (t['width'], t['height'])
if t.get('id') is None:
t['id'] = '%d' % i
if thumbnails and 'thumbnail' not in info_dict: if thumbnails and 'thumbnail' not in info_dict:
info_dict['thumbnail'] = thumbnails[-1]['url'] info_dict['thumbnail'] = thumbnails[-1]['url']

View File

@ -285,6 +285,7 @@ from .ndr import NDRIE
from .ndtv import NDTVIE from .ndtv import NDTVIE
from .netzkino import NetzkinoIE from .netzkino import NetzkinoIE
from .nerdcubed import NerdCubedFeedIE from .nerdcubed import NerdCubedFeedIE
from .nerdist import NerdistIE
from .newgrounds import NewgroundsIE from .newgrounds import NewgroundsIE
from .newstube import NewstubeIE from .newstube import NewstubeIE
from .nextmedia import ( from .nextmedia import (
@ -553,6 +554,7 @@ from .wimp import WimpIE
from .wistia import WistiaIE from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE from .worldstarhiphop import WorldStarHipHopIE
from .wrzuta import WrzutaIE from .wrzuta import WrzutaIE
from .wsj import WSJIE
from .xbef import XBefIE from .xbef import XBefIE
from .xboxclips import XboxClipsIE from .xboxclips import XboxClipsIE
from .xhamster import XHamsterIE from .xhamster import XHamsterIE

View File

@ -1,8 +1,6 @@
# encoding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
@ -21,9 +19,7 @@ class AftonbladetIE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.search(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
# find internal video meta data # find internal video meta data

View File

@ -108,7 +108,7 @@ class BrightcoveIE(InfoExtractor):
""" """
# Fix up some stupid HTML, see https://github.com/rg3/youtube-dl/issues/1553 # Fix up some stupid HTML, see https://github.com/rg3/youtube-dl/issues/1553
object_str = re.sub(r'(<param name="[^"]+" value="[^"]+")>', object_str = re.sub(r'(<param(?:\s+[a-zA-Z0-9_]+="[^"]*")*)>',
lambda m: m.group(1) + '/>', object_str) lambda m: m.group(1) + '/>', object_str)
# Fix up some stupid XML, see https://github.com/rg3/youtube-dl/issues/1608 # Fix up some stupid XML, see https://github.com/rg3/youtube-dl/issues/1608
object_str = object_str.replace('<--', '<!--') object_str = object_str.replace('<--', '<!--')

View File

@ -145,6 +145,7 @@ class InfoExtractor(object):
thumbnail: Full URL to a video thumbnail image. thumbnail: Full URL to a video thumbnail image.
description: Full video description. description: Full video description.
uploader: Full name of the video uploader. uploader: Full name of the video uploader.
creator: The main artist who created the video.
timestamp: UNIX timestamp of the moment the video became available. timestamp: UNIX timestamp of the moment the video became available.
upload_date: Video upload date (YYYYMMDD). upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp. If not explicitly set, calculated from timestamp.
@ -704,11 +705,11 @@ class InfoExtractor(object):
preference, preference,
f.get('language_preference') if f.get('language_preference') is not None else -1, f.get('language_preference') if f.get('language_preference') is not None else -1,
f.get('quality') if f.get('quality') is not None else -1, f.get('quality') if f.get('quality') is not None else -1,
f.get('height') if f.get('height') is not None else -1,
f.get('width') if f.get('width') is not None else -1,
ext_preference,
f.get('tbr') if f.get('tbr') is not None else -1, f.get('tbr') if f.get('tbr') is not None else -1,
f.get('vbr') if f.get('vbr') is not None else -1, f.get('vbr') if f.get('vbr') is not None else -1,
ext_preference,
f.get('height') if f.get('height') is not None else -1,
f.get('width') if f.get('width') is not None else -1,
f.get('abr') if f.get('abr') is not None else -1, f.get('abr') if f.get('abr') is not None else -1,
audio_ext_preference, audio_ext_preference,
f.get('fps') if f.get('fps') is not None else -1, f.get('fps') if f.get('fps') is not None else -1,
@ -860,10 +861,13 @@ class InfoExtractor(object):
return formats return formats
# TODO: improve extraction # TODO: improve extraction
def _extract_smil_formats(self, smil_url, video_id): def _extract_smil_formats(self, smil_url, video_id, fatal=True):
smil = self._download_xml( smil = self._download_xml(
smil_url, video_id, 'Downloading SMIL file', smil_url, video_id, 'Downloading SMIL file',
'Unable to download SMIL file') 'Unable to download SMIL file', fatal=fatal)
if smil is False:
assert not fatal
return []
base = smil.find('./head/meta').get('base') base = smil.find('./head/meta').get('base')

View File

@ -1,77 +1,69 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs,
compat_urlparse, compat_urlparse,
) )
from ..utils import (
determine_ext,
int_or_none,
)
class FranceCultureIE(InfoExtractor): class FranceCultureIE(InfoExtractor):
_VALID_URL = r'(?P<baseurl>http://(?:www\.)?franceculture\.fr/)player/reecouter\?play=(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?franceculture\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://www.franceculture.fr/player/reecouter?play=4795174', 'url': 'http://www.franceculture.fr/player/reecouter?play=4795174',
'info_dict': { 'info_dict': {
'id': '4795174', 'id': '4795174',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Rendez-vous au pays des geeks', 'title': 'Rendez-vous au pays des geeks',
'alt_title': 'Carnet nomade | 13-14',
'vcodec': 'none', 'vcodec': 'none',
'uploader': 'Colette Fellous',
'upload_date': '20140301', 'upload_date': '20140301',
'duration': 3601,
'thumbnail': r're:^http://www\.franceculture\.fr/.*/images/player/Carnet-nomade\.jpg$', 'thumbnail': r're:^http://www\.franceculture\.fr/.*/images/player/Carnet-nomade\.jpg$',
'description': 'Avec :Jean-Baptiste Péretié pour son documentaire sur Arte "La revanche des « geeks », une enquête menée aux Etats-Unis dans la S ...', 'description': 'startswith:Avec :Jean-Baptiste Péretié pour son documentaire sur Arte "La revanche des « geeks », une enquête menée aux Etats',
'timestamp': 1393700400,
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
baseurl = mobj.group('baseurl')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
params_code = self._search_regex(
r"<param name='movie' value='/sites/all/modules/rf/rf_player/swf/loader.swf\?([^']+)' />", video_path = self._search_regex(
webpage, 'parameter code') r'<a id="player".*?href="([^"]+)"', webpage, 'video path')
params = compat_parse_qs(params_code) video_url = compat_urlparse.urljoin(url, video_path)
video_url = compat_urlparse.urljoin(baseurl, params['urlAOD'][0]) timestamp = int_or_none(self._search_regex(
r'<a id="player".*?data-date="([0-9]+)"',
webpage, 'upload date', fatal=False))
thumbnail = self._search_regex(
r'<a id="player".*?>\s+<img src="([^"]+)"',
webpage, 'thumbnail', fatal=False)
title = self._html_search_regex( title = self._html_search_regex(
r'<h1 class="title[^"]+">(.+?)</h1>', webpage, 'title') r'<span class="title-diffusion">(.*?)</span>', webpage, 'title')
alt_title = self._html_search_regex(
r'<span class="title">(.*?)</span>',
webpage, 'alt_title', fatal=False)
description = self._html_search_regex(
r'<span class="description">(.*?)</span>',
webpage, 'description', fatal=False)
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'(?s)<div id="emission".*?<span class="author">(.*?)</span>', r'(?s)<div id="emission".*?<span class="author">(.*?)</span>',
webpage, 'uploader', fatal=False) webpage, 'uploader', default=None)
thumbnail_part = self._html_search_regex( vcodec = 'none' if determine_ext(video_url.lower()) == 'mp3' else None
r'(?s)<div id="emission".*?<img src="([^"]+)"', webpage,
'thumbnail', fatal=False)
if thumbnail_part is None:
thumbnail = None
else:
thumbnail = compat_urlparse.urljoin(baseurl, thumbnail_part)
description = self._html_search_regex(
r'(?s)<p class="desc">(.*?)</p>', webpage, 'description')
info = json.loads(params['infoData'][0])[0]
duration = info.get('media_length')
upload_date_candidate = info.get('media_section5')
upload_date = (
upload_date_candidate
if (upload_date_candidate is not None and
re.match(r'[0-9]{8}$', upload_date_candidate))
else None)
return { return {
'id': video_id, 'id': video_id,
'url': video_url, 'url': video_url,
'vcodec': 'none' if video_url.lower().endswith('.mp3') else None, 'vcodec': vcodec,
'duration': duration,
'uploader': uploader, 'uploader': uploader,
'upload_date': upload_date, 'timestamp': timestamp,
'title': title, 'title': title,
'alt_title': alt_title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'description': description, 'description': description,
} }

View File

@ -0,0 +1,80 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_iso8601,
xpath_text,
)
class NerdistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nerdist\.com/vepisode/(?P<id>[^/?#]+)'
_TEST = {
'url': 'http://www.nerdist.com/vepisode/exclusive-which-dc-characters-w',
'md5': '3698ed582931b90d9e81e02e26e89f23',
'info_dict': {
'display_id': 'exclusive-which-dc-characters-w',
'id': 'RPHpvJyr',
'ext': 'mp4',
'title': 'Your TEEN TITANS Revealed! Who\'s on the show?',
'thumbnail': 're:^https?://.*/thumbs/.*\.jpg$',
'description': 'Exclusive: Find out which DC Comics superheroes will star in TEEN TITANS Live-Action TV Show on Nerdist News with Jessica Chobot!',
'uploader': 'Eric Diaz',
'upload_date': '20150202',
'timestamp': 1422892808,
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'''(?x)<script\s+(?:type="text/javascript"\s+)?
src="https?://content\.nerdist\.com/players/([a-zA-Z0-9_]+)-''',
webpage, 'video ID')
timestamp = parse_iso8601(self._html_search_meta(
'shareaholic:article_published_time', webpage, 'upload date'))
uploader = self._html_search_meta(
'shareaholic:article_author_name', webpage, 'article author')
doc = self._download_xml(
'http://content.nerdist.com/jw6/%s.xml' % video_id, video_id)
video_info = doc.find('.//item')
title = xpath_text(video_info, './title', fatal=True)
description = xpath_text(video_info, './description')
thumbnail = xpath_text(
video_info, './{http://rss.jwpcdn.com/}image', 'thumbnail')
formats = []
for source in video_info.findall('./{http://rss.jwpcdn.com/}source'):
vurl = source.attrib['file']
ext = determine_ext(vurl)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
vurl, video_id, entry_protocol='m3u8_native', ext='mp4',
preference=0))
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
vurl, video_id, fatal=False
))
else:
formats.append({
'format_id': ext,
'url': vurl,
})
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'formats': formats,
'uploader': uploader,
}

View File

@ -46,7 +46,18 @@ class NFLIE(InfoExtractor):
'timestamp': 1388354455, 'timestamp': 1388354455,
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
} }
} },
{
'url': 'http://www.nfl.com/news/story/0ap3000000467586/article/patriots-seahawks-involved-in-lategame-skirmish',
'info_dict': {
'id': '0ap3000000467607',
'ext': 'mp4',
'title': 'Frustrations flare on the field',
'description': 'Emotions ran high at the end of the Super Bowl on both sides of the ball after a dramatic finish.',
'timestamp': 1422850320,
'upload_date': '20150202',
},
},
] ]
@staticmethod @staticmethod
@ -80,7 +91,11 @@ class NFLIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
config_url = NFLIE.prepend_host(host, self._search_regex( config_url = NFLIE.prepend_host(host, self._search_regex(
r'(?:config|configURL)\s*:\s*"([^"]+)"', webpage, 'config URL')) r'(?:config|configURL)\s*:\s*"([^"]+)"', webpage, 'config URL',
default='static/content/static/config/video/config.json'))
# For articles, the id in the url is not the video id
video_id = self._search_regex(
r'contentId\s*:\s*"([^"]+)"', webpage, 'video id', default=video_id)
config = self._download_json(config_url, video_id, config = self._download_json(config_url, video_id,
note='Downloading player config') note='Downloading player config')
url_template = NFLIE.prepend_host( url_template = NFLIE.prepend_host(

View File

@ -1,8 +1,6 @@
# encoding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
@ -11,7 +9,7 @@ from ..utils import (
class NormalbootsIE(InfoExtractor): class NormalbootsIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?normalboots\.com/video/(?P<videoid>[0-9a-z-]*)/?$' _VALID_URL = r'http://(?:www\.)?normalboots\.com/video/(?P<id>[0-9a-z-]*)/?$'
_TEST = { _TEST = {
'url': 'http://normalboots.com/video/home-alone-games-jontron/', 'url': 'http://normalboots.com/video/home-alone-games-jontron/',
'md5': '8bf6de238915dd501105b44ef5f1e0f6', 'md5': '8bf6de238915dd501105b44ef5f1e0f6',
@ -30,19 +28,22 @@ class NormalbootsIE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('videoid')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_uploader = self._html_search_regex(r'Posted\sby\s<a\shref="[A-Za-z0-9/]*">(?P<uploader>[A-Za-z]*)\s</a>',
webpage, 'uploader')
raw_upload_date = self._html_search_regex('<span style="text-transform:uppercase; font-size:inherit;">[A-Za-z]+, (?P<date>.*)</span>',
webpage, 'date')
video_upload_date = unified_strdate(raw_upload_date)
player_url = self._html_search_regex(r'<iframe\swidth="[0-9]+"\sheight="[0-9]+"\ssrc="(?P<url>[\S]+)"', webpage, 'url') video_uploader = self._html_search_regex(
r'Posted\sby\s<a\shref="[A-Za-z0-9/]*">(?P<uploader>[A-Za-z]*)\s</a>',
webpage, 'uploader', fatal=False)
video_upload_date = unified_strdate(self._html_search_regex(
r'<span style="text-transform:uppercase; font-size:inherit;">[A-Za-z]+, (?P<date>.*)</span>',
webpage, 'date', fatal=False))
player_url = self._html_search_regex(
r'<iframe\swidth="[0-9]+"\sheight="[0-9]+"\ssrc="(?P<url>[\S]+)"',
webpage, 'player url')
player_page = self._download_webpage(player_url, video_id) player_page = self._download_webpage(player_url, video_id)
video_url = self._html_search_regex(r"file:\s'(?P<file>[^']+\.mp4)'", player_page, 'file') video_url = self._html_search_regex(
r"file:\s'(?P<file>[^']+\.mp4)'", player_page, 'file')
return { return {
'id': video_id, 'id': video_id,

View File

@ -0,0 +1,89 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
)
class WSJIE(InfoExtractor):
_VALID_URL = r'https?://video-api\.wsj\.com/api-video/player/iframe\.html\?guid=(?P<id>[a-zA-Z0-9-]+)'
IE_DESC = 'Wall Street Journal'
_TEST = {
'url': 'http://video-api.wsj.com/api-video/player/iframe.html?guid=1BD01A4C-BFE8-40A5-A42F-8A8AF9898B1A',
'md5': '9747d7a6ebc2f4df64b981e1dde9efa9',
'info_dict': {
'id': '1BD01A4C-BFE8-40A5-A42F-8A8AF9898B1A',
'ext': 'mp4',
'upload_date': '20150202',
'uploader_id': 'bbright',
'creator': 'bbright',
'categories': list, # a long list
'duration': 90,
'title': 'Bills Coach Rex Ryan Updates His Old Jets Tattoo',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
bitrates = [128, 174, 264, 320, 464, 664, 1264]
api_url = (
'http://video-api.wsj.com/api-video/find_all_videos.asp?'
'type=guid&count=1&query=%s&'
'fields=hls,adZone,thumbnailList,guid,state,secondsUntilStartTime,'
'author,description,name,linkURL,videoStillURL,duration,videoURL,'
'adCategory,catastrophic,linkShortURL,doctypeID,youtubeID,'
'titletag,rssURL,wsj-section,wsj-subsection,allthingsd-section,'
'allthingsd-subsection,sm-section,sm-subsection,provider,'
'formattedCreationDate,keywords,keywordsOmniture,column,editor,'
'emailURL,emailPartnerID,showName,omnitureProgramName,'
'omnitureVideoFormat,linkRelativeURL,touchCastID,'
'omniturePublishDate,%s') % (
video_id, ','.join('video%dkMP4Url' % br for br in bitrates))
info = self._download_json(api_url, video_id)['items'][0]
# Thumbnails are conveniently in the correct format already
thumbnails = info.get('thumbnailList')
creator = info.get('author')
uploader_id = info.get('editor')
categories = info.get('keywords')
duration = int_or_none(info.get('duration'))
upload_date = unified_strdate(
info.get('formattedCreationDate'), day_first=False)
title = info.get('name', info.get('titletag'))
formats = [{
'format_id': 'f4m',
'format_note': 'f4m (meta URL)',
'url': info['videoURL'],
}]
if info.get('hls'):
formats.extend(self._extract_m3u8_formats(
info['hls'], video_id, ext='mp4',
preference=0, entry_protocol='m3u8_native'))
for br in bitrates:
field = 'video%dkMP4Url' % br
if info.get(field):
formats.append({
'format_id': 'mp4-%d' % br,
'container': 'mp4',
'tbr': br,
'url': info[field],
})
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'thumbnails': thumbnails,
'creator': creator,
'uploader_id': uploader_id,
'duration': duration,
'upload_date': upload_date,
'title': title,
'formats': formats,
'categories': categories,
}

View File

@ -511,8 +511,9 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
metadata['artist'] = info['uploader_id'] metadata['artist'] = info['uploader_id']
if info.get('description') is not None: if info.get('description') is not None:
metadata['description'] = info['description'] metadata['description'] = info['description']
metadata['comment'] = info['description']
if info.get('webpage_url') is not None: if info.get('webpage_url') is not None:
metadata['comment'] = info['webpage_url'] metadata['purl'] = info['webpage_url']
if not metadata: if not metadata:
self._downloader.to_screen('[ffmpeg] There isn\'t any metadata to add') self._downloader.to_screen('[ffmpeg] There isn\'t any metadata to add')

View File

@ -701,7 +701,7 @@ def unified_strdate(date_str, day_first=True):
# %z (UTC offset) is only supported in python>=3.2 # %z (UTC offset) is only supported in python>=3.2
date_str = re.sub(r' ?(\+|-)[0-9]{2}:?[0-9]{2}$', '', date_str) date_str = re.sub(r' ?(\+|-)[0-9]{2}:?[0-9]{2}$', '', date_str)
# Remove AM/PM + timezone # Remove AM/PM + timezone
date_str = re.sub(r'(?i)\s*(?:AM|PM)\s+[A-Z]+', '', date_str) date_str = re.sub(r'(?i)\s*(?:AM|PM)(?:\s+[A-Z]+)?', '', date_str)
format_expressions = [ format_expressions = [
'%d %B %Y', '%d %B %Y',

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2015.02.02.3' __version__ = '2015.02.03.1'