Set version number

Fix code for metacafe.com (this fixes issue #8 )
Fix some minor unicode-related problems
2010-10-31 11:24:08 +01:00 · 2010-10-31 11:24:08 +01:00 · 2010-10-31 11:24:08 +01:00 · 2010-10-31 11:24:08 +01:00 · 2010-10-31 11:24:08 +01:00 · 2010-10-31 11:24:08 +01:00
4 changed files with 213 additions and 453 deletions
--- a/.hgignore
+++ b/.hgignore
@@ -1,4 +1,2 @@
 syntax: glob
-index.html
-youtube-dl-*
 .*.swp
--- a/15
+++ b/15
@@ -1,15 +0,0 @@
-#!/usr/bin/env python
-import hashlib
-import subprocess
-
-template = file('index.html.in', 'r').read()
-version = subprocess.Popen(['./youtube-dl', '--version'], stdout=subprocess.PIPE).communicate()[0].strip()
-data = file('youtube-dl', 'rb').read()
-md5sum = hashlib.md5(data).hexdigest()
-sha1sum = hashlib.sha1(data).hexdigest()
-sha256sum = hashlib.sha256(data).hexdigest()
-template = template.replace('@PROGRAM_VERSION@', version)
-template = template.replace('@PROGRAM_MD5SUM@', md5sum)
-template = template.replace('@PROGRAM_SHA1SUM@', sha1sum)
-template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
-file('index.html', 'w').write(template)
--- a/index.html.in
+++ b/index.html.in
@@ -1,229 +0,0 @@
-<!DOCTYPE html 
-     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
-<head>
-	<meta http-equiv="Content-type" content="text/html; charset=UTF-8" />
-	<title>youtube-dl: Download videos from YouTube.com</title>
-	<style type="text/css"><!--
-		body {
-			font-family: sans-serif;
-			font-size: small;
-		}
-		h1 {
-			text-align: center;
-			text-decoration: underline;
-			color: #006699;
-		}
-		h2 {
-			color: #006699;
-		}
-		p {
-			text-align: justify;
-			margin-left: 5%;
-			margin-right: 5%;
-		}
-		ul {
-			margin-left: 5%;
-			margin-right: 5%;
-			list-style-type: square;
-		}
-		li {
-			margin-bottom: 0.5ex;
-		}
-		.smallnote {
-			font-size: x-small;
-			text-align: center;
-		}
-		--></style>
-</head>
-<body>
-<h1>youtube-dl: Download videos from YouTube.com</h1>
-
-<p class="smallnote">(and more...)</p>
-
-<h2>What is it?</h2>
-
-<p><em>youtube-dl</em> is a small command-line program to download videos
-from YouTube.com. It requires the <a href="http://www.python.org/">Python
-interpreter</a>, version 2.4 or later, and it's not platform specific.
-It should work in your Unix box, in Windows or in Mac OS X. The latest version
-is <strong>@PROGRAM_VERSION@</strong>. It's released to the public domain,
-which means you can modify it, redistribute it or use it however you like.</p>
-
-<p>I'll try to keep it updated if YouTube.com changes the way you access
-their videos. After all, it's a simple and short program. However, I can't
-guarantee anything. If you detect it stops working, check for new versions
-and/or inform me about the problem, indicating the program version you
-are using. If the program stops working and I can't solve the problem but
-you have a solution, I'd like to know it. If that happens and you feel you
-can maintain the program yourself, tell me. My contact information is
-at <a href="http://freshmeat.net/~rg3/">freshmeat.net</a>.</p>
-
-<p>Thanks for all the feedback received so far. I'm glad people find my
-program useful.</p>
-
-<h2>Usage instructions</h2>
-
-<p>In Windows, once you have installed the Python interpreter, save the
-program with the <em>.py</em> extension and put it somewhere in the PATH.
-Try to follow the
-<a href="http://rg03.wordpress.com/youtube-dl-under-windows-xp/">guide to
-install youtube-dl under Windows XP</a>.</p>
-
-<p>In Unix, download it, give it execution permission and copy it to one
-of the PATH directories (typically, <em>/usr/local/bin</em>).</p>
-
-<p>After that, you should be able to call it from the command line as
-<em>youtube-dl</em> or <em>youtube-dl.py</em>. I will use <em>youtube-dl</em>
-in the following examples. Usage instructions are easy. Use <em>youtube-dl</em>
-followed by a video URL or identifier. Example: <em>youtube-dl
-"http://www.youtube.com/watch?v=foobar"</em>. The video will be saved
-to the file <em>foobar.flv</em> in that example. As YouTube.com
-videos are in Flash Video format, their extension should be <em>flv</em>.
-In Linux and other unices, video players using a recent version of
-<em>ffmpeg</em> can play them. That includes MPlayer, VLC, etc. Those two
-work under Windows and other platforms, but you could also get a
-specific FLV player of your taste.</p>
-
-<p>If you try to run the program and you receive an error message containing the
-keyword <em>SyntaxError</em> near the end, it means your Python interpreter
-is too old.</p>
-
-<h2>More usage tips</h2>
-
-<ul>
-
-<li>You can change the file name of the video using the -o option, like in
-<em>youtube-dl -o vid.flv "http://www.youtube.com/watch?v=foobar"</em>.
-Read the <a href="#otpl">Output template</a> section for more details on
-this.</li>
-
-<li>Some videos require an account to be downloaded, mostly because they're
-flagged as mature content. You can pass the program a username and password
-for a YouTube.com account with the -u and -p options, like <em>youtube-dl
-u myusername -p mypassword "http://www.youtube.com/watch?v=foobar"</em>.</li>
-
-<li>The account data can also be read from the user .netrc file by indicating
-the -n or --netrc option. The machine name is <em>youtube</em> in that
-case.</li>
-
-<li>The <em>simulate mode</em> (activated with -s or --simulate) can be used
-to just get the real video URL and use it with a download manager if you
-prefer that option.</li>
-
-<li>The <em>quiet mode</em> (activated with -q or --quiet) can be used to
-supress all output messages. This allows, in systems featuring /dev/stdout
-and other similar special files, outputting the video data to standard output
-in order to pipe it to another program without interferences.</li>
-
-<li>The program can be told to simply print the final video URL to standard
-output using the -g or --get-url option.</li>
-
-<li>In a similar line, the -e or --get-title option tells the program to print
-the video title.</li>
-
-<li>The default filename is <em>video_id.flv</em>. But you can also use the
-video title in the filename with the -t or --title option, or preserve the
-literal title in the filename with the -l or --literal option.</li>
-
-<li>You can make the program append <em>&amp;fmt=something</em> to the URL
-by using the -f or --format option. This makes it possible to download high
-quality versions of the videos when available.</li>
-
-<li>The -b or --best-quality option is an alias for -f 18.</li>
-
-<li>The -m or --mobile-version option is an alias for -f 17.</li>
-
-<li>Normally, the program will stop on the first error, but you can tell it
-to attempt to download every video with the -i or --ignore-errors option.</li>
-
-<li>The -a or --batch-file option lets you specify a file to read URLs from.
-The file must contain one URL per line.</li>
-
-<li>The program can be told not to overwrite existing files using the -w or
--no-overwrites option.</li>
-
-<li>For YouTube, you can also use the URL of a playlist, and it will download
-all the videos in that playlist.</li>
-
-<li>For YouTube, you can also use the special word <em>ytsearch</em> to
-download search results. With <em>ytsearch</em> it will download the
-first search result. With <em>ytsearchN</em>, where N is a number, it
-will download the first N results. With <em>ytsearchall</em> it will
-download every result for that search. In most systems you'll need to
-use quotes for multiple words. Example: <em>youtube-dl "ytsearch3:cute
-kittens"</em>.
-
-<li><em>youtube-dl</em> honors the <em>http_proxy</em> environment variable
-if you want to use a proxy. Set it to something like
-<em>http://proxy.example.com:8080</em>, and do not leave the <em>http://</em>
-prefix out.</li>
-
-<li>You can get the program version by calling it as <em>youtube-dl
-v</em> or <em>youtube-dl --version</em>.</li>
-
-<li>For usage instructions, use <em>youtube-dl -h</em> or <em>youtube-dl
--help.</em></li>
-
-<li>You can cancel the program at any time pressing Ctrl+C. It may print
-some error lines saying something about <em>KeyboardInterrupt</em>.
-That's ok.</li>
-
-</ul>
-
-<h2>Download it</h2>
-
-<p>Note that if you directly click on these hyperlinks, your web browser will
-most likely display the program contents. It's usually better to
-right-click on it and choose the appropriate option, normally called <em>Save
-Target As</em> or <em>Save Link As</em>, depending on the web browser you
-are using.</p>
-
-<p><a href="youtube-dl">@PROGRAM_VERSION@</a></p>
-<ul>
-        <li><strong>MD5</strong>: @PROGRAM_MD5SUM@</li>
-        <li><strong>SHA1</strong>: @PROGRAM_SHA1SUM@</li>
-        <li><strong>SHA256</strong>: @PROGRAM_SHA256SUM@</li>
-</ul>
-
-<h2 id="otpl">Output template</h2>
-
-<p>The -o option allows users to indicate a template for the output file names.
-The basic usage is not to set any template arguments when downloading a single
-file, like in <em>youtube-dl -o funny_video.flv 'http://some/video'</em>.
-However, it may contain special sequences that will be replaced when
-downloading each video. The special sequences have the format
-<strong>%(NAME)s</strong>. To clarify, that's a percent symbol followed by a
-name in parenthesis, followed by a lowercase S. Allowed names are:</p>
-
-<ul>
-<li><em>id</em>: The sequence will be replaced by the video identifier.</li>
-<li><em>url</em>: The sequence will be replaced by the video URL.</li>
-<li><em>uploader</em>: The sequence will be replaced by the nickname of the
-person who uploaded the video.</li>
-<li><em>title</em>: The sequence will be replaced by the literal video
-title.</li>
-<li><em>stitle</em>: The sequence will be replaced by a simplified video
-title, restricted to alphanumeric characters and dashes.</li>
-<li><em>ext</em>: The sequence will be replaced by the appropriate
-extension (like <em>flv</em> or <em>mp4</em>).</li>
-</ul>
-
-<p>As you may have guessed, the default template is <em>%(id)s.%(ext)s</em>.
-When some command line options are used, it's replaced by other templates like
-<em>%(title)s-%(id)s.%(ext)s</em>. You can specify your own.</p>
-
-<h2>Authors</h2>
-
-<ul>
-<li>Ricardo Garcia Gonzalez: program core, YouTube.com InfoExtractor,
-metacafe.com InfoExtractor and YouTube playlist InfoExtractor.</li>
-<li>Danny Colligan: YouTube search InfoExtractor, ideas and patches.</li>
-<li>Many other people contributing patches, code, ideas and kind messages. Too
-many to be listed here. You know who you are. Thank you very much.</li>
-</ul>
-
-<p class="smallnote">Copyright &copy; 2006-2007 Ricardo Garcia Gonzalez</p>
-</body>
-</html>
--- a/420
+++ b/420
@@ -18,8 +18,8 @@ import time
 import urllib
 import urllib2

-std_headers = {	
-	'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5',
+std_headers = {
+	'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8',
 	'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
 	'Accept': 'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5',
 	'Accept-Language': 'en-us,en;q=0.5',
@@ -65,16 +65,17 @@ class FileDownloader(object):
 	For this, file downloader objects have a method that allows
 	InfoExtractors to be registered in a given order. When it is passed
 	a URL, the file downloader handles it to the first InfoExtractor it
-	finds that reports being able to handle it. The InfoExtractor returns
-	all the information to the FileDownloader and the latter downloads the
-	file or does whatever it's instructed to do.
+	finds that reports being able to handle it. The InfoExtractor extracts
+	all the information about the video or videos the URL refers to, and
+	asks the FileDownloader to process the video information, possibly
+	downloading the video.

 	File downloaders accept a lot of parameters. In order not to saturate
 	the object constructor with arguments, it receives a dictionary of
-	options instead. These options are available through the get_params()
-	method for the InfoExtractors to use. The FileDownloader also registers
-	itself as the downloader in charge for the InfoExtractors that are
-	added to it, so this is a "mutual registration".
+	options instead. These options are available through the params
+	attribute for the InfoExtractors to use. The FileDownloader also
+	registers itself as the downloader in charge for the InfoExtractors
+	that are added to it, so this is a "mutual registration".

 	Available options:

@@ -92,15 +93,17 @@ class FileDownloader(object):
 	nooverwrites:	Prevent overwriting files.
 	"""

-	_params = None
+	params = None
 	_ies = []
 	_pps = []
+	_download_retcode = None

 	def __init__(self, params):
 		"""Create a FileDownloader object with the given options."""
 		self._ies = []
 		self._pps = []
-		self.set_params(params)
+		self._download_retcode = 0
+		self.params = params
 	
 	@staticmethod
 	def pmkdir(filename):
@@ -144,7 +147,7 @@ class FileDownloader(object):
 			return '--:--'
 		return '%02d:%02d' % (eta_mins, eta_secs)

- 	@staticmethod
+	@staticmethod
 	def calc_speed(start, now, bytes):
 		dif = now - start
 		if bytes == 0 or dif < 0.001: # One millisecond
@@ -174,16 +177,6 @@ class FileDownloader(object):
 		multiplier = 1024.0 ** 'bkmgtpezy'.index(matchobj.group(2).lower())
 		return long(round(number * multiplier))

-	def set_params(self, params):
-		"""Sets parameters."""
-		if type(params) != dict:
-			raise ValueError('params: dictionary expected')
-		self._params = params
-	
-	def get_params(self):
-		"""Get parameters."""
-		return self._params
-
 	def add_info_extractor(self, ie):
 		"""Add an InfoExtractor object to the end of the list."""
 		self._ies.append(ie)
@@ -196,8 +189,8 @@ class FileDownloader(object):
 	
 	def to_stdout(self, message, skip_eol=False):
 		"""Print message to stdout if not in quiet mode."""
-		if not self._params.get('quiet', False):
-			print u'%s%s' % (message, [u'\n', u''][skip_eol]),
+		if not self.params.get('quiet', False):
+			print (u'%s%s' % (message, [u'\n', u''][skip_eol])).encode(locale.getpreferredencoding()),
 			sys.stdout.flush()
 	
 	def to_stderr(self, message):
@@ -206,26 +199,24 @@ class FileDownloader(object):
 	
 	def fixed_template(self):
 		"""Checks if the output template is fixed."""
-		return (re.search(ur'(?u)%\(.+?\)s', self._params['outtmpl']) is None)
+		return (re.search(ur'(?u)%\(.+?\)s', self.params['outtmpl']) is None)

 	def trouble(self, message=None):
 		"""Determine action to take when a download problem appears.

 		Depending on if the downloader has been configured to ignore
 		download errors or not, this method may throw an exception or
-		not when errors are found, after printing the message. If it
-		doesn't raise, it returns an error code suitable to be returned
-		later as a program exit code to indicate error.
+		not when errors are found, after printing the message.
 		"""
 		if message is not None:
 			self.to_stderr(message)
-		if not self._params.get('ignoreerrors', False):
+		if not self.params.get('ignoreerrors', False):
 			raise DownloadError(message)
-		return 1
+		self._download_retcode = 1

 	def slow_down(self, start_time, byte_counter):
 		"""Sleep if the download speed is over the rate limit."""
-		rate_limit = self._params.get('ratelimit', None)
+		rate_limit = self.params.get('ratelimit', None)
 		if rate_limit is None or byte_counter == 0:
 			return
 		now = time.time()
@@ -249,77 +240,78 @@ class FileDownloader(object):
 		"""Report download finished."""
 		self.to_stdout(u'')

+	def process_info(self, info_dict):
+		"""Process a single dictionary returned by an InfoExtractor."""
+		# Forced printings
+		if self.params.get('forcetitle', False):
+			print info_dict['title'].encode(locale.getpreferredencoding())
+		if self.params.get('forceurl', False):
+			print info_dict['url'].encode(locale.getpreferredencoding())
+			
+		# Do nothing else if in simulate mode
+		if self.params.get('simulate', False):
+			return
+
+		try:
+			filename = self.params['outtmpl'] % info_dict
+			self.report_destination(filename)
+		except (ValueError, KeyError), err:
+			self.trouble('ERROR: invalid output template or system charset: %s' % str(err))
+		if self.params['nooverwrites'] and os.path.exists(filename):
+			self.to_stderr('WARNING: file exists: %s; skipping' % filename)
+			return
+		try:
+			self.pmkdir(filename)
+		except (OSError, IOError), err:
+			self.trouble('ERROR: unable to create directories: %s' % str(err))
+			return
+		try:
+			outstream = open(filename, 'wb')
+		except (OSError, IOError), err:
+			self.trouble('ERROR: unable to open for writing: %s' % str(err))
+			return
+		try:
+			self._do_download(outstream, info_dict['url'])
+			outstream.close()
+		except (OSError, IOError), err:
+			self.trouble('ERROR: unable to write video data: %s' % str(err))
+			return
+		except (urllib2.URLError, httplib.HTTPException, socket.error), err:
+			self.trouble('ERROR: unable to download video data: %s' % str(err))
+			return
+		try:
+			self.post_process(filename, info_dict)
+		except (PostProcessingError), err:
+			self.trouble('ERROR: postprocessing: %s' % str(err))
+			return
+
+		return
+
 	def download(self, url_list):
 		"""Download a given list of URLs."""
-		retcode = 0
 		if len(url_list) > 1 and self.fixed_template():
-			raise SameFileError(self._params['outtmpl'])
+			raise SameFileError(self.params['outtmpl'])

 		for url in url_list:
 			suitable_found = False
 			for ie in self._ies:
+				# Go to next InfoExtractor if not suitable
 				if not ie.suitable(url):
 					continue
+
 				# Suitable InfoExtractor found
 				suitable_found = True
-				all_results = ie.extract(url)
-				results = [x for x in all_results if x is not None]
-				if len(results) != len(all_results):
-					retcode = self.trouble()

-				if len(results) > 1 and self.fixed_template():
-					raise SameFileError(self._params['outtmpl'])
-
-				for result in results:
-					# Forced printings
-					if self._params.get('forcetitle', False):
-						print result['title']
-					if self._params.get('forceurl', False):
-						print result['url']
-						
-					# Do nothing else if in simulate mode
-					if self._params.get('simulate', False):
-						continue
-
-					try:
-						filename = self._params['outtmpl'] % result
-						self.report_destination(filename)
-					except (ValueError, KeyError), err:
-						retcode = self.trouble('ERROR: invalid output template or system charset: %s' % str(err))
-						continue
-					if self._params['nooverwrites'] and os.path.exists(filename):
-						self.to_stderr('WARNING: file exists: %s; skipping' % filename)
-						continue
-					try:
-						self.pmkdir(filename)
-					except (OSError, IOError), err:
-						retcode = self.trouble('ERROR: unable to create directories: %s' % str(err))
-						continue
-					try:
-						outstream = open(filename, 'wb')
-					except (OSError, IOError), err:
-						retcode = self.trouble('ERROR: unable to open for writing: %s' % str(err))
-						continue
-					try:
-						self._do_download(outstream, result['url'])
-						outstream.close()
-					except (OSError, IOError), err:
-						retcode = self.trouble('ERROR: unable to write video data: %s' % str(err))
-						continue
-					except (urllib2.URLError, httplib.HTTPException, socket.error), err:
-						retcode = self.trouble('ERROR: unable to download video data: %s' % str(err))
-						continue
-					try:
-						self.post_process(filename, result)
-					except (PostProcessingError), err:
-						retcode = self.trouble('ERROR: postprocessing: %s' % str(err))
-						continue
+				# Extract information from URL and process it
+				ie.extract(url)

+				# Suitable InfoExtractor had been found; go to next URL
 				break
-			if not suitable_found:
-				retcode = self.trouble('ERROR: no suitable InfoExtractor: %s' % url)

-		return retcode
+			if not suitable_found:
+				self.trouble('ERROR: no suitable InfoExtractor: %s' % url)
+
+		return self._download_retcode

 	def post_process(self, filename, ie_info):
 		"""Run the postprocessing chain on the given file."""
@@ -369,9 +361,10 @@ class InfoExtractor(object):
 	Information extractors are the classes that, given a URL, extract
 	information from the video (or videos) the URL refers to. This
 	information includes the real video URL, the video title and simplified
-	title, author and others. It is returned in a list of dictionaries when
-	calling its extract() method. It is a list because a URL can refer to
-	more than one video (think of playlists). The dictionaries must include
+	title, author and others. The information is stored in a dictionary
+	which is then passed to the FileDownloader. The FileDownloader
+	processes this information possibly downloading the video to the file
+	system, among other possible outcomes. The dictionaries must include
 	the following fields:

 	id:		Video identifier.
@@ -415,15 +408,6 @@ class InfoExtractor(object):
 		"""Sets the downloader for this IE."""
 		self._downloader = downloader
 	
-	def to_stdout(self, message):
-		"""Print message to stdout if downloader is not in quiet mode."""
-		if self._downloader is None or not self._downloader.get_params().get('quiet', False):
-			print message
-	
-	def to_stderr(self, message):
-		"""Print message to stderr."""
-		print >>sys.stderr, message
-
 	def _real_initialize(self):
 		"""Real initialization process. Redefine in subclasses."""
 		pass
@@ -445,37 +429,60 @@ class YoutubeIE(InfoExtractor):
 	def suitable(url):
 		return (re.match(YoutubeIE._VALID_URL, url) is not None)

+	@staticmethod
+	def htmlentity_transform(matchobj):
+		"""Transforms an HTML entity to a Unicode character."""
+		entity = matchobj.group(1)
+
+		# Known non-numeric HTML entity
+		if entity in htmlentitydefs.name2codepoint:
+			return unichr(htmlentitydefs.name2codepoint[entity])
+
+		# Unicode character
+		mobj = re.match(ur'(?u)#(x?\d+)', entity)
+		if mobj is not None:
+			numstr = mobj.group(1)
+			if numstr.startswith(u'x'):
+				base = 16
+				numstr = u'0%s' % numstr
+			else:
+				base = 10
+			return unichr(long(numstr, base))
+
+		# Unknown entity in name, return its literal representation
+		return (u'&%s;' % entity)
+
 	def report_lang(self):
 		"""Report attempt to set language."""
-		self.to_stdout(u'[youtube] Setting language')
+		self._downloader.to_stdout(u'[youtube] Setting language')

 	def report_login(self):
 		"""Report attempt to log in."""
-		self.to_stdout(u'[youtube] Logging in')
+		self._downloader.to_stdout(u'[youtube] Logging in')
 	
 	def report_age_confirmation(self):
 		"""Report attempt to confirm age."""
-		self.to_stdout(u'[youtube] Confirming age')
+		self._downloader.to_stdout(u'[youtube] Confirming age')
 	
 	def report_webpage_download(self, video_id):
 		"""Report attempt to download webpage."""
-		self.to_stdout(u'[youtube] %s: Downloading video webpage' % video_id)
+		self._downloader.to_stdout(u'[youtube] %s: Downloading video webpage' % video_id)
 	
 	def report_information_extraction(self, video_id):
 		"""Report attempt to extract video information."""
-		self.to_stdout(u'[youtube] %s: Extracting video information' % video_id)
+		self._downloader.to_stdout(u'[youtube] %s: Extracting video information' % video_id)
 	
 	def report_video_url(self, video_id, video_real_url):
 		"""Report extracted video URL."""
-		self.to_stdout(u'[youtube] %s: URL: %s' % (video_id, video_real_url))
-
+		self._downloader.to_stdout(u'[youtube] %s: URL: %s' % (video_id, video_real_url))
+	
 	def _real_initialize(self):
 		if self._downloader is None:
 			return

 		username = None
 		password = None
-		downloader_params = self._downloader.get_params()
+		downloader_params = self._downloader.params

 		# Attempt to use provided username and password or .netrc data
 		if downloader_params.get('username', None) is not None:
@@ -490,20 +497,20 @@ class YoutubeIE(InfoExtractor):
 				else:
 					raise netrc.NetrcParseError('No authenticators for %s' % self._NETRC_MACHINE)
 			except (IOError, netrc.NetrcParseError), err:
-				self.to_stderr(u'WARNING: parsing .netrc: %s' % str(err))
+				self._downloader.to_stderr(u'WARNING: parsing .netrc: %s' % str(err))
 				return

-		# No authentication to be performed
-		if username is None:
-			return
-
 		# Set language
-		request = urllib2.Request(self._LOGIN_URL, None, std_headers)
+		request = urllib2.Request(self._LANG_URL, None, std_headers)
 		try:
 			self.report_lang()
 			urllib2.urlopen(request).read()
 		except (urllib2.URLError, httplib.HTTPException, socket.error), err:
-			self.to_stderr(u'WARNING: unable to set language: %s' % str(err))
+			self._downloader.to_stderr(u'WARNING: unable to set language: %s' % str(err))
+			return
+
+		# No authentication to be performed
+		if username is None:
 			return

 		# Log in
@@ -519,10 +526,10 @@ class YoutubeIE(InfoExtractor):
 			self.report_login()
 			login_results = urllib2.urlopen(request).read()
 			if re.search(r'(?i)<form[^>]* name="loginForm"', login_results) is not None:
-				self.to_stderr(u'WARNING: unable to log in: bad username or password')
+				self._downloader.to_stderr(u'WARNING: unable to log in: bad username or password')
 				return
 		except (urllib2.URLError, httplib.HTTPException, socket.error), err:
-			self.to_stderr(u'WARNING: unable to log in: %s' % str(err))
+			self._downloader.to_stderr(u'WARNING: unable to log in: %s' % str(err))
 			return
 	
 		# Confirm age
@@ -535,25 +542,29 @@ class YoutubeIE(InfoExtractor):
 			self.report_age_confirmation()
 			age_results = urllib2.urlopen(request).read()
 		except (urllib2.URLError, httplib.HTTPException, socket.error), err:
-			self.to_stderr(u'ERROR: unable to confirm age: %s' % str(err))
+			self._downloader.trouble(u'ERROR: unable to confirm age: %s' % str(err))
 			return

 	def _real_extract(self, url):
 		# Extract video id from URL
 		mobj = re.match(self._VALID_URL, url)
 		if mobj is None:
-			self.to_stderr(u'ERROR: invalid URL: %s' % url)
-			return [None]
+			self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
+			return
 		video_id = mobj.group(2)

 		# Downloader parameters
 		format_param = None
 		if self._downloader is not None:
-			params = self._downloader.get_params()
+			params = self._downloader.params
 			format_param = params.get('format', None)

 		# Extension
-		video_extension = {'18': 'mp4', '17': '3gp'}.get(format_param, 'flv')
+		video_extension = {
+			'17': '3gp',
+			'18': 'mp4',
+			'22': 'mp4',
+		}.get(format_param, 'flv')

 		# Normalize URL, including format
 		normalized_url = 'http://www.youtube.com/watch?v=%s&gl=US&hl=en' % video_id
@@ -564,16 +575,16 @@ class YoutubeIE(InfoExtractor):
 			self.report_webpage_download(video_id)
 			video_webpage = urllib2.urlopen(request).read()
 		except (urllib2.URLError, httplib.HTTPException, socket.error), err:
-			self.to_stderr(u'ERROR: unable to download video webpage: %s' % str(err))
-			return [None]
+			self._downloader.trouble(u'ERROR: unable to download video webpage: %s' % str(err))
+			return
 		self.report_information_extraction(video_id)
 		
 		# "t" param
 		mobj = re.search(r', "t": "([^"]+)"', video_webpage)
 		if mobj is None:
-			self.to_stderr(u'ERROR: unable to extract "t" parameter')
-			return [None]
-		video_real_url = 'http://www.youtube.com/get_video?video_id=%s&t=%s' % (video_id, mobj.group(1))
+			self._downloader.trouble(u'ERROR: unable to extract "t" parameter')
+			return
+		video_real_url = 'http://www.youtube.com/get_video?video_id=%s&t=%s&el=detailpage&ps=' % (video_id, mobj.group(1))
 		if format_param is not None:
 			video_real_url = '%s&fmt=%s' % (video_real_url, format_param)
 		self.report_video_url(video_id, video_real_url)
@@ -581,38 +592,39 @@ class YoutubeIE(InfoExtractor):
 		# uploader
 		mobj = re.search(r"var watchUsername = '([^']+)';", video_webpage)
 		if mobj is None:
-			self.to_stderr(u'ERROR: unable to extract uploader nickname')
-			return [None]
+			self._downloader.trouble(u'ERROR: unable to extract uploader nickname')
+			return
 		video_uploader = mobj.group(1)

 		# title
 		mobj = re.search(r'(?im)<title>YouTube - ([^<]*)</title>', video_webpage)
 		if mobj is None:
-			self.to_stderr(u'ERROR: unable to extract video title')
-			return [None]
+			self._downloader.trouble(u'ERROR: unable to extract video title')
+			return
 		video_title = mobj.group(1).decode('utf-8')
-		video_title = re.sub(ur'(?u)&(.+?);', lambda x: unichr(htmlentitydefs.name2codepoint[x.group(1)]), video_title)
+		video_title = re.sub(ur'(?u)&(.+?);', self.htmlentity_transform, video_title)
 		video_title = video_title.replace(os.sep, u'%')

 		# simplified title
 		simple_title = re.sub(ur'(?u)([^%s]+)' % simple_title_chars, ur'_', video_title)
 		simple_title = simple_title.strip(ur'_')

-		# Return information
-		return [{
+		# Process video information
+		self._downloader.process_info({
 			'id':		video_id.decode('utf-8'),
 			'url':		video_real_url.decode('utf-8'),
 			'uploader':	video_uploader.decode('utf-8'),
 			'title':	video_title,
 			'stitle':	simple_title,
 			'ext':		video_extension.decode('utf-8'),
-			}]
+			})

 class MetacafeIE(InfoExtractor):
 	"""Information Extractor for metacafe.com."""

 	_VALID_URL = r'(?:http://)?(?:www\.)?metacafe\.com/watch/([^/]+)/([^/]+)/.*'
 	_DISCLAIMER = 'http://www.metacafe.com/family_filter/'
+	_FILTER_POST = 'http://www.metacafe.com/f/index.php?inputType=filter&controllerGroup=user'
 	_youtube_ie = None

 	def __init__(self, youtube_ie, downloader=None):
@@ -625,19 +637,19 @@ class MetacafeIE(InfoExtractor):

 	def report_disclaimer(self):
 		"""Report disclaimer retrieval."""
-		self.to_stdout(u'[metacafe] Retrieving disclaimer')
+		self._downloader.to_stdout(u'[metacafe] Retrieving disclaimer')

 	def report_age_confirmation(self):
 		"""Report attempt to confirm age."""
-		self.to_stdout(u'[metacafe] Confirming age')
+		self._downloader.to_stdout(u'[metacafe] Confirming age')
 	
 	def report_download_webpage(self, video_id):
 		"""Report webpage download."""
-		self.to_stdout(u'[metacafe] %s: Downloading webpage' % video_id)
+		self._downloader.to_stdout(u'[metacafe] %s: Downloading webpage' % video_id)
 	
 	def report_extraction(self, video_id):
 		"""Report information extraction."""
-		self.to_stdout(u'[metacafe] %s: Extracting information' % video_id)
+		self._downloader.to_stdout(u'[metacafe] %s: Extracting information' % video_id)

 	def _real_initialize(self):
 		# Retrieve disclaimer
@@ -646,7 +658,7 @@ class MetacafeIE(InfoExtractor):
 			self.report_disclaimer()
 			disclaimer = urllib2.urlopen(request).read()
 		except (urllib2.URLError, httplib.HTTPException, socket.error), err:
-			self.to_stderr(u'ERROR: unable to retrieve disclaimer: %s' % str(err))
+			self._downloader.trouble(u'ERROR: unable to retrieve disclaimer: %s' % str(err))
 			return

 		# Confirm age
@@ -654,27 +666,28 @@ class MetacafeIE(InfoExtractor):
 			'filters': '0',
 			'submit': "Continue - I'm over 18",
 			}
-		request = urllib2.Request('http://www.metacafe.com/', urllib.urlencode(disclaimer_form), std_headers)
+		request = urllib2.Request(self._FILTER_POST, urllib.urlencode(disclaimer_form), std_headers)
 		try:
 			self.report_age_confirmation()
 			disclaimer = urllib2.urlopen(request).read()
 		except (urllib2.URLError, httplib.HTTPException, socket.error), err:
-			self.to_stderr(u'ERROR: unable to confirm age: %s' % str(err))
+			self._downloader.trouble(u'ERROR: unable to confirm age: %s' % str(err))
 			return
 	
 	def _real_extract(self, url):
 		# Extract id and simplified title from URL
 		mobj = re.match(self._VALID_URL, url)
 		if mobj is None:
-			self.to_stderr(u'ERROR: invalid URL: %s' % url)
-			return [None]
+			self._downloader.trouble(u'ERROR: invalid URL: %s' % url)
+			return

 		video_id = mobj.group(1)

 		# Check if video comes from YouTube
 		mobj2 = re.match(r'^yt-(.*)$', video_id)
 		if mobj2 is not None:
-			return self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % mobj2.group(1))
+			self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % mobj2.group(1))
+			return

 		simple_title = mobj.group(2).decode('utf-8')
 		video_extension = 'flv'
@@ -685,46 +698,46 @@ class MetacafeIE(InfoExtractor):
 			self.report_download_webpage(video_id)
 			webpage = urllib2.urlopen(request).read()
 		except (urllib2.URLError, httplib.HTTPException, socket.error), err:
-			self.to_stderr(u'ERROR: unable retrieve video webpage: %s' % str(err))
-			return [None]
+			self._downloader.trouble(u'ERROR: unable retrieve video webpage: %s' % str(err))
+			return

 		# Extract URL, uploader and title from webpage
 		self.report_extraction(video_id)
-		mobj = re.search(r'(?m)"mediaURL":"(http.*?\.flv)"', webpage)
+		mobj = re.search(r'(?m)&mediaURL=(http.*?\.flv)', webpage)
 		if mobj is None:
-			self.to_stderr(u'ERROR: unable to extract media URL')
-			return [None]
-		mediaURL = mobj.group(1).replace('\\', '')
+			self._downloader.trouble(u'ERROR: unable to extract media URL')
+			return
+		mediaURL = urllib.unquote(mobj.group(1))

-		mobj = re.search(r'(?m)"gdaKey":"(.*?)"', webpage)
+		mobj = re.search(r'(?m)&gdaKey=(.*?)&', webpage)
 		if mobj is None:
-			self.to_stderr(u'ERROR: unable to extract gdaKey')
-			return [None]
+			self._downloader.trouble(u'ERROR: unable to extract gdaKey')
+			return
 		gdaKey = mobj.group(1)

 		video_url = '%s?__gda__=%s' % (mediaURL, gdaKey)

 		mobj = re.search(r'(?im)<title>(.*) - Video</title>', webpage)
 		if mobj is None:
-			self.to_stderr(u'ERROR: unable to extract title')
-			return [None]
+			self._downloader.trouble(u'ERROR: unable to extract title')
+			return
 		video_title = mobj.group(1).decode('utf-8')

-		mobj = re.search(r'(?m)<li id="ChnlUsr">.*?Submitter:<br />(.*?)</li>', webpage)
+		mobj = re.search(r'(?ms)<li id="ChnlUsr">.*?Submitter:.*?<a .*?>(.*?)<', webpage)
 		if mobj is None:
-			self.to_stderr(u'ERROR: unable to extract uploader nickname')
-			return [None]
-		video_uploader = re.sub(r'<.*?>', '', mobj.group(1))
+			self._downloader.trouble(u'ERROR: unable to extract uploader nickname')
+			return
+		video_uploader = mobj.group(1)

-		# Return information
-		return [{
+		# Process video information
+		self._downloader.process_info({
 			'id':		video_id.decode('utf-8'),
 			'url':		video_url.decode('utf-8'),
 			'uploader':	video_uploader.decode('utf-8'),
 			'title':	video_title,
 			'stitle':	simple_title,
 			'ext':		video_extension.decode('utf-8'),
-			}]
+			})


 class YoutubeSearchIE(InfoExtractor):
@@ -734,8 +747,9 @@ class YoutubeSearchIE(InfoExtractor):
 	_VIDEO_INDICATOR = r'href="/watch\?v=.+?"'
 	_MORE_PAGES_INDICATOR = r'>Next</a>'
 	_youtube_ie = None
+	_max_youtube_results = 1000

-	def __init__(self, youtube_ie, downloader=None): 
+	def __init__(self, youtube_ie, downloader=None):
 		InfoExtractor.__init__(self, downloader)
 		self._youtube_ie = youtube_ie
 	
@@ -745,7 +759,7 @@ class YoutubeSearchIE(InfoExtractor):

 	def report_download_page(self, query, pagenum):
 		"""Report attempt to download playlist page with given number."""
-		self.to_stdout(u'[youtube] query "%s": Downloading page %s' % (query, pagenum))
+		self._downloader.to_stdout(u'[youtube] query "%s": Downloading page %s' % (query, pagenum))

 	def _real_initialize(self):
 		self._youtube_ie.initialize()
@@ -753,24 +767,31 @@ class YoutubeSearchIE(InfoExtractor):
 	def _real_extract(self, query):
 		mobj = re.match(self._VALID_QUERY, query)
 		if mobj is None:
-			self.to_stderr(u'ERROR: invalid search query "%s"' % query)
-			return [None]
+			self._downloader.trouble(u'ERROR: invalid search query "%s"' % query)
+			return

 		prefix, query = query.split(':')
 		prefix = prefix[8:]
-		if prefix == '': 
-			return self._download_n_results(query, 1)
-		elif prefix == 'all': 
-			return self._download_n_results(query, -1)
-		else: 
+		if prefix == '':
+			self._download_n_results(query, 1)
+			return
+		elif prefix == 'all':
+			self._download_n_results(query, self._max_youtube_results)
+			return
+		else:
 			try:
 				n = int(prefix)
 				if n <= 0:
-					self.to_stderr(u'ERROR: invalid download number %s for query "%s"' % (n, query))
-					return [None]
-				return self._download_n_results(query, n)
+					self._downloader.trouble(u'ERROR: invalid download number %s for query "%s"' % (n, query))
+					return
+				elif n > self._max_youtube_results:
+					self._downloader.to_stderr(u'WARNING: ytsearch returns max %i results (you requested %i)'  % (self._max_youtube_results, n))
+					n = self._max_youtube_results
+				self._download_n_results(query, n)
+				return
 			except ValueError: # parsing prefix as int fails
-				return self._download_n_results(query, 1)
+				self._download_n_results(query, 1)
+				return

 	def _download_n_results(self, query, n):
 		"""Downloads a specified number of results for a query"""
@@ -786,8 +807,8 @@ class YoutubeSearchIE(InfoExtractor):
 			try:
 				page = urllib2.urlopen(request).read()
 			except (urllib2.URLError, httplib.HTTPException, socket.error), err:
-				self.to_stderr(u'ERROR: unable to download webpage: %s' % str(err))
-				return [None]
+				self._downloader.trouble(u'ERROR: unable to download webpage: %s' % str(err))
+				return

 			# Extract video identifiers
 			for mobj in re.finditer(self._VIDEO_INDICATOR, page):
@@ -797,16 +818,14 @@ class YoutubeSearchIE(InfoExtractor):
 					already_seen.add(video_id)
 					if len(video_ids) == n:
 						# Specified n videos reached
-						information = []
 						for id in video_ids:
-							information.extend(self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id))
-						return information
+							self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id)
+						return

 			if self._MORE_PAGES_INDICATOR not in page:
-				information = []
 				for id in video_ids:
-					information.extend(self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id))
-				return information
+					self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id)
+				return

 			pagenum = pagenum + 1

@@ -829,7 +848,7 @@ class YoutubePlaylistIE(InfoExtractor):

 	def report_download_page(self, playlist_id, pagenum):
 		"""Report attempt to download playlist page with given number."""
-		self.to_stdout(u'[youtube] PL %s: Downloading page #%s' % (playlist_id, pagenum))
+		self._downloader.to_stdout(u'[youtube] PL %s: Downloading page #%s' % (playlist_id, pagenum))

 	def _real_initialize(self):
 		self._youtube_ie.initialize()
@@ -838,8 +857,8 @@ class YoutubePlaylistIE(InfoExtractor):
 		# Extract playlist id
 		mobj = re.match(self._VALID_URL, url)
 		if mobj is None:
-			self.to_stderr(u'ERROR: invalid url: %s' % url)
-			return [None]
+			self._downloader.trouble(u'ERROR: invalid url: %s' % url)
+			return

 		# Download playlist pages
 		playlist_id = mobj.group(1)
@@ -852,8 +871,8 @@ class YoutubePlaylistIE(InfoExtractor):
 			try:
 				page = urllib2.urlopen(request).read()
 			except (urllib2.URLError, httplib.HTTPException, socket.error), err:
-				self.to_stderr(u'ERROR: unable to download webpage: %s' % str(err))
-				return [None]
+				self._downloader.trouble(u'ERROR: unable to download webpage: %s' % str(err))
+				return

 			# Extract video identifiers
 			ids_in_page = []
@@ -866,10 +885,9 @@ class YoutubePlaylistIE(InfoExtractor):
 				break
 			pagenum = pagenum + 1

-		information = []
 		for id in video_ids:
-			information.extend(self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id))
-		return information
+			self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id)
+		return

 class PostProcessor(object):
 	"""Post Processor class.
@@ -893,15 +911,6 @@ class PostProcessor(object):
 	def __init__(self, downloader=None):
 		self._downloader = downloader

-	def to_stdout(self, message):
-		"""Print message to stdout if downloader is not in quiet mode."""
-		if self._downloader is None or not self._downloader.get_params().get('quiet', False):
-			print message
-	
-	def to_stderr(self, message):
-		"""Print message to stderr."""
-		print >>sys.stderr, message
-
 	def set_downloader(self, downloader):
 		"""Sets the downloader for this PP."""
 		self._downloader = downloader
@@ -941,7 +950,7 @@ if __name__ == '__main__':
 		# Parse command line
 		parser = optparse.OptionParser(
 				usage='Usage: %prog [options] url...',
-				version='2009.02.07',
+				version='2009.04.25',
 				conflict_handler='resolve',
 				)
 		parser.add_option('-h', '--help',
@@ -970,10 +979,10 @@ if __name__ == '__main__':
 				action='store_true', dest='gettitle', help='simulate, quiet but print title', default=False)
 		parser.add_option('-f', '--format',
 				dest='format', metavar='FMT', help='video format code')
-		parser.add_option('-b', '--best-quality',
-				action='store_const', dest='format', help='alias for -f 18', const='18')
 		parser.add_option('-m', '--mobile-version',
 				action='store_const', dest='format', help='alias for -f 17', const='17')
+		parser.add_option('-d', '--high-def',
+				action='store_const', dest='format', help='alias for -f 22', const='22')
 		parser.add_option('-i', '--ignore-errors',
 				action='store_true', dest='ignoreerrors', help='continue on download errors', default=False)
 		parser.add_option('-r', '--rate-limit',
@@ -1019,9 +1028,6 @@ if __name__ == '__main__':
 		youtube_search_ie = YoutubeSearchIE(youtube_ie)

 		# File downloader
-		charset = locale.getdefaultlocale()[1]
-		if charset is None:
-			charset = 'ascii'
 		fd = FileDownloader({
 			'usenetrc': opts.usenetrc,
 			'username': opts.username,
@@ -1031,7 +1037,7 @@ if __name__ == '__main__':
 			'forcetitle': opts.gettitle,
 			'simulate': (opts.simulate or opts.geturl or opts.gettitle),
 			'format': opts.format,
-			'outtmpl': ((opts.outtmpl is not None and opts.outtmpl.decode(charset))
+			'outtmpl': ((opts.outtmpl is not None and opts.outtmpl.decode(locale.getpreferredencoding()))
 				or (opts.usetitle and u'%(stitle)s-%(id)s.%(ext)s')
 				or (opts.useliteral and u'%(title)s-%(id)s.%(ext)s')
 				or u'%(id)s.%(ext)s'),
Author	SHA1	Message	Date
Ricardo Garcia	27c3383e2d	Set version number	2010-10-31 11:24:08 +01:00
Ricardo Garcia	dbccb6cd84	Fix code for metacafe.com (this fixes issue #8 )	2010-10-31 11:24:08 +01:00
Ricardo Garcia	98164eb3b9	Fix some minor unicode-related problems	2010-10-31 11:24:08 +01:00
Ricardo Garcia	2851b2ca18	Update internal documentation to reflect the new behaviour	2010-10-31 11:24:08 +01:00
Ricardo Garcia	6f21f68629	Download videos after extracting information This is achieved by letting the InfoExtractors instruct its downloader to process the information dictionary just after extracting the information. As a consequence, some code is simplified too.	2010-10-31 11:24:08 +01:00
Ricardo Garcia	147753eb33	Replase self._downloader.to_stderr() with self._downloader.trouble()	2010-10-31 11:24:08 +01:00
Ricardo Garcia	3aaf887e98	Put the downloader in full control of output messages	2010-10-31 11:24:08 +01:00
Ricardo Garcia	9bf386d74b	Move the downloader return code to a class member This makes it possible to initialize it with value zero and later let the trouble() overwrite the value. It simplifies error treatment and paves the way for the InfoExtracto objects to call process_info() themselves, which should solve the issues with tor and some other problems.	2010-10-31 11:24:08 +01:00
Ricardo Garcia	2f4d18a9f7	Use getpreferredencoding() instead of getdefaultlocale() This fixes issue #7 and is recommended after a bug report I made to the Python team: http://bugs.python.org/issue5815	2010-10-31 11:24:08 +01:00
Ricardo Garcia	b0eddb2eb4	Update User-Agent string	2010-10-31 11:24:08 +01:00
Ricardo Garcia	9cee6d9035	Minor adjustments to closely match what a web browser does	2010-10-31 11:24:08 +01:00
Ricardo Garcia	c8619e0163	Move the code to process an InfoExtractor result to its own method	2010-10-31 11:24:08 +01:00
dannycolligan	257453b92b	Added cap if user requests ytsearch number over 1000 (with warning)	2010-10-31 11:24:08 +01:00
dannyc@omega	fd9288c315	Changed ytsearchall to retrieve max 1000 results	2010-10-31 11:24:07 +01:00
Ricardo Garcia	1db4ff6054	Restore internal version number indicator	2010-10-31 11:24:07 +01:00
Ricardo Garcia	763826cf2c	Establish version number	2010-10-31 11:24:04 +01:00
Ricardo Garcia	af6a92f4c9	Fix issue #5	2010-10-31 11:24:04 +01:00
Ricardo Garcia	f995f7127c	Remove some extra whitespace	2010-10-31 11:24:04 +01:00
Ricardo Garcia	e54930cf71	Switch to "INTERNAL" version again	2010-10-31 11:24:04 +01:00
Ricardo Garcia	c6b311c524	Set version number for release	2010-10-31 11:23:58 +01:00
Ricardo Garcia	79e75f66c8	Remove --best-quality option and add proper support for high definition format	2010-10-31 11:23:58 +01:00
Ricardo Garcia	053e77d6ed	Remove old ignore patterns which are no longer needed	2010-10-31 11:23:58 +01:00
Ricardo Garcia	d0a9affb46	Replace setter and getter with simple attribute access	2010-10-31 11:23:58 +01:00
Ricardo Garcia	76800042fd	Replace version number while in progress	2010-10-31 11:23:58 +01:00
Ricardo Garcia	7ab2043c9c	Bump version number	2010-10-31 11:23:52 +01:00
Ricardo Garcia	3e703dd1cd	Remove generator and webpage template, moved to wiki	2010-10-31 11:23:52 +01:00
Ricardo Garcia	cc10940385	Fix very wrong code for setting the language It turned out that, despite the program working without apparent errors, the code for setting the language was completely wrong. First, it didn't run unless some form of authentication was performed. Second, I misstyped _LANG_URL as _LOGIN_URL, so the language was not being set at all! Amazing it still worked.	2010-10-31 11:23:48 +01:00
Ricardo Garcia	5121ef2071	Fix wrong indentation	2010-10-31 11:23:48 +01:00