pyRdfa.extras.httpheader
Utility functions to work with HTTP headers.
This module provides some utility functions useful for parsing and dealing with some of the HTTP 1.1 protocol headers which are not adequately covered by the standard Python libraries.
Requires Python 2.2 or later.
The functionality includes the correct interpretation of the various Accept-* style headers, content negotiation, byte range requests, HTTP-style date/times, and more.
There are a few classes defined by this module:
- class content_type -- media types such as 'text/plain'
- class language_tag -- language tags such as 'en-US'
- class range_set -- a collection of (byte) range specifiers
- class range_spec -- a single (byte) range specifier
The primary functions in this module may be categorized as follows:
Content negotiation functions...
- acceptable_content_type()
- acceptable_language()
- acceptable_charset()
- acceptable_encoding()
Mid-level header parsing functions...
- parse_accept_header()
- parse_accept_language_header()
- parse_range_header()
Date and time...
- http_datetime()
- parse_http_datetime()
Utility functions...
- quote_string()
- remove_comments()
- canonical_charset()
Low level string parsing functions...
- parse_comma_list()
- parse_comment()
- parse_qvalue_accept_list()
- parse_media_type()
- parse_number()
- parse_parameter_list()
- parse_quoted_string()
- parse_range_set()
- parse_range_spec()
- parse_token()
- parse_token_or_quoted_string()
And there are some specialized exception classes:
- RangeUnsatisfiableError
- RangeUnmergableError
- ParseError
See also:
- RFC 2616, "Hypertext Transfer Protocol -- HTTP/1.1", June 1999. http://www.ietf.org/rfc/rfc2616.txt Errata at http://purl.org/NET/http-errata
- RFC 2046, "(MIME) Part Two: Media Types", November 1996. http://www.ietf.org/rfc/rfc2046.txt
RFC 3066, "Tags for the Identification of Languages", January 2001. http://www.ietf.org/rfc/rfc3066.txt
Note: I have made a small modification on the regexp for internet date, to make it more liberal (ie, accept a time zone string of the form +0000) Ivan Herman http://www.ivan-herman.net, March 2011.
Have added statements to make it (hopefully) Python 3 compatible. Ivan Herman http://www.ivan-herman.net, August 2012.
1#!/usr/bin/env python 2# -*- coding: utf-8 -*- 3# 4""" Utility functions to work with HTTP headers. 5 6 This module provides some utility functions useful for parsing 7 and dealing with some of the HTTP 1.1 protocol headers which 8 are not adequately covered by the standard Python libraries. 9 10 Requires Python 2.2 or later. 11 12 The functionality includes the correct interpretation of the various 13 Accept-* style headers, content negotiation, byte range requests, 14 HTTP-style date/times, and more. 15 16 There are a few classes defined by this module: 17 18 * class content_type -- media types such as 'text/plain' 19 * class language_tag -- language tags such as 'en-US' 20 * class range_set -- a collection of (byte) range specifiers 21 * class range_spec -- a single (byte) range specifier 22 23 The primary functions in this module may be categorized as follows: 24 25 * Content negotiation functions... 26 * acceptable_content_type() 27 * acceptable_language() 28 * acceptable_charset() 29 * acceptable_encoding() 30 31 * Mid-level header parsing functions... 32 * parse_accept_header() 33 * parse_accept_language_header() 34 * parse_range_header() 35 36 * Date and time... 37 * http_datetime() 38 * parse_http_datetime() 39 40 * Utility functions... 41 * quote_string() 42 * remove_comments() 43 * canonical_charset() 44 45 * Low level string parsing functions... 46 * parse_comma_list() 47 * parse_comment() 48 * parse_qvalue_accept_list() 49 * parse_media_type() 50 * parse_number() 51 * parse_parameter_list() 52 * parse_quoted_string() 53 * parse_range_set() 54 * parse_range_spec() 55 * parse_token() 56 * parse_token_or_quoted_string() 57 58 And there are some specialized exception classes: 59 60 * RangeUnsatisfiableError 61 * RangeUnmergableError 62 * ParseError 63 64 See also: 65 66 * RFC 2616, "Hypertext Transfer Protocol -- HTTP/1.1", June 1999. 67 <http://www.ietf.org/rfc/rfc2616.txt> 68 Errata at <http://purl.org/NET/http-errata> 69 * RFC 2046, "(MIME) Part Two: Media Types", November 1996. 70 <http://www.ietf.org/rfc/rfc2046.txt> 71 * RFC 3066, "Tags for the Identification of Languages", January 2001. 72 <http://www.ietf.org/rfc/rfc3066.txt> 73 74 75 Note: I have made a small modification on the regexp for internet date, 76 to make it more liberal (ie, accept a time zone string of the form +0000) 77 Ivan Herman <http://www.ivan-herman.net>, March 2011. 78 79 Have added statements to make it (hopefully) Python 3 compatible. 80 Ivan Herman <http://www.ivan-herman.net>, August 2012. 81""" 82 83__author__ = "Deron Meranda <http://deron.meranda.us/>" 84__date__ = "2012-08-31" 85__version__ = "1.02" 86__credits__ = """Copyright (c) 2005 Deron E. Meranda <http://deron.meranda.us/> 87Licensed under GNU LGPL 2.1 or later. See <http://www.fsf.org/>. 88 89This library is free software; you can redistribute it and/or 90modify it under the terms of the GNU Lesser General Public 91License as published by the Free Software Foundation; either 92version 2.1 of the License, or (at your option) any later version. 93 94This library is distributed in the hope that it will be useful, 95but WITHOUT ANY WARRANTY; without even the implied warranty of 96MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 97Lesser General Public License for more details. 98 99You should have received a copy of the GNU Lesser General Public 100License along with this library; if not, write to the Free Software 101Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 102""" 103 104# Character classes from RFC 2616 section 2.2 105SEPARATORS = '()<>@,;:\\"/[]?={} \t' 106LWS = ' \t\n\r' # linear white space 107CRLF = '\r\n' 108DIGIT = '0123456789' 109HEX = '0123456789ABCDEFabcdef' 110 111try: 112 # Turn character classes into set types (for Python 2.4 or greater) 113 SEPARATORS = frozenset([c for c in SEPARATORS]) 114 LWS = frozenset([c for c in LWS]) 115 CRLF = frozenset([c for c in CRLF]) 116 DIGIT = frozenset([c for c in DIGIT]) 117 HEX = frozenset([c for c in HEX]) 118 del c 119except NameError: 120 # on frozenset error, leave as simple strings 121 pass 122 123 124def _is_string(obj): 125 """Returns True if the object is a string.""" 126 return isinstance(obj,str) 127 128 129def http_datetime(dt=None): 130 """Formats a datetime as an HTTP 1.1 Date/Time string. 131 132 Takes a standard Python datetime object and returns a string 133 formatted according to the HTTP 1.1 date/time format. 134 135 If no datetime is provided (or None) then the current 136 time is used. 137 138 ABOUT TIMEZONES: If the passed in datetime object is naive it is 139 assumed to be in UTC already. But if it has a tzinfo component, 140 the returned timestamp string will have been converted to UTC 141 automatically. So if you use timezone-aware datetimes, you need 142 not worry about conversion to UTC. 143 144 """ 145 if not dt: 146 import datetime 147 dt = datetime.datetime.utcnow() 148 else: 149 try: 150 dt = dt - dt.utcoffset() 151 except: 152 pass # no timezone offset, just assume already in UTC 153 154 s = dt.strftime('%a, %d %b %Y %H:%M:%S GMT') 155 return s 156 157 158def parse_http_datetime(datestring, utc_tzinfo=None, strict=False): 159 """Returns a datetime object from an HTTP 1.1 Date/Time string. 160 161 Note that HTTP dates are always in UTC, so the returned datetime 162 object will also be in UTC. 163 164 You can optionally pass in a tzinfo object which should represent 165 the UTC timezone, and the returned datetime will then be 166 timezone-aware (allowing you to more easly translate it into 167 different timzeones later). 168 169 If you set 'strict' to True, then only the RFC 1123 format 170 is recognized. Otherwise the backwards-compatible RFC 1036 171 and Unix asctime(3) formats are also recognized. 172 173 Please note that the day-of-the-week is not validated. 174 Also two-digit years, although not HTTP 1.1 compliant, are 175 treated according to recommended Y2K rules. 176 177 """ 178 import re, datetime 179 m = re.match(r'(?P<DOW>[a-z]+), (?P<D>\d+) (?P<MON>[a-z]+) (?P<Y>\d+) (?P<H>\d+):(?P<M>\d+):(?P<S>\d+(\.\d+)?) (?P<TZ>[a-zA-Z0-9_+]+)$', 180 datestring, re.IGNORECASE) 181 if not m and not strict: 182 m = re.match(r'(?P<DOW>[a-z]+) (?P<MON>[a-z]+) (?P<D>\d+) (?P<H>\d+):(?P<M>\d+):(?P<S>\d+) (?P<Y>\d+)$', 183 datestring, re.IGNORECASE) 184 if not m: 185 m = re.match(r'(?P<DOW>[a-z]+), (?P<D>\d+)-(?P<MON>[a-z]+)-(?P<Y>\d+) (?P<H>\d+):(?P<M>\d+):(?P<S>\d+(\.\d+)?) (?P<TZ>\w+)$', 186 datestring, re.IGNORECASE) 187 if not m: 188 raise ValueError('HTTP date is not correctly formatted') 189 190 try: 191 tz = m.group('TZ').upper() 192 except: 193 tz = 'GMT' 194 if tz not in ('GMT','UTC','0000','00:00'): 195 raise ValueError('HTTP date is not in GMT timezone') 196 197 monname = m.group('MON').upper() 198 mdict = {'JAN':1, 'FEB':2, 'MAR':3, 'APR':4, 'MAY':5, 'JUN':6, 199 'JUL':7, 'AUG':8, 'SEP':9, 'OCT':10, 'NOV':11, 'DEC':12} 200 month = mdict.get(monname) 201 if not month: 202 raise ValueError('HTTP date has an unrecognizable month') 203 y = int(m.group('Y')) 204 if y < 100: 205 century = datetime.datetime.utcnow().year / 100 206 if y < 50: 207 y = century * 100 + y 208 else: 209 y = (century - 1) * 100 + y 210 d = int(m.group('D')) 211 hour = int(m.group('H')) 212 minute = int(m.group('M')) 213 try: 214 second = int(m.group('S')) 215 except: 216 second = float(m.group('S')) 217 dt = datetime.datetime( y, month, d, hour, minute, second, tzinfo=utc_tzinfo ) 218 return dt 219 220 221class RangeUnsatisfiableError(ValueError): 222 """Exception class when a byte range lies outside the file size boundaries.""" 223 def __init__(self, reason=None): 224 if not reason: 225 reason = 'Range is unsatisfiable' 226 ValueError.__init__(self, reason) 227 228 229class RangeUnmergableError(ValueError): 230 """Exception class when byte ranges are noncontiguous and can not be merged together.""" 231 def __init__(self, reason=None): 232 if not reason: 233 reason = 'Ranges can not be merged together' 234 ValueError.__init__(self, reason) 235 236 237class ParseError(ValueError): 238 """Exception class representing a string parsing error.""" 239 def __init__(self, args, input_string, at_position): 240 ValueError.__init__(self, args) 241 self.input_string = input_string 242 self.at_position = at_position 243 def __str__(self): 244 if self.at_position >= len(self.input_string): 245 return '%s\n\tOccured at end of string' % self.args[0] 246 else: 247 return '%s\n\tOccured near %s' % (self.args[0], repr(self.input_string[self.at_position:self.at_position+16])) 248 249 250def is_token(s): 251 """Determines if the string is a valid token.""" 252 for c in s: 253 if ord(c) < 32 or ord(c) > 128 or c in SEPARATORS: 254 return False 255 return True 256 257 258def parse_comma_list(s, start=0, element_parser=None, min_count=0, max_count=0): 259 """Parses a comma-separated list with optional whitespace. 260 261 Takes an optional callback function `element_parser`, which 262 is assumed to be able to parse an individual element. It 263 will be passed the string and a `start` argument, and 264 is expected to return a tuple (parsed_result, chars_consumed). 265 266 If no element_parser is given, then either single tokens or 267 quoted strings will be parsed. 268 269 If min_count > 0, then at least that many non-empty elements 270 must be in the list, or an error is raised. 271 272 If max_count > 0, then no more than that many non-empty elements 273 may be in the list, or an error is raised. 274 275 """ 276 if min_count > 0 and start == len(s): 277 raise ParseError('Comma-separated list must contain some elements',s,start) 278 elif start >= len(s): 279 raise ParseError('Starting position is beyond the end of the string',s,start) 280 281 if not element_parser: 282 element_parser = parse_token_or_quoted_string 283 results = [] 284 pos = start 285 while pos < len(s): 286 e = element_parser( s, pos ) 287 if not e or e[1] == 0: 288 break # end of data? 289 else: 290 results.append( e[0] ) 291 pos += e[1] 292 while pos < len(s) and s[pos] in LWS: 293 pos += 1 294 if pos < len(s) and s[pos] != ',': 295 break 296 while pos < len(s) and s[pos] == ',': 297 # skip comma and any "empty" elements 298 pos += 1 # skip comma 299 while pos < len(s) and s[pos] in LWS: 300 pos += 1 301 if len(results) < min_count: 302 raise ParseError('Comma-separated list does not have enough elements',s,pos) 303 elif max_count and len(results) > max_count: 304 raise ParseError('Comma-separated list has too many elements',s,pos) 305 return (results, pos-start) 306 307 308def parse_token(s, start=0): 309 """Parses a token. 310 311 A token is a string defined by RFC 2616 section 2.2 as: 312 token = 1*<any CHAR except CTLs or separators> 313 314 Returns a tuple (token, chars_consumed), or ('',0) if no token 315 starts at the given string position. On a syntax error, a 316 ParseError exception will be raised. 317 318 """ 319 return parse_token_or_quoted_string(s, start, allow_quoted=False, allow_token=True) 320 321 322def quote_string(s, always_quote=True): 323 """Produces a quoted string according to HTTP 1.1 rules. 324 325 If always_quote is False and if the string is also a valid token, 326 then this function may return a string without quotes. 327 328 """ 329 need_quotes = False 330 q = '' 331 for c in s: 332 if ord(c) < 32 or ord(c) > 127 or c in SEPARATORS: 333 q += '\\' + c 334 need_quotes = True 335 else: 336 q += c 337 if need_quotes or always_quote: 338 return '"' + q + '"' 339 else: 340 return q 341 342 343def parse_quoted_string(s, start=0): 344 """Parses a quoted string. 345 346 Returns a tuple (string, chars_consumed). The quote marks will 347 have been removed and all \-escapes will have been replaced with 348 the characters they represent. 349 350 """ 351 return parse_token_or_quoted_string(s, start, allow_quoted=True, allow_token=False) 352 353 354def parse_token_or_quoted_string(s, start=0, allow_quoted=True, allow_token=True): 355 """Parses a token or a quoted-string. 356 357 's' is the string to parse, while start is the position within the 358 string where parsing should begin. It will returns a tuple 359 (token, chars_consumed), with all \-escapes and quotation already 360 processed. 361 362 Syntax is according to BNF rules in RFC 2161 section 2.2, 363 specifically the 'token' and 'quoted-string' declarations. 364 Syntax errors in the input string will result in ParseError 365 being raised. 366 367 If allow_quoted is False, then only tokens will be parsed instead 368 of either a token or quoted-string. 369 370 If allow_token is False, then only quoted-strings will be parsed 371 instead of either a token or quoted-string. 372 """ 373 if not allow_quoted and not allow_token: 374 raise ValueError('Parsing can not continue with options provided') 375 376 if start >= len(s): 377 raise ParseError('Starting position is beyond the end of the string',s,start) 378 has_quote = (s[start] == '"') 379 if has_quote and not allow_quoted: 380 raise ParseError('A quoted string was not expected', s, start) 381 if not has_quote and not allow_token: 382 raise ParseError('Expected a quotation mark', s, start) 383 384 s2 = '' 385 pos = start 386 if has_quote: 387 pos += 1 388 while pos < len(s): 389 c = s[pos] 390 if c == '\\' and has_quote: 391 # Note this is NOT C-style escaping; the character after the \ is 392 # taken literally. 393 pos += 1 394 if pos == len(s): 395 raise ParseError("End of string while expecting a character after '\\'",s,pos) 396 s2 += s[pos] 397 pos += 1 398 elif c == '"' and has_quote: 399 break 400 elif not has_quote and (c in SEPARATORS or ord(c)<32 or ord(c)>127): 401 break 402 else: 403 s2 += c 404 pos += 1 405 if has_quote: 406 # Make sure we have a closing quote mark 407 if pos >= len(s) or s[pos] != '"': 408 raise ParseError('Quoted string is missing closing quote mark',s,pos) 409 else: 410 pos += 1 411 return s2, (pos - start) 412 413 414def remove_comments(s, collapse_spaces=True): 415 """Removes any ()-style comments from a string. 416 417 In HTTP, ()-comments can nest, and this function will correctly 418 deal with that. 419 420 If 'collapse_spaces' is True, then if there is any whitespace 421 surrounding the comment, it will be replaced with a single space 422 character. Whitespace also collapses across multiple comment 423 sequences, so that "a (b) (c) d" becomes just "a d". 424 425 Otherwise, if 'collapse_spaces' is False then all whitespace which 426 is outside any comments is left intact as-is. 427 428 """ 429 if '(' not in s: 430 return s # simple case 431 A = [] 432 dostrip = False 433 added_comment_space = False 434 pos = 0 435 if collapse_spaces: 436 # eat any leading spaces before a comment 437 i = s.find('(') 438 if i >= 0: 439 while pos < i and s[pos] in LWS: 440 pos += 1 441 if pos != i: 442 pos = 0 443 else: 444 dostrip = True 445 added_comment_space = True # lie 446 while pos < len(s): 447 if s[pos] == '(': 448 _cmt, k = parse_comment( s, pos ) 449 pos += k 450 if collapse_spaces: 451 dostrip = True 452 if not added_comment_space: 453 if len(A) > 0 and A[-1] and A[-1][-1] in LWS: 454 # previous part ended with whitespace 455 A[-1] = A[-1].rstrip() 456 A.append(' ') # comment becomes one space 457 added_comment_space = True 458 else: 459 i = s.find( '(', pos ) 460 if i == -1: 461 if dostrip: 462 text = s[pos:].lstrip() 463 if s[pos] in LWS and not added_comment_space: 464 A.append(' ') 465 added_comment_space = True 466 else: 467 text = s[pos:] 468 if text: 469 A.append(text) 470 dostrip = False 471 added_comment_space = False 472 break # end of string 473 else: 474 if dostrip: 475 text = s[pos:i].lstrip() 476 if s[pos] in LWS and not added_comment_space: 477 A.append(' ') 478 added_comment_space = True 479 else: 480 text = s[pos:i] 481 if text: 482 A.append(text) 483 dostrip = False 484 added_comment_space = False 485 pos = i 486 if dostrip and len(A) > 0 and A[-1] and A[-1][-1] in LWS: 487 A[-1] = A[-1].rstrip() 488 return ''.join(A) 489 490 491def _test_comments(): 492 """A self-test on comment processing. Returns number of test failures.""" 493 def _testrm( a, b, collapse ): 494 b2 = remove_comments( a, collapse ) 495 if b != b2: 496 print( 'Comment test failed:' ) 497 print( ' remove_comments( %s, collapse_spaces=%s ) -> %s' % (repr(a), repr(collapse), repr(b2)) ) 498 print( ' expected %s' % repr(b) ) 499 return 1 500 return 0 501 failures = 0 502 failures += _testrm( r'', '', False ) 503 failures += _testrm( r'(hello)', '', False) 504 failures += _testrm( r'abc (hello) def', 'abc def', False) 505 failures += _testrm( r'abc (he(xyz)llo) def', 'abc def', False) 506 failures += _testrm( r'abc (he\(xyz)llo) def', 'abc llo) def', False) 507 failures += _testrm( r'abc(hello)def', 'abcdef', True) 508 failures += _testrm( r'abc (hello) def', 'abc def', True) 509 failures += _testrm( r'abc (hello)def', 'abc def', True) 510 failures += _testrm( r'abc(hello) def', 'abc def', True) 511 failures += _testrm( r'abc(hello) (world)def', 'abc def', True) 512 failures += _testrm( r'abc(hello)(world)def', 'abcdef', True) 513 failures += _testrm( r' (hello) (world) def', 'def', True) 514 failures += _testrm( r'abc (hello) (world) ', 'abc', True) 515 return failures 516 517def parse_comment(s, start=0): 518 """Parses a ()-style comment from a header value. 519 520 Returns tuple (comment, chars_consumed), where the comment will 521 have had the outer-most parentheses and white space stripped. Any 522 nested comments will still have their parentheses and whitespace 523 left intact. 524 525 All \-escaped quoted pairs will have been replaced with the actual 526 characters they represent, even within the inner nested comments. 527 528 You should note that only a few HTTP headers, such as User-Agent 529 or Via, allow ()-style comments within the header value. 530 531 A comment is defined by RFC 2616 section 2.2 as: 532 533 comment = "(" *( ctext | quoted-pair | comment ) ")" 534 ctext = <any TEXT excluding "(" and ")"> 535 """ 536 if start >= len(s): 537 raise ParseError('Starting position is beyond the end of the string',s,start) 538 if s[start] != '(': 539 raise ParseError('Comment must begin with opening parenthesis',s,start) 540 541 s2 = '' 542 nestlevel = 1 543 pos = start + 1 544 while pos < len(s) and s[pos] in LWS: 545 pos += 1 546 547 while pos < len(s): 548 c = s[pos] 549 if c == '\\': 550 # Note this is not C-style escaping; the character after the \ is 551 # taken literally. 552 pos += 1 553 if pos == len(s): 554 raise ParseError("End of string while expecting a character after '\\'",s,pos) 555 s2 += s[pos] 556 pos += 1 557 elif c == '(': 558 nestlevel += 1 559 s2 += c 560 pos += 1 561 elif c == ')': 562 nestlevel -= 1 563 pos += 1 564 if nestlevel >= 1: 565 s2 += c 566 else: 567 break 568 else: 569 s2 += c 570 pos += 1 571 if nestlevel > 0: 572 raise ParseError('End of string reached before comment was closed',s,pos) 573 # Now rstrip s2 of all LWS chars. 574 while len(s2) and s2[-1] in LWS: 575 s2 = s2[:-1] 576 return s2, (pos - start) 577 578 579class range_spec(object): 580 """A single contiguous (byte) range. 581 582 A range_spec defines a range (of bytes) by specifying two offsets, 583 the 'first' and 'last', which are inclusive in the range. Offsets 584 are zero-based (the first byte is offset 0). The range can not be 585 empty or negative (has to satisfy first <= last). 586 587 The range can be unbounded on either end, represented here by the 588 None value, with these semantics: 589 590 * A 'last' of None always indicates the last possible byte 591 (although that offset may not be known). 592 593 * A 'first' of None indicates this is a suffix range, where 594 the last value is actually interpreted to be the number 595 of bytes at the end of the file (regardless of file size). 596 597 Note that it is not valid for both first and last to be None. 598 599 """ 600 601 __slots__ = ['first','last'] 602 603 def __init__(self, first=0, last=None): 604 self.set( first, last ) 605 606 def set(self, first, last): 607 """Sets the value of this range given the first and last offsets. 608 """ 609 if first is not None and last is not None and first > last: 610 raise ValueError("Byte range does not satisfy first <= last.") 611 elif first is None and last is None: 612 raise ValueError("Byte range can not omit both first and last offsets.") 613 self.first = first 614 self.last = last 615 616 def __repr__(self): 617 return '%s.%s(%s,%s)' % (self.__class__.__module__, self.__class__.__name__, 618 self.first, self.last) 619 620 def __str__(self): 621 """Returns a string form of the range as would appear in a Range: header.""" 622 if self.first is None and self.last is None: 623 return '' 624 s = '' 625 if self.first is not None: 626 s += '%d' % self.first 627 s += '-' 628 if self.last is not None: 629 s += '%d' % self.last 630 return s 631 632 def __eq__(self, other): 633 """Compare ranges for equality. 634 635 Note that if non-specific ranges are involved (such as 34- and -5), 636 they could compare as not equal even though they may represent 637 the same set of bytes in some contexts. 638 """ 639 return self.first == other.first and self.last == other.last 640 641 def __ne__(self, other): 642 """Compare ranges for inequality. 643 644 Note that if non-specific ranges are involved (such as 34- and -5), 645 they could compare as not equal even though they may represent 646 the same set of bytes in some contexts. 647 """ 648 return not self.__eq__(other) 649 650 def __lt__(self, other): 651 """< operator is not defined""" 652 raise NotImplementedError('Ranges can not be relationally compared') 653 def __le__(self, other): 654 """<= operator is not defined""" 655 raise NotImplementedError('Ranges can not be ralationally compared') 656 def __gt__(self, other): 657 """> operator is not defined""" 658 raise NotImplementedError('Ranges can not be relationally compared') 659 def __ge__(self, other): 660 """>= operator is not defined""" 661 raise NotImplementedError('Ranges can not be relationally compared') 662 663 def copy(self): 664 """Makes a copy of this range object.""" 665 return self.__class__( self.first, self.last ) 666 667 def is_suffix(self): 668 """Returns True if this is a suffix range. 669 670 A suffix range is one that specifies the last N bytes of a 671 file regardless of file size. 672 673 """ 674 return self.first == None 675 676 def is_fixed(self): 677 """Returns True if this range is absolute and a fixed size. 678 679 This occurs only if neither first or last is None. Converse 680 is the is_unbounded() method. 681 682 """ 683 return self.first is not None and self.last is not None 684 685 def is_unbounded(self): 686 """Returns True if the number of bytes in the range is unspecified. 687 688 This can only occur if either the 'first' or the 'last' member 689 is None. Converse is the is_fixed() method. 690 691 """ 692 return self.first is None or self.last is None 693 694 def is_whole_file(self): 695 """Returns True if this range includes all possible bytes. 696 697 This can only occur if the 'last' member is None and the first 698 member is 0. 699 700 """ 701 return self.first == 0 and self.last is None 702 703 def __contains__(self, offset): 704 """Does this byte range contain the given byte offset? 705 706 If the offset < 0, then it is taken as an offset from the end 707 of the file, where -1 is the last byte. This type of offset 708 will only work with suffix ranges. 709 710 """ 711 if offset < 0: 712 if self.first is not None: 713 return False 714 else: 715 return self.last >= -offset 716 elif self.first is None: 717 return False 718 elif self.last is None: 719 return True 720 else: 721 return self.first <= offset <= self.last 722 723 def fix_to_size(self, size): 724 """Changes a length-relative range to an absolute range based upon given file size. 725 726 Ranges that are already absolute are left as is. 727 728 Note that zero-length files are handled as special cases, 729 since the only way possible to specify a zero-length range is 730 with the suffix range "-0". Thus unless this range is a suffix 731 range, it can not satisfy a zero-length file. 732 733 If the resulting range (partly) lies outside the file size then an 734 error is raised. 735 """ 736 737 if size == 0: 738 if self.first is None: 739 self.last = 0 740 return 741 else: 742 raise RangeUnsatisfiableError("Range can satisfy a zero-length file.") 743 744 if self.first is None: 745 # A suffix range 746 self.first = size - self.last 747 if self.first < 0: 748 self.first = 0 749 self.last = size - 1 750 else: 751 if self.first > size - 1: 752 raise RangeUnsatisfiableError('Range begins beyond the file size.') 753 else: 754 if self.last is None: 755 # An unbounded range 756 self.last = size - 1 757 return 758 759 def merge_with(self, other): 760 """Tries to merge the given range into this one. 761 762 The size of this range may be enlarged as a result. 763 764 An error is raised if the two ranges do not overlap or are not 765 contiguous with each other. 766 """ 767 if self.is_whole_file() or self == other: 768 return 769 elif other.is_whole_file(): 770 self.first, self.last = 0, None 771 return 772 773 a1, z1 = self.first, self.last 774 a2, z2 = other.first, other.last 775 776 if self.is_suffix(): 777 if z1 == 0: # self is zero-length, so merge becomes a copy 778 self.first, self.last = a2, z2 779 return 780 elif other.is_suffix(): 781 self.last = max(z1, z2) 782 else: 783 raise RangeUnmergableError() 784 elif other.is_suffix(): 785 if z2 == 0: # other is zero-length, so nothing to merge 786 return 787 else: 788 raise RangeUnmergableError() 789 790 assert a1 is not None and a2 is not None 791 792 if a2 < a1: 793 # swap ranges so a1 <= a2 794 a1, z1, a2, z2 = a2, z2, a1, z1 795 796 assert a1 <= a2 797 798 if z1 is None: 799 if z2 is not None and z2 + 1 < a1: 800 raise RangeUnmergableError() 801 else: 802 self.first = min(a1, a2) 803 self.last = None 804 elif z2 is None: 805 if z1 + 1 < a2: 806 raise RangeUnmergableError() 807 else: 808 self.first = min(a1, a2) 809 self.last = None 810 else: 811 if a2 > z1 + 1: 812 raise RangeUnmergableError() 813 else: 814 self.first = a1 815 self.last = max(z1, z2) 816 return 817 818 819class range_set(object): 820 """A collection of range_specs, with units (e.g., bytes). 821 """ 822 __slots__ = ['units', 'range_specs'] 823 824 def __init__(self): 825 self.units = 'bytes' 826 self.range_specs = [] # a list of range_spec objects 827 828 def __str__(self): 829 return self.units + '=' + ', '.join([str(s) for s in self.range_specs]) 830 831 def __repr__(self): 832 return '%s.%s(%s)' % (self.__class__.__module__, 833 self.__class__.__name__, 834 repr(self.__str__()) ) 835 836 def from_str(self, s, valid_units=('bytes','none')): 837 """Sets this range set based upon a string, such as the Range: header. 838 839 You can also use the parse_range_set() function for more control. 840 841 If a parsing error occurs, the pre-exising value of this range 842 set is left unchanged. 843 844 """ 845 r, k = parse_range_set( s, valid_units=valid_units ) 846 if k < len(s): 847 raise ParseError("Extra unparsable characters in range set specifier",s,k) 848 self.units = r.units 849 self.range_specs = r.range_specs 850 851 def is_single_range(self): 852 """Does this range specifier consist of only a single range set?""" 853 return len(self.range_specs) == 1 854 855 def is_contiguous(self): 856 """Can the collection of range_specs be coalesced into a single contiguous range?""" 857 if len(self.range_specs) <= 1: 858 return True 859 merged = self.range_specs[0].copy() 860 for s in self.range_specs[1:]: 861 try: 862 merged.merge_with(s) 863 except: 864 return False 865 return True 866 867 def fix_to_size(self, size): 868 """Changes all length-relative range_specs to absolute range_specs based upon given file size. 869 If none of the range_specs in this set can be satisfied, then the 870 entire set is considered unsatifiable and an error is raised. 871 Otherwise any unsatisfiable range_specs will simply be removed 872 from this set. 873 874 """ 875 for i in range(len(self.range_specs)): 876 try: 877 self.range_specs[i].fix_to_size( size ) 878 except RangeUnsatisfiableError: 879 self.range_specs[i] = None 880 self.range_specs = [s for s in self.range_specs if s is not None] 881 if len(self.range_specs) == 0: 882 raise RangeUnsatisfiableError('No ranges can be satisfied') 883 884 def coalesce(self): 885 """Collapses all consecutive range_specs which together define a contiguous range. 886 887 Note though that this method will not re-sort the range_specs, so a 888 potentially contiguous range may not be collapsed if they are 889 not sorted. For example the ranges: 890 10-20, 30-40, 20-30 891 will not be collapsed to just 10-40. However if the ranges are 892 sorted first as with: 893 10-20, 20-30, 30-40 894 then they will collapse to 10-40. 895 """ 896 if len(self.range_specs) <= 1: 897 return 898 for i in range(len(self.range_specs) - 1): 899 a = self.range_specs[i] 900 b = self.range_specs[i+1] 901 if a is not None: 902 try: 903 a.merge_with( b ) 904 self.range_specs[i+1] = None # to be deleted later 905 except RangeUnmergableError: 906 pass 907 self.range_specs = [r for r in self.range_specs if r is not None] 908 909 910def parse_number( s, start=0 ): 911 """Parses a positive decimal integer number from the string. 912 913 A tuple is returned (number, chars_consumed). If the 914 string is not a valid decimal number, then (None,0) is returned. 915 """ 916 if start >= len(s): 917 raise ParseError('Starting position is beyond the end of the string',s,start) 918 if s[start] not in DIGIT: 919 return (None,0) # not a number 920 pos = start 921 n = 0 922 while pos < len(s): 923 c = s[pos] 924 if c in DIGIT: 925 n *= 10 926 n += ord(c) - ord('0') 927 pos += 1 928 else: 929 break 930 return n, pos-start 931 932 933def parse_range_spec( s, start=0 ): 934 """Parses a (byte) range_spec. 935 936 Returns a tuple (range_spec, chars_consumed). 937 """ 938 if start >= len(s): 939 raise ParseError('Starting position is beyond the end of the string',s,start) 940 if s[start] not in DIGIT and s[start] != '-': 941 raise ParseError("Invalid range, expected a digit or '-'",s,start) 942 _first, last = None, None 943 pos = start 944 first, k = parse_number( s, pos ) 945 pos += k 946 if s[pos] == '-': 947 pos += 1 948 if pos < len(s): 949 last, k = parse_number( s, pos ) 950 pos += k 951 else: 952 raise ParseError("Byte range must include a '-'",s,pos) 953 if first is None and last is None: 954 raise ParseError('Byte range can not omit both first and last indices.',s,start) 955 R = range_spec( first, last ) 956 return R, pos-start 957 958 959def parse_range_header( header_value, valid_units=('bytes','none') ): 960 """Parses the value of an HTTP Range: header. 961 962 The value of the header as a string should be passed in; without 963 the header name itself. 964 965 Returns a range_set object. 966 """ 967 ranges, k = parse_range_set( header_value, valid_units=valid_units ) 968 if k < len(header_value): 969 raise ParseError('Range header has unexpected or unparsable characters', 970 header_value, k) 971 return ranges 972 973 974def parse_range_set( s, start=0, valid_units=('bytes','none') ): 975 """Parses a (byte) range set specifier. 976 977 Returns a tuple (range_set, chars_consumed). 978 """ 979 if start >= len(s): 980 raise ParseError('Starting position is beyond the end of the string',s,start) 981 pos = start 982 units, k = parse_token( s, pos ) 983 pos += k 984 if valid_units and units not in valid_units: 985 raise ParseError('Unsupported units type in range specifier',s,start) 986 while pos < len(s) and s[pos] in LWS: 987 pos += 1 988 if pos < len(s) and s[pos] == '=': 989 pos += 1 990 else: 991 raise ParseError("Invalid range specifier, expected '='",s,pos) 992 while pos < len(s) and s[pos] in LWS: 993 pos += 1 994 range_specs, k = parse_comma_list( s, pos, parse_range_spec, min_count=1 ) 995 pos += k 996 # Make sure no trash is at the end of the string 997 while pos < len(s) and s[pos] in LWS: 998 pos += 1 999 if pos < len(s): 1000 raise ParseError('Unparsable characters in range set specifier',s,pos) 1001 1002 ranges = range_set() 1003 ranges.units = units 1004 ranges.range_specs = range_specs 1005 return ranges, pos-start 1006 1007 1008def _split_at_qfactor( s ): 1009 """Splits a string at the quality factor (;q=) parameter. 1010 1011 Returns the left and right substrings as a two-member tuple. 1012 1013 """ 1014 # It may be faster, but incorrect, to use s.split(';q=',1), since 1015 # HTTP allows any amount of linear white space (LWS) to appear 1016 # between the parts, so it could also be "; q = ". 1017 1018 # We do this parsing 'manually' for speed rather than using a 1019 # regex, which would be r';[ \t\r\n]*q[ \t\r\n]*=[ \t\r\n]*' 1020 1021 pos = 0 1022 while 0 <= pos < len(s): 1023 pos = s.find(';', pos) 1024 if pos < 0: 1025 break # no more parameters 1026 startpos = pos 1027 pos = pos + 1 1028 while pos < len(s) and s[pos] in LWS: 1029 pos = pos + 1 1030 if pos < len(s) and s[pos] == 'q': 1031 pos = pos + 1 1032 while pos < len(s) and s[pos] in LWS: 1033 pos = pos + 1 1034 if pos < len(s) and s[pos] == '=': 1035 pos = pos + 1 1036 while pos < len(s) and s[pos] in LWS: 1037 pos = pos + 1 1038 return ( s[:startpos], s[pos:] ) 1039 return (s, '') 1040 1041 1042def parse_qvalue_accept_list( s, start=0, item_parser=parse_token ): 1043 """Parses any of the Accept-* style headers with quality factors. 1044 1045 This is a low-level function. It returns a list of tuples, each like: 1046 (item, item_parms, qvalue, accept_parms) 1047 1048 You can pass in a function which parses each of the item strings, or 1049 accept the default where the items must be simple tokens. Note that 1050 your parser should not consume any paramters (past the special "q" 1051 paramter anyway). 1052 1053 The item_parms and accept_parms are each lists of (name,value) tuples. 1054 1055 The qvalue is the quality factor, a number from 0 to 1 inclusive. 1056 1057 """ 1058 itemlist = [] 1059 pos = start 1060 if pos >= len(s): 1061 raise ParseError('Starting position is beyond the end of the string',s,pos) 1062 item = None 1063 while pos < len(s): 1064 item, k = item_parser(s, pos) 1065 pos += k 1066 while pos < len(s) and s[pos] in LWS: 1067 pos += 1 1068 if pos >= len(s) or s[pos] in ',;': 1069 itemparms, qvalue, acptparms = [], None, [] 1070 if pos < len(s) and s[pos] == ';': 1071 pos += 1 1072 while pos < len(s) and s[pos] in LWS: 1073 pos += 1 1074 parmlist, k = parse_parameter_list(s, pos) 1075 for p, v in parmlist: 1076 if p == 'q' and qvalue is None: 1077 try: 1078 qvalue = float(v) 1079 except ValueError: 1080 raise ParseError('qvalue must be a floating point number',s,pos) 1081 if qvalue < 0 or qvalue > 1: 1082 raise ParseError('qvalue must be between 0 and 1, inclusive',s,pos) 1083 elif qvalue is None: 1084 itemparms.append( (p,v) ) 1085 else: 1086 acptparms.append( (p,v) ) 1087 pos += k 1088 if item: 1089 # Add the item to the list 1090 if qvalue is None: 1091 qvalue = 1 1092 itemlist.append( (item, itemparms, qvalue, acptparms) ) 1093 item = None 1094 # skip commas 1095 while pos < len(s) and s[pos] == ',': 1096 pos += 1 1097 while pos < len(s) and s[pos] in LWS: 1098 pos += 1 1099 else: 1100 break 1101 return itemlist, pos - start 1102 1103 1104def parse_accept_header( header_value ): 1105 """Parses the Accept: header. 1106 1107 The value of the header as a string should be passed in; without 1108 the header name itself. 1109 1110 This will parse the value of any of the HTTP headers "Accept", 1111 "Accept-Charset", "Accept-Encoding", or "Accept-Language". These 1112 headers are similarly formatted, in that they are a list of items 1113 with associated quality factors. The quality factor, or qvalue, 1114 is a number in the range [0.0..1.0] which indicates the relative 1115 preference of each item. 1116 1117 This function returns a list of those items, sorted by preference 1118 (from most-prefered to least-prefered). Each item in the returned 1119 list is actually a tuple consisting of: 1120 1121 ( item_name, item_parms, qvalue, accept_parms ) 1122 1123 As an example, the following string, 1124 text/plain; charset="utf-8"; q=.5; columns=80 1125 would be parsed into this resulting tuple, 1126 ( 'text/plain', [('charset','utf-8')], 0.5, [('columns','80')] ) 1127 1128 The value of the returned item_name depends upon which header is 1129 being parsed, but for example it may be a MIME content or media 1130 type (without parameters), a language tag, or so on. Any optional 1131 parameters (delimited by semicolons) occuring before the "q=" 1132 attribute will be in the item_parms list as (attribute,value) 1133 tuples in the same order as they appear in the header. Any quoted 1134 values will have been unquoted and unescaped. 1135 1136 The qvalue is a floating point number in the inclusive range 0.0 1137 to 1.0, and roughly indicates the preference for this item. 1138 Values outside this range will be capped to the closest extreme. 1139 1140 (!) Note that a qvalue of 0 indicates that the item is 1141 explicitly NOT acceptable to the user agent, and should be 1142 handled differently by the caller. 1143 1144 The accept_parms, like the item_parms, is a list of any attributes 1145 occuring after the "q=" attribute, and will be in the list as 1146 (attribute,value) tuples in the same order as they occur. 1147 Usually accept_parms will be an empty list, as the HTTP spec 1148 allows these extra parameters in the syntax but does not 1149 currently define any possible values. 1150 1151 All empty items will be removed from the list. However, duplicate 1152 or conflicting values are not detected or handled in any way by 1153 this function. 1154 """ 1155 def parse_mt_only(s, start): 1156 mt, k = parse_media_type(s, start, with_parameters=False) 1157 ct = content_type() 1158 ct.major = mt[0] 1159 ct.minor = mt[1] 1160 return ct, k 1161 1162 alist, k = parse_qvalue_accept_list( header_value, item_parser=parse_mt_only ) 1163 if k < len(header_value): 1164 raise ParseError('Accept header is invalid',header_value,k) 1165 1166 ctlist = [] 1167 for ct, ctparms, q, acptparms in alist: 1168 if ctparms: 1169 ct.set_parameters( dict(ctparms) ) 1170 ctlist.append( (ct, q, acptparms) ) 1171 return ctlist 1172 1173 1174def parse_media_type(media_type, start=0, with_parameters=True): 1175 """Parses a media type (MIME type) designator into it's parts. 1176 1177 Given a media type string, returns a nested tuple of it's parts. 1178 1179 ((major,minor,parmlist), chars_consumed) 1180 1181 where parmlist is a list of tuples of (parm_name, parm_value). 1182 Quoted-values are appropriately unquoted and unescaped. 1183 1184 If 'with_parameters' is False, then parsing will stop immediately 1185 after the minor media type; and will not proceed to parse any 1186 of the semicolon-separated paramters. 1187 1188 Examples: 1189 image/png -> (('image','png',[]), 9) 1190 text/plain; charset="utf-16be" 1191 -> (('text','plain',[('charset,'utf-16be')]), 30) 1192 1193 """ 1194 1195 s = media_type 1196 pos = start 1197 ctmaj, k = parse_token(s, pos) 1198 if k == 0: 1199 raise ParseError('Media type must be of the form "major/minor".', s, pos) 1200 pos += k 1201 if pos >= len(s) or s[pos] != '/': 1202 raise ParseError('Media type must be of the form "major/minor".', s, pos) 1203 pos += 1 1204 ctmin, k = parse_token(s, pos) 1205 if k == 0: 1206 raise ParseError('Media type must be of the form "major/minor".', s, pos) 1207 pos += k 1208 if with_parameters: 1209 parmlist, k = parse_parameter_list(s, pos) 1210 pos += k 1211 else: 1212 parmlist = [] 1213 return ((ctmaj, ctmin, parmlist), pos - start) 1214 1215 1216def parse_parameter_list(s, start=0): 1217 """Parses a semicolon-separated 'parameter=value' list. 1218 1219 Returns a tuple (parmlist, chars_consumed), where parmlist 1220 is a list of tuples (parm_name, parm_value). 1221 1222 The parameter values will be unquoted and unescaped as needed. 1223 1224 Empty parameters (as in ";;") are skipped, as is insignificant 1225 white space. The list returned is kept in the same order as the 1226 parameters appear in the string. 1227 1228 """ 1229 pos = start 1230 parmlist = [] 1231 while pos < len(s): 1232 while pos < len(s) and s[pos] in LWS: 1233 pos += 1 # skip whitespace 1234 if pos < len(s) and s[pos] == ';': 1235 pos += 1 1236 while pos < len(s) and s[pos] in LWS: 1237 pos += 1 # skip whitespace 1238 if pos >= len(s): 1239 break 1240 parmname, k = parse_token(s, pos) 1241 if parmname: 1242 pos += k 1243 while pos < len(s) and s[pos] in LWS: 1244 pos += 1 # skip whitespace 1245 if not (pos < len(s) and s[pos] == '='): 1246 raise ParseError('Expected an "=" after parameter name', s, pos) 1247 pos += 1 1248 while pos < len(s) and s[pos] in LWS: 1249 pos += 1 # skip whitespace 1250 parmval, k = parse_token_or_quoted_string( s, pos ) 1251 pos += k 1252 parmlist.append( (parmname, parmval) ) 1253 else: 1254 break 1255 return parmlist, pos - start 1256 1257 1258class content_type(object): 1259 """This class represents a media type (aka a MIME content type), including parameters. 1260 1261 You initialize these by passing in a content-type declaration 1262 string, such as "text/plain; charset=ascii", to the constructor or 1263 to the set() method. If you provide no string value, the object 1264 returned will represent the wildcard */* content type. 1265 1266 Normally you will get the value back by using str(), or optionally 1267 you can access the components via the 'major', 'minor', 'media_type', 1268 or 'parmdict' members. 1269 1270 """ 1271 def __init__(self, content_type_string=None, with_parameters=True): 1272 """Create a new content_type object. 1273 1274 See the set() method for a description of the arguments. 1275 """ 1276 if content_type_string: 1277 self.set( content_type_string, with_parameters=with_parameters ) 1278 else: 1279 self.set( '*/*' ) 1280 1281 def set_parameters(self, parameter_list_or_dict): 1282 """Sets the optional paramters based upon the parameter list. 1283 1284 The paramter list should be a semicolon-separated name=value string. 1285 Any paramters which already exist on this object will be deleted, 1286 unless they appear in the given paramter_list. 1287 1288 """ 1289 if isinstance(parameter_list_or_dict, dict): 1290 # already a dictionary 1291 pl = parameter_list_or_dict 1292 else: 1293 pl, k = parse_parameter_list(parameter_list_or_dict) 1294 if k < len(parameter_list_or_dict): 1295 raise ParseError('Invalid parameter list', parameter_list_or_dict, k) 1296 self.parmdict = dict(pl) 1297 1298 def set(self, content_type_string, with_parameters=True): 1299 """Parses the content type string and sets this object to it's value. 1300 1301 For a more complete description of the arguments, see the 1302 documentation for the parse_media_type() function in this module. 1303 """ 1304 mt, k = parse_media_type( content_type_string, with_parameters=with_parameters ) 1305 if k < len(content_type_string): 1306 raise ParseError('Not a valid content type',content_type_string, k) 1307 major, minor, pdict = mt 1308 self._set_major( major ) 1309 self._set_minor( minor ) 1310 self.parmdict = dict(pdict) 1311 1312 def _get_major(self): 1313 return self._major 1314 def _set_major(self, s): 1315 s = s.lower() # case-insentive 1316 if not is_token(s): 1317 raise ValueError('Major media type contains an invalid character') 1318 self._major = s 1319 1320 def _get_minor(self): 1321 return self._minor 1322 def _set_minor(self, s): 1323 s = s.lower() # case-insentive 1324 if not is_token(s): 1325 raise ValueError('Minor media type contains an invalid character') 1326 self._minor = s 1327 1328 major = property(_get_major, _set_major, doc="Major media classification") 1329 minor = property(_get_minor, _set_minor, doc="Minor media sub-classification") 1330 1331 def __str__(self): 1332 """String value.""" 1333 s = '%s/%s' % (self.major, self.minor) 1334 if self.parmdict: 1335 extra = '; '.join([ '%s=%s' % (a[0],quote_string(a[1],False)) for a in self.parmdict.items()]) 1336 s += '; ' + extra 1337 return s 1338 1339 def __unicode__(self): 1340 """Unicode string value.""" 1341 # In Python 3 this is probably unnecessary in general, this is just to avoid possible syntax issues. I.H. 1342 return str(self.__str__()) 1343 1344 def __repr__(self): 1345 """Python representation of this object.""" 1346 s = '%s(%s)' % (self.__class__.__name__, repr(self.__str__())) 1347 return s 1348 1349 1350 def __hash__(self): 1351 """Hash this object; the hash is dependent only upon the value.""" 1352 return hash(str(self)) 1353 1354 def __getstate__(self): 1355 """Pickler""" 1356 return str(self) 1357 1358 def __setstate__(self, state): 1359 """Unpickler""" 1360 self.set(state) 1361 1362 def __len__(self): 1363 """Logical length of this media type. 1364 For example: 1365 len('*/*') -> 0 1366 len('image/*') -> 1 1367 len('image/png') -> 2 1368 len('text/plain; charset=utf-8') -> 3 1369 len('text/plain; charset=utf-8; filename=xyz.txt') -> 4 1370 1371 """ 1372 if self.major == '*': 1373 return 0 1374 elif self.minor == '*': 1375 return 1 1376 else: 1377 return 2 + len(self.parmdict) 1378 1379 def __eq__(self, other): 1380 """Equality test. 1381 1382 Note that this is an exact match, including any parameters if any. 1383 """ 1384 return self.major == other.major and \ 1385 self.minor == other.minor and \ 1386 self.parmdict == other.parmdict 1387 1388 def __ne__(self, other): 1389 """Inequality test.""" 1390 return not self.__eq__(other) 1391 1392 def _get_media_type(self): 1393 """Returns the media 'type/subtype' string, without parameters.""" 1394 return '%s/%s' % (self.major, self.minor) 1395 1396 media_type = property(_get_media_type, doc="Returns the just the media type 'type/subtype' without any paramters (read-only).") 1397 1398 def is_wildcard(self): 1399 """Returns True if this is a 'something/*' media type. 1400 """ 1401 return self.minor == '*' 1402 1403 def is_universal_wildcard(self): 1404 """Returns True if this is the unspecified '*/*' media type. 1405 """ 1406 return self.major == '*' and self.minor == '*' 1407 1408 def is_composite(self): 1409 """Is this media type composed of multiple parts. 1410 """ 1411 return self.major == 'multipart' or self.major == 'message' 1412 1413 def is_xml(self): 1414 """Returns True if this media type is XML-based. 1415 1416 Note this does not consider text/html to be XML, but 1417 application/xhtml+xml is. 1418 """ 1419 return self.minor == 'xml' or self.minor.endswith('+xml') 1420 1421# Some common media types 1422content_formdata = content_type('multipart/form-data') 1423content_urlencoded = content_type('application/x-www-form-urlencoded') 1424content_byteranges = content_type('multipart/byteranges') # RFC 2616 sect 14.16 1425content_opaque = content_type('application/octet-stream') 1426content_html = content_type('text/html') 1427content_xhtml = content_type('application/xhtml+xml') 1428 1429 1430def acceptable_content_type( accept_header, content_types, ignore_wildcard=True ): 1431 """Determines if the given content type is acceptable to the user agent. 1432 1433 The accept_header should be the value present in the HTTP 1434 "Accept:" header. In mod_python this is typically obtained from 1435 the req.http_headers_in table; in WSGI it is environ["Accept"]; 1436 other web frameworks may provide other methods of obtaining it. 1437 1438 Optionally the accept_header parameter can be pre-parsed, as 1439 returned from the parse_accept_header() function in this module. 1440 1441 The content_types argument should either be a single MIME media 1442 type string, or a sequence of them. It represents the set of 1443 content types that the caller (server) is willing to send. 1444 Generally, the server content_types should not contain any 1445 wildcarded values. 1446 1447 This function determines which content type which is the most 1448 preferred and is acceptable to both the user agent and the server. 1449 If one is negotiated it will return a four-valued tuple like: 1450 1451 (server_content_type, ua_content_range, qvalue, accept_parms) 1452 1453 The first tuple value is one of the server's content_types, while 1454 the remaining tuple values descript which of the client's 1455 acceptable content_types was matched. In most cases accept_parms 1456 will be an empty list (see description of parse_accept_header() 1457 for more details). 1458 1459 If no content type could be negotiated, then this function will 1460 return None (and the caller should typically cause an HTTP 406 Not 1461 Acceptable as a response). 1462 1463 Note that the wildcarded content type "*/*" sent by the client 1464 will be ignored, since it is often incorrectly sent by web 1465 browsers that don't really mean it. To override this, call with 1466 ignore_wildcard=False. Partial wildcards such as "image/*" will 1467 always be processed, but be at a lower priority than a complete 1468 matching type. 1469 1470 See also: RFC 2616 section 14.1, and 1471 <http://www.iana.org/assignments/media-types/> 1472 1473 """ 1474 if _is_string(accept_header): 1475 accept_list = parse_accept_header(accept_header) 1476 else: 1477 accept_list = accept_header 1478 1479 if _is_string(content_types): 1480 content_types = [content_types] 1481 1482 server_ctlist = [content_type(ct) for ct in content_types] 1483 del ct 1484 1485 #print 'AC', repr(accept_list) 1486 #print 'SV', repr(server_ctlist) 1487 1488 best = None # (content_type, qvalue, accept_parms, matchlen) 1489 1490 for server_ct in server_ctlist: 1491 best_for_this = None 1492 for client_ct, qvalue, aargs in accept_list: 1493 if ignore_wildcard and client_ct.is_universal_wildcard(): 1494 continue # */* being ignored 1495 1496 matchlen = 0 # how specifically this one matches (0 is a non-match) 1497 if client_ct.is_universal_wildcard(): 1498 matchlen = 1 # */* is a 1 1499 elif client_ct.major == server_ct.major: 1500 if client_ct.minor == '*': # something/* is a 2 1501 matchlen = 2 1502 elif client_ct.minor == server_ct.minor: # something/something is a 3 1503 matchlen = 3 1504 # must make sure all the parms match too 1505 for pname, pval in client_ct.parmdict.items(): 1506 sval = server_ct.parmdict.get(pname) 1507 if pname == 'charset': 1508 # special case for charset to match aliases 1509 pval = canonical_charset(pval) 1510 sval = canonical_charset(sval) 1511 if sval == pval: 1512 matchlen = matchlen + 1 1513 else: 1514 matchlen = 0 1515 break 1516 else: 1517 matchlen = 0 1518 1519 #print 'S',server_ct,' C',client_ct,' M',matchlen,'Q',qvalue 1520 if matchlen > 0: 1521 if not best_for_this \ 1522 or matchlen > best_for_this[-1] \ 1523 or (matchlen == best_for_this[-1] and qvalue > best_for_this[2]): 1524 # This match is better 1525 best_for_this = (server_ct, client_ct, qvalue, aargs, matchlen) 1526 #print 'BEST2 NOW', repr(best_for_this) 1527 if not best or \ 1528 (best_for_this and best_for_this[2] > best[2]): 1529 best = best_for_this 1530 #print 'BEST NOW', repr(best) 1531 if not best or best[1] <= 0: 1532 return None 1533 return best[:-1] 1534 1535 1536# Aliases of common charsets, see <http://www.iana.org/assignments/character-sets>. 1537character_set_aliases = { 1538 'ASCII': 'US-ASCII', 1539 'ISO646-US': 'US-ASCII', 1540 'IBM367': 'US-ASCII', 1541 'CP367': 'US-ASCII', 1542 'CSASCII': 'US-ASCII', 1543 'ANSI_X3.4-1968': 'US-ASCII', 1544 'ISO_646.IRV:1991': 'US-ASCII', 1545 1546 'UTF7': 'UTF-7', 1547 1548 'UTF8': 'UTF-8', 1549 1550 'UTF16': 'UTF-16', 1551 'UTF16LE': 'UTF-16LE', 1552 'UTF16BE': 'UTF-16BE', 1553 1554 'UTF32': 'UTF-32', 1555 'UTF32LE': 'UTF-32LE', 1556 'UTF32BE': 'UTF-32BE', 1557 1558 'UCS2': 'ISO-10646-UCS-2', 1559 'UCS_2': 'ISO-10646-UCS-2', 1560 'UCS-2': 'ISO-10646-UCS-2', 1561 'CSUNICODE': 'ISO-10646-UCS-2', 1562 1563 'UCS4': 'ISO-10646-UCS-4', 1564 'UCS_4': 'ISO-10646-UCS-4', 1565 'UCS-4': 'ISO-10646-UCS-4', 1566 'CSUCS4': 'ISO-10646-UCS-4', 1567 1568 'ISO_8859-1': 'ISO-8859-1', 1569 'LATIN1': 'ISO-8859-1', 1570 'CP819': 'ISO-8859-1', 1571 'IBM819': 'ISO-8859-1', 1572 1573 'ISO_8859-2': 'ISO-8859-2', 1574 'LATIN2': 'ISO-8859-2', 1575 1576 'ISO_8859-3': 'ISO-8859-3', 1577 'LATIN3': 'ISO-8859-3', 1578 1579 'ISO_8859-4': 'ISO-8859-4', 1580 'LATIN4': 'ISO-8859-4', 1581 1582 'ISO_8859-5': 'ISO-8859-5', 1583 'CYRILLIC': 'ISO-8859-5', 1584 1585 'ISO_8859-6': 'ISO-8859-6', 1586 'ARABIC': 'ISO-8859-6', 1587 'ECMA-114': 'ISO-8859-6', 1588 1589 'ISO_8859-6-E': 'ISO-8859-6-E', 1590 'ISO_8859-6-I': 'ISO-8859-6-I', 1591 1592 'ISO_8859-7': 'ISO-8859-7', 1593 'GREEK': 'ISO-8859-7', 1594 'GREEK8': 'ISO-8859-7', 1595 'ECMA-118': 'ISO-8859-7', 1596 1597 'ISO_8859-8': 'ISO-8859-8', 1598 'HEBREW': 'ISO-8859-8', 1599 1600 'ISO_8859-8-E': 'ISO-8859-8-E', 1601 'ISO_8859-8-I': 'ISO-8859-8-I', 1602 1603 'ISO_8859-9': 'ISO-8859-9', 1604 'LATIN5': 'ISO-8859-9', 1605 1606 'ISO_8859-10': 'ISO-8859-10', 1607 'LATIN6': 'ISO-8859-10', 1608 1609 'ISO_8859-13': 'ISO-8859-13', 1610 1611 'ISO_8859-14': 'ISO-8859-14', 1612 'LATIN8': 'ISO-8859-14', 1613 1614 'ISO_8859-15': 'ISO-8859-15', 1615 'LATIN9': 'ISO-8859-15', 1616 1617 'ISO_8859-16': 'ISO-8859-16', 1618 'LATIN10': 'ISO-8859-16', 1619 } 1620 1621def canonical_charset(charset): 1622 """Returns the canonical or preferred name of a charset. 1623 1624 Additional character sets can be recognized by this function by 1625 altering the character_set_aliases dictionary in this module. 1626 Charsets which are not recognized are simply converted to 1627 upper-case (as charset names are always case-insensitive). 1628 1629 See <http://www.iana.org/assignments/character-sets>. 1630 1631 """ 1632 # It would be nice to use Python's codecs modules for this, but 1633 # there is no fixed public interface to it's alias mappings. 1634 if not charset: 1635 return charset 1636 uc = charset.upper() 1637 uccon = character_set_aliases.get( uc, uc ) 1638 return uccon 1639 1640 1641def acceptable_charset(accept_charset_header, charsets, ignore_wildcard=True, default='ISO-8859-1'): 1642 """ 1643 Determines if the given charset is acceptable to the user agent. 1644 1645 The accept_charset_header should be the value present in the HTTP 1646 "Accept-Charset:" header. In mod_python this is typically 1647 obtained from the req.http_headers table; in WSGI it is 1648 environ["Accept-Charset"]; other web frameworks may provide other 1649 methods of obtaining it. 1650 1651 Optionally the accept_charset_header parameter can instead be the 1652 list returned from the parse_accept_header() function in this 1653 module. 1654 1655 The charsets argument should either be a charset identifier string, 1656 or a sequence of them. 1657 1658 This function returns the charset identifier string which is the 1659 most prefered and is acceptable to both the user agent and the 1660 caller. It will return the default value if no charset is negotiable. 1661 1662 Note that the wildcarded charset "*" will be ignored. To override 1663 this, call with ignore_wildcard=False. 1664 1665 See also: RFC 2616 section 14.2, and 1666 <http://www.iana.org/assignments/character-sets> 1667 1668 """ 1669 if default: 1670 default = canonical_charset(default) 1671 1672 if _is_string(accept_charset_header): 1673 accept_list = parse_accept_header(accept_charset_header) 1674 else: 1675 accept_list = accept_charset_header 1676 1677 if _is_string(charsets): 1678 charsets = [canonical_charset(charsets)] 1679 else: 1680 charsets = [canonical_charset(c) for c in charsets] 1681 1682 # Note per RFC that 'ISO-8859-1' is special, and is implictly in the 1683 # accept list with q=1; unless it is already in the list, or '*' is in the list. 1684 1685 best = None 1686 for c, qvalue, _junk in accept_list: 1687 if c == '*': 1688 default = None 1689 if ignore_wildcard: 1690 continue 1691 if not best or qvalue > best[1]: 1692 best = (c, qvalue) 1693 else: 1694 c = canonical_charset(c) 1695 for test_c in charsets: 1696 if c == default: 1697 default = None 1698 if c == test_c and (not best or best[0]=='*' or qvalue > best[1]): 1699 best = (c, qvalue) 1700 if default and default in [test_c.upper() for test_c in charsets]: 1701 best = (default, 1) 1702 if best[0] == '*': 1703 best = (charsets[0], best[1]) 1704 return best 1705 1706 1707 1708class language_tag(object): 1709 """This class represents an RFC 3066 language tag. 1710 1711 Initialize objects of this class with a single string representing 1712 the language tag, such as "en-US". 1713 1714 Case is insensitive. Wildcarded subtags are ignored or stripped as 1715 they have no significance, so that "en-*" is the same as "en". 1716 However the universal wildcard "*" language tag is kept as-is. 1717 1718 Note that although relational operators such as < are defined, 1719 they only form a partial order based upon specialization. 1720 1721 Thus for example, 1722 "en" <= "en-US" 1723 but, 1724 not "en" <= "de", and 1725 not "de" <= "en". 1726 1727 """ 1728 1729 def __init__(self, tagname): 1730 """Initialize objects of this class with a single string representing 1731 the language tag, such as "en-US". Case is insensitive. 1732 1733 """ 1734 1735 self.parts = tagname.lower().split('-') 1736 while len(self.parts) > 1 and self.parts[-1] == '*': 1737 del self.parts[-1] 1738 1739 def __len__(self): 1740 """Number of subtags in this tag.""" 1741 if len(self.parts) == 1 and self.parts[0] == '*': 1742 return 0 1743 return len(self.parts) 1744 1745 def __str__(self): 1746 """The standard string form of this language tag.""" 1747 a = [] 1748 if len(self.parts) >= 1: 1749 a.append(self.parts[0]) 1750 if len(self.parts) >= 2: 1751 if len(self.parts[1]) == 2: 1752 a.append( self.parts[1].upper() ) 1753 else: 1754 a.append( self.parts[1] ) 1755 a.extend( self.parts[2:] ) 1756 return '-'.join(a) 1757 1758 def __unicode__(self): 1759 """The unicode string form of this language tag.""" 1760 return str(self.__str__()) 1761 1762 def __repr__(self): 1763 """The python representation of this language tag.""" 1764 s = '%s("%s")' % (self.__class__.__name__, self.__str__()) 1765 return s 1766 1767 def superior(self): 1768 """Returns another instance of language_tag which is the superior. 1769 1770 Thus en-US gives en, and en gives *. 1771 1772 """ 1773 if len(self) <= 1: 1774 return self.__class__('*') 1775 return self.__class__( '-'.join(self.parts[:-1]) ) 1776 1777 def all_superiors(self, include_wildcard=False): 1778 """Returns a list of this language and all it's superiors. 1779 1780 If include_wildcard is False, then "*" will not be among the 1781 output list, unless this language is itself "*". 1782 1783 """ 1784 langlist = [ self ] 1785 l = self 1786 while not l.is_universal_wildcard(): 1787 l = l.superior() 1788 if l.is_universal_wildcard() and not include_wildcard: 1789 continue 1790 langlist.append(l) 1791 return langlist 1792 1793 def is_universal_wildcard(self): 1794 """Returns True if this language tag represents all possible 1795 languages, by using the reserved tag of "*". 1796 1797 """ 1798 return len(self.parts) == 1 and self.parts[0] == '*' 1799 1800 def dialect_of(self, other, ignore_wildcard=True): 1801 """Is this language a dialect (or subset/specialization) of another. 1802 1803 This method returns True if this language is the same as or a 1804 specialization (dialect) of the other language_tag. 1805 1806 If ignore_wildcard is False, then all languages will be 1807 considered to be a dialect of the special language tag of "*". 1808 1809 """ 1810 if not ignore_wildcard and self.is_universal_wildcard(): 1811 return True 1812 for i in range( min(len(self), len(other)) ): 1813 if self.parts[i] != other.parts[i]: 1814 return False 1815 if len(self) >= len(other): 1816 return True 1817 return False 1818 1819 def __eq__(self, other): 1820 """== operator. Are the two languages the same?""" 1821 1822 return self.parts == other.parts 1823 1824 def __neq__(self, other): 1825 """!= operator. Are the two languages different?""" 1826 1827 return not self.__eq__(other) 1828 1829 def __lt__(self, other): 1830 """< operator. Returns True if the other language is a more 1831 specialized dialect of this one.""" 1832 1833 return other.dialect_of(self) and self != other 1834 1835 def __le__(self, other): 1836 """<= operator. Returns True if the other language is the same 1837 as or a more specialized dialect of this one.""" 1838 return other.dialect_of(self) 1839 1840 def __gt__(self, other): 1841 """> operator. Returns True if this language is a more 1842 specialized dialect of the other one.""" 1843 1844 return self.dialect_of(other) and self != other 1845 1846 def __ge__(self, other): 1847 """>= operator. Returns True if this language is the same as 1848 or a more specialized dialect of the other one.""" 1849 1850 return self.dialect_of(other) 1851 1852 1853def parse_accept_language_header( header_value ): 1854 """Parses the Accept-Language header. 1855 1856 Returns a list of tuples, each like: 1857 1858 (language_tag, qvalue, accept_parameters) 1859 1860 """ 1861 alist, k = parse_qvalue_accept_list( header_value) 1862 if k < len(header_value): 1863 raise ParseError('Accept-Language header is invalid',header_value,k) 1864 1865 langlist = [] 1866 for token, langparms, q, acptparms in alist: 1867 if langparms: 1868 raise ParseError('Language tag may not have any parameters',header_value,0) 1869 lang = language_tag( token ) 1870 langlist.append( (lang, q, acptparms) ) 1871 1872 return langlist 1873 1874 1875def acceptable_language( accept_header, server_languages, ignore_wildcard=True, assume_superiors=True ): 1876 """Determines if the given language is acceptable to the user agent. 1877 1878 The accept_header should be the value present in the HTTP 1879 "Accept-Language:" header. In mod_python this is typically 1880 obtained from the req.http_headers_in table; in WSGI it is 1881 environ["Accept-Language"]; other web frameworks may provide other 1882 methods of obtaining it. 1883 1884 Optionally the accept_header parameter can be pre-parsed, as 1885 returned by the parse_accept_language_header() function defined in 1886 this module. 1887 1888 The server_languages argument should either be a single language 1889 string, a language_tag object, or a sequence of them. It 1890 represents the set of languages that the server is willing to 1891 send to the user agent. 1892 1893 Note that the wildcarded language tag "*" will be ignored. To 1894 override this, call with ignore_wildcard=False, and even then 1895 it will be the lowest-priority choice regardless of it's 1896 quality factor (as per HTTP spec). 1897 1898 If the assume_superiors is True then it the languages that the 1899 browser accepts will automatically include all superior languages. 1900 Any superior languages which must be added are done so with one 1901 half the qvalue of the language which is present. For example, if 1902 the accept string is "en-US", then it will be treated as if it 1903 were "en-US, en;q=0.5". Note that although the HTTP 1.1 spec says 1904 that browsers are supposed to encourage users to configure all 1905 acceptable languages, sometimes they don't, thus the ability 1906 for this function to assume this. But setting assume_superiors 1907 to False will insure strict adherence to the HTTP 1.1 spec; which 1908 means that if the browser accepts "en-US", then it will not 1909 be acceptable to send just "en" to it. 1910 1911 This function returns the language which is the most prefered and 1912 is acceptable to both the user agent and the caller. It will 1913 return None if no language is negotiable, otherwise the return 1914 value is always an instance of language_tag. 1915 1916 See also: RFC 3066 <http://www.ietf.org/rfc/rfc3066.txt>, and 1917 ISO 639, links at <http://en.wikipedia.org/wiki/ISO_639>, and 1918 <http://www.iana.org/assignments/language-tags>. 1919 1920 """ 1921 # Note special instructions from RFC 2616 sect. 14.1: 1922 # "The language quality factor assigned to a language-tag by the 1923 # Accept-Language field is the quality value of the longest 1924 # language- range in the field that matches the language-tag." 1925 1926 if _is_string(accept_header): 1927 accept_list = parse_accept_language_header(accept_header) 1928 else: 1929 accept_list = accept_header 1930 1931 # Possibly add in any "missing" languages that the browser may 1932 # have forgotten to include in the list. Insure list is sorted so 1933 # more general languages come before more specific ones. 1934 1935 accept_list.sort() 1936 all_tags = [a[0] for a in accept_list] 1937 if assume_superiors: 1938 to_add = [] 1939 for langtag, qvalue, _args in accept_list: 1940 if len(langtag) >= 2: 1941 for suptag in langtag.all_superiors( include_wildcard=False ): 1942 if suptag not in all_tags: 1943 # Add in superior at half the qvalue 1944 to_add.append( (suptag, qvalue / 2, '') ) 1945 all_tags.append( suptag ) 1946 accept_list.extend( to_add ) 1947 1948 # Convert server_languages to a list of language_tags 1949 if _is_string(server_languages): 1950 server_languages = [language_tag(server_languages)] 1951 elif isinstance(server_languages, language_tag): 1952 server_languages = [server_languages] 1953 else: 1954 server_languages = [language_tag(lang) for lang in server_languages] 1955 1956 # Select the best one 1957 best = None # tuple (langtag, qvalue, matchlen) 1958 1959 for langtag, qvalue, _args in accept_list: 1960 # aargs is ignored for Accept-Language 1961 if qvalue <= 0: 1962 continue # UA doesn't accept this language 1963 1964 if ignore_wildcard and langtag.is_universal_wildcard(): 1965 continue # "*" being ignored 1966 1967 for svrlang in server_languages: 1968 # The best match is determined first by the quality factor, 1969 # and then by the most specific match. 1970 1971 matchlen = -1 # how specifically this one matches (0 is a non-match) 1972 if svrlang.dialect_of( langtag, ignore_wildcard=ignore_wildcard ): 1973 matchlen = len(langtag) 1974 if not best \ 1975 or matchlen > best[2] \ 1976 or (matchlen == best[2] and qvalue > best[1]): 1977 # This match is better 1978 best = (langtag, qvalue, matchlen) 1979 if not best: 1980 return None 1981 return best[0] 1982 1983# end of file
130def http_datetime(dt=None): 131 """Formats a datetime as an HTTP 1.1 Date/Time string. 132 133 Takes a standard Python datetime object and returns a string 134 formatted according to the HTTP 1.1 date/time format. 135 136 If no datetime is provided (or None) then the current 137 time is used. 138 139 ABOUT TIMEZONES: If the passed in datetime object is naive it is 140 assumed to be in UTC already. But if it has a tzinfo component, 141 the returned timestamp string will have been converted to UTC 142 automatically. So if you use timezone-aware datetimes, you need 143 not worry about conversion to UTC. 144 145 """ 146 if not dt: 147 import datetime 148 dt = datetime.datetime.utcnow() 149 else: 150 try: 151 dt = dt - dt.utcoffset() 152 except: 153 pass # no timezone offset, just assume already in UTC 154 155 s = dt.strftime('%a, %d %b %Y %H:%M:%S GMT') 156 return s
Formats a datetime as an HTTP 1.1 Date/Time string.
Takes a standard Python datetime object and returns a string formatted according to the HTTP 1.1 date/time format.
If no datetime is provided (or None) then the current time is used.
ABOUT TIMEZONES: If the passed in datetime object is naive it is assumed to be in UTC already. But if it has a tzinfo component, the returned timestamp string will have been converted to UTC automatically. So if you use timezone-aware datetimes, you need not worry about conversion to UTC.
159def parse_http_datetime(datestring, utc_tzinfo=None, strict=False): 160 """Returns a datetime object from an HTTP 1.1 Date/Time string. 161 162 Note that HTTP dates are always in UTC, so the returned datetime 163 object will also be in UTC. 164 165 You can optionally pass in a tzinfo object which should represent 166 the UTC timezone, and the returned datetime will then be 167 timezone-aware (allowing you to more easly translate it into 168 different timzeones later). 169 170 If you set 'strict' to True, then only the RFC 1123 format 171 is recognized. Otherwise the backwards-compatible RFC 1036 172 and Unix asctime(3) formats are also recognized. 173 174 Please note that the day-of-the-week is not validated. 175 Also two-digit years, although not HTTP 1.1 compliant, are 176 treated according to recommended Y2K rules. 177 178 """ 179 import re, datetime 180 m = re.match(r'(?P<DOW>[a-z]+), (?P<D>\d+) (?P<MON>[a-z]+) (?P<Y>\d+) (?P<H>\d+):(?P<M>\d+):(?P<S>\d+(\.\d+)?) (?P<TZ>[a-zA-Z0-9_+]+)$', 181 datestring, re.IGNORECASE) 182 if not m and not strict: 183 m = re.match(r'(?P<DOW>[a-z]+) (?P<MON>[a-z]+) (?P<D>\d+) (?P<H>\d+):(?P<M>\d+):(?P<S>\d+) (?P<Y>\d+)$', 184 datestring, re.IGNORECASE) 185 if not m: 186 m = re.match(r'(?P<DOW>[a-z]+), (?P<D>\d+)-(?P<MON>[a-z]+)-(?P<Y>\d+) (?P<H>\d+):(?P<M>\d+):(?P<S>\d+(\.\d+)?) (?P<TZ>\w+)$', 187 datestring, re.IGNORECASE) 188 if not m: 189 raise ValueError('HTTP date is not correctly formatted') 190 191 try: 192 tz = m.group('TZ').upper() 193 except: 194 tz = 'GMT' 195 if tz not in ('GMT','UTC','0000','00:00'): 196 raise ValueError('HTTP date is not in GMT timezone') 197 198 monname = m.group('MON').upper() 199 mdict = {'JAN':1, 'FEB':2, 'MAR':3, 'APR':4, 'MAY':5, 'JUN':6, 200 'JUL':7, 'AUG':8, 'SEP':9, 'OCT':10, 'NOV':11, 'DEC':12} 201 month = mdict.get(monname) 202 if not month: 203 raise ValueError('HTTP date has an unrecognizable month') 204 y = int(m.group('Y')) 205 if y < 100: 206 century = datetime.datetime.utcnow().year / 100 207 if y < 50: 208 y = century * 100 + y 209 else: 210 y = (century - 1) * 100 + y 211 d = int(m.group('D')) 212 hour = int(m.group('H')) 213 minute = int(m.group('M')) 214 try: 215 second = int(m.group('S')) 216 except: 217 second = float(m.group('S')) 218 dt = datetime.datetime( y, month, d, hour, minute, second, tzinfo=utc_tzinfo ) 219 return dt
Returns a datetime object from an HTTP 1.1 Date/Time string.
Note that HTTP dates are always in UTC, so the returned datetime object will also be in UTC.
You can optionally pass in a tzinfo object which should represent the UTC timezone, and the returned datetime will then be timezone-aware (allowing you to more easly translate it into different timzeones later).
If you set 'strict' to True, then only the RFC 1123 format is recognized. Otherwise the backwards-compatible RFC 1036 and Unix asctime(3) formats are also recognized.
Please note that the day-of-the-week is not validated. Also two-digit years, although not HTTP 1.1 compliant, are treated according to recommended Y2K rules.
222class RangeUnsatisfiableError(ValueError): 223 """Exception class when a byte range lies outside the file size boundaries.""" 224 def __init__(self, reason=None): 225 if not reason: 226 reason = 'Range is unsatisfiable' 227 ValueError.__init__(self, reason)
Exception class when a byte range lies outside the file size boundaries.
Inherited Members
- builtins.BaseException
- with_traceback
- args
230class RangeUnmergableError(ValueError): 231 """Exception class when byte ranges are noncontiguous and can not be merged together.""" 232 def __init__(self, reason=None): 233 if not reason: 234 reason = 'Ranges can not be merged together' 235 ValueError.__init__(self, reason)
Exception class when byte ranges are noncontiguous and can not be merged together.
Inherited Members
- builtins.BaseException
- with_traceback
- args
238class ParseError(ValueError): 239 """Exception class representing a string parsing error.""" 240 def __init__(self, args, input_string, at_position): 241 ValueError.__init__(self, args) 242 self.input_string = input_string 243 self.at_position = at_position 244 def __str__(self): 245 if self.at_position >= len(self.input_string): 246 return '%s\n\tOccured at end of string' % self.args[0] 247 else: 248 return '%s\n\tOccured near %s' % (self.args[0], repr(self.input_string[self.at_position:self.at_position+16]))
Exception class representing a string parsing error.
Inherited Members
- builtins.BaseException
- with_traceback
- args
251def is_token(s): 252 """Determines if the string is a valid token.""" 253 for c in s: 254 if ord(c) < 32 or ord(c) > 128 or c in SEPARATORS: 255 return False 256 return True
Determines if the string is a valid token.
259def parse_comma_list(s, start=0, element_parser=None, min_count=0, max_count=0): 260 """Parses a comma-separated list with optional whitespace. 261 262 Takes an optional callback function `element_parser`, which 263 is assumed to be able to parse an individual element. It 264 will be passed the string and a `start` argument, and 265 is expected to return a tuple (parsed_result, chars_consumed). 266 267 If no element_parser is given, then either single tokens or 268 quoted strings will be parsed. 269 270 If min_count > 0, then at least that many non-empty elements 271 must be in the list, or an error is raised. 272 273 If max_count > 0, then no more than that many non-empty elements 274 may be in the list, or an error is raised. 275 276 """ 277 if min_count > 0 and start == len(s): 278 raise ParseError('Comma-separated list must contain some elements',s,start) 279 elif start >= len(s): 280 raise ParseError('Starting position is beyond the end of the string',s,start) 281 282 if not element_parser: 283 element_parser = parse_token_or_quoted_string 284 results = [] 285 pos = start 286 while pos < len(s): 287 e = element_parser( s, pos ) 288 if not e or e[1] == 0: 289 break # end of data? 290 else: 291 results.append( e[0] ) 292 pos += e[1] 293 while pos < len(s) and s[pos] in LWS: 294 pos += 1 295 if pos < len(s) and s[pos] != ',': 296 break 297 while pos < len(s) and s[pos] == ',': 298 # skip comma and any "empty" elements 299 pos += 1 # skip comma 300 while pos < len(s) and s[pos] in LWS: 301 pos += 1 302 if len(results) < min_count: 303 raise ParseError('Comma-separated list does not have enough elements',s,pos) 304 elif max_count and len(results) > max_count: 305 raise ParseError('Comma-separated list has too many elements',s,pos) 306 return (results, pos-start)
Parses a comma-separated list with optional whitespace.
Takes an optional callback function element_parser
, which
is assumed to be able to parse an individual element. It
will be passed the string and a start
argument, and
is expected to return a tuple (parsed_result, chars_consumed).
If no element_parser is given, then either single tokens or quoted strings will be parsed.
If min_count > 0, then at least that many non-empty elements must be in the list, or an error is raised.
If max_count > 0, then no more than that many non-empty elements may be in the list, or an error is raised.
309def parse_token(s, start=0): 310 """Parses a token. 311 312 A token is a string defined by RFC 2616 section 2.2 as: 313 token = 1*<any CHAR except CTLs or separators> 314 315 Returns a tuple (token, chars_consumed), or ('',0) if no token 316 starts at the given string position. On a syntax error, a 317 ParseError exception will be raised. 318 319 """ 320 return parse_token_or_quoted_string(s, start, allow_quoted=False, allow_token=True)
Parses a token.
A token is a string defined by RFC 2616 section 2.2 as:
token = 1*
Returns a tuple (token, chars_consumed), or ('',0) if no token starts at the given string position. On a syntax error, a ParseError exception will be raised.
323def quote_string(s, always_quote=True): 324 """Produces a quoted string according to HTTP 1.1 rules. 325 326 If always_quote is False and if the string is also a valid token, 327 then this function may return a string without quotes. 328 329 """ 330 need_quotes = False 331 q = '' 332 for c in s: 333 if ord(c) < 32 or ord(c) > 127 or c in SEPARATORS: 334 q += '\\' + c 335 need_quotes = True 336 else: 337 q += c 338 if need_quotes or always_quote: 339 return '"' + q + '"' 340 else: 341 return q
Produces a quoted string according to HTTP 1.1 rules.
If always_quote is False and if the string is also a valid token, then this function may return a string without quotes.
344def parse_quoted_string(s, start=0): 345 """Parses a quoted string. 346 347 Returns a tuple (string, chars_consumed). The quote marks will 348 have been removed and all \-escapes will have been replaced with 349 the characters they represent. 350 351 """ 352 return parse_token_or_quoted_string(s, start, allow_quoted=True, allow_token=False)
Parses a quoted string.
Returns a tuple (string, chars_consumed). The quote marks will have been removed and all -escapes will have been replaced with the characters they represent.
355def parse_token_or_quoted_string(s, start=0, allow_quoted=True, allow_token=True): 356 """Parses a token or a quoted-string. 357 358 's' is the string to parse, while start is the position within the 359 string where parsing should begin. It will returns a tuple 360 (token, chars_consumed), with all \-escapes and quotation already 361 processed. 362 363 Syntax is according to BNF rules in RFC 2161 section 2.2, 364 specifically the 'token' and 'quoted-string' declarations. 365 Syntax errors in the input string will result in ParseError 366 being raised. 367 368 If allow_quoted is False, then only tokens will be parsed instead 369 of either a token or quoted-string. 370 371 If allow_token is False, then only quoted-strings will be parsed 372 instead of either a token or quoted-string. 373 """ 374 if not allow_quoted and not allow_token: 375 raise ValueError('Parsing can not continue with options provided') 376 377 if start >= len(s): 378 raise ParseError('Starting position is beyond the end of the string',s,start) 379 has_quote = (s[start] == '"') 380 if has_quote and not allow_quoted: 381 raise ParseError('A quoted string was not expected', s, start) 382 if not has_quote and not allow_token: 383 raise ParseError('Expected a quotation mark', s, start) 384 385 s2 = '' 386 pos = start 387 if has_quote: 388 pos += 1 389 while pos < len(s): 390 c = s[pos] 391 if c == '\\' and has_quote: 392 # Note this is NOT C-style escaping; the character after the \ is 393 # taken literally. 394 pos += 1 395 if pos == len(s): 396 raise ParseError("End of string while expecting a character after '\\'",s,pos) 397 s2 += s[pos] 398 pos += 1 399 elif c == '"' and has_quote: 400 break 401 elif not has_quote and (c in SEPARATORS or ord(c)<32 or ord(c)>127): 402 break 403 else: 404 s2 += c 405 pos += 1 406 if has_quote: 407 # Make sure we have a closing quote mark 408 if pos >= len(s) or s[pos] != '"': 409 raise ParseError('Quoted string is missing closing quote mark',s,pos) 410 else: 411 pos += 1 412 return s2, (pos - start)
Parses a token or a quoted-string.
's' is the string to parse, while start is the position within the string where parsing should begin. It will returns a tuple (token, chars_consumed), with all -escapes and quotation already processed.
Syntax is according to BNF rules in RFC 2161 section 2.2, specifically the 'token' and 'quoted-string' declarations. Syntax errors in the input string will result in ParseError being raised.
If allow_quoted is False, then only tokens will be parsed instead of either a token or quoted-string.
If allow_token is False, then only quoted-strings will be parsed instead of either a token or quoted-string.
415def remove_comments(s, collapse_spaces=True): 416 """Removes any ()-style comments from a string. 417 418 In HTTP, ()-comments can nest, and this function will correctly 419 deal with that. 420 421 If 'collapse_spaces' is True, then if there is any whitespace 422 surrounding the comment, it will be replaced with a single space 423 character. Whitespace also collapses across multiple comment 424 sequences, so that "a (b) (c) d" becomes just "a d". 425 426 Otherwise, if 'collapse_spaces' is False then all whitespace which 427 is outside any comments is left intact as-is. 428 429 """ 430 if '(' not in s: 431 return s # simple case 432 A = [] 433 dostrip = False 434 added_comment_space = False 435 pos = 0 436 if collapse_spaces: 437 # eat any leading spaces before a comment 438 i = s.find('(') 439 if i >= 0: 440 while pos < i and s[pos] in LWS: 441 pos += 1 442 if pos != i: 443 pos = 0 444 else: 445 dostrip = True 446 added_comment_space = True # lie 447 while pos < len(s): 448 if s[pos] == '(': 449 _cmt, k = parse_comment( s, pos ) 450 pos += k 451 if collapse_spaces: 452 dostrip = True 453 if not added_comment_space: 454 if len(A) > 0 and A[-1] and A[-1][-1] in LWS: 455 # previous part ended with whitespace 456 A[-1] = A[-1].rstrip() 457 A.append(' ') # comment becomes one space 458 added_comment_space = True 459 else: 460 i = s.find( '(', pos ) 461 if i == -1: 462 if dostrip: 463 text = s[pos:].lstrip() 464 if s[pos] in LWS and not added_comment_space: 465 A.append(' ') 466 added_comment_space = True 467 else: 468 text = s[pos:] 469 if text: 470 A.append(text) 471 dostrip = False 472 added_comment_space = False 473 break # end of string 474 else: 475 if dostrip: 476 text = s[pos:i].lstrip() 477 if s[pos] in LWS and not added_comment_space: 478 A.append(' ') 479 added_comment_space = True 480 else: 481 text = s[pos:i] 482 if text: 483 A.append(text) 484 dostrip = False 485 added_comment_space = False 486 pos = i 487 if dostrip and len(A) > 0 and A[-1] and A[-1][-1] in LWS: 488 A[-1] = A[-1].rstrip() 489 return ''.join(A)
Removes any ()-style comments from a string.
In HTTP, ()-comments can nest, and this function will correctly deal with that.
If 'collapse_spaces' is True, then if there is any whitespace surrounding the comment, it will be replaced with a single space character. Whitespace also collapses across multiple comment sequences, so that "a (b) (c) d" becomes just "a d".
Otherwise, if 'collapse_spaces' is False then all whitespace which is outside any comments is left intact as-is.
518def parse_comment(s, start=0): 519 """Parses a ()-style comment from a header value. 520 521 Returns tuple (comment, chars_consumed), where the comment will 522 have had the outer-most parentheses and white space stripped. Any 523 nested comments will still have their parentheses and whitespace 524 left intact. 525 526 All \-escaped quoted pairs will have been replaced with the actual 527 characters they represent, even within the inner nested comments. 528 529 You should note that only a few HTTP headers, such as User-Agent 530 or Via, allow ()-style comments within the header value. 531 532 A comment is defined by RFC 2616 section 2.2 as: 533 534 comment = "(" *( ctext | quoted-pair | comment ) ")" 535 ctext = <any TEXT excluding "(" and ")"> 536 """ 537 if start >= len(s): 538 raise ParseError('Starting position is beyond the end of the string',s,start) 539 if s[start] != '(': 540 raise ParseError('Comment must begin with opening parenthesis',s,start) 541 542 s2 = '' 543 nestlevel = 1 544 pos = start + 1 545 while pos < len(s) and s[pos] in LWS: 546 pos += 1 547 548 while pos < len(s): 549 c = s[pos] 550 if c == '\\': 551 # Note this is not C-style escaping; the character after the \ is 552 # taken literally. 553 pos += 1 554 if pos == len(s): 555 raise ParseError("End of string while expecting a character after '\\'",s,pos) 556 s2 += s[pos] 557 pos += 1 558 elif c == '(': 559 nestlevel += 1 560 s2 += c 561 pos += 1 562 elif c == ')': 563 nestlevel -= 1 564 pos += 1 565 if nestlevel >= 1: 566 s2 += c 567 else: 568 break 569 else: 570 s2 += c 571 pos += 1 572 if nestlevel > 0: 573 raise ParseError('End of string reached before comment was closed',s,pos) 574 # Now rstrip s2 of all LWS chars. 575 while len(s2) and s2[-1] in LWS: 576 s2 = s2[:-1] 577 return s2, (pos - start)
Parses a ()-style comment from a header value.
Returns tuple (comment, chars_consumed), where the comment will have had the outer-most parentheses and white space stripped. Any nested comments will still have their parentheses and whitespace left intact.
All -escaped quoted pairs will have been replaced with the actual characters they represent, even within the inner nested comments.
You should note that only a few HTTP headers, such as User-Agent or Via, allow ()-style comments within the header value.
A comment is defined by RFC 2616 section 2.2 as:
comment = "(" *( ctext | quoted-pair | comment ) ")"
ctext =
580class range_spec(object): 581 """A single contiguous (byte) range. 582 583 A range_spec defines a range (of bytes) by specifying two offsets, 584 the 'first' and 'last', which are inclusive in the range. Offsets 585 are zero-based (the first byte is offset 0). The range can not be 586 empty or negative (has to satisfy first <= last). 587 588 The range can be unbounded on either end, represented here by the 589 None value, with these semantics: 590 591 * A 'last' of None always indicates the last possible byte 592 (although that offset may not be known). 593 594 * A 'first' of None indicates this is a suffix range, where 595 the last value is actually interpreted to be the number 596 of bytes at the end of the file (regardless of file size). 597 598 Note that it is not valid for both first and last to be None. 599 600 """ 601 602 __slots__ = ['first','last'] 603 604 def __init__(self, first=0, last=None): 605 self.set( first, last ) 606 607 def set(self, first, last): 608 """Sets the value of this range given the first and last offsets. 609 """ 610 if first is not None and last is not None and first > last: 611 raise ValueError("Byte range does not satisfy first <= last.") 612 elif first is None and last is None: 613 raise ValueError("Byte range can not omit both first and last offsets.") 614 self.first = first 615 self.last = last 616 617 def __repr__(self): 618 return '%s.%s(%s,%s)' % (self.__class__.__module__, self.__class__.__name__, 619 self.first, self.last) 620 621 def __str__(self): 622 """Returns a string form of the range as would appear in a Range: header.""" 623 if self.first is None and self.last is None: 624 return '' 625 s = '' 626 if self.first is not None: 627 s += '%d' % self.first 628 s += '-' 629 if self.last is not None: 630 s += '%d' % self.last 631 return s 632 633 def __eq__(self, other): 634 """Compare ranges for equality. 635 636 Note that if non-specific ranges are involved (such as 34- and -5), 637 they could compare as not equal even though they may represent 638 the same set of bytes in some contexts. 639 """ 640 return self.first == other.first and self.last == other.last 641 642 def __ne__(self, other): 643 """Compare ranges for inequality. 644 645 Note that if non-specific ranges are involved (such as 34- and -5), 646 they could compare as not equal even though they may represent 647 the same set of bytes in some contexts. 648 """ 649 return not self.__eq__(other) 650 651 def __lt__(self, other): 652 """< operator is not defined""" 653 raise NotImplementedError('Ranges can not be relationally compared') 654 def __le__(self, other): 655 """<= operator is not defined""" 656 raise NotImplementedError('Ranges can not be ralationally compared') 657 def __gt__(self, other): 658 """> operator is not defined""" 659 raise NotImplementedError('Ranges can not be relationally compared') 660 def __ge__(self, other): 661 """>= operator is not defined""" 662 raise NotImplementedError('Ranges can not be relationally compared') 663 664 def copy(self): 665 """Makes a copy of this range object.""" 666 return self.__class__( self.first, self.last ) 667 668 def is_suffix(self): 669 """Returns True if this is a suffix range. 670 671 A suffix range is one that specifies the last N bytes of a 672 file regardless of file size. 673 674 """ 675 return self.first == None 676 677 def is_fixed(self): 678 """Returns True if this range is absolute and a fixed size. 679 680 This occurs only if neither first or last is None. Converse 681 is the is_unbounded() method. 682 683 """ 684 return self.first is not None and self.last is not None 685 686 def is_unbounded(self): 687 """Returns True if the number of bytes in the range is unspecified. 688 689 This can only occur if either the 'first' or the 'last' member 690 is None. Converse is the is_fixed() method. 691 692 """ 693 return self.first is None or self.last is None 694 695 def is_whole_file(self): 696 """Returns True if this range includes all possible bytes. 697 698 This can only occur if the 'last' member is None and the first 699 member is 0. 700 701 """ 702 return self.first == 0 and self.last is None 703 704 def __contains__(self, offset): 705 """Does this byte range contain the given byte offset? 706 707 If the offset < 0, then it is taken as an offset from the end 708 of the file, where -1 is the last byte. This type of offset 709 will only work with suffix ranges. 710 711 """ 712 if offset < 0: 713 if self.first is not None: 714 return False 715 else: 716 return self.last >= -offset 717 elif self.first is None: 718 return False 719 elif self.last is None: 720 return True 721 else: 722 return self.first <= offset <= self.last 723 724 def fix_to_size(self, size): 725 """Changes a length-relative range to an absolute range based upon given file size. 726 727 Ranges that are already absolute are left as is. 728 729 Note that zero-length files are handled as special cases, 730 since the only way possible to specify a zero-length range is 731 with the suffix range "-0". Thus unless this range is a suffix 732 range, it can not satisfy a zero-length file. 733 734 If the resulting range (partly) lies outside the file size then an 735 error is raised. 736 """ 737 738 if size == 0: 739 if self.first is None: 740 self.last = 0 741 return 742 else: 743 raise RangeUnsatisfiableError("Range can satisfy a zero-length file.") 744 745 if self.first is None: 746 # A suffix range 747 self.first = size - self.last 748 if self.first < 0: 749 self.first = 0 750 self.last = size - 1 751 else: 752 if self.first > size - 1: 753 raise RangeUnsatisfiableError('Range begins beyond the file size.') 754 else: 755 if self.last is None: 756 # An unbounded range 757 self.last = size - 1 758 return 759 760 def merge_with(self, other): 761 """Tries to merge the given range into this one. 762 763 The size of this range may be enlarged as a result. 764 765 An error is raised if the two ranges do not overlap or are not 766 contiguous with each other. 767 """ 768 if self.is_whole_file() or self == other: 769 return 770 elif other.is_whole_file(): 771 self.first, self.last = 0, None 772 return 773 774 a1, z1 = self.first, self.last 775 a2, z2 = other.first, other.last 776 777 if self.is_suffix(): 778 if z1 == 0: # self is zero-length, so merge becomes a copy 779 self.first, self.last = a2, z2 780 return 781 elif other.is_suffix(): 782 self.last = max(z1, z2) 783 else: 784 raise RangeUnmergableError() 785 elif other.is_suffix(): 786 if z2 == 0: # other is zero-length, so nothing to merge 787 return 788 else: 789 raise RangeUnmergableError() 790 791 assert a1 is not None and a2 is not None 792 793 if a2 < a1: 794 # swap ranges so a1 <= a2 795 a1, z1, a2, z2 = a2, z2, a1, z1 796 797 assert a1 <= a2 798 799 if z1 is None: 800 if z2 is not None and z2 + 1 < a1: 801 raise RangeUnmergableError() 802 else: 803 self.first = min(a1, a2) 804 self.last = None 805 elif z2 is None: 806 if z1 + 1 < a2: 807 raise RangeUnmergableError() 808 else: 809 self.first = min(a1, a2) 810 self.last = None 811 else: 812 if a2 > z1 + 1: 813 raise RangeUnmergableError() 814 else: 815 self.first = a1 816 self.last = max(z1, z2) 817 return
A single contiguous (byte) range.
A range_spec defines a range (of bytes) by specifying two offsets, the 'first' and 'last', which are inclusive in the range. Offsets are zero-based (the first byte is offset 0). The range can not be empty or negative (has to satisfy first <= last).
The range can be unbounded on either end, represented here by the None value, with these semantics:
A 'last' of None always indicates the last possible byte (although that offset may not be known).
A 'first' of None indicates this is a suffix range, where the last value is actually interpreted to be the number of bytes at the end of the file (regardless of file size).
Note that it is not valid for both first and last to be None.
607 def set(self, first, last): 608 """Sets the value of this range given the first and last offsets. 609 """ 610 if first is not None and last is not None and first > last: 611 raise ValueError("Byte range does not satisfy first <= last.") 612 elif first is None and last is None: 613 raise ValueError("Byte range can not omit both first and last offsets.") 614 self.first = first 615 self.last = last
Sets the value of this range given the first and last offsets.
664 def copy(self): 665 """Makes a copy of this range object.""" 666 return self.__class__( self.first, self.last )
Makes a copy of this range object.
668 def is_suffix(self): 669 """Returns True if this is a suffix range. 670 671 A suffix range is one that specifies the last N bytes of a 672 file regardless of file size. 673 674 """ 675 return self.first == None
Returns True if this is a suffix range.
A suffix range is one that specifies the last N bytes of a file regardless of file size.
677 def is_fixed(self): 678 """Returns True if this range is absolute and a fixed size. 679 680 This occurs only if neither first or last is None. Converse 681 is the is_unbounded() method. 682 683 """ 684 return self.first is not None and self.last is not None
Returns True if this range is absolute and a fixed size.
This occurs only if neither first or last is None. Converse is the is_unbounded() method.
686 def is_unbounded(self): 687 """Returns True if the number of bytes in the range is unspecified. 688 689 This can only occur if either the 'first' or the 'last' member 690 is None. Converse is the is_fixed() method. 691 692 """ 693 return self.first is None or self.last is None
Returns True if the number of bytes in the range is unspecified.
This can only occur if either the 'first' or the 'last' member is None. Converse is the is_fixed() method.
695 def is_whole_file(self): 696 """Returns True if this range includes all possible bytes. 697 698 This can only occur if the 'last' member is None and the first 699 member is 0. 700 701 """ 702 return self.first == 0 and self.last is None
Returns True if this range includes all possible bytes.
This can only occur if the 'last' member is None and the first member is 0.
724 def fix_to_size(self, size): 725 """Changes a length-relative range to an absolute range based upon given file size. 726 727 Ranges that are already absolute are left as is. 728 729 Note that zero-length files are handled as special cases, 730 since the only way possible to specify a zero-length range is 731 with the suffix range "-0". Thus unless this range is a suffix 732 range, it can not satisfy a zero-length file. 733 734 If the resulting range (partly) lies outside the file size then an 735 error is raised. 736 """ 737 738 if size == 0: 739 if self.first is None: 740 self.last = 0 741 return 742 else: 743 raise RangeUnsatisfiableError("Range can satisfy a zero-length file.") 744 745 if self.first is None: 746 # A suffix range 747 self.first = size - self.last 748 if self.first < 0: 749 self.first = 0 750 self.last = size - 1 751 else: 752 if self.first > size - 1: 753 raise RangeUnsatisfiableError('Range begins beyond the file size.') 754 else: 755 if self.last is None: 756 # An unbounded range 757 self.last = size - 1 758 return
Changes a length-relative range to an absolute range based upon given file size.
Ranges that are already absolute are left as is.
Note that zero-length files are handled as special cases, since the only way possible to specify a zero-length range is with the suffix range "-0". Thus unless this range is a suffix range, it can not satisfy a zero-length file.
If the resulting range (partly) lies outside the file size then an error is raised.
760 def merge_with(self, other): 761 """Tries to merge the given range into this one. 762 763 The size of this range may be enlarged as a result. 764 765 An error is raised if the two ranges do not overlap or are not 766 contiguous with each other. 767 """ 768 if self.is_whole_file() or self == other: 769 return 770 elif other.is_whole_file(): 771 self.first, self.last = 0, None 772 return 773 774 a1, z1 = self.first, self.last 775 a2, z2 = other.first, other.last 776 777 if self.is_suffix(): 778 if z1 == 0: # self is zero-length, so merge becomes a copy 779 self.first, self.last = a2, z2 780 return 781 elif other.is_suffix(): 782 self.last = max(z1, z2) 783 else: 784 raise RangeUnmergableError() 785 elif other.is_suffix(): 786 if z2 == 0: # other is zero-length, so nothing to merge 787 return 788 else: 789 raise RangeUnmergableError() 790 791 assert a1 is not None and a2 is not None 792 793 if a2 < a1: 794 # swap ranges so a1 <= a2 795 a1, z1, a2, z2 = a2, z2, a1, z1 796 797 assert a1 <= a2 798 799 if z1 is None: 800 if z2 is not None and z2 + 1 < a1: 801 raise RangeUnmergableError() 802 else: 803 self.first = min(a1, a2) 804 self.last = None 805 elif z2 is None: 806 if z1 + 1 < a2: 807 raise RangeUnmergableError() 808 else: 809 self.first = min(a1, a2) 810 self.last = None 811 else: 812 if a2 > z1 + 1: 813 raise RangeUnmergableError() 814 else: 815 self.first = a1 816 self.last = max(z1, z2) 817 return
Tries to merge the given range into this one.
The size of this range may be enlarged as a result.
An error is raised if the two ranges do not overlap or are not contiguous with each other.
820class range_set(object): 821 """A collection of range_specs, with units (e.g., bytes). 822 """ 823 __slots__ = ['units', 'range_specs'] 824 825 def __init__(self): 826 self.units = 'bytes' 827 self.range_specs = [] # a list of range_spec objects 828 829 def __str__(self): 830 return self.units + '=' + ', '.join([str(s) for s in self.range_specs]) 831 832 def __repr__(self): 833 return '%s.%s(%s)' % (self.__class__.__module__, 834 self.__class__.__name__, 835 repr(self.__str__()) ) 836 837 def from_str(self, s, valid_units=('bytes','none')): 838 """Sets this range set based upon a string, such as the Range: header. 839 840 You can also use the parse_range_set() function for more control. 841 842 If a parsing error occurs, the pre-exising value of this range 843 set is left unchanged. 844 845 """ 846 r, k = parse_range_set( s, valid_units=valid_units ) 847 if k < len(s): 848 raise ParseError("Extra unparsable characters in range set specifier",s,k) 849 self.units = r.units 850 self.range_specs = r.range_specs 851 852 def is_single_range(self): 853 """Does this range specifier consist of only a single range set?""" 854 return len(self.range_specs) == 1 855 856 def is_contiguous(self): 857 """Can the collection of range_specs be coalesced into a single contiguous range?""" 858 if len(self.range_specs) <= 1: 859 return True 860 merged = self.range_specs[0].copy() 861 for s in self.range_specs[1:]: 862 try: 863 merged.merge_with(s) 864 except: 865 return False 866 return True 867 868 def fix_to_size(self, size): 869 """Changes all length-relative range_specs to absolute range_specs based upon given file size. 870 If none of the range_specs in this set can be satisfied, then the 871 entire set is considered unsatifiable and an error is raised. 872 Otherwise any unsatisfiable range_specs will simply be removed 873 from this set. 874 875 """ 876 for i in range(len(self.range_specs)): 877 try: 878 self.range_specs[i].fix_to_size( size ) 879 except RangeUnsatisfiableError: 880 self.range_specs[i] = None 881 self.range_specs = [s for s in self.range_specs if s is not None] 882 if len(self.range_specs) == 0: 883 raise RangeUnsatisfiableError('No ranges can be satisfied') 884 885 def coalesce(self): 886 """Collapses all consecutive range_specs which together define a contiguous range. 887 888 Note though that this method will not re-sort the range_specs, so a 889 potentially contiguous range may not be collapsed if they are 890 not sorted. For example the ranges: 891 10-20, 30-40, 20-30 892 will not be collapsed to just 10-40. However if the ranges are 893 sorted first as with: 894 10-20, 20-30, 30-40 895 then they will collapse to 10-40. 896 """ 897 if len(self.range_specs) <= 1: 898 return 899 for i in range(len(self.range_specs) - 1): 900 a = self.range_specs[i] 901 b = self.range_specs[i+1] 902 if a is not None: 903 try: 904 a.merge_with( b ) 905 self.range_specs[i+1] = None # to be deleted later 906 except RangeUnmergableError: 907 pass 908 self.range_specs = [r for r in self.range_specs if r is not None]
A collection of range_specs, with units (e.g., bytes).
837 def from_str(self, s, valid_units=('bytes','none')): 838 """Sets this range set based upon a string, such as the Range: header. 839 840 You can also use the parse_range_set() function for more control. 841 842 If a parsing error occurs, the pre-exising value of this range 843 set is left unchanged. 844 845 """ 846 r, k = parse_range_set( s, valid_units=valid_units ) 847 if k < len(s): 848 raise ParseError("Extra unparsable characters in range set specifier",s,k) 849 self.units = r.units 850 self.range_specs = r.range_specs
Sets this range set based upon a string, such as the Range: header.
You can also use the parse_range_set() function for more control.
If a parsing error occurs, the pre-exising value of this range set is left unchanged.
852 def is_single_range(self): 853 """Does this range specifier consist of only a single range set?""" 854 return len(self.range_specs) == 1
Does this range specifier consist of only a single range set?
856 def is_contiguous(self): 857 """Can the collection of range_specs be coalesced into a single contiguous range?""" 858 if len(self.range_specs) <= 1: 859 return True 860 merged = self.range_specs[0].copy() 861 for s in self.range_specs[1:]: 862 try: 863 merged.merge_with(s) 864 except: 865 return False 866 return True
Can the collection of range_specs be coalesced into a single contiguous range?
868 def fix_to_size(self, size): 869 """Changes all length-relative range_specs to absolute range_specs based upon given file size. 870 If none of the range_specs in this set can be satisfied, then the 871 entire set is considered unsatifiable and an error is raised. 872 Otherwise any unsatisfiable range_specs will simply be removed 873 from this set. 874 875 """ 876 for i in range(len(self.range_specs)): 877 try: 878 self.range_specs[i].fix_to_size( size ) 879 except RangeUnsatisfiableError: 880 self.range_specs[i] = None 881 self.range_specs = [s for s in self.range_specs if s is not None] 882 if len(self.range_specs) == 0: 883 raise RangeUnsatisfiableError('No ranges can be satisfied')
Changes all length-relative range_specs to absolute range_specs based upon given file size. If none of the range_specs in this set can be satisfied, then the entire set is considered unsatifiable and an error is raised. Otherwise any unsatisfiable range_specs will simply be removed from this set.
885 def coalesce(self): 886 """Collapses all consecutive range_specs which together define a contiguous range. 887 888 Note though that this method will not re-sort the range_specs, so a 889 potentially contiguous range may not be collapsed if they are 890 not sorted. For example the ranges: 891 10-20, 30-40, 20-30 892 will not be collapsed to just 10-40. However if the ranges are 893 sorted first as with: 894 10-20, 20-30, 30-40 895 then they will collapse to 10-40. 896 """ 897 if len(self.range_specs) <= 1: 898 return 899 for i in range(len(self.range_specs) - 1): 900 a = self.range_specs[i] 901 b = self.range_specs[i+1] 902 if a is not None: 903 try: 904 a.merge_with( b ) 905 self.range_specs[i+1] = None # to be deleted later 906 except RangeUnmergableError: 907 pass 908 self.range_specs = [r for r in self.range_specs if r is not None]
Collapses all consecutive range_specs which together define a contiguous range.
Note though that this method will not re-sort the range_specs, so a potentially contiguous range may not be collapsed if they are not sorted. For example the ranges: 10-20, 30-40, 20-30 will not be collapsed to just 10-40. However if the ranges are sorted first as with: 10-20, 20-30, 30-40 then they will collapse to 10-40.
911def parse_number( s, start=0 ): 912 """Parses a positive decimal integer number from the string. 913 914 A tuple is returned (number, chars_consumed). If the 915 string is not a valid decimal number, then (None,0) is returned. 916 """ 917 if start >= len(s): 918 raise ParseError('Starting position is beyond the end of the string',s,start) 919 if s[start] not in DIGIT: 920 return (None,0) # not a number 921 pos = start 922 n = 0 923 while pos < len(s): 924 c = s[pos] 925 if c in DIGIT: 926 n *= 10 927 n += ord(c) - ord('0') 928 pos += 1 929 else: 930 break 931 return n, pos-start
Parses a positive decimal integer number from the string.
A tuple is returned (number, chars_consumed). If the string is not a valid decimal number, then (None,0) is returned.
934def parse_range_spec( s, start=0 ): 935 """Parses a (byte) range_spec. 936 937 Returns a tuple (range_spec, chars_consumed). 938 """ 939 if start >= len(s): 940 raise ParseError('Starting position is beyond the end of the string',s,start) 941 if s[start] not in DIGIT and s[start] != '-': 942 raise ParseError("Invalid range, expected a digit or '-'",s,start) 943 _first, last = None, None 944 pos = start 945 first, k = parse_number( s, pos ) 946 pos += k 947 if s[pos] == '-': 948 pos += 1 949 if pos < len(s): 950 last, k = parse_number( s, pos ) 951 pos += k 952 else: 953 raise ParseError("Byte range must include a '-'",s,pos) 954 if first is None and last is None: 955 raise ParseError('Byte range can not omit both first and last indices.',s,start) 956 R = range_spec( first, last ) 957 return R, pos-start
Parses a (byte) range_spec.
Returns a tuple (range_spec, chars_consumed).
960def parse_range_header( header_value, valid_units=('bytes','none') ): 961 """Parses the value of an HTTP Range: header. 962 963 The value of the header as a string should be passed in; without 964 the header name itself. 965 966 Returns a range_set object. 967 """ 968 ranges, k = parse_range_set( header_value, valid_units=valid_units ) 969 if k < len(header_value): 970 raise ParseError('Range header has unexpected or unparsable characters', 971 header_value, k) 972 return ranges
Parses the value of an HTTP Range: header.
The value of the header as a string should be passed in; without the header name itself.
Returns a range_set object.
975def parse_range_set( s, start=0, valid_units=('bytes','none') ): 976 """Parses a (byte) range set specifier. 977 978 Returns a tuple (range_set, chars_consumed). 979 """ 980 if start >= len(s): 981 raise ParseError('Starting position is beyond the end of the string',s,start) 982 pos = start 983 units, k = parse_token( s, pos ) 984 pos += k 985 if valid_units and units not in valid_units: 986 raise ParseError('Unsupported units type in range specifier',s,start) 987 while pos < len(s) and s[pos] in LWS: 988 pos += 1 989 if pos < len(s) and s[pos] == '=': 990 pos += 1 991 else: 992 raise ParseError("Invalid range specifier, expected '='",s,pos) 993 while pos < len(s) and s[pos] in LWS: 994 pos += 1 995 range_specs, k = parse_comma_list( s, pos, parse_range_spec, min_count=1 ) 996 pos += k 997 # Make sure no trash is at the end of the string 998 while pos < len(s) and s[pos] in LWS: 999 pos += 1 1000 if pos < len(s): 1001 raise ParseError('Unparsable characters in range set specifier',s,pos) 1002 1003 ranges = range_set() 1004 ranges.units = units 1005 ranges.range_specs = range_specs 1006 return ranges, pos-start
Parses a (byte) range set specifier.
Returns a tuple (range_set, chars_consumed).
1043def parse_qvalue_accept_list( s, start=0, item_parser=parse_token ): 1044 """Parses any of the Accept-* style headers with quality factors. 1045 1046 This is a low-level function. It returns a list of tuples, each like: 1047 (item, item_parms, qvalue, accept_parms) 1048 1049 You can pass in a function which parses each of the item strings, or 1050 accept the default where the items must be simple tokens. Note that 1051 your parser should not consume any paramters (past the special "q" 1052 paramter anyway). 1053 1054 The item_parms and accept_parms are each lists of (name,value) tuples. 1055 1056 The qvalue is the quality factor, a number from 0 to 1 inclusive. 1057 1058 """ 1059 itemlist = [] 1060 pos = start 1061 if pos >= len(s): 1062 raise ParseError('Starting position is beyond the end of the string',s,pos) 1063 item = None 1064 while pos < len(s): 1065 item, k = item_parser(s, pos) 1066 pos += k 1067 while pos < len(s) and s[pos] in LWS: 1068 pos += 1 1069 if pos >= len(s) or s[pos] in ',;': 1070 itemparms, qvalue, acptparms = [], None, [] 1071 if pos < len(s) and s[pos] == ';': 1072 pos += 1 1073 while pos < len(s) and s[pos] in LWS: 1074 pos += 1 1075 parmlist, k = parse_parameter_list(s, pos) 1076 for p, v in parmlist: 1077 if p == 'q' and qvalue is None: 1078 try: 1079 qvalue = float(v) 1080 except ValueError: 1081 raise ParseError('qvalue must be a floating point number',s,pos) 1082 if qvalue < 0 or qvalue > 1: 1083 raise ParseError('qvalue must be between 0 and 1, inclusive',s,pos) 1084 elif qvalue is None: 1085 itemparms.append( (p,v) ) 1086 else: 1087 acptparms.append( (p,v) ) 1088 pos += k 1089 if item: 1090 # Add the item to the list 1091 if qvalue is None: 1092 qvalue = 1 1093 itemlist.append( (item, itemparms, qvalue, acptparms) ) 1094 item = None 1095 # skip commas 1096 while pos < len(s) and s[pos] == ',': 1097 pos += 1 1098 while pos < len(s) and s[pos] in LWS: 1099 pos += 1 1100 else: 1101 break 1102 return itemlist, pos - start
Parses any of the Accept-* style headers with quality factors.
This is a low-level function. It returns a list of tuples, each like: (item, item_parms, qvalue, accept_parms)
You can pass in a function which parses each of the item strings, or accept the default where the items must be simple tokens. Note that your parser should not consume any paramters (past the special "q" paramter anyway).
The item_parms and accept_parms are each lists of (name,value) tuples.
The qvalue is the quality factor, a number from 0 to 1 inclusive.
1105def parse_accept_header( header_value ): 1106 """Parses the Accept: header. 1107 1108 The value of the header as a string should be passed in; without 1109 the header name itself. 1110 1111 This will parse the value of any of the HTTP headers "Accept", 1112 "Accept-Charset", "Accept-Encoding", or "Accept-Language". These 1113 headers are similarly formatted, in that they are a list of items 1114 with associated quality factors. The quality factor, or qvalue, 1115 is a number in the range [0.0..1.0] which indicates the relative 1116 preference of each item. 1117 1118 This function returns a list of those items, sorted by preference 1119 (from most-prefered to least-prefered). Each item in the returned 1120 list is actually a tuple consisting of: 1121 1122 ( item_name, item_parms, qvalue, accept_parms ) 1123 1124 As an example, the following string, 1125 text/plain; charset="utf-8"; q=.5; columns=80 1126 would be parsed into this resulting tuple, 1127 ( 'text/plain', [('charset','utf-8')], 0.5, [('columns','80')] ) 1128 1129 The value of the returned item_name depends upon which header is 1130 being parsed, but for example it may be a MIME content or media 1131 type (without parameters), a language tag, or so on. Any optional 1132 parameters (delimited by semicolons) occuring before the "q=" 1133 attribute will be in the item_parms list as (attribute,value) 1134 tuples in the same order as they appear in the header. Any quoted 1135 values will have been unquoted and unescaped. 1136 1137 The qvalue is a floating point number in the inclusive range 0.0 1138 to 1.0, and roughly indicates the preference for this item. 1139 Values outside this range will be capped to the closest extreme. 1140 1141 (!) Note that a qvalue of 0 indicates that the item is 1142 explicitly NOT acceptable to the user agent, and should be 1143 handled differently by the caller. 1144 1145 The accept_parms, like the item_parms, is a list of any attributes 1146 occuring after the "q=" attribute, and will be in the list as 1147 (attribute,value) tuples in the same order as they occur. 1148 Usually accept_parms will be an empty list, as the HTTP spec 1149 allows these extra parameters in the syntax but does not 1150 currently define any possible values. 1151 1152 All empty items will be removed from the list. However, duplicate 1153 or conflicting values are not detected or handled in any way by 1154 this function. 1155 """ 1156 def parse_mt_only(s, start): 1157 mt, k = parse_media_type(s, start, with_parameters=False) 1158 ct = content_type() 1159 ct.major = mt[0] 1160 ct.minor = mt[1] 1161 return ct, k 1162 1163 alist, k = parse_qvalue_accept_list( header_value, item_parser=parse_mt_only ) 1164 if k < len(header_value): 1165 raise ParseError('Accept header is invalid',header_value,k) 1166 1167 ctlist = [] 1168 for ct, ctparms, q, acptparms in alist: 1169 if ctparms: 1170 ct.set_parameters( dict(ctparms) ) 1171 ctlist.append( (ct, q, acptparms) ) 1172 return ctlist
Parses the Accept: header.
The value of the header as a string should be passed in; without the header name itself.
This will parse the value of any of the HTTP headers "Accept", "Accept-Charset", "Accept-Encoding", or "Accept-Language". These headers are similarly formatted, in that they are a list of items with associated quality factors. The quality factor, or qvalue, is a number in the range [0.0..1.0] which indicates the relative preference of each item.
This function returns a list of those items, sorted by preference (from most-prefered to least-prefered). Each item in the returned list is actually a tuple consisting of:
( item_name, item_parms, qvalue, accept_parms )
As an example, the following string, text/plain; charset="utf-8"; q=.5; columns=80 would be parsed into this resulting tuple, ( 'text/plain', [('charset','utf-8')], 0.5, [('columns','80')] )
The value of the returned item_name depends upon which header is being parsed, but for example it may be a MIME content or media type (without parameters), a language tag, or so on. Any optional parameters (delimited by semicolons) occuring before the "q=" attribute will be in the item_parms list as (attribute,value) tuples in the same order as they appear in the header. Any quoted values will have been unquoted and unescaped.
The qvalue is a floating point number in the inclusive range 0.0 to 1.0, and roughly indicates the preference for this item. Values outside this range will be capped to the closest extreme.
(!) Note that a qvalue of 0 indicates that the item is
explicitly NOT acceptable to the user agent, and should be
handled differently by the caller.
The accept_parms, like the item_parms, is a list of any attributes occuring after the "q=" attribute, and will be in the list as (attribute,value) tuples in the same order as they occur. Usually accept_parms will be an empty list, as the HTTP spec allows these extra parameters in the syntax but does not currently define any possible values.
All empty items will be removed from the list. However, duplicate or conflicting values are not detected or handled in any way by this function.
1175def parse_media_type(media_type, start=0, with_parameters=True): 1176 """Parses a media type (MIME type) designator into it's parts. 1177 1178 Given a media type string, returns a nested tuple of it's parts. 1179 1180 ((major,minor,parmlist), chars_consumed) 1181 1182 where parmlist is a list of tuples of (parm_name, parm_value). 1183 Quoted-values are appropriately unquoted and unescaped. 1184 1185 If 'with_parameters' is False, then parsing will stop immediately 1186 after the minor media type; and will not proceed to parse any 1187 of the semicolon-separated paramters. 1188 1189 Examples: 1190 image/png -> (('image','png',[]), 9) 1191 text/plain; charset="utf-16be" 1192 -> (('text','plain',[('charset,'utf-16be')]), 30) 1193 1194 """ 1195 1196 s = media_type 1197 pos = start 1198 ctmaj, k = parse_token(s, pos) 1199 if k == 0: 1200 raise ParseError('Media type must be of the form "major/minor".', s, pos) 1201 pos += k 1202 if pos >= len(s) or s[pos] != '/': 1203 raise ParseError('Media type must be of the form "major/minor".', s, pos) 1204 pos += 1 1205 ctmin, k = parse_token(s, pos) 1206 if k == 0: 1207 raise ParseError('Media type must be of the form "major/minor".', s, pos) 1208 pos += k 1209 if with_parameters: 1210 parmlist, k = parse_parameter_list(s, pos) 1211 pos += k 1212 else: 1213 parmlist = [] 1214 return ((ctmaj, ctmin, parmlist), pos - start)
Parses a media type (MIME type) designator into it's parts.
Given a media type string, returns a nested tuple of it's parts.
((major,minor,parmlist), chars_consumed)
where parmlist is a list of tuples of (parm_name, parm_value). Quoted-values are appropriately unquoted and unescaped.
If 'with_parameters' is False, then parsing will stop immediately after the minor media type; and will not proceed to parse any of the semicolon-separated paramters.
Examples: image/png -> (('image','png',[]), 9) text/plain; charset="utf-16be" -> (('text','plain',[('charset,'utf-16be')]), 30)
1217def parse_parameter_list(s, start=0): 1218 """Parses a semicolon-separated 'parameter=value' list. 1219 1220 Returns a tuple (parmlist, chars_consumed), where parmlist 1221 is a list of tuples (parm_name, parm_value). 1222 1223 The parameter values will be unquoted and unescaped as needed. 1224 1225 Empty parameters (as in ";;") are skipped, as is insignificant 1226 white space. The list returned is kept in the same order as the 1227 parameters appear in the string. 1228 1229 """ 1230 pos = start 1231 parmlist = [] 1232 while pos < len(s): 1233 while pos < len(s) and s[pos] in LWS: 1234 pos += 1 # skip whitespace 1235 if pos < len(s) and s[pos] == ';': 1236 pos += 1 1237 while pos < len(s) and s[pos] in LWS: 1238 pos += 1 # skip whitespace 1239 if pos >= len(s): 1240 break 1241 parmname, k = parse_token(s, pos) 1242 if parmname: 1243 pos += k 1244 while pos < len(s) and s[pos] in LWS: 1245 pos += 1 # skip whitespace 1246 if not (pos < len(s) and s[pos] == '='): 1247 raise ParseError('Expected an "=" after parameter name', s, pos) 1248 pos += 1 1249 while pos < len(s) and s[pos] in LWS: 1250 pos += 1 # skip whitespace 1251 parmval, k = parse_token_or_quoted_string( s, pos ) 1252 pos += k 1253 parmlist.append( (parmname, parmval) ) 1254 else: 1255 break 1256 return parmlist, pos - start
Parses a semicolon-separated 'parameter=value' list.
Returns a tuple (parmlist, chars_consumed), where parmlist is a list of tuples (parm_name, parm_value).
The parameter values will be unquoted and unescaped as needed.
Empty parameters (as in ";;") are skipped, as is insignificant white space. The list returned is kept in the same order as the parameters appear in the string.
1259class content_type(object): 1260 """This class represents a media type (aka a MIME content type), including parameters. 1261 1262 You initialize these by passing in a content-type declaration 1263 string, such as "text/plain; charset=ascii", to the constructor or 1264 to the set() method. If you provide no string value, the object 1265 returned will represent the wildcard */* content type. 1266 1267 Normally you will get the value back by using str(), or optionally 1268 you can access the components via the 'major', 'minor', 'media_type', 1269 or 'parmdict' members. 1270 1271 """ 1272 def __init__(self, content_type_string=None, with_parameters=True): 1273 """Create a new content_type object. 1274 1275 See the set() method for a description of the arguments. 1276 """ 1277 if content_type_string: 1278 self.set( content_type_string, with_parameters=with_parameters ) 1279 else: 1280 self.set( '*/*' ) 1281 1282 def set_parameters(self, parameter_list_or_dict): 1283 """Sets the optional paramters based upon the parameter list. 1284 1285 The paramter list should be a semicolon-separated name=value string. 1286 Any paramters which already exist on this object will be deleted, 1287 unless they appear in the given paramter_list. 1288 1289 """ 1290 if isinstance(parameter_list_or_dict, dict): 1291 # already a dictionary 1292 pl = parameter_list_or_dict 1293 else: 1294 pl, k = parse_parameter_list(parameter_list_or_dict) 1295 if k < len(parameter_list_or_dict): 1296 raise ParseError('Invalid parameter list', parameter_list_or_dict, k) 1297 self.parmdict = dict(pl) 1298 1299 def set(self, content_type_string, with_parameters=True): 1300 """Parses the content type string and sets this object to it's value. 1301 1302 For a more complete description of the arguments, see the 1303 documentation for the parse_media_type() function in this module. 1304 """ 1305 mt, k = parse_media_type( content_type_string, with_parameters=with_parameters ) 1306 if k < len(content_type_string): 1307 raise ParseError('Not a valid content type',content_type_string, k) 1308 major, minor, pdict = mt 1309 self._set_major( major ) 1310 self._set_minor( minor ) 1311 self.parmdict = dict(pdict) 1312 1313 def _get_major(self): 1314 return self._major 1315 def _set_major(self, s): 1316 s = s.lower() # case-insentive 1317 if not is_token(s): 1318 raise ValueError('Major media type contains an invalid character') 1319 self._major = s 1320 1321 def _get_minor(self): 1322 return self._minor 1323 def _set_minor(self, s): 1324 s = s.lower() # case-insentive 1325 if not is_token(s): 1326 raise ValueError('Minor media type contains an invalid character') 1327 self._minor = s 1328 1329 major = property(_get_major, _set_major, doc="Major media classification") 1330 minor = property(_get_minor, _set_minor, doc="Minor media sub-classification") 1331 1332 def __str__(self): 1333 """String value.""" 1334 s = '%s/%s' % (self.major, self.minor) 1335 if self.parmdict: 1336 extra = '; '.join([ '%s=%s' % (a[0],quote_string(a[1],False)) for a in self.parmdict.items()]) 1337 s += '; ' + extra 1338 return s 1339 1340 def __unicode__(self): 1341 """Unicode string value.""" 1342 # In Python 3 this is probably unnecessary in general, this is just to avoid possible syntax issues. I.H. 1343 return str(self.__str__()) 1344 1345 def __repr__(self): 1346 """Python representation of this object.""" 1347 s = '%s(%s)' % (self.__class__.__name__, repr(self.__str__())) 1348 return s 1349 1350 1351 def __hash__(self): 1352 """Hash this object; the hash is dependent only upon the value.""" 1353 return hash(str(self)) 1354 1355 def __getstate__(self): 1356 """Pickler""" 1357 return str(self) 1358 1359 def __setstate__(self, state): 1360 """Unpickler""" 1361 self.set(state) 1362 1363 def __len__(self): 1364 """Logical length of this media type. 1365 For example: 1366 len('*/*') -> 0 1367 len('image/*') -> 1 1368 len('image/png') -> 2 1369 len('text/plain; charset=utf-8') -> 3 1370 len('text/plain; charset=utf-8; filename=xyz.txt') -> 4 1371 1372 """ 1373 if self.major == '*': 1374 return 0 1375 elif self.minor == '*': 1376 return 1 1377 else: 1378 return 2 + len(self.parmdict) 1379 1380 def __eq__(self, other): 1381 """Equality test. 1382 1383 Note that this is an exact match, including any parameters if any. 1384 """ 1385 return self.major == other.major and \ 1386 self.minor == other.minor and \ 1387 self.parmdict == other.parmdict 1388 1389 def __ne__(self, other): 1390 """Inequality test.""" 1391 return not self.__eq__(other) 1392 1393 def _get_media_type(self): 1394 """Returns the media 'type/subtype' string, without parameters.""" 1395 return '%s/%s' % (self.major, self.minor) 1396 1397 media_type = property(_get_media_type, doc="Returns the just the media type 'type/subtype' without any paramters (read-only).") 1398 1399 def is_wildcard(self): 1400 """Returns True if this is a 'something/*' media type. 1401 """ 1402 return self.minor == '*' 1403 1404 def is_universal_wildcard(self): 1405 """Returns True if this is the unspecified '*/*' media type. 1406 """ 1407 return self.major == '*' and self.minor == '*' 1408 1409 def is_composite(self): 1410 """Is this media type composed of multiple parts. 1411 """ 1412 return self.major == 'multipart' or self.major == 'message' 1413 1414 def is_xml(self): 1415 """Returns True if this media type is XML-based. 1416 1417 Note this does not consider text/html to be XML, but 1418 application/xhtml+xml is. 1419 """ 1420 return self.minor == 'xml' or self.minor.endswith('+xml')
This class represents a media type (aka a MIME content type), including parameters.
You initialize these by passing in a content-type declaration string, such as "text/plain; charset=ascii", to the constructor or to the set() method. If you provide no string value, the object returned will represent the wildcard / content type.
Normally you will get the value back by using str(), or optionally you can access the components via the 'major', 'minor', 'media_type', or 'parmdict' members.
1272 def __init__(self, content_type_string=None, with_parameters=True): 1273 """Create a new content_type object. 1274 1275 See the set() method for a description of the arguments. 1276 """ 1277 if content_type_string: 1278 self.set( content_type_string, with_parameters=with_parameters ) 1279 else: 1280 self.set( '*/*' )
Create a new content_type object.
See the set() method for a description of the arguments.
1282 def set_parameters(self, parameter_list_or_dict): 1283 """Sets the optional paramters based upon the parameter list. 1284 1285 The paramter list should be a semicolon-separated name=value string. 1286 Any paramters which already exist on this object will be deleted, 1287 unless they appear in the given paramter_list. 1288 1289 """ 1290 if isinstance(parameter_list_or_dict, dict): 1291 # already a dictionary 1292 pl = parameter_list_or_dict 1293 else: 1294 pl, k = parse_parameter_list(parameter_list_or_dict) 1295 if k < len(parameter_list_or_dict): 1296 raise ParseError('Invalid parameter list', parameter_list_or_dict, k) 1297 self.parmdict = dict(pl)
Sets the optional paramters based upon the parameter list.
The paramter list should be a semicolon-separated name=value string. Any paramters which already exist on this object will be deleted, unless they appear in the given paramter_list.
1299 def set(self, content_type_string, with_parameters=True): 1300 """Parses the content type string and sets this object to it's value. 1301 1302 For a more complete description of the arguments, see the 1303 documentation for the parse_media_type() function in this module. 1304 """ 1305 mt, k = parse_media_type( content_type_string, with_parameters=with_parameters ) 1306 if k < len(content_type_string): 1307 raise ParseError('Not a valid content type',content_type_string, k) 1308 major, minor, pdict = mt 1309 self._set_major( major ) 1310 self._set_minor( minor ) 1311 self.parmdict = dict(pdict)
Parses the content type string and sets this object to it's value.
For a more complete description of the arguments, see the documentation for the parse_media_type() function in this module.
1393 def _get_media_type(self): 1394 """Returns the media 'type/subtype' string, without parameters.""" 1395 return '%s/%s' % (self.major, self.minor)
Returns the media 'type/subtype' string, without parameters.
1399 def is_wildcard(self): 1400 """Returns True if this is a 'something/*' media type. 1401 """ 1402 return self.minor == '*'
Returns True if this is a 'something/*' media type.
1404 def is_universal_wildcard(self): 1405 """Returns True if this is the unspecified '*/*' media type. 1406 """ 1407 return self.major == '*' and self.minor == '*'
Returns True if this is the unspecified '/' media type.
1409 def is_composite(self): 1410 """Is this media type composed of multiple parts. 1411 """ 1412 return self.major == 'multipart' or self.major == 'message'
Is this media type composed of multiple parts.
1414 def is_xml(self): 1415 """Returns True if this media type is XML-based. 1416 1417 Note this does not consider text/html to be XML, but 1418 application/xhtml+xml is. 1419 """ 1420 return self.minor == 'xml' or self.minor.endswith('+xml')
Returns True if this media type is XML-based.
Note this does not consider text/html to be XML, but application/xhtml+xml is.
1431def acceptable_content_type( accept_header, content_types, ignore_wildcard=True ): 1432 """Determines if the given content type is acceptable to the user agent. 1433 1434 The accept_header should be the value present in the HTTP 1435 "Accept:" header. In mod_python this is typically obtained from 1436 the req.http_headers_in table; in WSGI it is environ["Accept"]; 1437 other web frameworks may provide other methods of obtaining it. 1438 1439 Optionally the accept_header parameter can be pre-parsed, as 1440 returned from the parse_accept_header() function in this module. 1441 1442 The content_types argument should either be a single MIME media 1443 type string, or a sequence of them. It represents the set of 1444 content types that the caller (server) is willing to send. 1445 Generally, the server content_types should not contain any 1446 wildcarded values. 1447 1448 This function determines which content type which is the most 1449 preferred and is acceptable to both the user agent and the server. 1450 If one is negotiated it will return a four-valued tuple like: 1451 1452 (server_content_type, ua_content_range, qvalue, accept_parms) 1453 1454 The first tuple value is one of the server's content_types, while 1455 the remaining tuple values descript which of the client's 1456 acceptable content_types was matched. In most cases accept_parms 1457 will be an empty list (see description of parse_accept_header() 1458 for more details). 1459 1460 If no content type could be negotiated, then this function will 1461 return None (and the caller should typically cause an HTTP 406 Not 1462 Acceptable as a response). 1463 1464 Note that the wildcarded content type "*/*" sent by the client 1465 will be ignored, since it is often incorrectly sent by web 1466 browsers that don't really mean it. To override this, call with 1467 ignore_wildcard=False. Partial wildcards such as "image/*" will 1468 always be processed, but be at a lower priority than a complete 1469 matching type. 1470 1471 See also: RFC 2616 section 14.1, and 1472 <http://www.iana.org/assignments/media-types/> 1473 1474 """ 1475 if _is_string(accept_header): 1476 accept_list = parse_accept_header(accept_header) 1477 else: 1478 accept_list = accept_header 1479 1480 if _is_string(content_types): 1481 content_types = [content_types] 1482 1483 server_ctlist = [content_type(ct) for ct in content_types] 1484 del ct 1485 1486 #print 'AC', repr(accept_list) 1487 #print 'SV', repr(server_ctlist) 1488 1489 best = None # (content_type, qvalue, accept_parms, matchlen) 1490 1491 for server_ct in server_ctlist: 1492 best_for_this = None 1493 for client_ct, qvalue, aargs in accept_list: 1494 if ignore_wildcard and client_ct.is_universal_wildcard(): 1495 continue # */* being ignored 1496 1497 matchlen = 0 # how specifically this one matches (0 is a non-match) 1498 if client_ct.is_universal_wildcard(): 1499 matchlen = 1 # */* is a 1 1500 elif client_ct.major == server_ct.major: 1501 if client_ct.minor == '*': # something/* is a 2 1502 matchlen = 2 1503 elif client_ct.minor == server_ct.minor: # something/something is a 3 1504 matchlen = 3 1505 # must make sure all the parms match too 1506 for pname, pval in client_ct.parmdict.items(): 1507 sval = server_ct.parmdict.get(pname) 1508 if pname == 'charset': 1509 # special case for charset to match aliases 1510 pval = canonical_charset(pval) 1511 sval = canonical_charset(sval) 1512 if sval == pval: 1513 matchlen = matchlen + 1 1514 else: 1515 matchlen = 0 1516 break 1517 else: 1518 matchlen = 0 1519 1520 #print 'S',server_ct,' C',client_ct,' M',matchlen,'Q',qvalue 1521 if matchlen > 0: 1522 if not best_for_this \ 1523 or matchlen > best_for_this[-1] \ 1524 or (matchlen == best_for_this[-1] and qvalue > best_for_this[2]): 1525 # This match is better 1526 best_for_this = (server_ct, client_ct, qvalue, aargs, matchlen) 1527 #print 'BEST2 NOW', repr(best_for_this) 1528 if not best or \ 1529 (best_for_this and best_for_this[2] > best[2]): 1530 best = best_for_this 1531 #print 'BEST NOW', repr(best) 1532 if not best or best[1] <= 0: 1533 return None 1534 return best[:-1]
Determines if the given content type is acceptable to the user agent.
The accept_header should be the value present in the HTTP "Accept:" header. In mod_python this is typically obtained from the req.http_headers_in table; in WSGI it is environ["Accept"]; other web frameworks may provide other methods of obtaining it.
Optionally the accept_header parameter can be pre-parsed, as returned from the parse_accept_header() function in this module.
The content_types argument should either be a single MIME media type string, or a sequence of them. It represents the set of content types that the caller (server) is willing to send. Generally, the server content_types should not contain any wildcarded values.
This function determines which content type which is the most preferred and is acceptable to both the user agent and the server. If one is negotiated it will return a four-valued tuple like:
(server_content_type, ua_content_range, qvalue, accept_parms)
The first tuple value is one of the server's content_types, while the remaining tuple values descript which of the client's acceptable content_types was matched. In most cases accept_parms will be an empty list (see description of parse_accept_header() for more details).
If no content type could be negotiated, then this function will return None (and the caller should typically cause an HTTP 406 Not Acceptable as a response).
Note that the wildcarded content type "/" sent by the client will be ignored, since it is often incorrectly sent by web browsers that don't really mean it. To override this, call with ignore_wildcard=False. Partial wildcards such as "image/*" will always be processed, but be at a lower priority than a complete matching type.
See also: RFC 2616 section 14.1, and http://www.iana.org/assignments/media-types/
1622def canonical_charset(charset): 1623 """Returns the canonical or preferred name of a charset. 1624 1625 Additional character sets can be recognized by this function by 1626 altering the character_set_aliases dictionary in this module. 1627 Charsets which are not recognized are simply converted to 1628 upper-case (as charset names are always case-insensitive). 1629 1630 See <http://www.iana.org/assignments/character-sets>. 1631 1632 """ 1633 # It would be nice to use Python's codecs modules for this, but 1634 # there is no fixed public interface to it's alias mappings. 1635 if not charset: 1636 return charset 1637 uc = charset.upper() 1638 uccon = character_set_aliases.get( uc, uc ) 1639 return uccon
Returns the canonical or preferred name of a charset.
Additional character sets can be recognized by this function by altering the character_set_aliases dictionary in this module. Charsets which are not recognized are simply converted to upper-case (as charset names are always case-insensitive).
1642def acceptable_charset(accept_charset_header, charsets, ignore_wildcard=True, default='ISO-8859-1'): 1643 """ 1644 Determines if the given charset is acceptable to the user agent. 1645 1646 The accept_charset_header should be the value present in the HTTP 1647 "Accept-Charset:" header. In mod_python this is typically 1648 obtained from the req.http_headers table; in WSGI it is 1649 environ["Accept-Charset"]; other web frameworks may provide other 1650 methods of obtaining it. 1651 1652 Optionally the accept_charset_header parameter can instead be the 1653 list returned from the parse_accept_header() function in this 1654 module. 1655 1656 The charsets argument should either be a charset identifier string, 1657 or a sequence of them. 1658 1659 This function returns the charset identifier string which is the 1660 most prefered and is acceptable to both the user agent and the 1661 caller. It will return the default value if no charset is negotiable. 1662 1663 Note that the wildcarded charset "*" will be ignored. To override 1664 this, call with ignore_wildcard=False. 1665 1666 See also: RFC 2616 section 14.2, and 1667 <http://www.iana.org/assignments/character-sets> 1668 1669 """ 1670 if default: 1671 default = canonical_charset(default) 1672 1673 if _is_string(accept_charset_header): 1674 accept_list = parse_accept_header(accept_charset_header) 1675 else: 1676 accept_list = accept_charset_header 1677 1678 if _is_string(charsets): 1679 charsets = [canonical_charset(charsets)] 1680 else: 1681 charsets = [canonical_charset(c) for c in charsets] 1682 1683 # Note per RFC that 'ISO-8859-1' is special, and is implictly in the 1684 # accept list with q=1; unless it is already in the list, or '*' is in the list. 1685 1686 best = None 1687 for c, qvalue, _junk in accept_list: 1688 if c == '*': 1689 default = None 1690 if ignore_wildcard: 1691 continue 1692 if not best or qvalue > best[1]: 1693 best = (c, qvalue) 1694 else: 1695 c = canonical_charset(c) 1696 for test_c in charsets: 1697 if c == default: 1698 default = None 1699 if c == test_c and (not best or best[0]=='*' or qvalue > best[1]): 1700 best = (c, qvalue) 1701 if default and default in [test_c.upper() for test_c in charsets]: 1702 best = (default, 1) 1703 if best[0] == '*': 1704 best = (charsets[0], best[1]) 1705 return best
Determines if the given charset is acceptable to the user agent.
The accept_charset_header should be the value present in the HTTP "Accept-Charset:" header. In mod_python this is typically obtained from the req.http_headers table; in WSGI it is environ["Accept-Charset"]; other web frameworks may provide other methods of obtaining it.
Optionally the accept_charset_header parameter can instead be the list returned from the parse_accept_header() function in this module.
The charsets argument should either be a charset identifier string, or a sequence of them.
This function returns the charset identifier string which is the most prefered and is acceptable to both the user agent and the caller. It will return the default value if no charset is negotiable.
Note that the wildcarded charset "*" will be ignored. To override this, call with ignore_wildcard=False.
See also: RFC 2616 section 14.2, and http://www.iana.org/assignments/character-sets
1709class language_tag(object): 1710 """This class represents an RFC 3066 language tag. 1711 1712 Initialize objects of this class with a single string representing 1713 the language tag, such as "en-US". 1714 1715 Case is insensitive. Wildcarded subtags are ignored or stripped as 1716 they have no significance, so that "en-*" is the same as "en". 1717 However the universal wildcard "*" language tag is kept as-is. 1718 1719 Note that although relational operators such as < are defined, 1720 they only form a partial order based upon specialization. 1721 1722 Thus for example, 1723 "en" <= "en-US" 1724 but, 1725 not "en" <= "de", and 1726 not "de" <= "en". 1727 1728 """ 1729 1730 def __init__(self, tagname): 1731 """Initialize objects of this class with a single string representing 1732 the language tag, such as "en-US". Case is insensitive. 1733 1734 """ 1735 1736 self.parts = tagname.lower().split('-') 1737 while len(self.parts) > 1 and self.parts[-1] == '*': 1738 del self.parts[-1] 1739 1740 def __len__(self): 1741 """Number of subtags in this tag.""" 1742 if len(self.parts) == 1 and self.parts[0] == '*': 1743 return 0 1744 return len(self.parts) 1745 1746 def __str__(self): 1747 """The standard string form of this language tag.""" 1748 a = [] 1749 if len(self.parts) >= 1: 1750 a.append(self.parts[0]) 1751 if len(self.parts) >= 2: 1752 if len(self.parts[1]) == 2: 1753 a.append( self.parts[1].upper() ) 1754 else: 1755 a.append( self.parts[1] ) 1756 a.extend( self.parts[2:] ) 1757 return '-'.join(a) 1758 1759 def __unicode__(self): 1760 """The unicode string form of this language tag.""" 1761 return str(self.__str__()) 1762 1763 def __repr__(self): 1764 """The python representation of this language tag.""" 1765 s = '%s("%s")' % (self.__class__.__name__, self.__str__()) 1766 return s 1767 1768 def superior(self): 1769 """Returns another instance of language_tag which is the superior. 1770 1771 Thus en-US gives en, and en gives *. 1772 1773 """ 1774 if len(self) <= 1: 1775 return self.__class__('*') 1776 return self.__class__( '-'.join(self.parts[:-1]) ) 1777 1778 def all_superiors(self, include_wildcard=False): 1779 """Returns a list of this language and all it's superiors. 1780 1781 If include_wildcard is False, then "*" will not be among the 1782 output list, unless this language is itself "*". 1783 1784 """ 1785 langlist = [ self ] 1786 l = self 1787 while not l.is_universal_wildcard(): 1788 l = l.superior() 1789 if l.is_universal_wildcard() and not include_wildcard: 1790 continue 1791 langlist.append(l) 1792 return langlist 1793 1794 def is_universal_wildcard(self): 1795 """Returns True if this language tag represents all possible 1796 languages, by using the reserved tag of "*". 1797 1798 """ 1799 return len(self.parts) == 1 and self.parts[0] == '*' 1800 1801 def dialect_of(self, other, ignore_wildcard=True): 1802 """Is this language a dialect (or subset/specialization) of another. 1803 1804 This method returns True if this language is the same as or a 1805 specialization (dialect) of the other language_tag. 1806 1807 If ignore_wildcard is False, then all languages will be 1808 considered to be a dialect of the special language tag of "*". 1809 1810 """ 1811 if not ignore_wildcard and self.is_universal_wildcard(): 1812 return True 1813 for i in range( min(len(self), len(other)) ): 1814 if self.parts[i] != other.parts[i]: 1815 return False 1816 if len(self) >= len(other): 1817 return True 1818 return False 1819 1820 def __eq__(self, other): 1821 """== operator. Are the two languages the same?""" 1822 1823 return self.parts == other.parts 1824 1825 def __neq__(self, other): 1826 """!= operator. Are the two languages different?""" 1827 1828 return not self.__eq__(other) 1829 1830 def __lt__(self, other): 1831 """< operator. Returns True if the other language is a more 1832 specialized dialect of this one.""" 1833 1834 return other.dialect_of(self) and self != other 1835 1836 def __le__(self, other): 1837 """<= operator. Returns True if the other language is the same 1838 as or a more specialized dialect of this one.""" 1839 return other.dialect_of(self) 1840 1841 def __gt__(self, other): 1842 """> operator. Returns True if this language is a more 1843 specialized dialect of the other one.""" 1844 1845 return self.dialect_of(other) and self != other 1846 1847 def __ge__(self, other): 1848 """>= operator. Returns True if this language is the same as 1849 or a more specialized dialect of the other one.""" 1850 1851 return self.dialect_of(other)
This class represents an RFC 3066 language tag.
Initialize objects of this class with a single string representing the language tag, such as "en-US".
Case is insensitive. Wildcarded subtags are ignored or stripped as they have no significance, so that "en-" is the same as "en". However the universal wildcard "" language tag is kept as-is.
Note that although relational operators such as < are defined, they only form a partial order based upon specialization.
Thus for example, "en" <= "en-US" but, not "en" <= "de", and not "de" <= "en".
1730 def __init__(self, tagname): 1731 """Initialize objects of this class with a single string representing 1732 the language tag, such as "en-US". Case is insensitive. 1733 1734 """ 1735 1736 self.parts = tagname.lower().split('-') 1737 while len(self.parts) > 1 and self.parts[-1] == '*': 1738 del self.parts[-1]
Initialize objects of this class with a single string representing the language tag, such as "en-US". Case is insensitive.
1768 def superior(self): 1769 """Returns another instance of language_tag which is the superior. 1770 1771 Thus en-US gives en, and en gives *. 1772 1773 """ 1774 if len(self) <= 1: 1775 return self.__class__('*') 1776 return self.__class__( '-'.join(self.parts[:-1]) )
Returns another instance of language_tag which is the superior.
Thus en-US gives en, and en gives *.
1778 def all_superiors(self, include_wildcard=False): 1779 """Returns a list of this language and all it's superiors. 1780 1781 If include_wildcard is False, then "*" will not be among the 1782 output list, unless this language is itself "*". 1783 1784 """ 1785 langlist = [ self ] 1786 l = self 1787 while not l.is_universal_wildcard(): 1788 l = l.superior() 1789 if l.is_universal_wildcard() and not include_wildcard: 1790 continue 1791 langlist.append(l) 1792 return langlist
Returns a list of this language and all it's superiors.
If include_wildcard is False, then "" will not be among the output list, unless this language is itself "".
1794 def is_universal_wildcard(self): 1795 """Returns True if this language tag represents all possible 1796 languages, by using the reserved tag of "*". 1797 1798 """ 1799 return len(self.parts) == 1 and self.parts[0] == '*'
Returns True if this language tag represents all possible languages, by using the reserved tag of "*".
1801 def dialect_of(self, other, ignore_wildcard=True): 1802 """Is this language a dialect (or subset/specialization) of another. 1803 1804 This method returns True if this language is the same as or a 1805 specialization (dialect) of the other language_tag. 1806 1807 If ignore_wildcard is False, then all languages will be 1808 considered to be a dialect of the special language tag of "*". 1809 1810 """ 1811 if not ignore_wildcard and self.is_universal_wildcard(): 1812 return True 1813 for i in range( min(len(self), len(other)) ): 1814 if self.parts[i] != other.parts[i]: 1815 return False 1816 if len(self) >= len(other): 1817 return True 1818 return False
Is this language a dialect (or subset/specialization) of another.
This method returns True if this language is the same as or a specialization (dialect) of the other language_tag.
If ignore_wildcard is False, then all languages will be considered to be a dialect of the special language tag of "*".
1854def parse_accept_language_header( header_value ): 1855 """Parses the Accept-Language header. 1856 1857 Returns a list of tuples, each like: 1858 1859 (language_tag, qvalue, accept_parameters) 1860 1861 """ 1862 alist, k = parse_qvalue_accept_list( header_value) 1863 if k < len(header_value): 1864 raise ParseError('Accept-Language header is invalid',header_value,k) 1865 1866 langlist = [] 1867 for token, langparms, q, acptparms in alist: 1868 if langparms: 1869 raise ParseError('Language tag may not have any parameters',header_value,0) 1870 lang = language_tag( token ) 1871 langlist.append( (lang, q, acptparms) ) 1872 1873 return langlist
Parses the Accept-Language header.
Returns a list of tuples, each like:
(language_tag, qvalue, accept_parameters)
1876def acceptable_language( accept_header, server_languages, ignore_wildcard=True, assume_superiors=True ): 1877 """Determines if the given language is acceptable to the user agent. 1878 1879 The accept_header should be the value present in the HTTP 1880 "Accept-Language:" header. In mod_python this is typically 1881 obtained from the req.http_headers_in table; in WSGI it is 1882 environ["Accept-Language"]; other web frameworks may provide other 1883 methods of obtaining it. 1884 1885 Optionally the accept_header parameter can be pre-parsed, as 1886 returned by the parse_accept_language_header() function defined in 1887 this module. 1888 1889 The server_languages argument should either be a single language 1890 string, a language_tag object, or a sequence of them. It 1891 represents the set of languages that the server is willing to 1892 send to the user agent. 1893 1894 Note that the wildcarded language tag "*" will be ignored. To 1895 override this, call with ignore_wildcard=False, and even then 1896 it will be the lowest-priority choice regardless of it's 1897 quality factor (as per HTTP spec). 1898 1899 If the assume_superiors is True then it the languages that the 1900 browser accepts will automatically include all superior languages. 1901 Any superior languages which must be added are done so with one 1902 half the qvalue of the language which is present. For example, if 1903 the accept string is "en-US", then it will be treated as if it 1904 were "en-US, en;q=0.5". Note that although the HTTP 1.1 spec says 1905 that browsers are supposed to encourage users to configure all 1906 acceptable languages, sometimes they don't, thus the ability 1907 for this function to assume this. But setting assume_superiors 1908 to False will insure strict adherence to the HTTP 1.1 spec; which 1909 means that if the browser accepts "en-US", then it will not 1910 be acceptable to send just "en" to it. 1911 1912 This function returns the language which is the most prefered and 1913 is acceptable to both the user agent and the caller. It will 1914 return None if no language is negotiable, otherwise the return 1915 value is always an instance of language_tag. 1916 1917 See also: RFC 3066 <http://www.ietf.org/rfc/rfc3066.txt>, and 1918 ISO 639, links at <http://en.wikipedia.org/wiki/ISO_639>, and 1919 <http://www.iana.org/assignments/language-tags>. 1920 1921 """ 1922 # Note special instructions from RFC 2616 sect. 14.1: 1923 # "The language quality factor assigned to a language-tag by the 1924 # Accept-Language field is the quality value of the longest 1925 # language- range in the field that matches the language-tag." 1926 1927 if _is_string(accept_header): 1928 accept_list = parse_accept_language_header(accept_header) 1929 else: 1930 accept_list = accept_header 1931 1932 # Possibly add in any "missing" languages that the browser may 1933 # have forgotten to include in the list. Insure list is sorted so 1934 # more general languages come before more specific ones. 1935 1936 accept_list.sort() 1937 all_tags = [a[0] for a in accept_list] 1938 if assume_superiors: 1939 to_add = [] 1940 for langtag, qvalue, _args in accept_list: 1941 if len(langtag) >= 2: 1942 for suptag in langtag.all_superiors( include_wildcard=False ): 1943 if suptag not in all_tags: 1944 # Add in superior at half the qvalue 1945 to_add.append( (suptag, qvalue / 2, '') ) 1946 all_tags.append( suptag ) 1947 accept_list.extend( to_add ) 1948 1949 # Convert server_languages to a list of language_tags 1950 if _is_string(server_languages): 1951 server_languages = [language_tag(server_languages)] 1952 elif isinstance(server_languages, language_tag): 1953 server_languages = [server_languages] 1954 else: 1955 server_languages = [language_tag(lang) for lang in server_languages] 1956 1957 # Select the best one 1958 best = None # tuple (langtag, qvalue, matchlen) 1959 1960 for langtag, qvalue, _args in accept_list: 1961 # aargs is ignored for Accept-Language 1962 if qvalue <= 0: 1963 continue # UA doesn't accept this language 1964 1965 if ignore_wildcard and langtag.is_universal_wildcard(): 1966 continue # "*" being ignored 1967 1968 for svrlang in server_languages: 1969 # The best match is determined first by the quality factor, 1970 # and then by the most specific match. 1971 1972 matchlen = -1 # how specifically this one matches (0 is a non-match) 1973 if svrlang.dialect_of( langtag, ignore_wildcard=ignore_wildcard ): 1974 matchlen = len(langtag) 1975 if not best \ 1976 or matchlen > best[2] \ 1977 or (matchlen == best[2] and qvalue > best[1]): 1978 # This match is better 1979 best = (langtag, qvalue, matchlen) 1980 if not best: 1981 return None 1982 return best[0]
Determines if the given language is acceptable to the user agent.
The accept_header should be the value present in the HTTP "Accept-Language:" header. In mod_python this is typically obtained from the req.http_headers_in table; in WSGI it is environ["Accept-Language"]; other web frameworks may provide other methods of obtaining it.
Optionally the accept_header parameter can be pre-parsed, as returned by the parse_accept_language_header() function defined in this module.
The server_languages argument should either be a single language string, a language_tag object, or a sequence of them. It represents the set of languages that the server is willing to send to the user agent.
Note that the wildcarded language tag "*" will be ignored. To override this, call with ignore_wildcard=False, and even then it will be the lowest-priority choice regardless of it's quality factor (as per HTTP spec).
If the assume_superiors is True then it the languages that the browser accepts will automatically include all superior languages. Any superior languages which must be added are done so with one half the qvalue of the language which is present. For example, if the accept string is "en-US", then it will be treated as if it were "en-US, en;q=0.5". Note that although the HTTP 1.1 spec says that browsers are supposed to encourage users to configure all acceptable languages, sometimes they don't, thus the ability for this function to assume this. But setting assume_superiors to False will insure strict adherence to the HTTP 1.1 spec; which means that if the browser accepts "en-US", then it will not be acceptable to send just "en" to it.
This function returns the language which is the most prefered and is acceptable to both the user agent and the caller. It will return None if no language is negotiable, otherwise the return value is always an instance of language_tag.
See also: RFC 3066 http://www.ietf.org/rfc/rfc3066.txt, and ISO 639, links at http://en.wikipedia.org/wiki/ISO_639, and http://www.iana.org/assignments/language-tags.