pyRdfa.extras.httpheader

Utility functions to work with HTTP headers.

This module provides some utility functions useful for parsing and dealing with some of the HTTP 1.1 protocol headers which are not adequately covered by the standard Python libraries.

Requires Python 2.2 or later.

The functionality includes the correct interpretation of the various Accept-* style headers, content negotiation, byte range requests, HTTP-style date/times, and more.

There are a few classes defined by this module:

  • class content_type -- media types such as 'text/plain'
  • class language_tag -- language tags such as 'en-US'
  • class range_set -- a collection of (byte) range specifiers
  • class range_spec -- a single (byte) range specifier

The primary functions in this module may be categorized as follows:

  • Content negotiation functions...

    • acceptable_content_type()
    • acceptable_language()
    • acceptable_charset()
    • acceptable_encoding()
  • Mid-level header parsing functions...

    • parse_accept_header()
    • parse_accept_language_header()
    • parse_range_header()
  • Date and time...

    • http_datetime()
    • parse_http_datetime()
  • Utility functions...

    • quote_string()
    • remove_comments()
    • canonical_charset()
  • Low level string parsing functions...

    • parse_comma_list()
    • parse_comment()
    • parse_qvalue_accept_list()
    • parse_media_type()
    • parse_number()
    • parse_parameter_list()
    • parse_quoted_string()
    • parse_range_set()
    • parse_range_spec()
    • parse_token()
    • parse_token_or_quoted_string()

And there are some specialized exception classes:

  • RangeUnsatisfiableError
  • RangeUnmergableError
  • ParseError

See also:

   1#!/usr/bin/env python
   2# -*- coding: utf-8 -*-
   3#
   4""" Utility functions to work with HTTP headers.
   5
   6 This module provides some utility functions useful for parsing
   7 and dealing with some of the HTTP 1.1 protocol headers which
   8 are not adequately covered by the standard Python libraries.
   9
  10 Requires Python 2.2 or later.
  11
  12 The functionality includes the correct interpretation of the various
  13 Accept-* style headers, content negotiation, byte range requests,
  14 HTTP-style date/times, and more.
  15
  16 There are a few classes defined by this module:
  17
  18   * class content_type   -- media types such as 'text/plain'
  19   * class language_tag   -- language tags such as 'en-US'
  20   * class range_set      -- a collection of (byte) range specifiers
  21   * class range_spec     -- a single (byte) range specifier
  22
  23 The primary functions in this module may be categorized as follows:
  24
  25   * Content negotiation functions...
  26     * acceptable_content_type()
  27     * acceptable_language()
  28     * acceptable_charset()
  29     * acceptable_encoding()
  30
  31   * Mid-level header parsing functions...
  32     * parse_accept_header()
  33     * parse_accept_language_header()
  34     * parse_range_header()
  35 
  36   * Date and time...
  37     * http_datetime()
  38     * parse_http_datetime()
  39
  40   * Utility functions...
  41     * quote_string()
  42     * remove_comments()
  43     * canonical_charset()
  44
  45   * Low level string parsing functions...
  46     * parse_comma_list()
  47     * parse_comment()
  48     * parse_qvalue_accept_list()
  49     * parse_media_type()
  50     * parse_number()
  51     * parse_parameter_list()
  52     * parse_quoted_string()
  53     * parse_range_set()
  54     * parse_range_spec()
  55     * parse_token()
  56     * parse_token_or_quoted_string()
  57
  58 And there are some specialized exception classes:
  59
  60   * RangeUnsatisfiableError
  61   * RangeUnmergableError
  62   * ParseError
  63
  64 See also:
  65
  66   * RFC 2616, "Hypertext Transfer Protocol -- HTTP/1.1", June 1999.
  67             <http://www.ietf.org/rfc/rfc2616.txt>
  68             Errata at <http://purl.org/NET/http-errata>
  69   * RFC 2046, "(MIME) Part Two: Media Types", November 1996.
  70             <http://www.ietf.org/rfc/rfc2046.txt>
  71   * RFC 3066, "Tags for the Identification of Languages", January 2001.
  72             <http://www.ietf.org/rfc/rfc3066.txt>
  73             
  74             
  75  Note: I have made a small modification on the regexp for internet date, 
  76  to make it more liberal (ie, accept a time zone string of the form +0000)
  77  Ivan Herman <http://www.ivan-herman.net>, March 2011.
  78  
  79  Have added statements to make it (hopefully) Python 3 compatible.
  80  Ivan Herman <http://www.ivan-herman.net>, August 2012.
  81"""
  82
  83__author__ =  "Deron Meranda <http://deron.meranda.us/>"
  84__date__ =    "2012-08-31"
  85__version__ = "1.02"
  86__credits__ = """Copyright (c) 2005 Deron E. Meranda <http://deron.meranda.us/>
  87Licensed under GNU LGPL 2.1 or later.  See <http://www.fsf.org/>.
  88
  89This library is free software; you can redistribute it and/or
  90modify it under the terms of the GNU Lesser General Public
  91License as published by the Free Software Foundation; either
  92version 2.1 of the License, or (at your option) any later version.
  93
  94This library is distributed in the hope that it will be useful,
  95but WITHOUT ANY WARRANTY; without even the implied warranty of
  96MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  97Lesser General Public License for more details.
  98
  99You should have received a copy of the GNU Lesser General Public
 100License along with this library; if not, write to the Free Software
 101Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
 102"""
 103
 104# Character classes from RFC 2616 section 2.2
 105SEPARATORS = '()<>@,;:\\"/[]?={} \t'
 106LWS =        ' \t\n\r'  # linear white space
 107CRLF =       '\r\n'
 108DIGIT =      '0123456789'
 109HEX =        '0123456789ABCDEFabcdef'
 110
 111try:
 112    # Turn character classes into set types (for Python 2.4 or greater)
 113    SEPARATORS = frozenset([c for c in SEPARATORS])
 114    LWS = frozenset([c for c in LWS])
 115    CRLF = frozenset([c for c in CRLF])
 116    DIGIT = frozenset([c for c in DIGIT])
 117    HEX = frozenset([c for c in HEX])
 118    del c
 119except NameError:
 120    # on frozenset error, leave as simple strings
 121    pass
 122
 123
 124def _is_string(obj):
 125    """Returns True if the object is a string."""
 126    return isinstance(obj,str)
 127
 128
 129def http_datetime(dt=None):
 130    """Formats a datetime as an HTTP 1.1 Date/Time string.
 131
 132    Takes a standard Python datetime object and returns a string
 133    formatted according to the HTTP 1.1 date/time format.
 134
 135    If no datetime is provided (or None) then the current
 136    time is used.
 137    
 138    ABOUT TIMEZONES: If the passed in datetime object is naive it is
 139    assumed to be in UTC already.  But if it has a tzinfo component,
 140    the returned timestamp string will have been converted to UTC
 141    automatically.  So if you use timezone-aware datetimes, you need
 142    not worry about conversion to UTC.
 143
 144    """
 145    if not dt:
 146        import datetime
 147        dt = datetime.datetime.utcnow()
 148    else:
 149        try:
 150            dt = dt - dt.utcoffset()
 151        except:
 152            pass  # no timezone offset, just assume already in UTC
 153
 154    s = dt.strftime('%a, %d %b %Y %H:%M:%S GMT')
 155    return s
 156
 157
 158def parse_http_datetime(datestring, utc_tzinfo=None, strict=False):
 159    """Returns a datetime object from an HTTP 1.1 Date/Time string.
 160
 161    Note that HTTP dates are always in UTC, so the returned datetime
 162    object will also be in UTC.
 163
 164    You can optionally pass in a tzinfo object which should represent
 165    the UTC timezone, and the returned datetime will then be
 166    timezone-aware (allowing you to more easly translate it into
 167    different timzeones later).
 168
 169    If you set 'strict' to True, then only the RFC 1123 format
 170    is recognized.  Otherwise the backwards-compatible RFC 1036
 171    and Unix asctime(3) formats are also recognized.
 172    
 173    Please note that the day-of-the-week is not validated.
 174    Also two-digit years, although not HTTP 1.1 compliant, are
 175    treated according to recommended Y2K rules.
 176
 177    """
 178    import re, datetime
 179    m = re.match(r'(?P<DOW>[a-z]+), (?P<D>\d+) (?P<MON>[a-z]+) (?P<Y>\d+) (?P<H>\d+):(?P<M>\d+):(?P<S>\d+(\.\d+)?) (?P<TZ>[a-zA-Z0-9_+]+)$',
 180                 datestring, re.IGNORECASE)
 181    if not m and not strict:
 182        m = re.match(r'(?P<DOW>[a-z]+) (?P<MON>[a-z]+) (?P<D>\d+) (?P<H>\d+):(?P<M>\d+):(?P<S>\d+) (?P<Y>\d+)$',
 183                     datestring, re.IGNORECASE)
 184        if not m:
 185            m = re.match(r'(?P<DOW>[a-z]+), (?P<D>\d+)-(?P<MON>[a-z]+)-(?P<Y>\d+) (?P<H>\d+):(?P<M>\d+):(?P<S>\d+(\.\d+)?) (?P<TZ>\w+)$',
 186                         datestring, re.IGNORECASE)
 187    if not m:
 188        raise ValueError('HTTP date is not correctly formatted')
 189
 190    try:
 191        tz = m.group('TZ').upper()
 192    except:
 193        tz = 'GMT'
 194    if tz not in ('GMT','UTC','0000','00:00'):
 195        raise ValueError('HTTP date is not in GMT timezone')
 196
 197    monname = m.group('MON').upper()
 198    mdict = {'JAN':1, 'FEB':2, 'MAR':3, 'APR':4, 'MAY':5, 'JUN':6,
 199             'JUL':7, 'AUG':8, 'SEP':9, 'OCT':10, 'NOV':11, 'DEC':12}
 200    month = mdict.get(monname)
 201    if not month:
 202        raise ValueError('HTTP date has an unrecognizable month')
 203    y = int(m.group('Y'))
 204    if y < 100:
 205        century = datetime.datetime.utcnow().year / 100
 206        if y < 50:
 207            y = century * 100 + y
 208        else:
 209            y = (century - 1) * 100 + y
 210    d = int(m.group('D'))
 211    hour = int(m.group('H'))
 212    minute = int(m.group('M'))
 213    try:
 214        second = int(m.group('S'))
 215    except:
 216        second = float(m.group('S'))
 217    dt = datetime.datetime( y, month, d, hour, minute, second, tzinfo=utc_tzinfo )
 218    return dt
 219
 220
 221class RangeUnsatisfiableError(ValueError):
 222    """Exception class when a byte range lies outside the file size boundaries."""
 223    def __init__(self, reason=None):
 224        if not reason:
 225            reason = 'Range is unsatisfiable'
 226        ValueError.__init__(self, reason)
 227
 228
 229class RangeUnmergableError(ValueError):
 230    """Exception class when byte ranges are noncontiguous and can not be merged together."""
 231    def __init__(self, reason=None):
 232        if not reason:
 233            reason = 'Ranges can not be merged together'
 234        ValueError.__init__(self, reason)
 235
 236
 237class ParseError(ValueError):
 238    """Exception class representing a string parsing error."""
 239    def __init__(self, args, input_string, at_position):
 240        ValueError.__init__(self, args)
 241        self.input_string = input_string
 242        self.at_position = at_position
 243    def __str__(self):
 244        if self.at_position >= len(self.input_string):
 245            return '%s\n\tOccured at end of string' % self.args[0]
 246        else:
 247            return '%s\n\tOccured near %s' % (self.args[0], repr(self.input_string[self.at_position:self.at_position+16]))
 248
 249
 250def is_token(s):
 251    """Determines if the string is a valid token."""
 252    for c in s:
 253        if ord(c) < 32 or ord(c) > 128 or c in SEPARATORS:
 254            return False
 255    return True
 256
 257
 258def parse_comma_list(s, start=0, element_parser=None, min_count=0, max_count=0):
 259    """Parses a comma-separated list with optional whitespace.
 260
 261    Takes an optional callback function `element_parser`, which
 262    is assumed to be able to parse an individual element.  It
 263    will be passed the string and a `start` argument, and
 264    is expected to return a tuple (parsed_result, chars_consumed).
 265
 266    If no element_parser is given, then either single tokens or
 267    quoted strings will be parsed.
 268
 269    If min_count > 0, then at least that many non-empty elements
 270    must be in the list, or an error is raised.
 271
 272    If max_count > 0, then no more than that many non-empty elements
 273    may be in the list, or an error is raised.
 274
 275    """
 276    if min_count > 0 and start == len(s):
 277        raise ParseError('Comma-separated list must contain some elements',s,start)
 278    elif start >= len(s):
 279        raise ParseError('Starting position is beyond the end of the string',s,start)
 280
 281    if not element_parser:
 282        element_parser = parse_token_or_quoted_string
 283    results = []
 284    pos = start
 285    while pos < len(s):
 286        e = element_parser( s, pos )
 287        if not e or e[1] == 0:
 288            break # end of data?
 289        else:
 290            results.append( e[0] )
 291            pos += e[1]
 292        while pos < len(s) and s[pos] in LWS:
 293            pos += 1
 294        if pos < len(s) and s[pos] != ',':
 295            break
 296        while pos < len(s) and s[pos] == ',':
 297            # skip comma and any "empty" elements
 298            pos += 1  # skip comma
 299            while pos < len(s) and s[pos] in LWS:
 300                pos += 1
 301    if len(results) < min_count:
 302        raise ParseError('Comma-separated list does not have enough elements',s,pos)
 303    elif max_count and len(results) > max_count:
 304        raise ParseError('Comma-separated list has too many elements',s,pos)
 305    return (results, pos-start)
 306
 307
 308def parse_token(s, start=0):
 309    """Parses a token.
 310
 311    A token is a string defined by RFC 2616 section 2.2 as:
 312       token = 1*<any CHAR except CTLs or separators>
 313
 314    Returns a tuple (token, chars_consumed), or ('',0) if no token
 315    starts at the given string position.  On a syntax error, a
 316    ParseError exception will be raised.
 317
 318    """
 319    return parse_token_or_quoted_string(s, start, allow_quoted=False, allow_token=True)
 320
 321
 322def quote_string(s, always_quote=True):
 323    """Produces a quoted string according to HTTP 1.1 rules.
 324
 325    If always_quote is False and if the string is also a valid token,
 326    then this function may return a string without quotes.
 327
 328    """
 329    need_quotes = False
 330    q = ''
 331    for c in s:
 332        if ord(c) < 32 or ord(c) > 127 or c in SEPARATORS:
 333            q += '\\' + c
 334            need_quotes = True
 335        else:
 336            q += c
 337    if need_quotes or always_quote:
 338        return '"' + q + '"'
 339    else:
 340        return q
 341
 342
 343def parse_quoted_string(s, start=0):
 344    """Parses a quoted string.
 345
 346    Returns a tuple (string, chars_consumed).  The quote marks will
 347    have been removed and all \-escapes will have been replaced with
 348    the characters they represent.
 349
 350    """
 351    return parse_token_or_quoted_string(s, start, allow_quoted=True, allow_token=False)
 352
 353
 354def parse_token_or_quoted_string(s, start=0, allow_quoted=True, allow_token=True):
 355    """Parses a token or a quoted-string.
 356
 357    's' is the string to parse, while start is the position within the
 358    string where parsing should begin.  It will returns a tuple
 359    (token, chars_consumed), with all \-escapes and quotation already
 360    processed.
 361
 362    Syntax is according to BNF rules in RFC 2161 section 2.2,
 363    specifically the 'token' and 'quoted-string' declarations.
 364    Syntax errors in the input string will result in ParseError
 365    being raised.
 366
 367    If allow_quoted is False, then only tokens will be parsed instead
 368    of either a token or quoted-string.
 369
 370    If allow_token is False, then only quoted-strings will be parsed
 371    instead of either a token or quoted-string.
 372    """
 373    if not allow_quoted and not allow_token:
 374        raise ValueError('Parsing can not continue with options provided')
 375
 376    if start >= len(s):
 377        raise ParseError('Starting position is beyond the end of the string',s,start)
 378    has_quote = (s[start] == '"')
 379    if has_quote and not allow_quoted:
 380        raise ParseError('A quoted string was not expected', s, start)
 381    if not has_quote and not allow_token:
 382        raise ParseError('Expected a quotation mark', s, start)
 383
 384    s2 = ''
 385    pos = start
 386    if has_quote:
 387        pos += 1
 388    while pos < len(s):
 389        c = s[pos]
 390        if c == '\\' and has_quote:
 391            # Note this is NOT C-style escaping; the character after the \ is
 392            # taken literally.
 393            pos += 1
 394            if pos == len(s):
 395                raise ParseError("End of string while expecting a character after '\\'",s,pos)
 396            s2 += s[pos]
 397            pos += 1
 398        elif c == '"' and has_quote:
 399            break
 400        elif not has_quote and (c in SEPARATORS or ord(c)<32 or ord(c)>127):
 401            break
 402        else:
 403            s2 += c
 404            pos += 1
 405    if has_quote:
 406        # Make sure we have a closing quote mark
 407        if pos >= len(s) or s[pos] != '"':
 408            raise ParseError('Quoted string is missing closing quote mark',s,pos)
 409        else:
 410            pos += 1
 411    return s2, (pos - start)
 412
 413
 414def remove_comments(s, collapse_spaces=True):
 415    """Removes any ()-style comments from a string.
 416
 417    In HTTP, ()-comments can nest, and this function will correctly
 418    deal with that.
 419
 420    If 'collapse_spaces' is True, then if there is any whitespace
 421    surrounding the comment, it will be replaced with a single space
 422    character.  Whitespace also collapses across multiple comment
 423    sequences, so that "a (b) (c) d" becomes just "a d".
 424
 425    Otherwise, if 'collapse_spaces' is False then all whitespace which
 426    is outside any comments is left intact as-is.
 427
 428    """
 429    if '(' not in s:
 430        return s  # simple case
 431    A = []
 432    dostrip = False
 433    added_comment_space = False
 434    pos = 0
 435    if collapse_spaces:
 436        # eat any leading spaces before a comment
 437        i = s.find('(')
 438        if i >= 0:
 439            while pos < i and s[pos] in LWS:
 440                pos += 1
 441            if pos != i:
 442                pos = 0
 443            else:
 444                dostrip = True
 445                added_comment_space = True  # lie
 446    while pos < len(s):
 447        if s[pos] == '(':
 448            _cmt, k = parse_comment( s, pos )
 449            pos += k
 450            if collapse_spaces:
 451                dostrip = True
 452                if not added_comment_space:
 453                    if len(A) > 0 and A[-1] and A[-1][-1] in LWS:
 454                        # previous part ended with whitespace
 455                        A[-1] = A[-1].rstrip()
 456                        A.append(' ')  # comment becomes one space
 457                        added_comment_space = True
 458        else:
 459            i = s.find( '(', pos )
 460            if i == -1:
 461                if dostrip:
 462                    text = s[pos:].lstrip()
 463                    if s[pos] in LWS and not added_comment_space:
 464                        A.append(' ')
 465                        added_comment_space = True
 466                else:
 467                    text = s[pos:]
 468                if text:
 469                    A.append(text)
 470                    dostrip = False
 471                    added_comment_space = False
 472                break # end of string
 473            else:
 474                if dostrip:
 475                    text = s[pos:i].lstrip()
 476                    if s[pos] in LWS and not added_comment_space:
 477                        A.append(' ')
 478                        added_comment_space = True
 479                else:
 480                    text = s[pos:i]
 481                if text:
 482                    A.append(text)
 483                    dostrip = False
 484                    added_comment_space = False
 485                pos = i
 486    if dostrip and len(A) > 0 and A[-1] and A[-1][-1] in LWS:
 487        A[-1] = A[-1].rstrip()
 488    return ''.join(A)
 489
 490
 491def _test_comments():
 492    """A self-test on comment processing.  Returns number of test failures."""
 493    def _testrm( a, b, collapse ):
 494        b2 = remove_comments( a, collapse )
 495        if b != b2:
 496            print( 'Comment test failed:' )
 497            print( '   remove_comments( %s, collapse_spaces=%s ) -> %s' % (repr(a), repr(collapse), repr(b2)) )
 498            print( '   expected %s' % repr(b) )
 499            return 1
 500        return 0
 501    failures = 0
 502    failures += _testrm( r'', '', False )
 503    failures += _testrm( r'(hello)', '', False)
 504    failures += _testrm( r'abc (hello) def', 'abc  def', False)
 505    failures += _testrm( r'abc (he(xyz)llo) def', 'abc  def', False)
 506    failures += _testrm( r'abc (he\(xyz)llo) def', 'abc llo) def', False)
 507    failures += _testrm( r'abc(hello)def', 'abcdef', True)
 508    failures += _testrm( r'abc (hello) def', 'abc def', True)
 509    failures += _testrm( r'abc   (hello)def', 'abc def', True)
 510    failures += _testrm( r'abc(hello)  def', 'abc def', True)
 511    failures += _testrm( r'abc(hello) (world)def', 'abc def', True)
 512    failures += _testrm( r'abc(hello)(world)def', 'abcdef', True)
 513    failures += _testrm( r'  (hello) (world) def', 'def', True)
 514    failures += _testrm( r'abc  (hello) (world) ', 'abc', True)
 515    return failures
 516
 517def parse_comment(s, start=0):
 518    """Parses a ()-style comment from a header value.
 519
 520    Returns tuple (comment, chars_consumed), where the comment will
 521    have had the outer-most parentheses and white space stripped.  Any
 522    nested comments will still have their parentheses and whitespace
 523    left intact.
 524
 525    All \-escaped quoted pairs will have been replaced with the actual
 526    characters they represent, even within the inner nested comments.
 527
 528    You should note that only a few HTTP headers, such as User-Agent
 529    or Via, allow ()-style comments within the header value.
 530
 531    A comment is defined by RFC 2616 section 2.2 as:
 532    
 533       comment = "(" *( ctext | quoted-pair | comment ) ")"
 534       ctext   = <any TEXT excluding "(" and ")">
 535    """
 536    if start >= len(s):
 537        raise ParseError('Starting position is beyond the end of the string',s,start)
 538    if s[start] != '(':
 539        raise ParseError('Comment must begin with opening parenthesis',s,start)
 540
 541    s2 = ''
 542    nestlevel = 1
 543    pos = start + 1
 544    while pos < len(s) and s[pos] in LWS:
 545        pos += 1
 546
 547    while pos < len(s):
 548        c = s[pos]
 549        if c == '\\':
 550            # Note this is not C-style escaping; the character after the \ is
 551            # taken literally.
 552            pos += 1
 553            if pos == len(s):
 554                raise ParseError("End of string while expecting a character after '\\'",s,pos)
 555            s2 += s[pos]
 556            pos += 1
 557        elif c == '(':
 558            nestlevel += 1
 559            s2 += c
 560            pos += 1
 561        elif c == ')':
 562            nestlevel -= 1
 563            pos += 1
 564            if nestlevel >= 1:
 565                s2 += c
 566            else:
 567                break
 568        else:
 569            s2 += c
 570            pos += 1
 571    if nestlevel > 0:
 572        raise ParseError('End of string reached before comment was closed',s,pos)
 573    # Now rstrip s2 of all LWS chars.
 574    while len(s2) and s2[-1] in LWS:
 575        s2 = s2[:-1]
 576    return s2, (pos - start)
 577    
 578
 579class range_spec(object):
 580    """A single contiguous (byte) range.
 581
 582    A range_spec defines a range (of bytes) by specifying two offsets,
 583    the 'first' and 'last', which are inclusive in the range.  Offsets
 584    are zero-based (the first byte is offset 0).  The range can not be
 585    empty or negative (has to satisfy first <= last).
 586
 587    The range can be unbounded on either end, represented here by the
 588    None value, with these semantics:
 589
 590       * A 'last' of None always indicates the last possible byte
 591        (although that offset may not be known).
 592
 593       * A 'first' of None indicates this is a suffix range, where
 594         the last value is actually interpreted to be the number
 595         of bytes at the end of the file (regardless of file size).
 596
 597    Note that it is not valid for both first and last to be None.
 598
 599    """
 600
 601    __slots__ = ['first','last']
 602
 603    def __init__(self, first=0, last=None):
 604        self.set( first, last )
 605
 606    def set(self, first, last):
 607        """Sets the value of this range given the first and last offsets.
 608        """
 609        if first is not None and last is not None and first > last:
 610            raise ValueError("Byte range does not satisfy first <= last.")
 611        elif first is None and last is None:
 612            raise ValueError("Byte range can not omit both first and last offsets.")
 613        self.first = first
 614        self.last = last
 615
 616    def __repr__(self):
 617        return '%s.%s(%s,%s)' % (self.__class__.__module__, self.__class__.__name__,
 618                                 self.first, self.last)
 619
 620    def __str__(self):
 621        """Returns a string form of the range as would appear in a Range: header."""
 622        if self.first is None and self.last is None:
 623            return ''
 624        s = ''
 625        if self.first is not None:
 626            s += '%d' % self.first
 627        s += '-'
 628        if self.last is not None:
 629            s += '%d' % self.last
 630        return s
 631
 632    def __eq__(self, other):
 633        """Compare ranges for equality.
 634
 635        Note that if non-specific ranges are involved (such as 34- and -5),
 636        they could compare as not equal even though they may represent
 637        the same set of bytes in some contexts.
 638        """
 639        return self.first == other.first and self.last == other.last
 640
 641    def __ne__(self, other):
 642        """Compare ranges for inequality.
 643
 644        Note that if non-specific ranges are involved (such as 34- and -5),
 645        they could compare as not equal even though they may represent
 646        the same set of bytes in some contexts.
 647        """
 648        return not self.__eq__(other)
 649
 650    def __lt__(self, other):
 651        """< operator is not defined"""
 652        raise NotImplementedError('Ranges can not be relationally compared')
 653    def __le__(self, other):
 654        """<= operator is not defined"""
 655        raise NotImplementedError('Ranges can not be ralationally compared')
 656    def __gt__(self, other):
 657        """> operator is not defined"""
 658        raise NotImplementedError('Ranges can not be relationally compared')
 659    def __ge__(self, other):
 660        """>= operator is not defined"""
 661        raise NotImplementedError('Ranges can not be relationally compared')
 662    
 663    def copy(self):
 664        """Makes a copy of this range object."""
 665        return self.__class__( self.first, self.last )
 666
 667    def is_suffix(self):
 668        """Returns True if this is a suffix range.
 669
 670        A suffix range is one that specifies the last N bytes of a
 671        file regardless of file size.
 672
 673        """
 674        return self.first == None
 675
 676    def is_fixed(self):
 677        """Returns True if this range is absolute and a fixed size.
 678
 679        This occurs only if neither first or last is None.  Converse
 680        is the is_unbounded() method.
 681
 682        """
 683        return self.first is not None and self.last is not None
 684
 685    def is_unbounded(self):
 686        """Returns True if the number of bytes in the range is unspecified.
 687
 688        This can only occur if either the 'first' or the 'last' member
 689        is None.  Converse is the is_fixed() method.
 690
 691        """
 692        return self.first is None or self.last is None
 693
 694    def is_whole_file(self):
 695        """Returns True if this range includes all possible bytes.
 696
 697        This can only occur if the 'last' member is None and the first
 698        member is 0.
 699
 700        """
 701        return self.first == 0 and self.last is None
 702
 703    def __contains__(self, offset):
 704        """Does this byte range contain the given byte offset?
 705
 706        If the offset < 0, then it is taken as an offset from the end
 707        of the file, where -1 is the last byte.  This type of offset
 708        will only work with suffix ranges.
 709
 710        """
 711        if offset < 0:
 712            if self.first is not None:
 713                return False
 714            else:
 715                return self.last >= -offset
 716        elif self.first is None:
 717            return False
 718        elif self.last is None:
 719            return True
 720        else:
 721            return self.first <= offset <= self.last
 722
 723    def fix_to_size(self, size):
 724        """Changes a length-relative range to an absolute range based upon given file size.
 725
 726        Ranges that are already absolute are left as is.
 727
 728        Note that zero-length files are handled as special cases,
 729        since the only way possible to specify a zero-length range is
 730        with the suffix range "-0".  Thus unless this range is a suffix
 731        range, it can not satisfy a zero-length file.
 732
 733        If the resulting range (partly) lies outside the file size then an
 734        error is raised.
 735        """
 736
 737        if size == 0:
 738            if self.first is None:
 739                self.last = 0
 740                return
 741            else:
 742                raise RangeUnsatisfiableError("Range can satisfy a zero-length file.")
 743
 744        if self.first is None:
 745            # A suffix range
 746            self.first = size - self.last
 747            if self.first < 0:
 748                self.first = 0
 749            self.last = size - 1
 750        else:
 751            if self.first > size - 1:
 752                raise RangeUnsatisfiableError('Range begins beyond the file size.')
 753            else:
 754                if self.last is None:
 755                    # An unbounded range
 756                    self.last = size - 1
 757        return
 758
 759    def merge_with(self, other):
 760        """Tries to merge the given range into this one.
 761
 762        The size of this range may be enlarged as a result.
 763
 764        An error is raised if the two ranges do not overlap or are not
 765        contiguous with each other.
 766        """
 767        if self.is_whole_file() or self == other:
 768            return
 769        elif other.is_whole_file():
 770            self.first, self.last = 0, None
 771            return
 772
 773        a1, z1 = self.first, self.last
 774        a2, z2 = other.first, other.last
 775
 776        if self.is_suffix():
 777            if z1 == 0: # self is zero-length, so merge becomes a copy
 778                self.first, self.last = a2, z2
 779                return
 780            elif other.is_suffix():
 781                self.last = max(z1, z2)
 782            else:
 783                raise RangeUnmergableError()
 784        elif other.is_suffix():
 785            if z2 == 0: # other is zero-length, so nothing to merge
 786                return
 787            else:
 788                raise RangeUnmergableError()
 789
 790        assert a1 is not None and a2 is not None
 791
 792        if a2 < a1:
 793            # swap ranges so a1 <= a2
 794            a1, z1, a2, z2 = a2, z2, a1, z1
 795
 796        assert a1 <= a2
 797
 798        if z1 is None:
 799            if z2 is not None and z2 + 1 < a1:
 800                raise RangeUnmergableError()
 801            else:
 802                self.first = min(a1, a2)
 803                self.last = None
 804        elif z2 is None:
 805            if z1 + 1 < a2:
 806                raise RangeUnmergableError()
 807            else:
 808                self.first = min(a1, a2)
 809                self.last = None
 810        else:
 811            if a2 > z1 + 1:
 812                raise RangeUnmergableError()
 813            else:
 814                self.first = a1
 815                self.last = max(z1, z2)
 816        return
 817
 818
 819class range_set(object):
 820    """A collection of range_specs, with units (e.g., bytes).
 821    """
 822    __slots__ = ['units', 'range_specs']
 823
 824    def __init__(self):
 825        self.units = 'bytes'
 826        self.range_specs = []  # a list of range_spec objects
 827
 828    def __str__(self):
 829        return self.units + '=' + ', '.join([str(s) for s in self.range_specs])
 830
 831    def __repr__(self):
 832        return '%s.%s(%s)' % (self.__class__.__module__,
 833                              self.__class__.__name__,
 834                              repr(self.__str__()) )
 835
 836    def from_str(self, s, valid_units=('bytes','none')):
 837        """Sets this range set based upon a string, such as the Range: header.
 838
 839        You can also use the parse_range_set() function for more control.
 840
 841        If a parsing error occurs, the pre-exising value of this range
 842        set is left unchanged.
 843
 844        """
 845        r, k = parse_range_set( s, valid_units=valid_units )
 846        if k < len(s):
 847            raise ParseError("Extra unparsable characters in range set specifier",s,k)
 848        self.units = r.units
 849        self.range_specs = r.range_specs
 850
 851    def is_single_range(self):
 852        """Does this range specifier consist of only a single range set?"""
 853        return len(self.range_specs) == 1
 854
 855    def is_contiguous(self):
 856        """Can the collection of range_specs be coalesced into a single contiguous range?"""
 857        if len(self.range_specs) <= 1:
 858            return True
 859        merged = self.range_specs[0].copy()
 860        for s in self.range_specs[1:]:
 861            try:
 862                merged.merge_with(s)
 863            except:
 864                return False
 865        return True
 866
 867    def fix_to_size(self, size):
 868        """Changes all length-relative range_specs to absolute range_specs based upon given file size.
 869        If none of the range_specs in this set can be satisfied, then the
 870        entire set is considered unsatifiable and an error is raised.
 871        Otherwise any unsatisfiable range_specs will simply be removed
 872        from this set.
 873
 874        """
 875        for i in range(len(self.range_specs)):
 876            try:
 877                self.range_specs[i].fix_to_size( size )
 878            except RangeUnsatisfiableError:
 879                self.range_specs[i] = None
 880        self.range_specs = [s for s in self.range_specs if s is not None]
 881        if len(self.range_specs) == 0:
 882            raise RangeUnsatisfiableError('No ranges can be satisfied')
 883
 884    def coalesce(self):
 885        """Collapses all consecutive range_specs which together define a contiguous range.
 886
 887        Note though that this method will not re-sort the range_specs, so a
 888        potentially contiguous range may not be collapsed if they are
 889        not sorted.  For example the ranges:
 890            10-20, 30-40, 20-30
 891        will not be collapsed to just 10-40.  However if the ranges are
 892        sorted first as with:
 893            10-20, 20-30, 30-40
 894        then they will collapse to 10-40.
 895        """
 896        if len(self.range_specs) <= 1:
 897            return
 898        for i in range(len(self.range_specs) - 1):
 899            a = self.range_specs[i]
 900            b = self.range_specs[i+1]
 901            if a is not None:
 902                try:
 903                    a.merge_with( b )
 904                    self.range_specs[i+1] = None # to be deleted later
 905                except RangeUnmergableError:
 906                    pass
 907        self.range_specs = [r for r in self.range_specs if r is not None]
 908
 909
 910def parse_number( s, start=0 ):
 911    """Parses a positive decimal integer number from the string.
 912
 913    A tuple is returned (number, chars_consumed).  If the
 914    string is not a valid decimal number, then (None,0) is returned.
 915    """
 916    if start >= len(s):
 917        raise ParseError('Starting position is beyond the end of the string',s,start)
 918    if s[start] not in DIGIT:
 919        return (None,0)  # not a number
 920    pos = start
 921    n = 0
 922    while pos < len(s):
 923        c = s[pos]
 924        if c in DIGIT:
 925            n *= 10
 926            n += ord(c) - ord('0')
 927            pos += 1
 928        else:
 929            break
 930    return n, pos-start
 931
 932
 933def parse_range_spec( s, start=0 ):
 934    """Parses a (byte) range_spec.
 935
 936    Returns a tuple (range_spec, chars_consumed).
 937    """
 938    if start >= len(s):
 939        raise ParseError('Starting position is beyond the end of the string',s,start)
 940    if s[start] not in DIGIT and s[start] != '-':
 941        raise ParseError("Invalid range, expected a digit or '-'",s,start)
 942    _first, last = None, None
 943    pos = start
 944    first, k = parse_number( s, pos )
 945    pos += k
 946    if s[pos] == '-':
 947        pos += 1
 948        if pos < len(s):
 949            last, k = parse_number( s, pos )
 950            pos += k
 951    else:
 952        raise ParseError("Byte range must include a '-'",s,pos)
 953    if first is None and last is None:
 954        raise ParseError('Byte range can not omit both first and last indices.',s,start)
 955    R = range_spec( first, last )
 956    return R, pos-start
 957
 958
 959def parse_range_header( header_value, valid_units=('bytes','none') ):
 960    """Parses the value of an HTTP Range: header.
 961
 962    The value of the header as a string should be passed in; without
 963    the header name itself.
 964
 965    Returns a range_set object.
 966    """
 967    ranges, k = parse_range_set( header_value, valid_units=valid_units )
 968    if k < len(header_value):
 969        raise ParseError('Range header has unexpected or unparsable characters',
 970                         header_value, k)
 971    return ranges
 972
 973
 974def parse_range_set( s, start=0, valid_units=('bytes','none') ):
 975    """Parses a (byte) range set specifier.
 976
 977    Returns a tuple (range_set, chars_consumed).
 978    """
 979    if start >= len(s):
 980        raise ParseError('Starting position is beyond the end of the string',s,start)
 981    pos = start
 982    units, k = parse_token( s, pos )
 983    pos += k
 984    if valid_units and units not in valid_units:
 985        raise ParseError('Unsupported units type in range specifier',s,start)
 986    while pos < len(s) and s[pos] in LWS:
 987        pos += 1
 988    if pos < len(s) and s[pos] == '=':
 989        pos += 1
 990    else:
 991        raise ParseError("Invalid range specifier, expected '='",s,pos)
 992    while pos < len(s) and s[pos] in LWS:
 993        pos += 1
 994    range_specs, k = parse_comma_list( s, pos, parse_range_spec, min_count=1 )
 995    pos += k
 996    # Make sure no trash is at the end of the string
 997    while pos < len(s) and s[pos] in LWS:
 998        pos += 1
 999    if pos < len(s):
1000        raise ParseError('Unparsable characters in range set specifier',s,pos)
1001
1002    ranges = range_set()
1003    ranges.units = units
1004    ranges.range_specs = range_specs
1005    return ranges, pos-start
1006
1007
1008def _split_at_qfactor( s ):
1009    """Splits a string at the quality factor (;q=) parameter.
1010
1011    Returns the left and right substrings as a two-member tuple.
1012
1013    """
1014    # It may be faster, but incorrect, to use s.split(';q=',1), since
1015    # HTTP allows any amount of linear white space (LWS) to appear
1016    # between the parts, so it could also be "; q = ".
1017
1018    # We do this parsing 'manually' for speed rather than using a
1019    # regex, which would be r';[ \t\r\n]*q[ \t\r\n]*=[ \t\r\n]*'
1020
1021    pos = 0
1022    while 0 <= pos < len(s):
1023        pos = s.find(';', pos)
1024        if pos < 0:
1025            break # no more parameters
1026        startpos = pos
1027        pos = pos + 1
1028        while pos < len(s) and s[pos] in LWS:
1029            pos = pos + 1
1030        if pos < len(s) and s[pos] == 'q':
1031            pos = pos + 1
1032            while pos < len(s) and s[pos] in LWS:
1033                pos = pos + 1
1034            if pos < len(s) and s[pos] == '=':
1035                pos = pos + 1
1036                while pos < len(s) and s[pos] in LWS:
1037                    pos = pos + 1
1038                return ( s[:startpos], s[pos:] )
1039    return (s, '')
1040
1041
1042def parse_qvalue_accept_list( s, start=0, item_parser=parse_token ):
1043    """Parses any of the Accept-* style headers with quality factors.
1044
1045    This is a low-level function.  It returns a list of tuples, each like:
1046       (item, item_parms, qvalue, accept_parms)
1047
1048    You can pass in a function which parses each of the item strings, or
1049    accept the default where the items must be simple tokens.  Note that
1050    your parser should not consume any paramters (past the special "q"
1051    paramter anyway).
1052
1053    The item_parms and accept_parms are each lists of (name,value) tuples.
1054
1055    The qvalue is the quality factor, a number from 0 to 1 inclusive.
1056
1057    """
1058    itemlist = []
1059    pos = start
1060    if pos >= len(s):
1061        raise ParseError('Starting position is beyond the end of the string',s,pos)
1062    item = None
1063    while pos < len(s):
1064        item, k = item_parser(s, pos)
1065        pos += k
1066        while pos < len(s) and s[pos] in LWS:
1067            pos += 1
1068        if pos >= len(s) or s[pos] in ',;':
1069            itemparms, qvalue, acptparms = [], None, []
1070            if pos < len(s) and s[pos] == ';':
1071                pos += 1
1072                while pos < len(s) and s[pos] in LWS:
1073                    pos += 1
1074                parmlist, k = parse_parameter_list(s, pos)
1075                for p, v in parmlist:
1076                    if p == 'q' and qvalue is None:
1077                        try:
1078                            qvalue = float(v)
1079                        except ValueError:
1080                            raise ParseError('qvalue must be a floating point number',s,pos)
1081                        if qvalue < 0 or qvalue > 1:
1082                            raise ParseError('qvalue must be between 0 and 1, inclusive',s,pos)
1083                    elif qvalue is None:
1084                        itemparms.append( (p,v) )
1085                    else:
1086                        acptparms.append( (p,v) )
1087                pos += k
1088            if item:
1089                # Add the item to the list
1090                if qvalue is None:
1091                    qvalue = 1
1092                itemlist.append( (item, itemparms, qvalue, acptparms) )
1093                item = None
1094            # skip commas
1095            while pos < len(s) and s[pos] == ',':
1096                pos += 1
1097                while pos < len(s) and s[pos] in LWS:
1098                    pos += 1
1099        else:
1100            break
1101    return itemlist, pos - start
1102
1103
1104def parse_accept_header( header_value ):
1105    """Parses the Accept: header.
1106
1107    The value of the header as a string should be passed in; without
1108    the header name itself.
1109    
1110    This will parse the value of any of the HTTP headers "Accept",
1111    "Accept-Charset", "Accept-Encoding", or "Accept-Language".  These
1112    headers are similarly formatted, in that they are a list of items
1113    with associated quality factors.  The quality factor, or qvalue,
1114    is a number in the range [0.0..1.0] which indicates the relative
1115    preference of each item.
1116
1117    This function returns a list of those items, sorted by preference
1118    (from most-prefered to least-prefered).  Each item in the returned
1119    list is actually a tuple consisting of:
1120
1121       ( item_name, item_parms, qvalue, accept_parms )
1122
1123    As an example, the following string,
1124        text/plain; charset="utf-8"; q=.5; columns=80
1125    would be parsed into this resulting tuple,
1126        ( 'text/plain', [('charset','utf-8')], 0.5, [('columns','80')] )
1127
1128    The value of the returned item_name depends upon which header is
1129    being parsed, but for example it may be a MIME content or media
1130    type (without parameters), a language tag, or so on.  Any optional
1131    parameters (delimited by semicolons) occuring before the "q="
1132    attribute will be in the item_parms list as (attribute,value)
1133    tuples in the same order as they appear in the header.  Any quoted
1134    values will have been unquoted and unescaped.
1135
1136    The qvalue is a floating point number in the inclusive range 0.0
1137    to 1.0, and roughly indicates the preference for this item.
1138    Values outside this range will be capped to the closest extreme.
1139
1140         (!) Note that a qvalue of 0 indicates that the item is
1141         explicitly NOT acceptable to the user agent, and should be
1142         handled differently by the caller.
1143
1144    The accept_parms, like the item_parms, is a list of any attributes
1145    occuring after the "q=" attribute, and will be in the list as
1146    (attribute,value) tuples in the same order as they occur.
1147    Usually accept_parms will be an empty list, as the HTTP spec
1148    allows these extra parameters in the syntax but does not
1149    currently define any possible values.
1150
1151    All empty items will be removed from the list.  However, duplicate
1152    or conflicting values are not detected or handled in any way by
1153    this function.
1154    """
1155    def parse_mt_only(s, start):
1156        mt, k = parse_media_type(s, start, with_parameters=False)
1157        ct = content_type()
1158        ct.major = mt[0]
1159        ct.minor = mt[1]
1160        return ct, k
1161
1162    alist, k = parse_qvalue_accept_list( header_value, item_parser=parse_mt_only )
1163    if k < len(header_value):
1164        raise ParseError('Accept header is invalid',header_value,k)
1165
1166    ctlist = []
1167    for ct, ctparms, q, acptparms  in alist:
1168        if ctparms:
1169            ct.set_parameters( dict(ctparms) )
1170        ctlist.append( (ct, q, acptparms) )
1171    return ctlist
1172
1173
1174def parse_media_type(media_type, start=0, with_parameters=True):
1175    """Parses a media type (MIME type) designator into it's parts.
1176
1177    Given a media type string, returns a nested tuple of it's parts.
1178
1179        ((major,minor,parmlist), chars_consumed)
1180
1181    where parmlist is a list of tuples of (parm_name, parm_value).
1182    Quoted-values are appropriately unquoted and unescaped.
1183    
1184    If 'with_parameters' is False, then parsing will stop immediately
1185    after the minor media type; and will not proceed to parse any
1186    of the semicolon-separated paramters.
1187
1188    Examples:
1189        image/png -> (('image','png',[]), 9)
1190        text/plain; charset="utf-16be"
1191                  -> (('text','plain',[('charset,'utf-16be')]), 30)
1192
1193    """
1194
1195    s = media_type
1196    pos = start
1197    ctmaj, k = parse_token(s, pos)
1198    if k == 0:
1199        raise ParseError('Media type must be of the form "major/minor".', s, pos)
1200    pos += k
1201    if pos >= len(s) or s[pos] != '/':
1202        raise ParseError('Media type must be of the form "major/minor".', s, pos)
1203    pos += 1
1204    ctmin, k = parse_token(s, pos)
1205    if k == 0:
1206        raise ParseError('Media type must be of the form "major/minor".', s, pos)
1207    pos += k
1208    if with_parameters:
1209        parmlist, k = parse_parameter_list(s, pos)
1210        pos += k
1211    else:
1212        parmlist = []
1213    return ((ctmaj, ctmin, parmlist), pos - start)
1214
1215
1216def parse_parameter_list(s, start=0):
1217    """Parses a semicolon-separated 'parameter=value' list.
1218
1219    Returns a tuple (parmlist, chars_consumed), where parmlist
1220    is a list of tuples (parm_name, parm_value).
1221
1222    The parameter values will be unquoted and unescaped as needed.
1223
1224    Empty parameters (as in ";;") are skipped, as is insignificant
1225    white space.  The list returned is kept in the same order as the
1226    parameters appear in the string.
1227
1228    """
1229    pos = start
1230    parmlist = []
1231    while pos < len(s):
1232        while pos < len(s) and s[pos] in LWS:
1233            pos += 1 # skip whitespace
1234        if pos < len(s) and s[pos] == ';':
1235            pos += 1
1236            while pos < len(s) and s[pos] in LWS:
1237                pos += 1 # skip whitespace
1238        if pos >= len(s):
1239            break
1240        parmname, k = parse_token(s, pos)
1241        if parmname:
1242            pos += k
1243            while pos < len(s) and s[pos] in LWS:
1244                pos += 1 # skip whitespace
1245            if not (pos < len(s) and s[pos] == '='):
1246                raise ParseError('Expected an "=" after parameter name', s, pos)
1247            pos += 1
1248            while pos < len(s) and s[pos] in LWS:
1249                pos += 1 # skip whitespace
1250            parmval, k = parse_token_or_quoted_string( s, pos )
1251            pos += k
1252            parmlist.append( (parmname, parmval) )
1253        else:
1254            break
1255    return parmlist, pos - start
1256
1257
1258class content_type(object):
1259    """This class represents a media type (aka a MIME content type), including parameters.
1260
1261    You initialize these by passing in a content-type declaration
1262    string, such as "text/plain; charset=ascii", to the constructor or
1263    to the set() method.  If you provide no string value, the object
1264    returned will represent the wildcard */* content type.
1265
1266    Normally you will get the value back by using str(), or optionally
1267    you can access the components via the 'major', 'minor', 'media_type',
1268    or 'parmdict' members.
1269
1270    """
1271    def __init__(self, content_type_string=None, with_parameters=True):
1272        """Create a new content_type object.
1273
1274        See the set() method for a description of the arguments.
1275        """
1276        if content_type_string:
1277            self.set( content_type_string, with_parameters=with_parameters )
1278        else:
1279            self.set( '*/*' )
1280
1281    def set_parameters(self, parameter_list_or_dict):
1282        """Sets the optional paramters based upon the parameter list.
1283
1284        The paramter list should be a semicolon-separated name=value string.
1285        Any paramters which already exist on this object will be deleted,
1286        unless they appear in the given paramter_list.
1287
1288        """
1289        if isinstance(parameter_list_or_dict, dict):
1290            # already a dictionary
1291            pl = parameter_list_or_dict
1292        else:
1293            pl, k = parse_parameter_list(parameter_list_or_dict)
1294            if k < len(parameter_list_or_dict):
1295                raise ParseError('Invalid parameter list', parameter_list_or_dict, k)
1296        self.parmdict = dict(pl)
1297
1298    def set(self, content_type_string, with_parameters=True):
1299        """Parses the content type string and sets this object to it's value.
1300
1301        For a more complete description of the arguments, see the
1302        documentation for the parse_media_type() function in this module.
1303        """
1304        mt, k = parse_media_type( content_type_string, with_parameters=with_parameters )
1305        if k < len(content_type_string):
1306            raise ParseError('Not a valid content type',content_type_string, k)
1307        major, minor, pdict = mt
1308        self._set_major( major )
1309        self._set_minor( minor )
1310        self.parmdict = dict(pdict)
1311        
1312    def _get_major(self):
1313        return self._major
1314    def _set_major(self, s):
1315        s = s.lower()  # case-insentive
1316        if not is_token(s):
1317            raise ValueError('Major media type contains an invalid character')
1318        self._major = s
1319
1320    def _get_minor(self):
1321        return self._minor
1322    def _set_minor(self, s):
1323        s = s.lower()  # case-insentive
1324        if not is_token(s):
1325            raise ValueError('Minor media type contains an invalid character')
1326        self._minor = s
1327
1328    major = property(_get_major, _set_major, doc="Major media classification")
1329    minor = property(_get_minor, _set_minor, doc="Minor media sub-classification")
1330
1331    def __str__(self):
1332        """String value."""
1333        s = '%s/%s' % (self.major, self.minor)
1334        if self.parmdict:
1335            extra = '; '.join([ '%s=%s' % (a[0],quote_string(a[1],False)) for a in self.parmdict.items()])
1336            s += '; ' + extra
1337        return s
1338
1339    def __unicode__(self):
1340        """Unicode string value."""
1341        # In Python 3 this is probably unnecessary in general, this is just to avoid possible syntax issues. I.H.
1342        return str(self.__str__())
1343
1344    def __repr__(self):
1345        """Python representation of this object."""
1346        s = '%s(%s)' % (self.__class__.__name__, repr(self.__str__()))
1347        return s
1348
1349
1350    def __hash__(self):
1351        """Hash this object; the hash is dependent only upon the value."""
1352        return hash(str(self))
1353
1354    def __getstate__(self):
1355        """Pickler"""
1356        return str(self)
1357
1358    def __setstate__(self, state):
1359        """Unpickler"""
1360        self.set(state)
1361
1362    def __len__(self):
1363        """Logical length of this media type.
1364        For example:
1365           len('*/*')  -> 0
1366           len('image/*') -> 1
1367           len('image/png') -> 2
1368           len('text/plain; charset=utf-8')  -> 3
1369           len('text/plain; charset=utf-8; filename=xyz.txt') -> 4
1370
1371        """
1372        if self.major == '*':
1373            return 0
1374        elif self.minor == '*':
1375            return 1
1376        else:
1377            return 2 + len(self.parmdict)
1378
1379    def __eq__(self, other):
1380        """Equality test.
1381
1382        Note that this is an exact match, including any parameters if any.
1383        """
1384        return self.major == other.major and \
1385                   self.minor == other.minor and \
1386                   self.parmdict == other.parmdict
1387
1388    def __ne__(self, other):
1389        """Inequality test."""
1390        return not self.__eq__(other)
1391            
1392    def _get_media_type(self):
1393        """Returns the media 'type/subtype' string, without parameters."""
1394        return '%s/%s' % (self.major, self.minor)
1395
1396    media_type = property(_get_media_type, doc="Returns the just the media type 'type/subtype' without any paramters (read-only).")
1397
1398    def is_wildcard(self):
1399        """Returns True if this is a 'something/*' media type.
1400        """
1401        return self.minor == '*'
1402
1403    def is_universal_wildcard(self):
1404        """Returns True if this is the unspecified '*/*' media type.
1405        """
1406        return self.major == '*' and self.minor == '*'
1407
1408    def is_composite(self):
1409        """Is this media type composed of multiple parts.
1410        """
1411        return self.major == 'multipart' or self.major == 'message'
1412
1413    def is_xml(self):
1414        """Returns True if this media type is XML-based.
1415
1416        Note this does not consider text/html to be XML, but
1417        application/xhtml+xml is.
1418        """
1419        return self.minor == 'xml' or self.minor.endswith('+xml')
1420
1421# Some common media types
1422content_formdata = content_type('multipart/form-data')
1423content_urlencoded = content_type('application/x-www-form-urlencoded')
1424content_byteranges = content_type('multipart/byteranges') # RFC 2616 sect 14.16
1425content_opaque = content_type('application/octet-stream')
1426content_html = content_type('text/html')
1427content_xhtml = content_type('application/xhtml+xml')
1428
1429
1430def acceptable_content_type( accept_header, content_types, ignore_wildcard=True ):
1431    """Determines if the given content type is acceptable to the user agent.
1432
1433    The accept_header should be the value present in the HTTP
1434    "Accept:" header.  In mod_python this is typically obtained from
1435    the req.http_headers_in table; in WSGI it is environ["Accept"];
1436    other web frameworks may provide other methods of obtaining it.
1437
1438    Optionally the accept_header parameter can be pre-parsed, as
1439    returned from the parse_accept_header() function in this module.
1440
1441    The content_types argument should either be a single MIME media
1442    type string, or a sequence of them.  It represents the set of
1443    content types that the caller (server) is willing to send.
1444    Generally, the server content_types should not contain any
1445    wildcarded values.
1446
1447    This function determines which content type which is the most
1448    preferred and is acceptable to both the user agent and the server.
1449    If one is negotiated it will return a four-valued tuple like:
1450
1451        (server_content_type, ua_content_range, qvalue, accept_parms)
1452
1453    The first tuple value is one of the server's content_types, while
1454    the remaining tuple values descript which of the client's
1455    acceptable content_types was matched.  In most cases accept_parms
1456    will be an empty list (see description of parse_accept_header()
1457    for more details).
1458
1459    If no content type could be negotiated, then this function will
1460    return None (and the caller should typically cause an HTTP 406 Not
1461    Acceptable as a response).
1462
1463    Note that the wildcarded content type "*/*" sent by the client
1464    will be ignored, since it is often incorrectly sent by web
1465    browsers that don't really mean it.  To override this, call with
1466    ignore_wildcard=False.  Partial wildcards such as "image/*" will
1467    always be processed, but be at a lower priority than a complete
1468    matching type.
1469
1470    See also: RFC 2616 section 14.1, and
1471    <http://www.iana.org/assignments/media-types/>
1472
1473    """
1474    if _is_string(accept_header):
1475        accept_list = parse_accept_header(accept_header)
1476    else:
1477        accept_list = accept_header
1478
1479    if _is_string(content_types):
1480        content_types = [content_types]
1481
1482    server_ctlist = [content_type(ct) for ct in content_types]
1483    del ct
1484
1485    #print 'AC', repr(accept_list)
1486    #print 'SV', repr(server_ctlist)
1487
1488    best = None   # (content_type, qvalue, accept_parms, matchlen)
1489
1490    for server_ct in server_ctlist:
1491        best_for_this = None
1492        for client_ct, qvalue, aargs in accept_list:
1493            if ignore_wildcard and client_ct.is_universal_wildcard():
1494                continue  # */* being ignored
1495
1496            matchlen = 0 # how specifically this one matches (0 is a non-match)
1497            if client_ct.is_universal_wildcard():
1498                matchlen = 1   # */* is a 1
1499            elif client_ct.major == server_ct.major:
1500                if client_ct.minor == '*':  # something/* is a 2
1501                    matchlen = 2
1502                elif client_ct.minor == server_ct.minor: # something/something is a 3
1503                    matchlen = 3
1504                    # must make sure all the parms match too
1505                    for pname, pval in client_ct.parmdict.items():
1506                        sval = server_ct.parmdict.get(pname)
1507                        if pname == 'charset':
1508                            # special case for charset to match aliases
1509                            pval = canonical_charset(pval)
1510                            sval = canonical_charset(sval)
1511                        if sval == pval:
1512                            matchlen = matchlen + 1
1513                        else:
1514                            matchlen = 0
1515                            break
1516                else:
1517                    matchlen = 0
1518
1519            #print 'S',server_ct,'  C',client_ct,'  M',matchlen,'Q',qvalue
1520            if matchlen > 0:
1521                if not best_for_this \
1522                       or matchlen > best_for_this[-1] \
1523                       or (matchlen == best_for_this[-1] and qvalue > best_for_this[2]):
1524                    # This match is better
1525                    best_for_this = (server_ct, client_ct, qvalue, aargs, matchlen)
1526                    #print 'BEST2 NOW', repr(best_for_this)
1527        if not best or \
1528               (best_for_this and best_for_this[2] > best[2]):
1529            best = best_for_this
1530            #print 'BEST NOW', repr(best)
1531    if not best or best[1] <= 0:
1532        return None
1533    return best[:-1]
1534
1535
1536# Aliases of common charsets, see <http://www.iana.org/assignments/character-sets>.
1537character_set_aliases = {
1538    'ASCII': 'US-ASCII',
1539    'ISO646-US': 'US-ASCII',
1540    'IBM367': 'US-ASCII',
1541    'CP367': 'US-ASCII',
1542    'CSASCII': 'US-ASCII',
1543    'ANSI_X3.4-1968': 'US-ASCII',
1544    'ISO_646.IRV:1991': 'US-ASCII',
1545
1546    'UTF7': 'UTF-7',
1547
1548    'UTF8': 'UTF-8',
1549
1550    'UTF16': 'UTF-16',
1551    'UTF16LE': 'UTF-16LE',
1552    'UTF16BE': 'UTF-16BE',
1553
1554    'UTF32': 'UTF-32',
1555    'UTF32LE': 'UTF-32LE',
1556    'UTF32BE': 'UTF-32BE',
1557
1558    'UCS2': 'ISO-10646-UCS-2',
1559    'UCS_2': 'ISO-10646-UCS-2',
1560    'UCS-2': 'ISO-10646-UCS-2',
1561    'CSUNICODE': 'ISO-10646-UCS-2',
1562
1563    'UCS4': 'ISO-10646-UCS-4',
1564    'UCS_4': 'ISO-10646-UCS-4',
1565    'UCS-4': 'ISO-10646-UCS-4',
1566    'CSUCS4': 'ISO-10646-UCS-4',
1567
1568    'ISO_8859-1': 'ISO-8859-1',
1569    'LATIN1': 'ISO-8859-1',
1570    'CP819': 'ISO-8859-1',
1571    'IBM819': 'ISO-8859-1',
1572
1573    'ISO_8859-2': 'ISO-8859-2',
1574    'LATIN2': 'ISO-8859-2',
1575
1576    'ISO_8859-3': 'ISO-8859-3',
1577    'LATIN3': 'ISO-8859-3',
1578
1579    'ISO_8859-4': 'ISO-8859-4',
1580    'LATIN4': 'ISO-8859-4',
1581
1582    'ISO_8859-5': 'ISO-8859-5',
1583    'CYRILLIC': 'ISO-8859-5',
1584
1585    'ISO_8859-6': 'ISO-8859-6',
1586    'ARABIC': 'ISO-8859-6',
1587    'ECMA-114': 'ISO-8859-6',
1588
1589    'ISO_8859-6-E': 'ISO-8859-6-E',
1590    'ISO_8859-6-I': 'ISO-8859-6-I',
1591
1592    'ISO_8859-7': 'ISO-8859-7',
1593    'GREEK': 'ISO-8859-7',
1594    'GREEK8': 'ISO-8859-7',
1595    'ECMA-118': 'ISO-8859-7',
1596
1597    'ISO_8859-8': 'ISO-8859-8',
1598    'HEBREW': 'ISO-8859-8',
1599
1600    'ISO_8859-8-E': 'ISO-8859-8-E',
1601    'ISO_8859-8-I': 'ISO-8859-8-I',
1602
1603    'ISO_8859-9': 'ISO-8859-9',
1604    'LATIN5': 'ISO-8859-9',
1605
1606    'ISO_8859-10': 'ISO-8859-10',
1607    'LATIN6': 'ISO-8859-10',
1608
1609    'ISO_8859-13': 'ISO-8859-13',
1610
1611    'ISO_8859-14': 'ISO-8859-14',
1612    'LATIN8': 'ISO-8859-14',
1613
1614    'ISO_8859-15': 'ISO-8859-15',
1615    'LATIN9': 'ISO-8859-15',
1616
1617    'ISO_8859-16': 'ISO-8859-16',
1618    'LATIN10': 'ISO-8859-16',
1619    }
1620
1621def canonical_charset(charset):
1622    """Returns the canonical or preferred name of a charset.
1623
1624    Additional character sets can be recognized by this function by
1625    altering the character_set_aliases dictionary in this module.
1626    Charsets which are not recognized are simply converted to
1627    upper-case (as charset names are always case-insensitive).
1628    
1629    See <http://www.iana.org/assignments/character-sets>.
1630
1631    """
1632    # It would be nice to use Python's codecs modules for this, but
1633    # there is no fixed public interface to it's alias mappings.
1634    if not charset:
1635        return charset
1636    uc = charset.upper()
1637    uccon = character_set_aliases.get( uc, uc )
1638    return uccon
1639
1640
1641def acceptable_charset(accept_charset_header, charsets, ignore_wildcard=True, default='ISO-8859-1'):
1642    """
1643    Determines if the given charset is acceptable to the user agent.
1644
1645    The accept_charset_header should be the value present in the HTTP
1646    "Accept-Charset:" header.  In mod_python this is typically
1647    obtained from the req.http_headers table; in WSGI it is
1648    environ["Accept-Charset"]; other web frameworks may provide other
1649    methods of obtaining it.
1650
1651    Optionally the accept_charset_header parameter can instead be the
1652    list returned from the parse_accept_header() function in this
1653    module.
1654
1655    The charsets argument should either be a charset identifier string,
1656    or a sequence of them.
1657
1658    This function returns the charset identifier string which is the
1659    most prefered and is acceptable to both the user agent and the
1660    caller.  It will return the default value if no charset is negotiable.
1661    
1662    Note that the wildcarded charset "*" will be ignored.  To override
1663    this, call with ignore_wildcard=False.
1664
1665    See also: RFC 2616 section 14.2, and
1666    <http://www.iana.org/assignments/character-sets>
1667
1668    """
1669    if default:
1670        default = canonical_charset(default)
1671
1672    if _is_string(accept_charset_header):
1673        accept_list = parse_accept_header(accept_charset_header)
1674    else:
1675        accept_list = accept_charset_header
1676
1677    if _is_string(charsets):
1678        charsets = [canonical_charset(charsets)]
1679    else:
1680        charsets = [canonical_charset(c) for c in charsets]
1681
1682    # Note per RFC that 'ISO-8859-1' is special, and is implictly in the
1683    # accept list with q=1; unless it is already in the list, or '*' is in the list.
1684
1685    best = None
1686    for c, qvalue, _junk in accept_list:
1687        if c == '*':
1688            default = None
1689            if ignore_wildcard:
1690                continue
1691            if not best or qvalue > best[1]:
1692                best = (c, qvalue)
1693        else:
1694            c = canonical_charset(c)
1695            for test_c in charsets:
1696                if c == default:
1697                    default = None
1698                if c == test_c and (not best or best[0]=='*' or qvalue > best[1]):
1699                    best = (c, qvalue)
1700    if default and default in [test_c.upper() for test_c in charsets]:
1701        best = (default, 1)
1702    if best[0] == '*':
1703        best = (charsets[0], best[1])
1704    return best
1705
1706
1707
1708class language_tag(object):
1709    """This class represents an RFC 3066 language tag.
1710
1711    Initialize objects of this class with a single string representing
1712    the language tag, such as "en-US".
1713        
1714    Case is insensitive. Wildcarded subtags are ignored or stripped as
1715    they have no significance, so that "en-*" is the same as "en".
1716    However the universal wildcard "*" language tag is kept as-is.
1717
1718    Note that although relational operators such as < are defined,
1719    they only form a partial order based upon specialization.
1720
1721    Thus for example,
1722         "en" <= "en-US"
1723    but,
1724         not "en" <= "de", and
1725         not "de" <= "en".
1726
1727    """
1728
1729    def __init__(self, tagname):
1730        """Initialize objects of this class with a single string representing
1731        the language tag, such as "en-US".  Case is insensitive.
1732
1733        """
1734
1735        self.parts = tagname.lower().split('-')
1736        while len(self.parts) > 1 and self.parts[-1] == '*':
1737            del self.parts[-1]
1738
1739    def __len__(self):
1740        """Number of subtags in this tag."""
1741        if len(self.parts) == 1 and self.parts[0] == '*':
1742            return 0
1743        return len(self.parts)
1744
1745    def __str__(self):
1746        """The standard string form of this language tag."""
1747        a = []
1748        if len(self.parts) >= 1:
1749            a.append(self.parts[0])
1750        if len(self.parts) >= 2:
1751            if len(self.parts[1]) == 2:
1752                a.append( self.parts[1].upper() )
1753            else:
1754                a.append( self.parts[1] )
1755        a.extend( self.parts[2:] )
1756        return '-'.join(a)
1757
1758    def __unicode__(self):
1759        """The unicode string form of this language tag."""
1760        return str(self.__str__())
1761
1762    def __repr__(self):
1763        """The python representation of this language tag."""
1764        s = '%s("%s")' % (self.__class__.__name__, self.__str__())
1765        return s
1766
1767    def superior(self):
1768        """Returns another instance of language_tag which is the superior.
1769
1770        Thus en-US gives en, and en gives *.
1771
1772        """
1773        if len(self) <= 1:
1774            return self.__class__('*')
1775        return self.__class__( '-'.join(self.parts[:-1]) )
1776
1777    def all_superiors(self, include_wildcard=False):
1778        """Returns a list of this language and all it's superiors.
1779
1780        If include_wildcard is False, then "*" will not be among the
1781        output list, unless this language is itself "*".
1782
1783        """
1784        langlist = [ self ]
1785        l = self
1786        while not l.is_universal_wildcard():
1787            l = l.superior()
1788            if l.is_universal_wildcard() and not include_wildcard:
1789                continue
1790            langlist.append(l)
1791        return langlist
1792                
1793    def is_universal_wildcard(self):
1794        """Returns True if this language tag represents all possible
1795        languages, by using the reserved tag of "*".
1796
1797        """
1798        return len(self.parts) == 1 and self.parts[0] == '*'
1799
1800    def dialect_of(self, other, ignore_wildcard=True):
1801        """Is this language a dialect (or subset/specialization) of another.
1802
1803        This method returns True if this language is the same as or a
1804        specialization (dialect) of the other language_tag.
1805
1806        If ignore_wildcard is False, then all languages will be
1807        considered to be a dialect of the special language tag of "*".
1808
1809        """
1810        if not ignore_wildcard and self.is_universal_wildcard():
1811            return True
1812        for i in range( min(len(self), len(other)) ):
1813            if self.parts[i] != other.parts[i]:
1814                return False
1815        if len(self) >= len(other):
1816            return True
1817        return False
1818
1819    def __eq__(self, other):
1820        """== operator. Are the two languages the same?"""
1821
1822        return self.parts == other.parts
1823
1824    def __neq__(self, other):
1825        """!= operator. Are the two languages different?"""
1826
1827        return not self.__eq__(other)
1828
1829    def __lt__(self, other):
1830        """< operator. Returns True if the other language is a more
1831        specialized dialect of this one."""
1832
1833        return other.dialect_of(self) and self != other
1834
1835    def __le__(self, other):
1836        """<= operator. Returns True if the other language is the same
1837        as or a more specialized dialect of this one."""
1838        return other.dialect_of(self)
1839
1840    def __gt__(self, other):
1841        """> operator.  Returns True if this language is a more
1842        specialized dialect of the other one."""
1843
1844        return self.dialect_of(other) and self != other
1845
1846    def __ge__(self, other):
1847        """>= operator.  Returns True if this language is the same as
1848        or a more specialized dialect of the other one."""
1849
1850        return self.dialect_of(other)
1851
1852
1853def parse_accept_language_header( header_value ):
1854    """Parses the Accept-Language header.
1855
1856    Returns a list of tuples, each like:
1857
1858        (language_tag, qvalue, accept_parameters)
1859
1860    """
1861    alist, k = parse_qvalue_accept_list( header_value)
1862    if k < len(header_value):
1863        raise ParseError('Accept-Language header is invalid',header_value,k)
1864
1865    langlist = []
1866    for token, langparms, q, acptparms in alist:
1867        if langparms:
1868            raise ParseError('Language tag may not have any parameters',header_value,0)
1869        lang = language_tag( token )
1870        langlist.append( (lang, q, acptparms) )
1871
1872    return langlist
1873
1874
1875def acceptable_language( accept_header, server_languages, ignore_wildcard=True, assume_superiors=True ):
1876    """Determines if the given language is acceptable to the user agent.
1877
1878    The accept_header should be the value present in the HTTP
1879    "Accept-Language:" header.  In mod_python this is typically
1880    obtained from the req.http_headers_in table; in WSGI it is
1881    environ["Accept-Language"]; other web frameworks may provide other
1882    methods of obtaining it.
1883
1884    Optionally the accept_header parameter can be pre-parsed, as
1885    returned by the parse_accept_language_header() function defined in
1886    this module.
1887
1888    The server_languages argument should either be a single language
1889    string, a language_tag object, or a sequence of them.  It
1890    represents the set of languages that the server is willing to
1891    send to the user agent.
1892
1893    Note that the wildcarded language tag "*" will be ignored.  To
1894    override this, call with ignore_wildcard=False, and even then
1895    it will be the lowest-priority choice regardless of it's
1896    quality factor (as per HTTP spec).
1897
1898    If the assume_superiors is True then it the languages that the
1899    browser accepts will automatically include all superior languages.
1900    Any superior languages which must be added are done so with one
1901    half the qvalue of the language which is present.  For example, if
1902    the accept string is "en-US", then it will be treated as if it
1903    were "en-US, en;q=0.5".  Note that although the HTTP 1.1 spec says
1904    that browsers are supposed to encourage users to configure all
1905    acceptable languages, sometimes they don't, thus the ability
1906    for this function to assume this.  But setting assume_superiors
1907    to False will insure strict adherence to the HTTP 1.1 spec; which
1908    means that if the browser accepts "en-US", then it will not
1909    be acceptable to send just "en" to it.
1910
1911    This function returns the language which is the most prefered and
1912    is acceptable to both the user agent and the caller.  It will
1913    return None if no language is negotiable, otherwise the return
1914    value is always an instance of language_tag.
1915
1916    See also: RFC 3066 <http://www.ietf.org/rfc/rfc3066.txt>, and
1917    ISO 639, links at <http://en.wikipedia.org/wiki/ISO_639>, and
1918    <http://www.iana.org/assignments/language-tags>.
1919    
1920    """
1921    # Note special instructions from RFC 2616 sect. 14.1:
1922    #   "The language quality factor assigned to a language-tag by the
1923    #   Accept-Language field is the quality value of the longest
1924    #   language- range in the field that matches the language-tag."
1925
1926    if _is_string(accept_header):
1927        accept_list = parse_accept_language_header(accept_header)
1928    else:
1929        accept_list = accept_header
1930
1931    # Possibly add in any "missing" languages that the browser may
1932    # have forgotten to include in the list. Insure list is sorted so
1933    # more general languages come before more specific ones.
1934
1935    accept_list.sort()
1936    all_tags = [a[0] for a in accept_list]
1937    if assume_superiors:
1938        to_add = []
1939        for langtag, qvalue, _args in accept_list:
1940            if len(langtag) >= 2:
1941                for suptag in langtag.all_superiors( include_wildcard=False ):
1942                    if suptag not in all_tags:
1943                        # Add in superior at half the qvalue
1944                        to_add.append( (suptag, qvalue / 2, '') )
1945                        all_tags.append( suptag )
1946        accept_list.extend( to_add )
1947
1948    # Convert server_languages to a list of language_tags
1949    if _is_string(server_languages):
1950        server_languages = [language_tag(server_languages)]
1951    elif isinstance(server_languages, language_tag):
1952        server_languages = [server_languages]
1953    else:
1954        server_languages = [language_tag(lang) for lang in server_languages]
1955
1956    # Select the best one
1957    best = None  # tuple (langtag, qvalue, matchlen)
1958    
1959    for langtag, qvalue, _args in accept_list:
1960        # aargs is ignored for Accept-Language
1961        if qvalue <= 0:
1962            continue # UA doesn't accept this language
1963
1964        if ignore_wildcard and langtag.is_universal_wildcard():
1965            continue  # "*" being ignored
1966
1967        for svrlang in server_languages:
1968            # The best match is determined first by the quality factor,
1969            # and then by the most specific match.
1970
1971            matchlen = -1 # how specifically this one matches (0 is a non-match)
1972            if svrlang.dialect_of( langtag, ignore_wildcard=ignore_wildcard ):
1973                matchlen = len(langtag)
1974                if not best \
1975                       or matchlen > best[2] \
1976                       or (matchlen == best[2] and qvalue > best[1]):
1977                    # This match is better
1978                    best = (langtag, qvalue, matchlen)
1979    if not best:
1980        return None
1981    return best[0]
1982
1983# end of file
SEPARATORS = frozenset({'"', ']', '\t', '\\', '@', '/', '<', ':', ' ', '[', ';', '{', ')', '}', '(', '>', '?', '=', ','})
LWS = frozenset({'\n', ' ', '\r', '\t'})
CRLF = frozenset({'\n', '\r'})
DIGIT = frozenset({'5', '1', '4', '8', '7', '9', '2', '6', '3', '0'})
HEX = frozenset({'1', '8', '5', 'f', 'A', 'B', 'C', 'c', 'D', '3', '0', '4', 'd', 'b', '7', '2', '6', 'e', 'a', '9', 'E', 'F'})
def http_datetime(dt=None):
130def http_datetime(dt=None):
131    """Formats a datetime as an HTTP 1.1 Date/Time string.
132
133    Takes a standard Python datetime object and returns a string
134    formatted according to the HTTP 1.1 date/time format.
135
136    If no datetime is provided (or None) then the current
137    time is used.
138    
139    ABOUT TIMEZONES: If the passed in datetime object is naive it is
140    assumed to be in UTC already.  But if it has a tzinfo component,
141    the returned timestamp string will have been converted to UTC
142    automatically.  So if you use timezone-aware datetimes, you need
143    not worry about conversion to UTC.
144
145    """
146    if not dt:
147        import datetime
148        dt = datetime.datetime.utcnow()
149    else:
150        try:
151            dt = dt - dt.utcoffset()
152        except:
153            pass  # no timezone offset, just assume already in UTC
154
155    s = dt.strftime('%a, %d %b %Y %H:%M:%S GMT')
156    return s

Formats a datetime as an HTTP 1.1 Date/Time string.

Takes a standard Python datetime object and returns a string formatted according to the HTTP 1.1 date/time format.

If no datetime is provided (or None) then the current time is used.

ABOUT TIMEZONES: If the passed in datetime object is naive it is assumed to be in UTC already. But if it has a tzinfo component, the returned timestamp string will have been converted to UTC automatically. So if you use timezone-aware datetimes, you need not worry about conversion to UTC.

def parse_http_datetime(datestring, utc_tzinfo=None, strict=False):
159def parse_http_datetime(datestring, utc_tzinfo=None, strict=False):
160    """Returns a datetime object from an HTTP 1.1 Date/Time string.
161
162    Note that HTTP dates are always in UTC, so the returned datetime
163    object will also be in UTC.
164
165    You can optionally pass in a tzinfo object which should represent
166    the UTC timezone, and the returned datetime will then be
167    timezone-aware (allowing you to more easly translate it into
168    different timzeones later).
169
170    If you set 'strict' to True, then only the RFC 1123 format
171    is recognized.  Otherwise the backwards-compatible RFC 1036
172    and Unix asctime(3) formats are also recognized.
173    
174    Please note that the day-of-the-week is not validated.
175    Also two-digit years, although not HTTP 1.1 compliant, are
176    treated according to recommended Y2K rules.
177
178    """
179    import re, datetime
180    m = re.match(r'(?P<DOW>[a-z]+), (?P<D>\d+) (?P<MON>[a-z]+) (?P<Y>\d+) (?P<H>\d+):(?P<M>\d+):(?P<S>\d+(\.\d+)?) (?P<TZ>[a-zA-Z0-9_+]+)$',
181                 datestring, re.IGNORECASE)
182    if not m and not strict:
183        m = re.match(r'(?P<DOW>[a-z]+) (?P<MON>[a-z]+) (?P<D>\d+) (?P<H>\d+):(?P<M>\d+):(?P<S>\d+) (?P<Y>\d+)$',
184                     datestring, re.IGNORECASE)
185        if not m:
186            m = re.match(r'(?P<DOW>[a-z]+), (?P<D>\d+)-(?P<MON>[a-z]+)-(?P<Y>\d+) (?P<H>\d+):(?P<M>\d+):(?P<S>\d+(\.\d+)?) (?P<TZ>\w+)$',
187                         datestring, re.IGNORECASE)
188    if not m:
189        raise ValueError('HTTP date is not correctly formatted')
190
191    try:
192        tz = m.group('TZ').upper()
193    except:
194        tz = 'GMT'
195    if tz not in ('GMT','UTC','0000','00:00'):
196        raise ValueError('HTTP date is not in GMT timezone')
197
198    monname = m.group('MON').upper()
199    mdict = {'JAN':1, 'FEB':2, 'MAR':3, 'APR':4, 'MAY':5, 'JUN':6,
200             'JUL':7, 'AUG':8, 'SEP':9, 'OCT':10, 'NOV':11, 'DEC':12}
201    month = mdict.get(monname)
202    if not month:
203        raise ValueError('HTTP date has an unrecognizable month')
204    y = int(m.group('Y'))
205    if y < 100:
206        century = datetime.datetime.utcnow().year / 100
207        if y < 50:
208            y = century * 100 + y
209        else:
210            y = (century - 1) * 100 + y
211    d = int(m.group('D'))
212    hour = int(m.group('H'))
213    minute = int(m.group('M'))
214    try:
215        second = int(m.group('S'))
216    except:
217        second = float(m.group('S'))
218    dt = datetime.datetime( y, month, d, hour, minute, second, tzinfo=utc_tzinfo )
219    return dt

Returns a datetime object from an HTTP 1.1 Date/Time string.

Note that HTTP dates are always in UTC, so the returned datetime object will also be in UTC.

You can optionally pass in a tzinfo object which should represent the UTC timezone, and the returned datetime will then be timezone-aware (allowing you to more easly translate it into different timzeones later).

If you set 'strict' to True, then only the RFC 1123 format is recognized. Otherwise the backwards-compatible RFC 1036 and Unix asctime(3) formats are also recognized.

Please note that the day-of-the-week is not validated. Also two-digit years, although not HTTP 1.1 compliant, are treated according to recommended Y2K rules.

class RangeUnsatisfiableError(builtins.ValueError):
222class RangeUnsatisfiableError(ValueError):
223    """Exception class when a byte range lies outside the file size boundaries."""
224    def __init__(self, reason=None):
225        if not reason:
226            reason = 'Range is unsatisfiable'
227        ValueError.__init__(self, reason)

Exception class when a byte range lies outside the file size boundaries.

RangeUnsatisfiableError(reason=None)
224    def __init__(self, reason=None):
225        if not reason:
226            reason = 'Range is unsatisfiable'
227        ValueError.__init__(self, reason)
Inherited Members
builtins.BaseException
with_traceback
args
class RangeUnmergableError(builtins.ValueError):
230class RangeUnmergableError(ValueError):
231    """Exception class when byte ranges are noncontiguous and can not be merged together."""
232    def __init__(self, reason=None):
233        if not reason:
234            reason = 'Ranges can not be merged together'
235        ValueError.__init__(self, reason)

Exception class when byte ranges are noncontiguous and can not be merged together.

RangeUnmergableError(reason=None)
232    def __init__(self, reason=None):
233        if not reason:
234            reason = 'Ranges can not be merged together'
235        ValueError.__init__(self, reason)
Inherited Members
builtins.BaseException
with_traceback
args
class ParseError(builtins.ValueError):
238class ParseError(ValueError):
239    """Exception class representing a string parsing error."""
240    def __init__(self, args, input_string, at_position):
241        ValueError.__init__(self, args)
242        self.input_string = input_string
243        self.at_position = at_position
244    def __str__(self):
245        if self.at_position >= len(self.input_string):
246            return '%s\n\tOccured at end of string' % self.args[0]
247        else:
248            return '%s\n\tOccured near %s' % (self.args[0], repr(self.input_string[self.at_position:self.at_position+16]))

Exception class representing a string parsing error.

ParseError(args, input_string, at_position)
240    def __init__(self, args, input_string, at_position):
241        ValueError.__init__(self, args)
242        self.input_string = input_string
243        self.at_position = at_position
input_string
at_position
Inherited Members
builtins.BaseException
with_traceback
args
def is_token(s):
251def is_token(s):
252    """Determines if the string is a valid token."""
253    for c in s:
254        if ord(c) < 32 or ord(c) > 128 or c in SEPARATORS:
255            return False
256    return True

Determines if the string is a valid token.

def parse_comma_list(s, start=0, element_parser=None, min_count=0, max_count=0):
259def parse_comma_list(s, start=0, element_parser=None, min_count=0, max_count=0):
260    """Parses a comma-separated list with optional whitespace.
261
262    Takes an optional callback function `element_parser`, which
263    is assumed to be able to parse an individual element.  It
264    will be passed the string and a `start` argument, and
265    is expected to return a tuple (parsed_result, chars_consumed).
266
267    If no element_parser is given, then either single tokens or
268    quoted strings will be parsed.
269
270    If min_count > 0, then at least that many non-empty elements
271    must be in the list, or an error is raised.
272
273    If max_count > 0, then no more than that many non-empty elements
274    may be in the list, or an error is raised.
275
276    """
277    if min_count > 0 and start == len(s):
278        raise ParseError('Comma-separated list must contain some elements',s,start)
279    elif start >= len(s):
280        raise ParseError('Starting position is beyond the end of the string',s,start)
281
282    if not element_parser:
283        element_parser = parse_token_or_quoted_string
284    results = []
285    pos = start
286    while pos < len(s):
287        e = element_parser( s, pos )
288        if not e or e[1] == 0:
289            break # end of data?
290        else:
291            results.append( e[0] )
292            pos += e[1]
293        while pos < len(s) and s[pos] in LWS:
294            pos += 1
295        if pos < len(s) and s[pos] != ',':
296            break
297        while pos < len(s) and s[pos] == ',':
298            # skip comma and any "empty" elements
299            pos += 1  # skip comma
300            while pos < len(s) and s[pos] in LWS:
301                pos += 1
302    if len(results) < min_count:
303        raise ParseError('Comma-separated list does not have enough elements',s,pos)
304    elif max_count and len(results) > max_count:
305        raise ParseError('Comma-separated list has too many elements',s,pos)
306    return (results, pos-start)

Parses a comma-separated list with optional whitespace.

Takes an optional callback function element_parser, which is assumed to be able to parse an individual element. It will be passed the string and a start argument, and is expected to return a tuple (parsed_result, chars_consumed).

If no element_parser is given, then either single tokens or quoted strings will be parsed.

If min_count > 0, then at least that many non-empty elements must be in the list, or an error is raised.

If max_count > 0, then no more than that many non-empty elements may be in the list, or an error is raised.

def parse_token(s, start=0):
309def parse_token(s, start=0):
310    """Parses a token.
311
312    A token is a string defined by RFC 2616 section 2.2 as:
313       token = 1*<any CHAR except CTLs or separators>
314
315    Returns a tuple (token, chars_consumed), or ('',0) if no token
316    starts at the given string position.  On a syntax error, a
317    ParseError exception will be raised.
318
319    """
320    return parse_token_or_quoted_string(s, start, allow_quoted=False, allow_token=True)

Parses a token.

A token is a string defined by RFC 2616 section 2.2 as: token = 1*

Returns a tuple (token, chars_consumed), or ('',0) if no token starts at the given string position. On a syntax error, a ParseError exception will be raised.

def quote_string(s, always_quote=True):
323def quote_string(s, always_quote=True):
324    """Produces a quoted string according to HTTP 1.1 rules.
325
326    If always_quote is False and if the string is also a valid token,
327    then this function may return a string without quotes.
328
329    """
330    need_quotes = False
331    q = ''
332    for c in s:
333        if ord(c) < 32 or ord(c) > 127 or c in SEPARATORS:
334            q += '\\' + c
335            need_quotes = True
336        else:
337            q += c
338    if need_quotes or always_quote:
339        return '"' + q + '"'
340    else:
341        return q

Produces a quoted string according to HTTP 1.1 rules.

If always_quote is False and if the string is also a valid token, then this function may return a string without quotes.

def parse_quoted_string(s, start=0):
344def parse_quoted_string(s, start=0):
345    """Parses a quoted string.
346
347    Returns a tuple (string, chars_consumed).  The quote marks will
348    have been removed and all \-escapes will have been replaced with
349    the characters they represent.
350
351    """
352    return parse_token_or_quoted_string(s, start, allow_quoted=True, allow_token=False)

Parses a quoted string.

Returns a tuple (string, chars_consumed). The quote marks will have been removed and all -escapes will have been replaced with the characters they represent.

def parse_token_or_quoted_string(s, start=0, allow_quoted=True, allow_token=True):
355def parse_token_or_quoted_string(s, start=0, allow_quoted=True, allow_token=True):
356    """Parses a token or a quoted-string.
357
358    's' is the string to parse, while start is the position within the
359    string where parsing should begin.  It will returns a tuple
360    (token, chars_consumed), with all \-escapes and quotation already
361    processed.
362
363    Syntax is according to BNF rules in RFC 2161 section 2.2,
364    specifically the 'token' and 'quoted-string' declarations.
365    Syntax errors in the input string will result in ParseError
366    being raised.
367
368    If allow_quoted is False, then only tokens will be parsed instead
369    of either a token or quoted-string.
370
371    If allow_token is False, then only quoted-strings will be parsed
372    instead of either a token or quoted-string.
373    """
374    if not allow_quoted and not allow_token:
375        raise ValueError('Parsing can not continue with options provided')
376
377    if start >= len(s):
378        raise ParseError('Starting position is beyond the end of the string',s,start)
379    has_quote = (s[start] == '"')
380    if has_quote and not allow_quoted:
381        raise ParseError('A quoted string was not expected', s, start)
382    if not has_quote and not allow_token:
383        raise ParseError('Expected a quotation mark', s, start)
384
385    s2 = ''
386    pos = start
387    if has_quote:
388        pos += 1
389    while pos < len(s):
390        c = s[pos]
391        if c == '\\' and has_quote:
392            # Note this is NOT C-style escaping; the character after the \ is
393            # taken literally.
394            pos += 1
395            if pos == len(s):
396                raise ParseError("End of string while expecting a character after '\\'",s,pos)
397            s2 += s[pos]
398            pos += 1
399        elif c == '"' and has_quote:
400            break
401        elif not has_quote and (c in SEPARATORS or ord(c)<32 or ord(c)>127):
402            break
403        else:
404            s2 += c
405            pos += 1
406    if has_quote:
407        # Make sure we have a closing quote mark
408        if pos >= len(s) or s[pos] != '"':
409            raise ParseError('Quoted string is missing closing quote mark',s,pos)
410        else:
411            pos += 1
412    return s2, (pos - start)

Parses a token or a quoted-string.

's' is the string to parse, while start is the position within the string where parsing should begin. It will returns a tuple (token, chars_consumed), with all -escapes and quotation already processed.

Syntax is according to BNF rules in RFC 2161 section 2.2, specifically the 'token' and 'quoted-string' declarations. Syntax errors in the input string will result in ParseError being raised.

If allow_quoted is False, then only tokens will be parsed instead of either a token or quoted-string.

If allow_token is False, then only quoted-strings will be parsed instead of either a token or quoted-string.

def remove_comments(s, collapse_spaces=True):
415def remove_comments(s, collapse_spaces=True):
416    """Removes any ()-style comments from a string.
417
418    In HTTP, ()-comments can nest, and this function will correctly
419    deal with that.
420
421    If 'collapse_spaces' is True, then if there is any whitespace
422    surrounding the comment, it will be replaced with a single space
423    character.  Whitespace also collapses across multiple comment
424    sequences, so that "a (b) (c) d" becomes just "a d".
425
426    Otherwise, if 'collapse_spaces' is False then all whitespace which
427    is outside any comments is left intact as-is.
428
429    """
430    if '(' not in s:
431        return s  # simple case
432    A = []
433    dostrip = False
434    added_comment_space = False
435    pos = 0
436    if collapse_spaces:
437        # eat any leading spaces before a comment
438        i = s.find('(')
439        if i >= 0:
440            while pos < i and s[pos] in LWS:
441                pos += 1
442            if pos != i:
443                pos = 0
444            else:
445                dostrip = True
446                added_comment_space = True  # lie
447    while pos < len(s):
448        if s[pos] == '(':
449            _cmt, k = parse_comment( s, pos )
450            pos += k
451            if collapse_spaces:
452                dostrip = True
453                if not added_comment_space:
454                    if len(A) > 0 and A[-1] and A[-1][-1] in LWS:
455                        # previous part ended with whitespace
456                        A[-1] = A[-1].rstrip()
457                        A.append(' ')  # comment becomes one space
458                        added_comment_space = True
459        else:
460            i = s.find( '(', pos )
461            if i == -1:
462                if dostrip:
463                    text = s[pos:].lstrip()
464                    if s[pos] in LWS and not added_comment_space:
465                        A.append(' ')
466                        added_comment_space = True
467                else:
468                    text = s[pos:]
469                if text:
470                    A.append(text)
471                    dostrip = False
472                    added_comment_space = False
473                break # end of string
474            else:
475                if dostrip:
476                    text = s[pos:i].lstrip()
477                    if s[pos] in LWS and not added_comment_space:
478                        A.append(' ')
479                        added_comment_space = True
480                else:
481                    text = s[pos:i]
482                if text:
483                    A.append(text)
484                    dostrip = False
485                    added_comment_space = False
486                pos = i
487    if dostrip and len(A) > 0 and A[-1] and A[-1][-1] in LWS:
488        A[-1] = A[-1].rstrip()
489    return ''.join(A)

Removes any ()-style comments from a string.

In HTTP, ()-comments can nest, and this function will correctly deal with that.

If 'collapse_spaces' is True, then if there is any whitespace surrounding the comment, it will be replaced with a single space character. Whitespace also collapses across multiple comment sequences, so that "a (b) (c) d" becomes just "a d".

Otherwise, if 'collapse_spaces' is False then all whitespace which is outside any comments is left intact as-is.

def parse_comment(s, start=0):
518def parse_comment(s, start=0):
519    """Parses a ()-style comment from a header value.
520
521    Returns tuple (comment, chars_consumed), where the comment will
522    have had the outer-most parentheses and white space stripped.  Any
523    nested comments will still have their parentheses and whitespace
524    left intact.
525
526    All \-escaped quoted pairs will have been replaced with the actual
527    characters they represent, even within the inner nested comments.
528
529    You should note that only a few HTTP headers, such as User-Agent
530    or Via, allow ()-style comments within the header value.
531
532    A comment is defined by RFC 2616 section 2.2 as:
533    
534       comment = "(" *( ctext | quoted-pair | comment ) ")"
535       ctext   = <any TEXT excluding "(" and ")">
536    """
537    if start >= len(s):
538        raise ParseError('Starting position is beyond the end of the string',s,start)
539    if s[start] != '(':
540        raise ParseError('Comment must begin with opening parenthesis',s,start)
541
542    s2 = ''
543    nestlevel = 1
544    pos = start + 1
545    while pos < len(s) and s[pos] in LWS:
546        pos += 1
547
548    while pos < len(s):
549        c = s[pos]
550        if c == '\\':
551            # Note this is not C-style escaping; the character after the \ is
552            # taken literally.
553            pos += 1
554            if pos == len(s):
555                raise ParseError("End of string while expecting a character after '\\'",s,pos)
556            s2 += s[pos]
557            pos += 1
558        elif c == '(':
559            nestlevel += 1
560            s2 += c
561            pos += 1
562        elif c == ')':
563            nestlevel -= 1
564            pos += 1
565            if nestlevel >= 1:
566                s2 += c
567            else:
568                break
569        else:
570            s2 += c
571            pos += 1
572    if nestlevel > 0:
573        raise ParseError('End of string reached before comment was closed',s,pos)
574    # Now rstrip s2 of all LWS chars.
575    while len(s2) and s2[-1] in LWS:
576        s2 = s2[:-1]
577    return s2, (pos - start)

Parses a ()-style comment from a header value.

Returns tuple (comment, chars_consumed), where the comment will have had the outer-most parentheses and white space stripped. Any nested comments will still have their parentheses and whitespace left intact.

All -escaped quoted pairs will have been replaced with the actual characters they represent, even within the inner nested comments.

You should note that only a few HTTP headers, such as User-Agent or Via, allow ()-style comments within the header value.

A comment is defined by RFC 2616 section 2.2 as:

comment = "(" *( ctext | quoted-pair | comment ) ")" ctext =

class range_spec:
580class range_spec(object):
581    """A single contiguous (byte) range.
582
583    A range_spec defines a range (of bytes) by specifying two offsets,
584    the 'first' and 'last', which are inclusive in the range.  Offsets
585    are zero-based (the first byte is offset 0).  The range can not be
586    empty or negative (has to satisfy first <= last).
587
588    The range can be unbounded on either end, represented here by the
589    None value, with these semantics:
590
591       * A 'last' of None always indicates the last possible byte
592        (although that offset may not be known).
593
594       * A 'first' of None indicates this is a suffix range, where
595         the last value is actually interpreted to be the number
596         of bytes at the end of the file (regardless of file size).
597
598    Note that it is not valid for both first and last to be None.
599
600    """
601
602    __slots__ = ['first','last']
603
604    def __init__(self, first=0, last=None):
605        self.set( first, last )
606
607    def set(self, first, last):
608        """Sets the value of this range given the first and last offsets.
609        """
610        if first is not None and last is not None and first > last:
611            raise ValueError("Byte range does not satisfy first <= last.")
612        elif first is None and last is None:
613            raise ValueError("Byte range can not omit both first and last offsets.")
614        self.first = first
615        self.last = last
616
617    def __repr__(self):
618        return '%s.%s(%s,%s)' % (self.__class__.__module__, self.__class__.__name__,
619                                 self.first, self.last)
620
621    def __str__(self):
622        """Returns a string form of the range as would appear in a Range: header."""
623        if self.first is None and self.last is None:
624            return ''
625        s = ''
626        if self.first is not None:
627            s += '%d' % self.first
628        s += '-'
629        if self.last is not None:
630            s += '%d' % self.last
631        return s
632
633    def __eq__(self, other):
634        """Compare ranges for equality.
635
636        Note that if non-specific ranges are involved (such as 34- and -5),
637        they could compare as not equal even though they may represent
638        the same set of bytes in some contexts.
639        """
640        return self.first == other.first and self.last == other.last
641
642    def __ne__(self, other):
643        """Compare ranges for inequality.
644
645        Note that if non-specific ranges are involved (such as 34- and -5),
646        they could compare as not equal even though they may represent
647        the same set of bytes in some contexts.
648        """
649        return not self.__eq__(other)
650
651    def __lt__(self, other):
652        """< operator is not defined"""
653        raise NotImplementedError('Ranges can not be relationally compared')
654    def __le__(self, other):
655        """<= operator is not defined"""
656        raise NotImplementedError('Ranges can not be ralationally compared')
657    def __gt__(self, other):
658        """> operator is not defined"""
659        raise NotImplementedError('Ranges can not be relationally compared')
660    def __ge__(self, other):
661        """>= operator is not defined"""
662        raise NotImplementedError('Ranges can not be relationally compared')
663    
664    def copy(self):
665        """Makes a copy of this range object."""
666        return self.__class__( self.first, self.last )
667
668    def is_suffix(self):
669        """Returns True if this is a suffix range.
670
671        A suffix range is one that specifies the last N bytes of a
672        file regardless of file size.
673
674        """
675        return self.first == None
676
677    def is_fixed(self):
678        """Returns True if this range is absolute and a fixed size.
679
680        This occurs only if neither first or last is None.  Converse
681        is the is_unbounded() method.
682
683        """
684        return self.first is not None and self.last is not None
685
686    def is_unbounded(self):
687        """Returns True if the number of bytes in the range is unspecified.
688
689        This can only occur if either the 'first' or the 'last' member
690        is None.  Converse is the is_fixed() method.
691
692        """
693        return self.first is None or self.last is None
694
695    def is_whole_file(self):
696        """Returns True if this range includes all possible bytes.
697
698        This can only occur if the 'last' member is None and the first
699        member is 0.
700
701        """
702        return self.first == 0 and self.last is None
703
704    def __contains__(self, offset):
705        """Does this byte range contain the given byte offset?
706
707        If the offset < 0, then it is taken as an offset from the end
708        of the file, where -1 is the last byte.  This type of offset
709        will only work with suffix ranges.
710
711        """
712        if offset < 0:
713            if self.first is not None:
714                return False
715            else:
716                return self.last >= -offset
717        elif self.first is None:
718            return False
719        elif self.last is None:
720            return True
721        else:
722            return self.first <= offset <= self.last
723
724    def fix_to_size(self, size):
725        """Changes a length-relative range to an absolute range based upon given file size.
726
727        Ranges that are already absolute are left as is.
728
729        Note that zero-length files are handled as special cases,
730        since the only way possible to specify a zero-length range is
731        with the suffix range "-0".  Thus unless this range is a suffix
732        range, it can not satisfy a zero-length file.
733
734        If the resulting range (partly) lies outside the file size then an
735        error is raised.
736        """
737
738        if size == 0:
739            if self.first is None:
740                self.last = 0
741                return
742            else:
743                raise RangeUnsatisfiableError("Range can satisfy a zero-length file.")
744
745        if self.first is None:
746            # A suffix range
747            self.first = size - self.last
748            if self.first < 0:
749                self.first = 0
750            self.last = size - 1
751        else:
752            if self.first > size - 1:
753                raise RangeUnsatisfiableError('Range begins beyond the file size.')
754            else:
755                if self.last is None:
756                    # An unbounded range
757                    self.last = size - 1
758        return
759
760    def merge_with(self, other):
761        """Tries to merge the given range into this one.
762
763        The size of this range may be enlarged as a result.
764
765        An error is raised if the two ranges do not overlap or are not
766        contiguous with each other.
767        """
768        if self.is_whole_file() or self == other:
769            return
770        elif other.is_whole_file():
771            self.first, self.last = 0, None
772            return
773
774        a1, z1 = self.first, self.last
775        a2, z2 = other.first, other.last
776
777        if self.is_suffix():
778            if z1 == 0: # self is zero-length, so merge becomes a copy
779                self.first, self.last = a2, z2
780                return
781            elif other.is_suffix():
782                self.last = max(z1, z2)
783            else:
784                raise RangeUnmergableError()
785        elif other.is_suffix():
786            if z2 == 0: # other is zero-length, so nothing to merge
787                return
788            else:
789                raise RangeUnmergableError()
790
791        assert a1 is not None and a2 is not None
792
793        if a2 < a1:
794            # swap ranges so a1 <= a2
795            a1, z1, a2, z2 = a2, z2, a1, z1
796
797        assert a1 <= a2
798
799        if z1 is None:
800            if z2 is not None and z2 + 1 < a1:
801                raise RangeUnmergableError()
802            else:
803                self.first = min(a1, a2)
804                self.last = None
805        elif z2 is None:
806            if z1 + 1 < a2:
807                raise RangeUnmergableError()
808            else:
809                self.first = min(a1, a2)
810                self.last = None
811        else:
812            if a2 > z1 + 1:
813                raise RangeUnmergableError()
814            else:
815                self.first = a1
816                self.last = max(z1, z2)
817        return

A single contiguous (byte) range.

A range_spec defines a range (of bytes) by specifying two offsets, the 'first' and 'last', which are inclusive in the range. Offsets are zero-based (the first byte is offset 0). The range can not be empty or negative (has to satisfy first <= last).

The range can be unbounded on either end, represented here by the None value, with these semantics:

  • A 'last' of None always indicates the last possible byte (although that offset may not be known).

  • A 'first' of None indicates this is a suffix range, where the last value is actually interpreted to be the number of bytes at the end of the file (regardless of file size).

Note that it is not valid for both first and last to be None.

range_spec(first=0, last=None)
604    def __init__(self, first=0, last=None):
605        self.set( first, last )
def set(self, first, last):
607    def set(self, first, last):
608        """Sets the value of this range given the first and last offsets.
609        """
610        if first is not None and last is not None and first > last:
611            raise ValueError("Byte range does not satisfy first <= last.")
612        elif first is None and last is None:
613            raise ValueError("Byte range can not omit both first and last offsets.")
614        self.first = first
615        self.last = last

Sets the value of this range given the first and last offsets.

def copy(self):
664    def copy(self):
665        """Makes a copy of this range object."""
666        return self.__class__( self.first, self.last )

Makes a copy of this range object.

def is_suffix(self):
668    def is_suffix(self):
669        """Returns True if this is a suffix range.
670
671        A suffix range is one that specifies the last N bytes of a
672        file regardless of file size.
673
674        """
675        return self.first == None

Returns True if this is a suffix range.

A suffix range is one that specifies the last N bytes of a file regardless of file size.

def is_fixed(self):
677    def is_fixed(self):
678        """Returns True if this range is absolute and a fixed size.
679
680        This occurs only if neither first or last is None.  Converse
681        is the is_unbounded() method.
682
683        """
684        return self.first is not None and self.last is not None

Returns True if this range is absolute and a fixed size.

This occurs only if neither first or last is None. Converse is the is_unbounded() method.

def is_unbounded(self):
686    def is_unbounded(self):
687        """Returns True if the number of bytes in the range is unspecified.
688
689        This can only occur if either the 'first' or the 'last' member
690        is None.  Converse is the is_fixed() method.
691
692        """
693        return self.first is None or self.last is None

Returns True if the number of bytes in the range is unspecified.

This can only occur if either the 'first' or the 'last' member is None. Converse is the is_fixed() method.

def is_whole_file(self):
695    def is_whole_file(self):
696        """Returns True if this range includes all possible bytes.
697
698        This can only occur if the 'last' member is None and the first
699        member is 0.
700
701        """
702        return self.first == 0 and self.last is None

Returns True if this range includes all possible bytes.

This can only occur if the 'last' member is None and the first member is 0.

def fix_to_size(self, size):
724    def fix_to_size(self, size):
725        """Changes a length-relative range to an absolute range based upon given file size.
726
727        Ranges that are already absolute are left as is.
728
729        Note that zero-length files are handled as special cases,
730        since the only way possible to specify a zero-length range is
731        with the suffix range "-0".  Thus unless this range is a suffix
732        range, it can not satisfy a zero-length file.
733
734        If the resulting range (partly) lies outside the file size then an
735        error is raised.
736        """
737
738        if size == 0:
739            if self.first is None:
740                self.last = 0
741                return
742            else:
743                raise RangeUnsatisfiableError("Range can satisfy a zero-length file.")
744
745        if self.first is None:
746            # A suffix range
747            self.first = size - self.last
748            if self.first < 0:
749                self.first = 0
750            self.last = size - 1
751        else:
752            if self.first > size - 1:
753                raise RangeUnsatisfiableError('Range begins beyond the file size.')
754            else:
755                if self.last is None:
756                    # An unbounded range
757                    self.last = size - 1
758        return

Changes a length-relative range to an absolute range based upon given file size.

Ranges that are already absolute are left as is.

Note that zero-length files are handled as special cases, since the only way possible to specify a zero-length range is with the suffix range "-0". Thus unless this range is a suffix range, it can not satisfy a zero-length file.

If the resulting range (partly) lies outside the file size then an error is raised.

def merge_with(self, other):
760    def merge_with(self, other):
761        """Tries to merge the given range into this one.
762
763        The size of this range may be enlarged as a result.
764
765        An error is raised if the two ranges do not overlap or are not
766        contiguous with each other.
767        """
768        if self.is_whole_file() or self == other:
769            return
770        elif other.is_whole_file():
771            self.first, self.last = 0, None
772            return
773
774        a1, z1 = self.first, self.last
775        a2, z2 = other.first, other.last
776
777        if self.is_suffix():
778            if z1 == 0: # self is zero-length, so merge becomes a copy
779                self.first, self.last = a2, z2
780                return
781            elif other.is_suffix():
782                self.last = max(z1, z2)
783            else:
784                raise RangeUnmergableError()
785        elif other.is_suffix():
786            if z2 == 0: # other is zero-length, so nothing to merge
787                return
788            else:
789                raise RangeUnmergableError()
790
791        assert a1 is not None and a2 is not None
792
793        if a2 < a1:
794            # swap ranges so a1 <= a2
795            a1, z1, a2, z2 = a2, z2, a1, z1
796
797        assert a1 <= a2
798
799        if z1 is None:
800            if z2 is not None and z2 + 1 < a1:
801                raise RangeUnmergableError()
802            else:
803                self.first = min(a1, a2)
804                self.last = None
805        elif z2 is None:
806            if z1 + 1 < a2:
807                raise RangeUnmergableError()
808            else:
809                self.first = min(a1, a2)
810                self.last = None
811        else:
812            if a2 > z1 + 1:
813                raise RangeUnmergableError()
814            else:
815                self.first = a1
816                self.last = max(z1, z2)
817        return

Tries to merge the given range into this one.

The size of this range may be enlarged as a result.

An error is raised if the two ranges do not overlap or are not contiguous with each other.

first
last
class range_set:
820class range_set(object):
821    """A collection of range_specs, with units (e.g., bytes).
822    """
823    __slots__ = ['units', 'range_specs']
824
825    def __init__(self):
826        self.units = 'bytes'
827        self.range_specs = []  # a list of range_spec objects
828
829    def __str__(self):
830        return self.units + '=' + ', '.join([str(s) for s in self.range_specs])
831
832    def __repr__(self):
833        return '%s.%s(%s)' % (self.__class__.__module__,
834                              self.__class__.__name__,
835                              repr(self.__str__()) )
836
837    def from_str(self, s, valid_units=('bytes','none')):
838        """Sets this range set based upon a string, such as the Range: header.
839
840        You can also use the parse_range_set() function for more control.
841
842        If a parsing error occurs, the pre-exising value of this range
843        set is left unchanged.
844
845        """
846        r, k = parse_range_set( s, valid_units=valid_units )
847        if k < len(s):
848            raise ParseError("Extra unparsable characters in range set specifier",s,k)
849        self.units = r.units
850        self.range_specs = r.range_specs
851
852    def is_single_range(self):
853        """Does this range specifier consist of only a single range set?"""
854        return len(self.range_specs) == 1
855
856    def is_contiguous(self):
857        """Can the collection of range_specs be coalesced into a single contiguous range?"""
858        if len(self.range_specs) <= 1:
859            return True
860        merged = self.range_specs[0].copy()
861        for s in self.range_specs[1:]:
862            try:
863                merged.merge_with(s)
864            except:
865                return False
866        return True
867
868    def fix_to_size(self, size):
869        """Changes all length-relative range_specs to absolute range_specs based upon given file size.
870        If none of the range_specs in this set can be satisfied, then the
871        entire set is considered unsatifiable and an error is raised.
872        Otherwise any unsatisfiable range_specs will simply be removed
873        from this set.
874
875        """
876        for i in range(len(self.range_specs)):
877            try:
878                self.range_specs[i].fix_to_size( size )
879            except RangeUnsatisfiableError:
880                self.range_specs[i] = None
881        self.range_specs = [s for s in self.range_specs if s is not None]
882        if len(self.range_specs) == 0:
883            raise RangeUnsatisfiableError('No ranges can be satisfied')
884
885    def coalesce(self):
886        """Collapses all consecutive range_specs which together define a contiguous range.
887
888        Note though that this method will not re-sort the range_specs, so a
889        potentially contiguous range may not be collapsed if they are
890        not sorted.  For example the ranges:
891            10-20, 30-40, 20-30
892        will not be collapsed to just 10-40.  However if the ranges are
893        sorted first as with:
894            10-20, 20-30, 30-40
895        then they will collapse to 10-40.
896        """
897        if len(self.range_specs) <= 1:
898            return
899        for i in range(len(self.range_specs) - 1):
900            a = self.range_specs[i]
901            b = self.range_specs[i+1]
902            if a is not None:
903                try:
904                    a.merge_with( b )
905                    self.range_specs[i+1] = None # to be deleted later
906                except RangeUnmergableError:
907                    pass
908        self.range_specs = [r for r in self.range_specs if r is not None]

A collection of range_specs, with units (e.g., bytes).

units
range_specs
def from_str(self, s, valid_units=('bytes', 'none')):
837    def from_str(self, s, valid_units=('bytes','none')):
838        """Sets this range set based upon a string, such as the Range: header.
839
840        You can also use the parse_range_set() function for more control.
841
842        If a parsing error occurs, the pre-exising value of this range
843        set is left unchanged.
844
845        """
846        r, k = parse_range_set( s, valid_units=valid_units )
847        if k < len(s):
848            raise ParseError("Extra unparsable characters in range set specifier",s,k)
849        self.units = r.units
850        self.range_specs = r.range_specs

Sets this range set based upon a string, such as the Range: header.

You can also use the parse_range_set() function for more control.

If a parsing error occurs, the pre-exising value of this range set is left unchanged.

def is_single_range(self):
852    def is_single_range(self):
853        """Does this range specifier consist of only a single range set?"""
854        return len(self.range_specs) == 1

Does this range specifier consist of only a single range set?

def is_contiguous(self):
856    def is_contiguous(self):
857        """Can the collection of range_specs be coalesced into a single contiguous range?"""
858        if len(self.range_specs) <= 1:
859            return True
860        merged = self.range_specs[0].copy()
861        for s in self.range_specs[1:]:
862            try:
863                merged.merge_with(s)
864            except:
865                return False
866        return True

Can the collection of range_specs be coalesced into a single contiguous range?

def fix_to_size(self, size):
868    def fix_to_size(self, size):
869        """Changes all length-relative range_specs to absolute range_specs based upon given file size.
870        If none of the range_specs in this set can be satisfied, then the
871        entire set is considered unsatifiable and an error is raised.
872        Otherwise any unsatisfiable range_specs will simply be removed
873        from this set.
874
875        """
876        for i in range(len(self.range_specs)):
877            try:
878                self.range_specs[i].fix_to_size( size )
879            except RangeUnsatisfiableError:
880                self.range_specs[i] = None
881        self.range_specs = [s for s in self.range_specs if s is not None]
882        if len(self.range_specs) == 0:
883            raise RangeUnsatisfiableError('No ranges can be satisfied')

Changes all length-relative range_specs to absolute range_specs based upon given file size. If none of the range_specs in this set can be satisfied, then the entire set is considered unsatifiable and an error is raised. Otherwise any unsatisfiable range_specs will simply be removed from this set.

def coalesce(self):
885    def coalesce(self):
886        """Collapses all consecutive range_specs which together define a contiguous range.
887
888        Note though that this method will not re-sort the range_specs, so a
889        potentially contiguous range may not be collapsed if they are
890        not sorted.  For example the ranges:
891            10-20, 30-40, 20-30
892        will not be collapsed to just 10-40.  However if the ranges are
893        sorted first as with:
894            10-20, 20-30, 30-40
895        then they will collapse to 10-40.
896        """
897        if len(self.range_specs) <= 1:
898            return
899        for i in range(len(self.range_specs) - 1):
900            a = self.range_specs[i]
901            b = self.range_specs[i+1]
902            if a is not None:
903                try:
904                    a.merge_with( b )
905                    self.range_specs[i+1] = None # to be deleted later
906                except RangeUnmergableError:
907                    pass
908        self.range_specs = [r for r in self.range_specs if r is not None]

Collapses all consecutive range_specs which together define a contiguous range.

Note though that this method will not re-sort the range_specs, so a potentially contiguous range may not be collapsed if they are not sorted. For example the ranges: 10-20, 30-40, 20-30 will not be collapsed to just 10-40. However if the ranges are sorted first as with: 10-20, 20-30, 30-40 then they will collapse to 10-40.

def parse_number(s, start=0):
911def parse_number( s, start=0 ):
912    """Parses a positive decimal integer number from the string.
913
914    A tuple is returned (number, chars_consumed).  If the
915    string is not a valid decimal number, then (None,0) is returned.
916    """
917    if start >= len(s):
918        raise ParseError('Starting position is beyond the end of the string',s,start)
919    if s[start] not in DIGIT:
920        return (None,0)  # not a number
921    pos = start
922    n = 0
923    while pos < len(s):
924        c = s[pos]
925        if c in DIGIT:
926            n *= 10
927            n += ord(c) - ord('0')
928            pos += 1
929        else:
930            break
931    return n, pos-start

Parses a positive decimal integer number from the string.

A tuple is returned (number, chars_consumed). If the string is not a valid decimal number, then (None,0) is returned.

def parse_range_spec(s, start=0):
934def parse_range_spec( s, start=0 ):
935    """Parses a (byte) range_spec.
936
937    Returns a tuple (range_spec, chars_consumed).
938    """
939    if start >= len(s):
940        raise ParseError('Starting position is beyond the end of the string',s,start)
941    if s[start] not in DIGIT and s[start] != '-':
942        raise ParseError("Invalid range, expected a digit or '-'",s,start)
943    _first, last = None, None
944    pos = start
945    first, k = parse_number( s, pos )
946    pos += k
947    if s[pos] == '-':
948        pos += 1
949        if pos < len(s):
950            last, k = parse_number( s, pos )
951            pos += k
952    else:
953        raise ParseError("Byte range must include a '-'",s,pos)
954    if first is None and last is None:
955        raise ParseError('Byte range can not omit both first and last indices.',s,start)
956    R = range_spec( first, last )
957    return R, pos-start

Parses a (byte) range_spec.

Returns a tuple (range_spec, chars_consumed).

def parse_range_header(header_value, valid_units=('bytes', 'none')):
960def parse_range_header( header_value, valid_units=('bytes','none') ):
961    """Parses the value of an HTTP Range: header.
962
963    The value of the header as a string should be passed in; without
964    the header name itself.
965
966    Returns a range_set object.
967    """
968    ranges, k = parse_range_set( header_value, valid_units=valid_units )
969    if k < len(header_value):
970        raise ParseError('Range header has unexpected or unparsable characters',
971                         header_value, k)
972    return ranges

Parses the value of an HTTP Range: header.

The value of the header as a string should be passed in; without the header name itself.

Returns a range_set object.

def parse_range_set(s, start=0, valid_units=('bytes', 'none')):
 975def parse_range_set( s, start=0, valid_units=('bytes','none') ):
 976    """Parses a (byte) range set specifier.
 977
 978    Returns a tuple (range_set, chars_consumed).
 979    """
 980    if start >= len(s):
 981        raise ParseError('Starting position is beyond the end of the string',s,start)
 982    pos = start
 983    units, k = parse_token( s, pos )
 984    pos += k
 985    if valid_units and units not in valid_units:
 986        raise ParseError('Unsupported units type in range specifier',s,start)
 987    while pos < len(s) and s[pos] in LWS:
 988        pos += 1
 989    if pos < len(s) and s[pos] == '=':
 990        pos += 1
 991    else:
 992        raise ParseError("Invalid range specifier, expected '='",s,pos)
 993    while pos < len(s) and s[pos] in LWS:
 994        pos += 1
 995    range_specs, k = parse_comma_list( s, pos, parse_range_spec, min_count=1 )
 996    pos += k
 997    # Make sure no trash is at the end of the string
 998    while pos < len(s) and s[pos] in LWS:
 999        pos += 1
1000    if pos < len(s):
1001        raise ParseError('Unparsable characters in range set specifier',s,pos)
1002
1003    ranges = range_set()
1004    ranges.units = units
1005    ranges.range_specs = range_specs
1006    return ranges, pos-start

Parses a (byte) range set specifier.

Returns a tuple (range_set, chars_consumed).

def parse_qvalue_accept_list(s, start=0, item_parser=<function parse_token>):
1043def parse_qvalue_accept_list( s, start=0, item_parser=parse_token ):
1044    """Parses any of the Accept-* style headers with quality factors.
1045
1046    This is a low-level function.  It returns a list of tuples, each like:
1047       (item, item_parms, qvalue, accept_parms)
1048
1049    You can pass in a function which parses each of the item strings, or
1050    accept the default where the items must be simple tokens.  Note that
1051    your parser should not consume any paramters (past the special "q"
1052    paramter anyway).
1053
1054    The item_parms and accept_parms are each lists of (name,value) tuples.
1055
1056    The qvalue is the quality factor, a number from 0 to 1 inclusive.
1057
1058    """
1059    itemlist = []
1060    pos = start
1061    if pos >= len(s):
1062        raise ParseError('Starting position is beyond the end of the string',s,pos)
1063    item = None
1064    while pos < len(s):
1065        item, k = item_parser(s, pos)
1066        pos += k
1067        while pos < len(s) and s[pos] in LWS:
1068            pos += 1
1069        if pos >= len(s) or s[pos] in ',;':
1070            itemparms, qvalue, acptparms = [], None, []
1071            if pos < len(s) and s[pos] == ';':
1072                pos += 1
1073                while pos < len(s) and s[pos] in LWS:
1074                    pos += 1
1075                parmlist, k = parse_parameter_list(s, pos)
1076                for p, v in parmlist:
1077                    if p == 'q' and qvalue is None:
1078                        try:
1079                            qvalue = float(v)
1080                        except ValueError:
1081                            raise ParseError('qvalue must be a floating point number',s,pos)
1082                        if qvalue < 0 or qvalue > 1:
1083                            raise ParseError('qvalue must be between 0 and 1, inclusive',s,pos)
1084                    elif qvalue is None:
1085                        itemparms.append( (p,v) )
1086                    else:
1087                        acptparms.append( (p,v) )
1088                pos += k
1089            if item:
1090                # Add the item to the list
1091                if qvalue is None:
1092                    qvalue = 1
1093                itemlist.append( (item, itemparms, qvalue, acptparms) )
1094                item = None
1095            # skip commas
1096            while pos < len(s) and s[pos] == ',':
1097                pos += 1
1098                while pos < len(s) and s[pos] in LWS:
1099                    pos += 1
1100        else:
1101            break
1102    return itemlist, pos - start

Parses any of the Accept-* style headers with quality factors.

This is a low-level function. It returns a list of tuples, each like: (item, item_parms, qvalue, accept_parms)

You can pass in a function which parses each of the item strings, or accept the default where the items must be simple tokens. Note that your parser should not consume any paramters (past the special "q" paramter anyway).

The item_parms and accept_parms are each lists of (name,value) tuples.

The qvalue is the quality factor, a number from 0 to 1 inclusive.

def parse_accept_header(header_value):
1105def parse_accept_header( header_value ):
1106    """Parses the Accept: header.
1107
1108    The value of the header as a string should be passed in; without
1109    the header name itself.
1110    
1111    This will parse the value of any of the HTTP headers "Accept",
1112    "Accept-Charset", "Accept-Encoding", or "Accept-Language".  These
1113    headers are similarly formatted, in that they are a list of items
1114    with associated quality factors.  The quality factor, or qvalue,
1115    is a number in the range [0.0..1.0] which indicates the relative
1116    preference of each item.
1117
1118    This function returns a list of those items, sorted by preference
1119    (from most-prefered to least-prefered).  Each item in the returned
1120    list is actually a tuple consisting of:
1121
1122       ( item_name, item_parms, qvalue, accept_parms )
1123
1124    As an example, the following string,
1125        text/plain; charset="utf-8"; q=.5; columns=80
1126    would be parsed into this resulting tuple,
1127        ( 'text/plain', [('charset','utf-8')], 0.5, [('columns','80')] )
1128
1129    The value of the returned item_name depends upon which header is
1130    being parsed, but for example it may be a MIME content or media
1131    type (without parameters), a language tag, or so on.  Any optional
1132    parameters (delimited by semicolons) occuring before the "q="
1133    attribute will be in the item_parms list as (attribute,value)
1134    tuples in the same order as they appear in the header.  Any quoted
1135    values will have been unquoted and unescaped.
1136
1137    The qvalue is a floating point number in the inclusive range 0.0
1138    to 1.0, and roughly indicates the preference for this item.
1139    Values outside this range will be capped to the closest extreme.
1140
1141         (!) Note that a qvalue of 0 indicates that the item is
1142         explicitly NOT acceptable to the user agent, and should be
1143         handled differently by the caller.
1144
1145    The accept_parms, like the item_parms, is a list of any attributes
1146    occuring after the "q=" attribute, and will be in the list as
1147    (attribute,value) tuples in the same order as they occur.
1148    Usually accept_parms will be an empty list, as the HTTP spec
1149    allows these extra parameters in the syntax but does not
1150    currently define any possible values.
1151
1152    All empty items will be removed from the list.  However, duplicate
1153    or conflicting values are not detected or handled in any way by
1154    this function.
1155    """
1156    def parse_mt_only(s, start):
1157        mt, k = parse_media_type(s, start, with_parameters=False)
1158        ct = content_type()
1159        ct.major = mt[0]
1160        ct.minor = mt[1]
1161        return ct, k
1162
1163    alist, k = parse_qvalue_accept_list( header_value, item_parser=parse_mt_only )
1164    if k < len(header_value):
1165        raise ParseError('Accept header is invalid',header_value,k)
1166
1167    ctlist = []
1168    for ct, ctparms, q, acptparms  in alist:
1169        if ctparms:
1170            ct.set_parameters( dict(ctparms) )
1171        ctlist.append( (ct, q, acptparms) )
1172    return ctlist

Parses the Accept: header.

The value of the header as a string should be passed in; without the header name itself.

This will parse the value of any of the HTTP headers "Accept", "Accept-Charset", "Accept-Encoding", or "Accept-Language". These headers are similarly formatted, in that they are a list of items with associated quality factors. The quality factor, or qvalue, is a number in the range [0.0..1.0] which indicates the relative preference of each item.

This function returns a list of those items, sorted by preference (from most-prefered to least-prefered). Each item in the returned list is actually a tuple consisting of:

( item_name, item_parms, qvalue, accept_parms )

As an example, the following string, text/plain; charset="utf-8"; q=.5; columns=80 would be parsed into this resulting tuple, ( 'text/plain', [('charset','utf-8')], 0.5, [('columns','80')] )

The value of the returned item_name depends upon which header is being parsed, but for example it may be a MIME content or media type (without parameters), a language tag, or so on. Any optional parameters (delimited by semicolons) occuring before the "q=" attribute will be in the item_parms list as (attribute,value) tuples in the same order as they appear in the header. Any quoted values will have been unquoted and unescaped.

The qvalue is a floating point number in the inclusive range 0.0 to 1.0, and roughly indicates the preference for this item. Values outside this range will be capped to the closest extreme.

 (!) Note that a qvalue of 0 indicates that the item is
 explicitly NOT acceptable to the user agent, and should be
 handled differently by the caller.

The accept_parms, like the item_parms, is a list of any attributes occuring after the "q=" attribute, and will be in the list as (attribute,value) tuples in the same order as they occur. Usually accept_parms will be an empty list, as the HTTP spec allows these extra parameters in the syntax but does not currently define any possible values.

All empty items will be removed from the list. However, duplicate or conflicting values are not detected or handled in any way by this function.

def parse_media_type(media_type, start=0, with_parameters=True):
1175def parse_media_type(media_type, start=0, with_parameters=True):
1176    """Parses a media type (MIME type) designator into it's parts.
1177
1178    Given a media type string, returns a nested tuple of it's parts.
1179
1180        ((major,minor,parmlist), chars_consumed)
1181
1182    where parmlist is a list of tuples of (parm_name, parm_value).
1183    Quoted-values are appropriately unquoted and unescaped.
1184    
1185    If 'with_parameters' is False, then parsing will stop immediately
1186    after the minor media type; and will not proceed to parse any
1187    of the semicolon-separated paramters.
1188
1189    Examples:
1190        image/png -> (('image','png',[]), 9)
1191        text/plain; charset="utf-16be"
1192                  -> (('text','plain',[('charset,'utf-16be')]), 30)
1193
1194    """
1195
1196    s = media_type
1197    pos = start
1198    ctmaj, k = parse_token(s, pos)
1199    if k == 0:
1200        raise ParseError('Media type must be of the form "major/minor".', s, pos)
1201    pos += k
1202    if pos >= len(s) or s[pos] != '/':
1203        raise ParseError('Media type must be of the form "major/minor".', s, pos)
1204    pos += 1
1205    ctmin, k = parse_token(s, pos)
1206    if k == 0:
1207        raise ParseError('Media type must be of the form "major/minor".', s, pos)
1208    pos += k
1209    if with_parameters:
1210        parmlist, k = parse_parameter_list(s, pos)
1211        pos += k
1212    else:
1213        parmlist = []
1214    return ((ctmaj, ctmin, parmlist), pos - start)

Parses a media type (MIME type) designator into it's parts.

Given a media type string, returns a nested tuple of it's parts.

((major,minor,parmlist), chars_consumed)

where parmlist is a list of tuples of (parm_name, parm_value). Quoted-values are appropriately unquoted and unescaped.

If 'with_parameters' is False, then parsing will stop immediately after the minor media type; and will not proceed to parse any of the semicolon-separated paramters.

Examples: image/png -> (('image','png',[]), 9) text/plain; charset="utf-16be" -> (('text','plain',[('charset,'utf-16be')]), 30)

def parse_parameter_list(s, start=0):
1217def parse_parameter_list(s, start=0):
1218    """Parses a semicolon-separated 'parameter=value' list.
1219
1220    Returns a tuple (parmlist, chars_consumed), where parmlist
1221    is a list of tuples (parm_name, parm_value).
1222
1223    The parameter values will be unquoted and unescaped as needed.
1224
1225    Empty parameters (as in ";;") are skipped, as is insignificant
1226    white space.  The list returned is kept in the same order as the
1227    parameters appear in the string.
1228
1229    """
1230    pos = start
1231    parmlist = []
1232    while pos < len(s):
1233        while pos < len(s) and s[pos] in LWS:
1234            pos += 1 # skip whitespace
1235        if pos < len(s) and s[pos] == ';':
1236            pos += 1
1237            while pos < len(s) and s[pos] in LWS:
1238                pos += 1 # skip whitespace
1239        if pos >= len(s):
1240            break
1241        parmname, k = parse_token(s, pos)
1242        if parmname:
1243            pos += k
1244            while pos < len(s) and s[pos] in LWS:
1245                pos += 1 # skip whitespace
1246            if not (pos < len(s) and s[pos] == '='):
1247                raise ParseError('Expected an "=" after parameter name', s, pos)
1248            pos += 1
1249            while pos < len(s) and s[pos] in LWS:
1250                pos += 1 # skip whitespace
1251            parmval, k = parse_token_or_quoted_string( s, pos )
1252            pos += k
1253            parmlist.append( (parmname, parmval) )
1254        else:
1255            break
1256    return parmlist, pos - start

Parses a semicolon-separated 'parameter=value' list.

Returns a tuple (parmlist, chars_consumed), where parmlist is a list of tuples (parm_name, parm_value).

The parameter values will be unquoted and unescaped as needed.

Empty parameters (as in ";;") are skipped, as is insignificant white space. The list returned is kept in the same order as the parameters appear in the string.

class content_type:
1259class content_type(object):
1260    """This class represents a media type (aka a MIME content type), including parameters.
1261
1262    You initialize these by passing in a content-type declaration
1263    string, such as "text/plain; charset=ascii", to the constructor or
1264    to the set() method.  If you provide no string value, the object
1265    returned will represent the wildcard */* content type.
1266
1267    Normally you will get the value back by using str(), or optionally
1268    you can access the components via the 'major', 'minor', 'media_type',
1269    or 'parmdict' members.
1270
1271    """
1272    def __init__(self, content_type_string=None, with_parameters=True):
1273        """Create a new content_type object.
1274
1275        See the set() method for a description of the arguments.
1276        """
1277        if content_type_string:
1278            self.set( content_type_string, with_parameters=with_parameters )
1279        else:
1280            self.set( '*/*' )
1281
1282    def set_parameters(self, parameter_list_or_dict):
1283        """Sets the optional paramters based upon the parameter list.
1284
1285        The paramter list should be a semicolon-separated name=value string.
1286        Any paramters which already exist on this object will be deleted,
1287        unless they appear in the given paramter_list.
1288
1289        """
1290        if isinstance(parameter_list_or_dict, dict):
1291            # already a dictionary
1292            pl = parameter_list_or_dict
1293        else:
1294            pl, k = parse_parameter_list(parameter_list_or_dict)
1295            if k < len(parameter_list_or_dict):
1296                raise ParseError('Invalid parameter list', parameter_list_or_dict, k)
1297        self.parmdict = dict(pl)
1298
1299    def set(self, content_type_string, with_parameters=True):
1300        """Parses the content type string and sets this object to it's value.
1301
1302        For a more complete description of the arguments, see the
1303        documentation for the parse_media_type() function in this module.
1304        """
1305        mt, k = parse_media_type( content_type_string, with_parameters=with_parameters )
1306        if k < len(content_type_string):
1307            raise ParseError('Not a valid content type',content_type_string, k)
1308        major, minor, pdict = mt
1309        self._set_major( major )
1310        self._set_minor( minor )
1311        self.parmdict = dict(pdict)
1312        
1313    def _get_major(self):
1314        return self._major
1315    def _set_major(self, s):
1316        s = s.lower()  # case-insentive
1317        if not is_token(s):
1318            raise ValueError('Major media type contains an invalid character')
1319        self._major = s
1320
1321    def _get_minor(self):
1322        return self._minor
1323    def _set_minor(self, s):
1324        s = s.lower()  # case-insentive
1325        if not is_token(s):
1326            raise ValueError('Minor media type contains an invalid character')
1327        self._minor = s
1328
1329    major = property(_get_major, _set_major, doc="Major media classification")
1330    minor = property(_get_minor, _set_minor, doc="Minor media sub-classification")
1331
1332    def __str__(self):
1333        """String value."""
1334        s = '%s/%s' % (self.major, self.minor)
1335        if self.parmdict:
1336            extra = '; '.join([ '%s=%s' % (a[0],quote_string(a[1],False)) for a in self.parmdict.items()])
1337            s += '; ' + extra
1338        return s
1339
1340    def __unicode__(self):
1341        """Unicode string value."""
1342        # In Python 3 this is probably unnecessary in general, this is just to avoid possible syntax issues. I.H.
1343        return str(self.__str__())
1344
1345    def __repr__(self):
1346        """Python representation of this object."""
1347        s = '%s(%s)' % (self.__class__.__name__, repr(self.__str__()))
1348        return s
1349
1350
1351    def __hash__(self):
1352        """Hash this object; the hash is dependent only upon the value."""
1353        return hash(str(self))
1354
1355    def __getstate__(self):
1356        """Pickler"""
1357        return str(self)
1358
1359    def __setstate__(self, state):
1360        """Unpickler"""
1361        self.set(state)
1362
1363    def __len__(self):
1364        """Logical length of this media type.
1365        For example:
1366           len('*/*')  -> 0
1367           len('image/*') -> 1
1368           len('image/png') -> 2
1369           len('text/plain; charset=utf-8')  -> 3
1370           len('text/plain; charset=utf-8; filename=xyz.txt') -> 4
1371
1372        """
1373        if self.major == '*':
1374            return 0
1375        elif self.minor == '*':
1376            return 1
1377        else:
1378            return 2 + len(self.parmdict)
1379
1380    def __eq__(self, other):
1381        """Equality test.
1382
1383        Note that this is an exact match, including any parameters if any.
1384        """
1385        return self.major == other.major and \
1386                   self.minor == other.minor and \
1387                   self.parmdict == other.parmdict
1388
1389    def __ne__(self, other):
1390        """Inequality test."""
1391        return not self.__eq__(other)
1392            
1393    def _get_media_type(self):
1394        """Returns the media 'type/subtype' string, without parameters."""
1395        return '%s/%s' % (self.major, self.minor)
1396
1397    media_type = property(_get_media_type, doc="Returns the just the media type 'type/subtype' without any paramters (read-only).")
1398
1399    def is_wildcard(self):
1400        """Returns True if this is a 'something/*' media type.
1401        """
1402        return self.minor == '*'
1403
1404    def is_universal_wildcard(self):
1405        """Returns True if this is the unspecified '*/*' media type.
1406        """
1407        return self.major == '*' and self.minor == '*'
1408
1409    def is_composite(self):
1410        """Is this media type composed of multiple parts.
1411        """
1412        return self.major == 'multipart' or self.major == 'message'
1413
1414    def is_xml(self):
1415        """Returns True if this media type is XML-based.
1416
1417        Note this does not consider text/html to be XML, but
1418        application/xhtml+xml is.
1419        """
1420        return self.minor == 'xml' or self.minor.endswith('+xml')

This class represents a media type (aka a MIME content type), including parameters.

You initialize these by passing in a content-type declaration string, such as "text/plain; charset=ascii", to the constructor or to the set() method. If you provide no string value, the object returned will represent the wildcard / content type.

Normally you will get the value back by using str(), or optionally you can access the components via the 'major', 'minor', 'media_type', or 'parmdict' members.

content_type(content_type_string=None, with_parameters=True)
1272    def __init__(self, content_type_string=None, with_parameters=True):
1273        """Create a new content_type object.
1274
1275        See the set() method for a description of the arguments.
1276        """
1277        if content_type_string:
1278            self.set( content_type_string, with_parameters=with_parameters )
1279        else:
1280            self.set( '*/*' )

Create a new content_type object.

See the set() method for a description of the arguments.

def set_parameters(self, parameter_list_or_dict):
1282    def set_parameters(self, parameter_list_or_dict):
1283        """Sets the optional paramters based upon the parameter list.
1284
1285        The paramter list should be a semicolon-separated name=value string.
1286        Any paramters which already exist on this object will be deleted,
1287        unless they appear in the given paramter_list.
1288
1289        """
1290        if isinstance(parameter_list_or_dict, dict):
1291            # already a dictionary
1292            pl = parameter_list_or_dict
1293        else:
1294            pl, k = parse_parameter_list(parameter_list_or_dict)
1295            if k < len(parameter_list_or_dict):
1296                raise ParseError('Invalid parameter list', parameter_list_or_dict, k)
1297        self.parmdict = dict(pl)

Sets the optional paramters based upon the parameter list.

The paramter list should be a semicolon-separated name=value string. Any paramters which already exist on this object will be deleted, unless they appear in the given paramter_list.

def set(self, content_type_string, with_parameters=True):
1299    def set(self, content_type_string, with_parameters=True):
1300        """Parses the content type string and sets this object to it's value.
1301
1302        For a more complete description of the arguments, see the
1303        documentation for the parse_media_type() function in this module.
1304        """
1305        mt, k = parse_media_type( content_type_string, with_parameters=with_parameters )
1306        if k < len(content_type_string):
1307            raise ParseError('Not a valid content type',content_type_string, k)
1308        major, minor, pdict = mt
1309        self._set_major( major )
1310        self._set_minor( minor )
1311        self.parmdict = dict(pdict)

Parses the content type string and sets this object to it's value.

For a more complete description of the arguments, see the documentation for the parse_media_type() function in this module.

major
1313    def _get_major(self):
1314        return self._major

Major media classification

minor
1321    def _get_minor(self):
1322        return self._minor

Minor media sub-classification

media_type
1393    def _get_media_type(self):
1394        """Returns the media 'type/subtype' string, without parameters."""
1395        return '%s/%s' % (self.major, self.minor)

Returns the media 'type/subtype' string, without parameters.

def is_wildcard(self):
1399    def is_wildcard(self):
1400        """Returns True if this is a 'something/*' media type.
1401        """
1402        return self.minor == '*'

Returns True if this is a 'something/*' media type.

def is_universal_wildcard(self):
1404    def is_universal_wildcard(self):
1405        """Returns True if this is the unspecified '*/*' media type.
1406        """
1407        return self.major == '*' and self.minor == '*'

Returns True if this is the unspecified '/' media type.

def is_composite(self):
1409    def is_composite(self):
1410        """Is this media type composed of multiple parts.
1411        """
1412        return self.major == 'multipart' or self.major == 'message'

Is this media type composed of multiple parts.

def is_xml(self):
1414    def is_xml(self):
1415        """Returns True if this media type is XML-based.
1416
1417        Note this does not consider text/html to be XML, but
1418        application/xhtml+xml is.
1419        """
1420        return self.minor == 'xml' or self.minor.endswith('+xml')

Returns True if this media type is XML-based.

Note this does not consider text/html to be XML, but application/xhtml+xml is.

content_formdata = content_type('multipart/form-data')
content_urlencoded = content_type('application/x-www-form-urlencoded')
content_byteranges = content_type('multipart/byteranges')
content_opaque = content_type('application/octet-stream')
content_html = content_type('text/html')
content_xhtml = content_type('application/xhtml+xml')
def acceptable_content_type(accept_header, content_types, ignore_wildcard=True):
1431def acceptable_content_type( accept_header, content_types, ignore_wildcard=True ):
1432    """Determines if the given content type is acceptable to the user agent.
1433
1434    The accept_header should be the value present in the HTTP
1435    "Accept:" header.  In mod_python this is typically obtained from
1436    the req.http_headers_in table; in WSGI it is environ["Accept"];
1437    other web frameworks may provide other methods of obtaining it.
1438
1439    Optionally the accept_header parameter can be pre-parsed, as
1440    returned from the parse_accept_header() function in this module.
1441
1442    The content_types argument should either be a single MIME media
1443    type string, or a sequence of them.  It represents the set of
1444    content types that the caller (server) is willing to send.
1445    Generally, the server content_types should not contain any
1446    wildcarded values.
1447
1448    This function determines which content type which is the most
1449    preferred and is acceptable to both the user agent and the server.
1450    If one is negotiated it will return a four-valued tuple like:
1451
1452        (server_content_type, ua_content_range, qvalue, accept_parms)
1453
1454    The first tuple value is one of the server's content_types, while
1455    the remaining tuple values descript which of the client's
1456    acceptable content_types was matched.  In most cases accept_parms
1457    will be an empty list (see description of parse_accept_header()
1458    for more details).
1459
1460    If no content type could be negotiated, then this function will
1461    return None (and the caller should typically cause an HTTP 406 Not
1462    Acceptable as a response).
1463
1464    Note that the wildcarded content type "*/*" sent by the client
1465    will be ignored, since it is often incorrectly sent by web
1466    browsers that don't really mean it.  To override this, call with
1467    ignore_wildcard=False.  Partial wildcards such as "image/*" will
1468    always be processed, but be at a lower priority than a complete
1469    matching type.
1470
1471    See also: RFC 2616 section 14.1, and
1472    <http://www.iana.org/assignments/media-types/>
1473
1474    """
1475    if _is_string(accept_header):
1476        accept_list = parse_accept_header(accept_header)
1477    else:
1478        accept_list = accept_header
1479
1480    if _is_string(content_types):
1481        content_types = [content_types]
1482
1483    server_ctlist = [content_type(ct) for ct in content_types]
1484    del ct
1485
1486    #print 'AC', repr(accept_list)
1487    #print 'SV', repr(server_ctlist)
1488
1489    best = None   # (content_type, qvalue, accept_parms, matchlen)
1490
1491    for server_ct in server_ctlist:
1492        best_for_this = None
1493        for client_ct, qvalue, aargs in accept_list:
1494            if ignore_wildcard and client_ct.is_universal_wildcard():
1495                continue  # */* being ignored
1496
1497            matchlen = 0 # how specifically this one matches (0 is a non-match)
1498            if client_ct.is_universal_wildcard():
1499                matchlen = 1   # */* is a 1
1500            elif client_ct.major == server_ct.major:
1501                if client_ct.minor == '*':  # something/* is a 2
1502                    matchlen = 2
1503                elif client_ct.minor == server_ct.minor: # something/something is a 3
1504                    matchlen = 3
1505                    # must make sure all the parms match too
1506                    for pname, pval in client_ct.parmdict.items():
1507                        sval = server_ct.parmdict.get(pname)
1508                        if pname == 'charset':
1509                            # special case for charset to match aliases
1510                            pval = canonical_charset(pval)
1511                            sval = canonical_charset(sval)
1512                        if sval == pval:
1513                            matchlen = matchlen + 1
1514                        else:
1515                            matchlen = 0
1516                            break
1517                else:
1518                    matchlen = 0
1519
1520            #print 'S',server_ct,'  C',client_ct,'  M',matchlen,'Q',qvalue
1521            if matchlen > 0:
1522                if not best_for_this \
1523                       or matchlen > best_for_this[-1] \
1524                       or (matchlen == best_for_this[-1] and qvalue > best_for_this[2]):
1525                    # This match is better
1526                    best_for_this = (server_ct, client_ct, qvalue, aargs, matchlen)
1527                    #print 'BEST2 NOW', repr(best_for_this)
1528        if not best or \
1529               (best_for_this and best_for_this[2] > best[2]):
1530            best = best_for_this
1531            #print 'BEST NOW', repr(best)
1532    if not best or best[1] <= 0:
1533        return None
1534    return best[:-1]

Determines if the given content type is acceptable to the user agent.

The accept_header should be the value present in the HTTP "Accept:" header. In mod_python this is typically obtained from the req.http_headers_in table; in WSGI it is environ["Accept"]; other web frameworks may provide other methods of obtaining it.

Optionally the accept_header parameter can be pre-parsed, as returned from the parse_accept_header() function in this module.

The content_types argument should either be a single MIME media type string, or a sequence of them. It represents the set of content types that the caller (server) is willing to send. Generally, the server content_types should not contain any wildcarded values.

This function determines which content type which is the most preferred and is acceptable to both the user agent and the server. If one is negotiated it will return a four-valued tuple like:

(server_content_type, ua_content_range, qvalue, accept_parms)

The first tuple value is one of the server's content_types, while the remaining tuple values descript which of the client's acceptable content_types was matched. In most cases accept_parms will be an empty list (see description of parse_accept_header() for more details).

If no content type could be negotiated, then this function will return None (and the caller should typically cause an HTTP 406 Not Acceptable as a response).

Note that the wildcarded content type "/" sent by the client will be ignored, since it is often incorrectly sent by web browsers that don't really mean it. To override this, call with ignore_wildcard=False. Partial wildcards such as "image/*" will always be processed, but be at a lower priority than a complete matching type.

See also: RFC 2616 section 14.1, and http://www.iana.org/assignments/media-types/

character_set_aliases = {'ASCII': 'US-ASCII', 'ISO646-US': 'US-ASCII', 'IBM367': 'US-ASCII', 'CP367': 'US-ASCII', 'CSASCII': 'US-ASCII', 'ANSI_X3.4-1968': 'US-ASCII', 'ISO_646.IRV:1991': 'US-ASCII', 'UTF7': 'UTF-7', 'UTF8': 'UTF-8', 'UTF16': 'UTF-16', 'UTF16LE': 'UTF-16LE', 'UTF16BE': 'UTF-16BE', 'UTF32': 'UTF-32', 'UTF32LE': 'UTF-32LE', 'UTF32BE': 'UTF-32BE', 'UCS2': 'ISO-10646-UCS-2', 'UCS_2': 'ISO-10646-UCS-2', 'UCS-2': 'ISO-10646-UCS-2', 'CSUNICODE': 'ISO-10646-UCS-2', 'UCS4': 'ISO-10646-UCS-4', 'UCS_4': 'ISO-10646-UCS-4', 'UCS-4': 'ISO-10646-UCS-4', 'CSUCS4': 'ISO-10646-UCS-4', 'ISO_8859-1': 'ISO-8859-1', 'LATIN1': 'ISO-8859-1', 'CP819': 'ISO-8859-1', 'IBM819': 'ISO-8859-1', 'ISO_8859-2': 'ISO-8859-2', 'LATIN2': 'ISO-8859-2', 'ISO_8859-3': 'ISO-8859-3', 'LATIN3': 'ISO-8859-3', 'ISO_8859-4': 'ISO-8859-4', 'LATIN4': 'ISO-8859-4', 'ISO_8859-5': 'ISO-8859-5', 'CYRILLIC': 'ISO-8859-5', 'ISO_8859-6': 'ISO-8859-6', 'ARABIC': 'ISO-8859-6', 'ECMA-114': 'ISO-8859-6', 'ISO_8859-6-E': 'ISO-8859-6-E', 'ISO_8859-6-I': 'ISO-8859-6-I', 'ISO_8859-7': 'ISO-8859-7', 'GREEK': 'ISO-8859-7', 'GREEK8': 'ISO-8859-7', 'ECMA-118': 'ISO-8859-7', 'ISO_8859-8': 'ISO-8859-8', 'HEBREW': 'ISO-8859-8', 'ISO_8859-8-E': 'ISO-8859-8-E', 'ISO_8859-8-I': 'ISO-8859-8-I', 'ISO_8859-9': 'ISO-8859-9', 'LATIN5': 'ISO-8859-9', 'ISO_8859-10': 'ISO-8859-10', 'LATIN6': 'ISO-8859-10', 'ISO_8859-13': 'ISO-8859-13', 'ISO_8859-14': 'ISO-8859-14', 'LATIN8': 'ISO-8859-14', 'ISO_8859-15': 'ISO-8859-15', 'LATIN9': 'ISO-8859-15', 'ISO_8859-16': 'ISO-8859-16', 'LATIN10': 'ISO-8859-16'}
def canonical_charset(charset):
1622def canonical_charset(charset):
1623    """Returns the canonical or preferred name of a charset.
1624
1625    Additional character sets can be recognized by this function by
1626    altering the character_set_aliases dictionary in this module.
1627    Charsets which are not recognized are simply converted to
1628    upper-case (as charset names are always case-insensitive).
1629    
1630    See <http://www.iana.org/assignments/character-sets>.
1631
1632    """
1633    # It would be nice to use Python's codecs modules for this, but
1634    # there is no fixed public interface to it's alias mappings.
1635    if not charset:
1636        return charset
1637    uc = charset.upper()
1638    uccon = character_set_aliases.get( uc, uc )
1639    return uccon

Returns the canonical or preferred name of a charset.

Additional character sets can be recognized by this function by altering the character_set_aliases dictionary in this module. Charsets which are not recognized are simply converted to upper-case (as charset names are always case-insensitive).

See http://www.iana.org/assignments/character-sets.

def acceptable_charset( accept_charset_header, charsets, ignore_wildcard=True, default='ISO-8859-1'):
1642def acceptable_charset(accept_charset_header, charsets, ignore_wildcard=True, default='ISO-8859-1'):
1643    """
1644    Determines if the given charset is acceptable to the user agent.
1645
1646    The accept_charset_header should be the value present in the HTTP
1647    "Accept-Charset:" header.  In mod_python this is typically
1648    obtained from the req.http_headers table; in WSGI it is
1649    environ["Accept-Charset"]; other web frameworks may provide other
1650    methods of obtaining it.
1651
1652    Optionally the accept_charset_header parameter can instead be the
1653    list returned from the parse_accept_header() function in this
1654    module.
1655
1656    The charsets argument should either be a charset identifier string,
1657    or a sequence of them.
1658
1659    This function returns the charset identifier string which is the
1660    most prefered and is acceptable to both the user agent and the
1661    caller.  It will return the default value if no charset is negotiable.
1662    
1663    Note that the wildcarded charset "*" will be ignored.  To override
1664    this, call with ignore_wildcard=False.
1665
1666    See also: RFC 2616 section 14.2, and
1667    <http://www.iana.org/assignments/character-sets>
1668
1669    """
1670    if default:
1671        default = canonical_charset(default)
1672
1673    if _is_string(accept_charset_header):
1674        accept_list = parse_accept_header(accept_charset_header)
1675    else:
1676        accept_list = accept_charset_header
1677
1678    if _is_string(charsets):
1679        charsets = [canonical_charset(charsets)]
1680    else:
1681        charsets = [canonical_charset(c) for c in charsets]
1682
1683    # Note per RFC that 'ISO-8859-1' is special, and is implictly in the
1684    # accept list with q=1; unless it is already in the list, or '*' is in the list.
1685
1686    best = None
1687    for c, qvalue, _junk in accept_list:
1688        if c == '*':
1689            default = None
1690            if ignore_wildcard:
1691                continue
1692            if not best or qvalue > best[1]:
1693                best = (c, qvalue)
1694        else:
1695            c = canonical_charset(c)
1696            for test_c in charsets:
1697                if c == default:
1698                    default = None
1699                if c == test_c and (not best or best[0]=='*' or qvalue > best[1]):
1700                    best = (c, qvalue)
1701    if default and default in [test_c.upper() for test_c in charsets]:
1702        best = (default, 1)
1703    if best[0] == '*':
1704        best = (charsets[0], best[1])
1705    return best

Determines if the given charset is acceptable to the user agent.

The accept_charset_header should be the value present in the HTTP "Accept-Charset:" header. In mod_python this is typically obtained from the req.http_headers table; in WSGI it is environ["Accept-Charset"]; other web frameworks may provide other methods of obtaining it.

Optionally the accept_charset_header parameter can instead be the list returned from the parse_accept_header() function in this module.

The charsets argument should either be a charset identifier string, or a sequence of them.

This function returns the charset identifier string which is the most prefered and is acceptable to both the user agent and the caller. It will return the default value if no charset is negotiable.

Note that the wildcarded charset "*" will be ignored. To override this, call with ignore_wildcard=False.

See also: RFC 2616 section 14.2, and http://www.iana.org/assignments/character-sets

class language_tag:
1709class language_tag(object):
1710    """This class represents an RFC 3066 language tag.
1711
1712    Initialize objects of this class with a single string representing
1713    the language tag, such as "en-US".
1714        
1715    Case is insensitive. Wildcarded subtags are ignored or stripped as
1716    they have no significance, so that "en-*" is the same as "en".
1717    However the universal wildcard "*" language tag is kept as-is.
1718
1719    Note that although relational operators such as < are defined,
1720    they only form a partial order based upon specialization.
1721
1722    Thus for example,
1723         "en" <= "en-US"
1724    but,
1725         not "en" <= "de", and
1726         not "de" <= "en".
1727
1728    """
1729
1730    def __init__(self, tagname):
1731        """Initialize objects of this class with a single string representing
1732        the language tag, such as "en-US".  Case is insensitive.
1733
1734        """
1735
1736        self.parts = tagname.lower().split('-')
1737        while len(self.parts) > 1 and self.parts[-1] == '*':
1738            del self.parts[-1]
1739
1740    def __len__(self):
1741        """Number of subtags in this tag."""
1742        if len(self.parts) == 1 and self.parts[0] == '*':
1743            return 0
1744        return len(self.parts)
1745
1746    def __str__(self):
1747        """The standard string form of this language tag."""
1748        a = []
1749        if len(self.parts) >= 1:
1750            a.append(self.parts[0])
1751        if len(self.parts) >= 2:
1752            if len(self.parts[1]) == 2:
1753                a.append( self.parts[1].upper() )
1754            else:
1755                a.append( self.parts[1] )
1756        a.extend( self.parts[2:] )
1757        return '-'.join(a)
1758
1759    def __unicode__(self):
1760        """The unicode string form of this language tag."""
1761        return str(self.__str__())
1762
1763    def __repr__(self):
1764        """The python representation of this language tag."""
1765        s = '%s("%s")' % (self.__class__.__name__, self.__str__())
1766        return s
1767
1768    def superior(self):
1769        """Returns another instance of language_tag which is the superior.
1770
1771        Thus en-US gives en, and en gives *.
1772
1773        """
1774        if len(self) <= 1:
1775            return self.__class__('*')
1776        return self.__class__( '-'.join(self.parts[:-1]) )
1777
1778    def all_superiors(self, include_wildcard=False):
1779        """Returns a list of this language and all it's superiors.
1780
1781        If include_wildcard is False, then "*" will not be among the
1782        output list, unless this language is itself "*".
1783
1784        """
1785        langlist = [ self ]
1786        l = self
1787        while not l.is_universal_wildcard():
1788            l = l.superior()
1789            if l.is_universal_wildcard() and not include_wildcard:
1790                continue
1791            langlist.append(l)
1792        return langlist
1793                
1794    def is_universal_wildcard(self):
1795        """Returns True if this language tag represents all possible
1796        languages, by using the reserved tag of "*".
1797
1798        """
1799        return len(self.parts) == 1 and self.parts[0] == '*'
1800
1801    def dialect_of(self, other, ignore_wildcard=True):
1802        """Is this language a dialect (or subset/specialization) of another.
1803
1804        This method returns True if this language is the same as or a
1805        specialization (dialect) of the other language_tag.
1806
1807        If ignore_wildcard is False, then all languages will be
1808        considered to be a dialect of the special language tag of "*".
1809
1810        """
1811        if not ignore_wildcard and self.is_universal_wildcard():
1812            return True
1813        for i in range( min(len(self), len(other)) ):
1814            if self.parts[i] != other.parts[i]:
1815                return False
1816        if len(self) >= len(other):
1817            return True
1818        return False
1819
1820    def __eq__(self, other):
1821        """== operator. Are the two languages the same?"""
1822
1823        return self.parts == other.parts
1824
1825    def __neq__(self, other):
1826        """!= operator. Are the two languages different?"""
1827
1828        return not self.__eq__(other)
1829
1830    def __lt__(self, other):
1831        """< operator. Returns True if the other language is a more
1832        specialized dialect of this one."""
1833
1834        return other.dialect_of(self) and self != other
1835
1836    def __le__(self, other):
1837        """<= operator. Returns True if the other language is the same
1838        as or a more specialized dialect of this one."""
1839        return other.dialect_of(self)
1840
1841    def __gt__(self, other):
1842        """> operator.  Returns True if this language is a more
1843        specialized dialect of the other one."""
1844
1845        return self.dialect_of(other) and self != other
1846
1847    def __ge__(self, other):
1848        """>= operator.  Returns True if this language is the same as
1849        or a more specialized dialect of the other one."""
1850
1851        return self.dialect_of(other)

This class represents an RFC 3066 language tag.

Initialize objects of this class with a single string representing the language tag, such as "en-US".

Case is insensitive. Wildcarded subtags are ignored or stripped as they have no significance, so that "en-" is the same as "en". However the universal wildcard "" language tag is kept as-is.

Note that although relational operators such as < are defined, they only form a partial order based upon specialization.

Thus for example, "en" <= "en-US" but, not "en" <= "de", and not "de" <= "en".

language_tag(tagname)
1730    def __init__(self, tagname):
1731        """Initialize objects of this class with a single string representing
1732        the language tag, such as "en-US".  Case is insensitive.
1733
1734        """
1735
1736        self.parts = tagname.lower().split('-')
1737        while len(self.parts) > 1 and self.parts[-1] == '*':
1738            del self.parts[-1]

Initialize objects of this class with a single string representing the language tag, such as "en-US". Case is insensitive.

parts
def superior(self):
1768    def superior(self):
1769        """Returns another instance of language_tag which is the superior.
1770
1771        Thus en-US gives en, and en gives *.
1772
1773        """
1774        if len(self) <= 1:
1775            return self.__class__('*')
1776        return self.__class__( '-'.join(self.parts[:-1]) )

Returns another instance of language_tag which is the superior.

Thus en-US gives en, and en gives *.

def all_superiors(self, include_wildcard=False):
1778    def all_superiors(self, include_wildcard=False):
1779        """Returns a list of this language and all it's superiors.
1780
1781        If include_wildcard is False, then "*" will not be among the
1782        output list, unless this language is itself "*".
1783
1784        """
1785        langlist = [ self ]
1786        l = self
1787        while not l.is_universal_wildcard():
1788            l = l.superior()
1789            if l.is_universal_wildcard() and not include_wildcard:
1790                continue
1791            langlist.append(l)
1792        return langlist

Returns a list of this language and all it's superiors.

If include_wildcard is False, then "" will not be among the output list, unless this language is itself "".

def is_universal_wildcard(self):
1794    def is_universal_wildcard(self):
1795        """Returns True if this language tag represents all possible
1796        languages, by using the reserved tag of "*".
1797
1798        """
1799        return len(self.parts) == 1 and self.parts[0] == '*'

Returns True if this language tag represents all possible languages, by using the reserved tag of "*".

def dialect_of(self, other, ignore_wildcard=True):
1801    def dialect_of(self, other, ignore_wildcard=True):
1802        """Is this language a dialect (or subset/specialization) of another.
1803
1804        This method returns True if this language is the same as or a
1805        specialization (dialect) of the other language_tag.
1806
1807        If ignore_wildcard is False, then all languages will be
1808        considered to be a dialect of the special language tag of "*".
1809
1810        """
1811        if not ignore_wildcard and self.is_universal_wildcard():
1812            return True
1813        for i in range( min(len(self), len(other)) ):
1814            if self.parts[i] != other.parts[i]:
1815                return False
1816        if len(self) >= len(other):
1817            return True
1818        return False

Is this language a dialect (or subset/specialization) of another.

This method returns True if this language is the same as or a specialization (dialect) of the other language_tag.

If ignore_wildcard is False, then all languages will be considered to be a dialect of the special language tag of "*".

def parse_accept_language_header(header_value):
1854def parse_accept_language_header( header_value ):
1855    """Parses the Accept-Language header.
1856
1857    Returns a list of tuples, each like:
1858
1859        (language_tag, qvalue, accept_parameters)
1860
1861    """
1862    alist, k = parse_qvalue_accept_list( header_value)
1863    if k < len(header_value):
1864        raise ParseError('Accept-Language header is invalid',header_value,k)
1865
1866    langlist = []
1867    for token, langparms, q, acptparms in alist:
1868        if langparms:
1869            raise ParseError('Language tag may not have any parameters',header_value,0)
1870        lang = language_tag( token )
1871        langlist.append( (lang, q, acptparms) )
1872
1873    return langlist

Parses the Accept-Language header.

Returns a list of tuples, each like:

(language_tag, qvalue, accept_parameters)
def acceptable_language( accept_header, server_languages, ignore_wildcard=True, assume_superiors=True):
1876def acceptable_language( accept_header, server_languages, ignore_wildcard=True, assume_superiors=True ):
1877    """Determines if the given language is acceptable to the user agent.
1878
1879    The accept_header should be the value present in the HTTP
1880    "Accept-Language:" header.  In mod_python this is typically
1881    obtained from the req.http_headers_in table; in WSGI it is
1882    environ["Accept-Language"]; other web frameworks may provide other
1883    methods of obtaining it.
1884
1885    Optionally the accept_header parameter can be pre-parsed, as
1886    returned by the parse_accept_language_header() function defined in
1887    this module.
1888
1889    The server_languages argument should either be a single language
1890    string, a language_tag object, or a sequence of them.  It
1891    represents the set of languages that the server is willing to
1892    send to the user agent.
1893
1894    Note that the wildcarded language tag "*" will be ignored.  To
1895    override this, call with ignore_wildcard=False, and even then
1896    it will be the lowest-priority choice regardless of it's
1897    quality factor (as per HTTP spec).
1898
1899    If the assume_superiors is True then it the languages that the
1900    browser accepts will automatically include all superior languages.
1901    Any superior languages which must be added are done so with one
1902    half the qvalue of the language which is present.  For example, if
1903    the accept string is "en-US", then it will be treated as if it
1904    were "en-US, en;q=0.5".  Note that although the HTTP 1.1 spec says
1905    that browsers are supposed to encourage users to configure all
1906    acceptable languages, sometimes they don't, thus the ability
1907    for this function to assume this.  But setting assume_superiors
1908    to False will insure strict adherence to the HTTP 1.1 spec; which
1909    means that if the browser accepts "en-US", then it will not
1910    be acceptable to send just "en" to it.
1911
1912    This function returns the language which is the most prefered and
1913    is acceptable to both the user agent and the caller.  It will
1914    return None if no language is negotiable, otherwise the return
1915    value is always an instance of language_tag.
1916
1917    See also: RFC 3066 <http://www.ietf.org/rfc/rfc3066.txt>, and
1918    ISO 639, links at <http://en.wikipedia.org/wiki/ISO_639>, and
1919    <http://www.iana.org/assignments/language-tags>.
1920    
1921    """
1922    # Note special instructions from RFC 2616 sect. 14.1:
1923    #   "The language quality factor assigned to a language-tag by the
1924    #   Accept-Language field is the quality value of the longest
1925    #   language- range in the field that matches the language-tag."
1926
1927    if _is_string(accept_header):
1928        accept_list = parse_accept_language_header(accept_header)
1929    else:
1930        accept_list = accept_header
1931
1932    # Possibly add in any "missing" languages that the browser may
1933    # have forgotten to include in the list. Insure list is sorted so
1934    # more general languages come before more specific ones.
1935
1936    accept_list.sort()
1937    all_tags = [a[0] for a in accept_list]
1938    if assume_superiors:
1939        to_add = []
1940        for langtag, qvalue, _args in accept_list:
1941            if len(langtag) >= 2:
1942                for suptag in langtag.all_superiors( include_wildcard=False ):
1943                    if suptag not in all_tags:
1944                        # Add in superior at half the qvalue
1945                        to_add.append( (suptag, qvalue / 2, '') )
1946                        all_tags.append( suptag )
1947        accept_list.extend( to_add )
1948
1949    # Convert server_languages to a list of language_tags
1950    if _is_string(server_languages):
1951        server_languages = [language_tag(server_languages)]
1952    elif isinstance(server_languages, language_tag):
1953        server_languages = [server_languages]
1954    else:
1955        server_languages = [language_tag(lang) for lang in server_languages]
1956
1957    # Select the best one
1958    best = None  # tuple (langtag, qvalue, matchlen)
1959    
1960    for langtag, qvalue, _args in accept_list:
1961        # aargs is ignored for Accept-Language
1962        if qvalue <= 0:
1963            continue # UA doesn't accept this language
1964
1965        if ignore_wildcard and langtag.is_universal_wildcard():
1966            continue  # "*" being ignored
1967
1968        for svrlang in server_languages:
1969            # The best match is determined first by the quality factor,
1970            # and then by the most specific match.
1971
1972            matchlen = -1 # how specifically this one matches (0 is a non-match)
1973            if svrlang.dialect_of( langtag, ignore_wildcard=ignore_wildcard ):
1974                matchlen = len(langtag)
1975                if not best \
1976                       or matchlen > best[2] \
1977                       or (matchlen == best[2] and qvalue > best[1]):
1978                    # This match is better
1979                    best = (langtag, qvalue, matchlen)
1980    if not best:
1981        return None
1982    return best[0]

Determines if the given language is acceptable to the user agent.

The accept_header should be the value present in the HTTP "Accept-Language:" header. In mod_python this is typically obtained from the req.http_headers_in table; in WSGI it is environ["Accept-Language"]; other web frameworks may provide other methods of obtaining it.

Optionally the accept_header parameter can be pre-parsed, as returned by the parse_accept_language_header() function defined in this module.

The server_languages argument should either be a single language string, a language_tag object, or a sequence of them. It represents the set of languages that the server is willing to send to the user agent.

Note that the wildcarded language tag "*" will be ignored. To override this, call with ignore_wildcard=False, and even then it will be the lowest-priority choice regardless of it's quality factor (as per HTTP spec).

If the assume_superiors is True then it the languages that the browser accepts will automatically include all superior languages. Any superior languages which must be added are done so with one half the qvalue of the language which is present. For example, if the accept string is "en-US", then it will be treated as if it were "en-US, en;q=0.5". Note that although the HTTP 1.1 spec says that browsers are supposed to encourage users to configure all acceptable languages, sometimes they don't, thus the ability for this function to assume this. But setting assume_superiors to False will insure strict adherence to the HTTP 1.1 spec; which means that if the browser accepts "en-US", then it will not be acceptable to send just "en" to it.

This function returns the language which is the most prefered and is acceptable to both the user agent and the caller. It will return None if no language is negotiable, otherwise the return value is always an instance of language_tag.

See also: RFC 3066 http://www.ietf.org/rfc/rfc3066.txt, and ISO 639, links at http://en.wikipedia.org/wiki/ISO_639, and http://www.iana.org/assignments/language-tags.