pyRdfa.parse

The core parsing function of RDFa. Some details are put into other modules to make it clearer to update/modify (e.g., generation of C{@property} values, or managing the current state).

Note that the entry point (L{parse_one_node}) bifurcates into an RDFa 1.0 and RDFa 1.1 version, ie, to L{_parse_1_0} and L{_parse_1_1}. Some of the parsing details (management of C{@property}, list facilities, changed behavior on C{@typeof})) have changed between versions and forcing the two into one function would be counter productive.

@summary: RDFa core parser processing step @organization: U{World Wide Web Consortiumhttp://www.w3.org} @author: U{Ivan Herman} @license: This software is available for use under the U{W3C® SOFTWARE NOTICE AND LICENSE}

  1# -*- coding: utf-8 -*-
  2"""
  3The core parsing function of RDFa. Some details are
  4put into other modules to make it clearer to update/modify (e.g., generation of C{@property} values, or managing the current state).
  5
  6Note that the entry point (L{parse_one_node}) bifurcates into an RDFa 1.0 and RDFa 1.1 version, ie,
  7to L{_parse_1_0} and L{_parse_1_1}. Some of the parsing details (management of C{@property}, list facilities, changed behavior on C{@typeof})) have changed
  8between versions and forcing the two into one function would be counter productive.
  9
 10@summary: RDFa core parser processing step
 11@organization: U{World Wide Web Consortium<http://www.w3.org>}
 12@author: U{Ivan Herman<a href="http://www.w3.org/People/Ivan/">}
 13@license: This software is available for use under the
 14U{W3C® SOFTWARE NOTICE AND LICENSE<href="http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231">}
 15"""
 16
 17"""
 18$Id: parse.py,v 1.19 2013-01-07 12:46:43 ivan Exp $
 19$Date: 2013-01-07 12:46:43 $
 20"""
 21
 22from .state import ExecutionContext
 23from .property import ProcessProperty
 24from .embeddedRDF import handle_embeddedRDF
 25from .host import HostLanguage, host_dom_transforms
 26
 27from rdflib import URIRef
 28from rdflib import BNode
 29from rdflib import RDF as ns_rdf
 30
 31from . import IncorrectBlankNodeUsage, err_no_blank_node
 32from .utils import has_one_of_attributes
 33
 34#######################################################################
 35def parse_one_node(node, graph, parent_object, incoming_state, parent_incomplete_triples):
 36    """The (recursive) step of handling a single node. 
 37    
 38    This entry just switches between the RDFa 1.0 and RDFa 1.1 versions for parsing. This method is only invoked once,
 39    actually, from the top level; the recursion then happens in the L{_parse_1_0} and L{_parse_1_1} methods for
 40    RDFa 1.0 and RDFa 1.1, respectively.
 41
 42    @param node: the DOM node to handle
 43    @param graph: the RDF graph
 44    @type graph: RDFLib's Graph object instance
 45    @param parent_object: the parent's object, as an RDFLib URIRef
 46    @param incoming_state: the inherited state (namespaces, lang, etc.)
 47    @type incoming_state: L{state.ExecutionContext}
 48    @param parent_incomplete_triples: list of hanging triples (the missing resource set to None) to be handled (or not)
 49    by the current node.
 50    @return: whether the caller has to complete it's parent's incomplete triples
 51    @rtype: Boolean
 52    """
 53    # Branch according to versions.
 54    if incoming_state.rdfa_version >= "1.1":
 55        _parse_1_1(node, graph, parent_object, incoming_state, parent_incomplete_triples)
 56    else:
 57        _parse_1_0(node, graph, parent_object, incoming_state, parent_incomplete_triples)
 58
 59#######################################################################
 60def _parse_1_1(node, graph, parent_object, incoming_state, parent_incomplete_triples):
 61    """The (recursive) step of handling a single node. See the
 62    U{RDFa 1.1 Core document<http://www.w3.org/TR/rdfa-core/>} for further details.
 63    
 64    This is the RDFa 1.1 version.
 65
 66    @param node: the DOM node to handle
 67    @param graph: the RDF graph
 68    @type graph: RDFLib's Graph object instance
 69    @param parent_object: the parent's object, as an RDFLib URIRef
 70    @param incoming_state: the inherited state (namespaces, lang, etc.)
 71    @type incoming_state: L{state.ExecutionContext}
 72    @param parent_incomplete_triples: list of hanging triples (the missing resource set to None) to be handled (or not)
 73    by the current node.
 74    @return: whether the caller has to complete it's parent's incomplete triples
 75    @rtype: Boolean
 76    """
 77    def header_check(p_obj):
 78        """Special disposition for the HTML <head> and <body> elements..."""
 79        if state.options.host_language in [ HostLanguage.xhtml, HostLanguage.html5, HostLanguage.xhtml5 ]:
 80            if node.nodeName == "head" or node.nodeName == "body":
 81                if not has_one_of_attributes(node, "about", "resource", "src", "href"):
 82                    return p_obj
 83        else:
 84            return None
 85
 86    def lite_check():
 87        if state.options.check_lite and state.options.host_language in [ HostLanguage.html5, HostLanguage.xhtml5, HostLanguage.xhtml ]:
 88            if node.tagName == "link" and node.hasAttribute("rel") and state.term_or_curie.CURIE_to_URI(node.getAttribute("rel")) != None:
 89                state.options.add_warning("In RDFa Lite, attribute @rel in <link> is only used in non-RDFa way (consider using @property)", node=node)
 90
 91    # Update the state. This means, for example, the possible local settings of
 92    # namespaces and lang
 93    state = ExecutionContext(node, graph, inherited_state=incoming_state)
 94
 95    #---------------------------------------------------------------------------------
 96    # Extra warning check on RDFa Lite
 97    lite_check()
 98    
 99    #---------------------------------------------------------------------------------
100    # Handling the role attribute is pretty much orthogonal to everything else...
101    handle_role_attribute(node, graph, state)
102
103    #---------------------------------------------------------------------------------
104    # Handle the special case for embedded RDF, eg, in SVG1.2. 
105    # This may add some triples to the target graph that does not originate from RDFa parsing
106    # If the function return TRUE, that means that an rdf:RDF has been found. No
107    # RDFa parsing should be done on that subtree, so we simply return...
108    if state.options.embedded_rdf and node.nodeType == node.ELEMENT_NODE and handle_embeddedRDF(node, graph, state) : 
109        return    
110
111    #---------------------------------------------------------------------------------
112    # calling the host language specific massaging of the DOM
113    if state.options.host_language in host_dom_transforms and node.nodeType == node.ELEMENT_NODE:
114        for func in host_dom_transforms[state.options.host_language] : func(node, state)
115
116    #---------------------------------------------------------------------------------
117    # First, let us check whether there is anything to do at all. Ie,
118    # whether there is any relevant RDFa specific attribute on the element
119    #
120    if not has_one_of_attributes(node, "href", "resource", "about", "property", "rel", "rev", "typeof", "src", "vocab", "prefix"):
121        # nop, there is nothing to do here, just go down the tree and return...
122        for n in node.childNodes:
123            if n.nodeType == node.ELEMENT_NODE : parse_one_node(n, graph, parent_object, state, parent_incomplete_triples)
124        return
125
126    #-----------------------------------------------------------------
127    # The goal is to establish the subject and object for local processing
128    # The behaviour is slightly different depending on the presense or not
129    # of the @rel/@rev attributes
130    current_subject = None
131    current_object  = None
132    typed_resource    = None
133    
134    if has_one_of_attributes(node, "rel", "rev") :
135        # in this case there is the notion of 'left' and 'right' of @rel/@rev
136        # in establishing the new Subject and the objectResource
137        current_subject = header_check(parent_object)
138
139        # set first the subject
140        if node.hasAttribute("about"):
141            current_subject = state.getURI("about")
142            if node.hasAttribute("typeof") : typed_resource = current_subject
143            
144        # get_URI may return None in case of an illegal CURIE, so
145        # we have to be careful here, not use only an 'else'
146        if current_subject == None:
147            current_subject = parent_object
148        else:
149            state.reset_list_mapping(origin = current_subject)
150        
151        # set the object resource
152        current_object = state.getResource("resource", "href", "src")
153            
154        if node.hasAttribute("typeof") and not node.hasAttribute("about"):
155            if current_object == None:
156                current_object = BNode()
157            typed_resource = current_object
158        
159        if not node.hasAttribute("inlist") and current_object != None:
160            # In this case the newly defined object is, in fact, the head of the list
161            # just reset the whole thing.
162            state.reset_list_mapping(origin = current_object)
163
164    elif  node.hasAttribute("property") and not has_one_of_attributes(node, "content", "datatype"):
165        current_subject = header_check(parent_object)
166
167        # this is the case when the property may take hold of @src and friends...
168        if node.hasAttribute("about"):
169            current_subject = state.getURI("about")
170            if node.hasAttribute("typeof") : typed_resource = current_subject
171
172        # getURI may return None in case of an illegal CURIE, so
173        # we have to be careful here, not use only an 'else'
174        if current_subject == None:
175            current_subject = parent_object
176        else:
177            state.reset_list_mapping(origin = current_subject)
178
179        if typed_resource == None and node.hasAttribute("typeof"):
180            typed_resource = state.getResource("resource", "href", "src")
181            if typed_resource == None:
182                typed_resource = BNode()
183            current_object = typed_resource
184        else:
185            current_object = current_subject
186            
187    else:
188        current_subject = header_check(parent_object)
189
190        # in this case all the various 'resource' setting attributes
191        # behave identically, though they also have their own priority
192        if current_subject == None:
193            current_subject = state.getResource("about", "resource", "href", "src")
194            
195        # get_URI_ref may return None in case of an illegal CURIE, so
196        # we have to be careful here, not use only an 'else'
197        if current_subject == None:
198            if node.hasAttribute("typeof"):
199                current_subject = BNode()
200                state.reset_list_mapping(origin = current_subject)
201            else:
202                current_subject = parent_object
203        else:
204            state.reset_list_mapping(origin = current_subject)
205
206        # in this case no non-literal triples will be generated, so the
207        # only role of the current_object Resource is to be transferred to
208        # the children node
209        current_object = current_subject
210        if node.hasAttribute("typeof") : typed_resource = current_subject
211        
212    # ---------------------------------------------------------------------
213    ## The possible typeof indicates a number of type statements on the typed resource
214    for defined_type in state.getURI("typeof"):
215        if typed_resource:
216            graph.add((typed_resource, ns_rdf["type"], defined_type))
217
218    # ---------------------------------------------------------------------
219    # In case of @rel/@rev, either triples or incomplete triples are generated
220    # the (possible) incomplete triples are collected, to be forwarded to the children
221    incomplete_triples  = []
222    for prop in state.getURI("rel"):
223        if not isinstance(prop,BNode):
224            if node.hasAttribute("inlist"):
225                if current_object != None:
226                    # Add the content to the list. Note that if the same list
227                    # was initialized, at some point, by a None, it will be
228                    # overwritten by this real content
229                    state.add_to_list_mapping(prop, current_object)
230                else:
231                    # Add a dummy entry to the list... Note that
232                    # if that list was initialized already with a real content
233                    # this call will have no effect
234                    state.add_to_list_mapping(prop, None)
235                    
236                    # Add a placeholder into the hanging rels
237                    incomplete_triples.append( (None, prop, None) )
238            else:
239                theTriple = (current_subject, prop, current_object)
240                if current_object != None:
241                    graph.add(theTriple)
242                else:
243                    incomplete_triples.append(theTriple)
244        else:
245            state.options.add_warning(err_no_blank_node % "rel", warning_type=IncorrectBlankNodeUsage, node=node.nodeName)
246
247    for prop in state.getURI("rev"):
248        if not isinstance(prop,BNode):
249            theTriple = (current_object,prop,current_subject)
250            if current_object != None:
251                graph.add(theTriple)
252            else:
253                incomplete_triples.append(theTriple)
254        else:
255            state.options.add_warning(err_no_blank_node % "rev", warning_type=IncorrectBlankNodeUsage, node=node.nodeName)
256
257    # ----------------------------------------------------------------------
258    # Generation of the @property values, including literals. The newSubject is the subject
259    # A particularity of property is that it stops the parsing down the DOM tree if an XML Literal is generated,
260    # because everything down there is part of the generated literal. 
261    if node.hasAttribute("property"):
262        ProcessProperty(node, graph, current_subject, state, typed_resource).generate_1_1()
263
264    # ----------------------------------------------------------------------
265    # Setting the current object to a bnode is setting up a possible resource
266    # for the incomplete triples downwards
267    if current_object == None:
268        object_to_children = BNode()
269    else:
270        object_to_children = current_object
271
272    #-----------------------------------------------------------------------
273    # Here is the recursion step for all the children
274    for n in node.childNodes:
275        if n.nodeType == node.ELEMENT_NODE : 
276            _parse_1_1(n, graph, object_to_children, state, incomplete_triples)
277
278    # ---------------------------------------------------------------------
279    # At this point, the parent's incomplete triples may be completed
280    for (s,p,o) in parent_incomplete_triples:
281        if s == None and o == None:
282            # This is an encoded version of a hanging rel for a collection:
283            incoming_state.add_to_list_mapping( p, current_subject )
284        else:
285            if s == None : s = current_subject
286            if o == None : o = current_subject
287            graph.add((s,p,o))
288
289    # Generate the lists, if any and if this is the level where a new list was originally created    
290    if state.new_list and not state.list_empty():
291        for prop in state.get_list_props():
292            vals  = state.get_list_value(prop)
293            if vals == None:
294                # This was an empty list, in fact, ie, the list has been initiated by a <xxx rel="prop" inlist>
295                # but no list content has ever been added
296                graph.add( (state.get_list_origin(), prop, ns_rdf["nil"]) )
297            else:
298                heads = [ BNode() for _r in vals ] + [ ns_rdf["nil"] ]
299                for i in range(0, len(vals)):
300                    graph.add( (heads[i], ns_rdf["first"], vals[i]) )
301                    graph.add( (heads[i], ns_rdf["rest"],  heads[i+1]) )
302                # Anchor the list
303                graph.add( (state.get_list_origin(), prop, heads[0]) )
304
305    # -------------------------------------------------------------------
306    # This should be it...
307    # -------------------------------------------------------------------
308    return
309
310
311##################################################################################################################
312def _parse_1_0(node, graph, parent_object, incoming_state, parent_incomplete_triples):
313    """The (recursive) step of handling a single node. See the
314    U{RDFa 1.0 syntax document<http://www.w3.org/TR/rdfa-syntax>} for further details.
315    
316    This is the RDFa 1.0 version.
317
318    @param node: the DOM node to handle
319    @param graph: the RDF graph
320    @type graph: RDFLib's Graph object instance
321    @param parent_object: the parent's object, as an RDFLib URIRef
322    @param incoming_state: the inherited state (namespaces, lang, etc.)
323    @type incoming_state: L{state.ExecutionContext}
324    @param parent_incomplete_triples: list of hanging triples (the missing resource set to None) to be handled (or not)
325    by the current node.
326    @return: whether the caller has to complete it's parent's incomplete triples
327    @rtype: Boolean
328    """
329
330    # Update the state. This means, for example, the possible local settings of
331    # namespaces and lang
332    state = ExecutionContext(node, graph, inherited_state=incoming_state)
333
334    #---------------------------------------------------------------------------------
335    # Handling the role attribute is pretty much orthogonal to everything else...
336    handle_role_attribute(node, graph, state)
337
338    #---------------------------------------------------------------------------------
339    # Handle the special case for embedded RDF, eg, in SVG1.2. 
340    # This may add some triples to the target graph that does not originate from RDFa parsing
341    # If the function return TRUE, that means that an rdf:RDF has been found. No
342    # RDFa parsing should be done on that subtree, so we simply return...
343    if state.options.embedded_rdf and node.nodeType == node.ELEMENT_NODE and handle_embeddedRDF(node, graph, state) : 
344        return    
345
346    #---------------------------------------------------------------------------------
347    # calling the host language specific massaging of the DOM
348    if state.options.host_language in host_dom_transforms and node.nodeType == node.ELEMENT_NODE:
349        for func in host_dom_transforms[state.options.host_language] : func(node, state)
350
351    #---------------------------------------------------------------------------------
352    # First, let us check whether there is anything to do at all. Ie,
353    # whether there is any relevant RDFa specific attribute on the element
354    #
355    if not has_one_of_attributes(node, "href", "resource", "about", "property", "rel", "rev", "typeof", "src"):
356        # nop, there is nothing to do here, just go down the tree and return...
357        for n in node.childNodes:
358            if n.nodeType == node.ELEMENT_NODE : parse_one_node(n, graph, parent_object, state, parent_incomplete_triples)
359        return
360
361    #-----------------------------------------------------------------
362    # The goal is to establish the subject and object for local processing
363    # The behaviour is slightly different depending on the presense or not
364    # of the @rel/@rev attributes
365    current_subject = None
366    current_object = None
367
368    if has_one_of_attributes(node, "rel", "rev") :
369        # in this case there is the notion of 'left' and 'right' of @rel/@rev
370        # in establishing the new Subject and the objectResource
371        current_subject = state.getResource("about","src")
372
373        # get_URI may return None in case of an illegal CURIE, so
374        # we have to be careful here, not use only an 'else'
375        if current_subject == None:
376            if node.hasAttribute("typeof"):
377                current_subject = BNode()
378            else:
379                current_subject = parent_object
380        else:
381            state.reset_list_mapping(origin = current_subject)
382        
383        # set the object resource
384        current_object = state.getResource("resource", "href")
385        
386    else:
387        # in this case all the various 'resource' setting attributes
388        # behave identically, though they also have their own priority
389        current_subject = state.getResource("about", "src", "resource", "href")
390        
391        # get_URI_ref may return None in case of an illegal CURIE, so
392        # we have to be careful here, not use only an 'else'
393        if current_subject == None:
394            if node.hasAttribute("typeof"):
395                current_subject = BNode()
396            else:
397                current_subject = parent_object
398            current_subject = parent_object
399        else:
400            state.reset_list_mapping(origin = current_subject)
401
402        # in this case no non-literal triples will be generated, so the
403        # only role of the current_object Resource is to be transferred to
404        # the children node
405        current_object = current_subject
406
407    # ---------------------------------------------------------------------
408    ## The possible typeof indicates a number of type statements on the new Subject
409    for defined_type in state.getURI("typeof"):
410        graph.add((current_subject, ns_rdf["type"], defined_type))
411
412    # ---------------------------------------------------------------------
413    # In case of @rel/@rev, either triples or incomplete triples are generated
414    # the (possible) incomplete triples are collected, to be forwarded to the children
415    incomplete_triples  = []
416    for prop in state.getURI("rel"):
417        if not isinstance(prop,BNode):
418            theTriple = (current_subject, prop, current_object)
419            if current_object != None:
420                graph.add(theTriple)
421            else:
422                incomplete_triples.append(theTriple)
423        else:
424            state.options.add_warning(err_no_blank_node % "rel", warning_type=IncorrectBlankNodeUsage, node=node.nodeName)
425
426    for prop in state.getURI("rev"):
427        if not isinstance(prop,BNode):
428            theTriple = (current_object,prop,current_subject)
429            if current_object != None:
430                graph.add(theTriple)
431            else:
432                incomplete_triples.append(theTriple)
433        else:
434            state.options.add_warning(err_no_blank_node % "rev", warning_type=IncorrectBlankNodeUsage, node=node.nodeName)
435
436    # ----------------------------------------------------------------------
437    # Generation of the literal values. The newSubject is the subject
438    # A particularity of property is that it stops the parsing down the DOM tree if an XML Literal is generated,
439    # because everything down there is part of the generated literal. 
440    if node.hasAttribute("property"):
441        ProcessProperty(node, graph, current_subject, state).generate_1_0()
442
443    # ----------------------------------------------------------------------
444    # Setting the current object to a bnode is setting up a possible resource
445    # for the incomplete triples downwards
446    if current_object == None:
447        object_to_children = BNode()
448    else:
449        object_to_children = current_object
450
451    #-----------------------------------------------------------------------
452    # Here is the recursion step for all the children
453    for n in node.childNodes:
454        if n.nodeType == node.ELEMENT_NODE : 
455            _parse_1_0(n, graph, object_to_children, state, incomplete_triples)
456
457    # ---------------------------------------------------------------------
458    # At this point, the parent's incomplete triples may be completed
459    for (s,p,o) in parent_incomplete_triples:
460        if s == None and o == None:
461            # This is an encoded version of a hanging rel for a collection:
462            incoming_state.add_to_list_mapping( p, current_subject )
463        else:
464            if s == None : s = current_subject
465            if o == None : o = current_subject
466            graph.add((s,p,o))
467
468    # -------------------------------------------------------------------
469    # This should be it...
470    # -------------------------------------------------------------------
471    return
472
473
474#######################################################################
475# Handle the role attribute
476def handle_role_attribute(node, graph, state):
477    """
478    Handling the role attribute, according to http://www.w3.org/TR/role-attribute/#using-role-in-conjunction-with-rdfa
479    @param node: the DOM node to handle
480    @param graph: the RDF graph
481    @type graph: RDFLib's Graph object instance
482    @param state: the inherited state (namespaces, lang, etc.)
483    @type state: L{state.ExecutionContext}
484    """
485    if node.hasAttribute("role"):
486        if node.hasAttribute("id"):
487            i = node.getAttribute("id").strip()
488            subject = URIRef(state.base + '#' + i)
489        else:
490            subject = BNode()
491        predicate = URIRef('http://www.w3.org/1999/xhtml/vocab#role')
492        for obj in state.getURI("role"):
493            graph.add((subject, predicate, obj))
def parse_one_node( node, graph, parent_object, incoming_state, parent_incomplete_triples):
36def parse_one_node(node, graph, parent_object, incoming_state, parent_incomplete_triples):
37    """The (recursive) step of handling a single node. 
38    
39    This entry just switches between the RDFa 1.0 and RDFa 1.1 versions for parsing. This method is only invoked once,
40    actually, from the top level; the recursion then happens in the L{_parse_1_0} and L{_parse_1_1} methods for
41    RDFa 1.0 and RDFa 1.1, respectively.
42
43    @param node: the DOM node to handle
44    @param graph: the RDF graph
45    @type graph: RDFLib's Graph object instance
46    @param parent_object: the parent's object, as an RDFLib URIRef
47    @param incoming_state: the inherited state (namespaces, lang, etc.)
48    @type incoming_state: L{state.ExecutionContext}
49    @param parent_incomplete_triples: list of hanging triples (the missing resource set to None) to be handled (or not)
50    by the current node.
51    @return: whether the caller has to complete it's parent's incomplete triples
52    @rtype: Boolean
53    """
54    # Branch according to versions.
55    if incoming_state.rdfa_version >= "1.1":
56        _parse_1_1(node, graph, parent_object, incoming_state, parent_incomplete_triples)
57    else:
58        _parse_1_0(node, graph, parent_object, incoming_state, parent_incomplete_triples)

The (recursive) step of handling a single node.

This entry just switches between the RDFa 1.0 and RDFa 1.1 versions for parsing. This method is only invoked once, actually, from the top level; the recursion then happens in the L{_parse_1_0} and L{_parse_1_1} methods for RDFa 1.0 and RDFa 1.1, respectively.

@param node: the DOM node to handle @param graph: the RDF graph @type graph: RDFLib's Graph object instance @param parent_object: the parent's object, as an RDFLib URIRef @param incoming_state: the inherited state (namespaces, lang, etc.) @type incoming_state: L{state.ExecutionContext} @param parent_incomplete_triples: list of hanging triples (the missing resource set to None) to be handled (or not) by the current node. @return: whether the caller has to complete it's parent's incomplete triples @rtype: Boolean

def handle_role_attribute(node, graph, state):
477def handle_role_attribute(node, graph, state):
478    """
479    Handling the role attribute, according to http://www.w3.org/TR/role-attribute/#using-role-in-conjunction-with-rdfa
480    @param node: the DOM node to handle
481    @param graph: the RDF graph
482    @type graph: RDFLib's Graph object instance
483    @param state: the inherited state (namespaces, lang, etc.)
484    @type state: L{state.ExecutionContext}
485    """
486    if node.hasAttribute("role"):
487        if node.hasAttribute("id"):
488            i = node.getAttribute("id").strip()
489            subject = URIRef(state.base + '#' + i)
490        else:
491            subject = BNode()
492        predicate = URIRef('http://www.w3.org/1999/xhtml/vocab#role')
493        for obj in state.getURI("role"):
494            graph.add((subject, predicate, obj))

Handling the role attribute, according to http://www.w3.org/TR/role-attribute/#using-role-in-conjunction-with-rdfa @param node: the DOM node to handle @param graph: the RDF graph @type graph: RDFLib's Graph object instance @param state: the inherited state (namespaces, lang, etc.) @type state: L{state.ExecutionContext}