pyRdfa.parse
The core parsing function of RDFa. Some details are put into other modules to make it clearer to update/modify (e.g., generation of C{@property} values, or managing the current state).
Note that the entry point (L{parse_one_node}) bifurcates into an RDFa 1.0 and RDFa 1.1 version, ie, to L{_parse_1_0} and L{_parse_1_1}. Some of the parsing details (management of C{@property}, list facilities, changed behavior on C{@typeof})) have changed between versions and forcing the two into one function would be counter productive.
@summary: RDFa core parser processing step
@organization: U{World Wide Web Consortiumhttp://www.w3.org}
@author: U{Ivan Herman}
@license: This software is available for use under the
U{W3C® SOFTWARE NOTICE AND LICENSE
1# -*- coding: utf-8 -*- 2""" 3The core parsing function of RDFa. Some details are 4put into other modules to make it clearer to update/modify (e.g., generation of C{@property} values, or managing the current state). 5 6Note that the entry point (L{parse_one_node}) bifurcates into an RDFa 1.0 and RDFa 1.1 version, ie, 7to L{_parse_1_0} and L{_parse_1_1}. Some of the parsing details (management of C{@property}, list facilities, changed behavior on C{@typeof})) have changed 8between versions and forcing the two into one function would be counter productive. 9 10@summary: RDFa core parser processing step 11@organization: U{World Wide Web Consortium<http://www.w3.org>} 12@author: U{Ivan Herman<a href="http://www.w3.org/People/Ivan/">} 13@license: This software is available for use under the 14U{W3C® SOFTWARE NOTICE AND LICENSE<href="http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231">} 15""" 16 17""" 18$Id: parse.py,v 1.19 2013-01-07 12:46:43 ivan Exp $ 19$Date: 2013-01-07 12:46:43 $ 20""" 21 22from .state import ExecutionContext 23from .property import ProcessProperty 24from .embeddedRDF import handle_embeddedRDF 25from .host import HostLanguage, host_dom_transforms 26 27from rdflib import URIRef 28from rdflib import BNode 29from rdflib import RDF as ns_rdf 30 31from . import IncorrectBlankNodeUsage, err_no_blank_node 32from .utils import has_one_of_attributes 33 34####################################################################### 35def parse_one_node(node, graph, parent_object, incoming_state, parent_incomplete_triples): 36 """The (recursive) step of handling a single node. 37 38 This entry just switches between the RDFa 1.0 and RDFa 1.1 versions for parsing. This method is only invoked once, 39 actually, from the top level; the recursion then happens in the L{_parse_1_0} and L{_parse_1_1} methods for 40 RDFa 1.0 and RDFa 1.1, respectively. 41 42 @param node: the DOM node to handle 43 @param graph: the RDF graph 44 @type graph: RDFLib's Graph object instance 45 @param parent_object: the parent's object, as an RDFLib URIRef 46 @param incoming_state: the inherited state (namespaces, lang, etc.) 47 @type incoming_state: L{state.ExecutionContext} 48 @param parent_incomplete_triples: list of hanging triples (the missing resource set to None) to be handled (or not) 49 by the current node. 50 @return: whether the caller has to complete it's parent's incomplete triples 51 @rtype: Boolean 52 """ 53 # Branch according to versions. 54 if incoming_state.rdfa_version >= "1.1": 55 _parse_1_1(node, graph, parent_object, incoming_state, parent_incomplete_triples) 56 else: 57 _parse_1_0(node, graph, parent_object, incoming_state, parent_incomplete_triples) 58 59####################################################################### 60def _parse_1_1(node, graph, parent_object, incoming_state, parent_incomplete_triples): 61 """The (recursive) step of handling a single node. See the 62 U{RDFa 1.1 Core document<http://www.w3.org/TR/rdfa-core/>} for further details. 63 64 This is the RDFa 1.1 version. 65 66 @param node: the DOM node to handle 67 @param graph: the RDF graph 68 @type graph: RDFLib's Graph object instance 69 @param parent_object: the parent's object, as an RDFLib URIRef 70 @param incoming_state: the inherited state (namespaces, lang, etc.) 71 @type incoming_state: L{state.ExecutionContext} 72 @param parent_incomplete_triples: list of hanging triples (the missing resource set to None) to be handled (or not) 73 by the current node. 74 @return: whether the caller has to complete it's parent's incomplete triples 75 @rtype: Boolean 76 """ 77 def header_check(p_obj): 78 """Special disposition for the HTML <head> and <body> elements...""" 79 if state.options.host_language in [ HostLanguage.xhtml, HostLanguage.html5, HostLanguage.xhtml5 ]: 80 if node.nodeName == "head" or node.nodeName == "body": 81 if not has_one_of_attributes(node, "about", "resource", "src", "href"): 82 return p_obj 83 else: 84 return None 85 86 def lite_check(): 87 if state.options.check_lite and state.options.host_language in [ HostLanguage.html5, HostLanguage.xhtml5, HostLanguage.xhtml ]: 88 if node.tagName == "link" and node.hasAttribute("rel") and state.term_or_curie.CURIE_to_URI(node.getAttribute("rel")) != None: 89 state.options.add_warning("In RDFa Lite, attribute @rel in <link> is only used in non-RDFa way (consider using @property)", node=node) 90 91 # Update the state. This means, for example, the possible local settings of 92 # namespaces and lang 93 state = ExecutionContext(node, graph, inherited_state=incoming_state) 94 95 #--------------------------------------------------------------------------------- 96 # Extra warning check on RDFa Lite 97 lite_check() 98 99 #--------------------------------------------------------------------------------- 100 # Handling the role attribute is pretty much orthogonal to everything else... 101 handle_role_attribute(node, graph, state) 102 103 #--------------------------------------------------------------------------------- 104 # Handle the special case for embedded RDF, eg, in SVG1.2. 105 # This may add some triples to the target graph that does not originate from RDFa parsing 106 # If the function return TRUE, that means that an rdf:RDF has been found. No 107 # RDFa parsing should be done on that subtree, so we simply return... 108 if state.options.embedded_rdf and node.nodeType == node.ELEMENT_NODE and handle_embeddedRDF(node, graph, state) : 109 return 110 111 #--------------------------------------------------------------------------------- 112 # calling the host language specific massaging of the DOM 113 if state.options.host_language in host_dom_transforms and node.nodeType == node.ELEMENT_NODE: 114 for func in host_dom_transforms[state.options.host_language] : func(node, state) 115 116 #--------------------------------------------------------------------------------- 117 # First, let us check whether there is anything to do at all. Ie, 118 # whether there is any relevant RDFa specific attribute on the element 119 # 120 if not has_one_of_attributes(node, "href", "resource", "about", "property", "rel", "rev", "typeof", "src", "vocab", "prefix"): 121 # nop, there is nothing to do here, just go down the tree and return... 122 for n in node.childNodes: 123 if n.nodeType == node.ELEMENT_NODE : parse_one_node(n, graph, parent_object, state, parent_incomplete_triples) 124 return 125 126 #----------------------------------------------------------------- 127 # The goal is to establish the subject and object for local processing 128 # The behaviour is slightly different depending on the presense or not 129 # of the @rel/@rev attributes 130 current_subject = None 131 current_object = None 132 typed_resource = None 133 134 if has_one_of_attributes(node, "rel", "rev") : 135 # in this case there is the notion of 'left' and 'right' of @rel/@rev 136 # in establishing the new Subject and the objectResource 137 current_subject = header_check(parent_object) 138 139 # set first the subject 140 if node.hasAttribute("about"): 141 current_subject = state.getURI("about") 142 if node.hasAttribute("typeof") : typed_resource = current_subject 143 144 # get_URI may return None in case of an illegal CURIE, so 145 # we have to be careful here, not use only an 'else' 146 if current_subject == None: 147 current_subject = parent_object 148 else: 149 state.reset_list_mapping(origin = current_subject) 150 151 # set the object resource 152 current_object = state.getResource("resource", "href", "src") 153 154 if node.hasAttribute("typeof") and not node.hasAttribute("about"): 155 if current_object == None: 156 current_object = BNode() 157 typed_resource = current_object 158 159 if not node.hasAttribute("inlist") and current_object != None: 160 # In this case the newly defined object is, in fact, the head of the list 161 # just reset the whole thing. 162 state.reset_list_mapping(origin = current_object) 163 164 elif node.hasAttribute("property") and not has_one_of_attributes(node, "content", "datatype"): 165 current_subject = header_check(parent_object) 166 167 # this is the case when the property may take hold of @src and friends... 168 if node.hasAttribute("about"): 169 current_subject = state.getURI("about") 170 if node.hasAttribute("typeof") : typed_resource = current_subject 171 172 # getURI may return None in case of an illegal CURIE, so 173 # we have to be careful here, not use only an 'else' 174 if current_subject == None: 175 current_subject = parent_object 176 else: 177 state.reset_list_mapping(origin = current_subject) 178 179 if typed_resource == None and node.hasAttribute("typeof"): 180 typed_resource = state.getResource("resource", "href", "src") 181 if typed_resource == None: 182 typed_resource = BNode() 183 current_object = typed_resource 184 else: 185 current_object = current_subject 186 187 else: 188 current_subject = header_check(parent_object) 189 190 # in this case all the various 'resource' setting attributes 191 # behave identically, though they also have their own priority 192 if current_subject == None: 193 current_subject = state.getResource("about", "resource", "href", "src") 194 195 # get_URI_ref may return None in case of an illegal CURIE, so 196 # we have to be careful here, not use only an 'else' 197 if current_subject == None: 198 if node.hasAttribute("typeof"): 199 current_subject = BNode() 200 state.reset_list_mapping(origin = current_subject) 201 else: 202 current_subject = parent_object 203 else: 204 state.reset_list_mapping(origin = current_subject) 205 206 # in this case no non-literal triples will be generated, so the 207 # only role of the current_object Resource is to be transferred to 208 # the children node 209 current_object = current_subject 210 if node.hasAttribute("typeof") : typed_resource = current_subject 211 212 # --------------------------------------------------------------------- 213 ## The possible typeof indicates a number of type statements on the typed resource 214 for defined_type in state.getURI("typeof"): 215 if typed_resource: 216 graph.add((typed_resource, ns_rdf["type"], defined_type)) 217 218 # --------------------------------------------------------------------- 219 # In case of @rel/@rev, either triples or incomplete triples are generated 220 # the (possible) incomplete triples are collected, to be forwarded to the children 221 incomplete_triples = [] 222 for prop in state.getURI("rel"): 223 if not isinstance(prop,BNode): 224 if node.hasAttribute("inlist"): 225 if current_object != None: 226 # Add the content to the list. Note that if the same list 227 # was initialized, at some point, by a None, it will be 228 # overwritten by this real content 229 state.add_to_list_mapping(prop, current_object) 230 else: 231 # Add a dummy entry to the list... Note that 232 # if that list was initialized already with a real content 233 # this call will have no effect 234 state.add_to_list_mapping(prop, None) 235 236 # Add a placeholder into the hanging rels 237 incomplete_triples.append( (None, prop, None) ) 238 else: 239 theTriple = (current_subject, prop, current_object) 240 if current_object != None: 241 graph.add(theTriple) 242 else: 243 incomplete_triples.append(theTriple) 244 else: 245 state.options.add_warning(err_no_blank_node % "rel", warning_type=IncorrectBlankNodeUsage, node=node.nodeName) 246 247 for prop in state.getURI("rev"): 248 if not isinstance(prop,BNode): 249 theTriple = (current_object,prop,current_subject) 250 if current_object != None: 251 graph.add(theTriple) 252 else: 253 incomplete_triples.append(theTriple) 254 else: 255 state.options.add_warning(err_no_blank_node % "rev", warning_type=IncorrectBlankNodeUsage, node=node.nodeName) 256 257 # ---------------------------------------------------------------------- 258 # Generation of the @property values, including literals. The newSubject is the subject 259 # A particularity of property is that it stops the parsing down the DOM tree if an XML Literal is generated, 260 # because everything down there is part of the generated literal. 261 if node.hasAttribute("property"): 262 ProcessProperty(node, graph, current_subject, state, typed_resource).generate_1_1() 263 264 # ---------------------------------------------------------------------- 265 # Setting the current object to a bnode is setting up a possible resource 266 # for the incomplete triples downwards 267 if current_object == None: 268 object_to_children = BNode() 269 else: 270 object_to_children = current_object 271 272 #----------------------------------------------------------------------- 273 # Here is the recursion step for all the children 274 for n in node.childNodes: 275 if n.nodeType == node.ELEMENT_NODE : 276 _parse_1_1(n, graph, object_to_children, state, incomplete_triples) 277 278 # --------------------------------------------------------------------- 279 # At this point, the parent's incomplete triples may be completed 280 for (s,p,o) in parent_incomplete_triples: 281 if s == None and o == None: 282 # This is an encoded version of a hanging rel for a collection: 283 incoming_state.add_to_list_mapping( p, current_subject ) 284 else: 285 if s == None : s = current_subject 286 if o == None : o = current_subject 287 graph.add((s,p,o)) 288 289 # Generate the lists, if any and if this is the level where a new list was originally created 290 if state.new_list and not state.list_empty(): 291 for prop in state.get_list_props(): 292 vals = state.get_list_value(prop) 293 if vals == None: 294 # This was an empty list, in fact, ie, the list has been initiated by a <xxx rel="prop" inlist> 295 # but no list content has ever been added 296 graph.add( (state.get_list_origin(), prop, ns_rdf["nil"]) ) 297 else: 298 heads = [ BNode() for _r in vals ] + [ ns_rdf["nil"] ] 299 for i in range(0, len(vals)): 300 graph.add( (heads[i], ns_rdf["first"], vals[i]) ) 301 graph.add( (heads[i], ns_rdf["rest"], heads[i+1]) ) 302 # Anchor the list 303 graph.add( (state.get_list_origin(), prop, heads[0]) ) 304 305 # ------------------------------------------------------------------- 306 # This should be it... 307 # ------------------------------------------------------------------- 308 return 309 310 311################################################################################################################## 312def _parse_1_0(node, graph, parent_object, incoming_state, parent_incomplete_triples): 313 """The (recursive) step of handling a single node. See the 314 U{RDFa 1.0 syntax document<http://www.w3.org/TR/rdfa-syntax>} for further details. 315 316 This is the RDFa 1.0 version. 317 318 @param node: the DOM node to handle 319 @param graph: the RDF graph 320 @type graph: RDFLib's Graph object instance 321 @param parent_object: the parent's object, as an RDFLib URIRef 322 @param incoming_state: the inherited state (namespaces, lang, etc.) 323 @type incoming_state: L{state.ExecutionContext} 324 @param parent_incomplete_triples: list of hanging triples (the missing resource set to None) to be handled (or not) 325 by the current node. 326 @return: whether the caller has to complete it's parent's incomplete triples 327 @rtype: Boolean 328 """ 329 330 # Update the state. This means, for example, the possible local settings of 331 # namespaces and lang 332 state = ExecutionContext(node, graph, inherited_state=incoming_state) 333 334 #--------------------------------------------------------------------------------- 335 # Handling the role attribute is pretty much orthogonal to everything else... 336 handle_role_attribute(node, graph, state) 337 338 #--------------------------------------------------------------------------------- 339 # Handle the special case for embedded RDF, eg, in SVG1.2. 340 # This may add some triples to the target graph that does not originate from RDFa parsing 341 # If the function return TRUE, that means that an rdf:RDF has been found. No 342 # RDFa parsing should be done on that subtree, so we simply return... 343 if state.options.embedded_rdf and node.nodeType == node.ELEMENT_NODE and handle_embeddedRDF(node, graph, state) : 344 return 345 346 #--------------------------------------------------------------------------------- 347 # calling the host language specific massaging of the DOM 348 if state.options.host_language in host_dom_transforms and node.nodeType == node.ELEMENT_NODE: 349 for func in host_dom_transforms[state.options.host_language] : func(node, state) 350 351 #--------------------------------------------------------------------------------- 352 # First, let us check whether there is anything to do at all. Ie, 353 # whether there is any relevant RDFa specific attribute on the element 354 # 355 if not has_one_of_attributes(node, "href", "resource", "about", "property", "rel", "rev", "typeof", "src"): 356 # nop, there is nothing to do here, just go down the tree and return... 357 for n in node.childNodes: 358 if n.nodeType == node.ELEMENT_NODE : parse_one_node(n, graph, parent_object, state, parent_incomplete_triples) 359 return 360 361 #----------------------------------------------------------------- 362 # The goal is to establish the subject and object for local processing 363 # The behaviour is slightly different depending on the presense or not 364 # of the @rel/@rev attributes 365 current_subject = None 366 current_object = None 367 368 if has_one_of_attributes(node, "rel", "rev") : 369 # in this case there is the notion of 'left' and 'right' of @rel/@rev 370 # in establishing the new Subject and the objectResource 371 current_subject = state.getResource("about","src") 372 373 # get_URI may return None in case of an illegal CURIE, so 374 # we have to be careful here, not use only an 'else' 375 if current_subject == None: 376 if node.hasAttribute("typeof"): 377 current_subject = BNode() 378 else: 379 current_subject = parent_object 380 else: 381 state.reset_list_mapping(origin = current_subject) 382 383 # set the object resource 384 current_object = state.getResource("resource", "href") 385 386 else: 387 # in this case all the various 'resource' setting attributes 388 # behave identically, though they also have their own priority 389 current_subject = state.getResource("about", "src", "resource", "href") 390 391 # get_URI_ref may return None in case of an illegal CURIE, so 392 # we have to be careful here, not use only an 'else' 393 if current_subject == None: 394 if node.hasAttribute("typeof"): 395 current_subject = BNode() 396 else: 397 current_subject = parent_object 398 current_subject = parent_object 399 else: 400 state.reset_list_mapping(origin = current_subject) 401 402 # in this case no non-literal triples will be generated, so the 403 # only role of the current_object Resource is to be transferred to 404 # the children node 405 current_object = current_subject 406 407 # --------------------------------------------------------------------- 408 ## The possible typeof indicates a number of type statements on the new Subject 409 for defined_type in state.getURI("typeof"): 410 graph.add((current_subject, ns_rdf["type"], defined_type)) 411 412 # --------------------------------------------------------------------- 413 # In case of @rel/@rev, either triples or incomplete triples are generated 414 # the (possible) incomplete triples are collected, to be forwarded to the children 415 incomplete_triples = [] 416 for prop in state.getURI("rel"): 417 if not isinstance(prop,BNode): 418 theTriple = (current_subject, prop, current_object) 419 if current_object != None: 420 graph.add(theTriple) 421 else: 422 incomplete_triples.append(theTriple) 423 else: 424 state.options.add_warning(err_no_blank_node % "rel", warning_type=IncorrectBlankNodeUsage, node=node.nodeName) 425 426 for prop in state.getURI("rev"): 427 if not isinstance(prop,BNode): 428 theTriple = (current_object,prop,current_subject) 429 if current_object != None: 430 graph.add(theTriple) 431 else: 432 incomplete_triples.append(theTriple) 433 else: 434 state.options.add_warning(err_no_blank_node % "rev", warning_type=IncorrectBlankNodeUsage, node=node.nodeName) 435 436 # ---------------------------------------------------------------------- 437 # Generation of the literal values. The newSubject is the subject 438 # A particularity of property is that it stops the parsing down the DOM tree if an XML Literal is generated, 439 # because everything down there is part of the generated literal. 440 if node.hasAttribute("property"): 441 ProcessProperty(node, graph, current_subject, state).generate_1_0() 442 443 # ---------------------------------------------------------------------- 444 # Setting the current object to a bnode is setting up a possible resource 445 # for the incomplete triples downwards 446 if current_object == None: 447 object_to_children = BNode() 448 else: 449 object_to_children = current_object 450 451 #----------------------------------------------------------------------- 452 # Here is the recursion step for all the children 453 for n in node.childNodes: 454 if n.nodeType == node.ELEMENT_NODE : 455 _parse_1_0(n, graph, object_to_children, state, incomplete_triples) 456 457 # --------------------------------------------------------------------- 458 # At this point, the parent's incomplete triples may be completed 459 for (s,p,o) in parent_incomplete_triples: 460 if s == None and o == None: 461 # This is an encoded version of a hanging rel for a collection: 462 incoming_state.add_to_list_mapping( p, current_subject ) 463 else: 464 if s == None : s = current_subject 465 if o == None : o = current_subject 466 graph.add((s,p,o)) 467 468 # ------------------------------------------------------------------- 469 # This should be it... 470 # ------------------------------------------------------------------- 471 return 472 473 474####################################################################### 475# Handle the role attribute 476def handle_role_attribute(node, graph, state): 477 """ 478 Handling the role attribute, according to http://www.w3.org/TR/role-attribute/#using-role-in-conjunction-with-rdfa 479 @param node: the DOM node to handle 480 @param graph: the RDF graph 481 @type graph: RDFLib's Graph object instance 482 @param state: the inherited state (namespaces, lang, etc.) 483 @type state: L{state.ExecutionContext} 484 """ 485 if node.hasAttribute("role"): 486 if node.hasAttribute("id"): 487 i = node.getAttribute("id").strip() 488 subject = URIRef(state.base + '#' + i) 489 else: 490 subject = BNode() 491 predicate = URIRef('http://www.w3.org/1999/xhtml/vocab#role') 492 for obj in state.getURI("role"): 493 graph.add((subject, predicate, obj))
36def parse_one_node(node, graph, parent_object, incoming_state, parent_incomplete_triples): 37 """The (recursive) step of handling a single node. 38 39 This entry just switches between the RDFa 1.0 and RDFa 1.1 versions for parsing. This method is only invoked once, 40 actually, from the top level; the recursion then happens in the L{_parse_1_0} and L{_parse_1_1} methods for 41 RDFa 1.0 and RDFa 1.1, respectively. 42 43 @param node: the DOM node to handle 44 @param graph: the RDF graph 45 @type graph: RDFLib's Graph object instance 46 @param parent_object: the parent's object, as an RDFLib URIRef 47 @param incoming_state: the inherited state (namespaces, lang, etc.) 48 @type incoming_state: L{state.ExecutionContext} 49 @param parent_incomplete_triples: list of hanging triples (the missing resource set to None) to be handled (or not) 50 by the current node. 51 @return: whether the caller has to complete it's parent's incomplete triples 52 @rtype: Boolean 53 """ 54 # Branch according to versions. 55 if incoming_state.rdfa_version >= "1.1": 56 _parse_1_1(node, graph, parent_object, incoming_state, parent_incomplete_triples) 57 else: 58 _parse_1_0(node, graph, parent_object, incoming_state, parent_incomplete_triples)
The (recursive) step of handling a single node.
This entry just switches between the RDFa 1.0 and RDFa 1.1 versions for parsing. This method is only invoked once, actually, from the top level; the recursion then happens in the L{_parse_1_0} and L{_parse_1_1} methods for RDFa 1.0 and RDFa 1.1, respectively.
@param node: the DOM node to handle @param graph: the RDF graph @type graph: RDFLib's Graph object instance @param parent_object: the parent's object, as an RDFLib URIRef @param incoming_state: the inherited state (namespaces, lang, etc.) @type incoming_state: L{state.ExecutionContext} @param parent_incomplete_triples: list of hanging triples (the missing resource set to None) to be handled (or not) by the current node. @return: whether the caller has to complete it's parent's incomplete triples @rtype: Boolean
477def handle_role_attribute(node, graph, state): 478 """ 479 Handling the role attribute, according to http://www.w3.org/TR/role-attribute/#using-role-in-conjunction-with-rdfa 480 @param node: the DOM node to handle 481 @param graph: the RDF graph 482 @type graph: RDFLib's Graph object instance 483 @param state: the inherited state (namespaces, lang, etc.) 484 @type state: L{state.ExecutionContext} 485 """ 486 if node.hasAttribute("role"): 487 if node.hasAttribute("id"): 488 i = node.getAttribute("id").strip() 489 subject = URIRef(state.base + '#' + i) 490 else: 491 subject = BNode() 492 predicate = URIRef('http://www.w3.org/1999/xhtml/vocab#role') 493 for obj in state.getURI("role"): 494 graph.add((subject, predicate, obj))
Handling the role attribute, according to http://www.w3.org/TR/role-attribute/#using-role-in-conjunction-with-rdfa @param node: the DOM node to handle @param graph: the RDF graph @type graph: RDFLib's Graph object instance @param state: the inherited state (namespaces, lang, etc.) @type state: L{state.ExecutionContext}