pyRdfa.host
Host language sub-package for the pyRdfa package. It contains variables and possible modules necessary to manage various RDFa host languages.
This module may have to be modified if a new host language is added to the system. In many cases the rdfa_core as a host language is enough, because there is no need for a special processing. However, some host languages may require an initial context, or their value may control some transformations, in which case additional data have to be added to this module. This module header contains all tables and arrays to be adapted, and the module content may contain specific transformation methods.
@summary: RDFa Host package
@requires: U{RDFLib packagehttp://rdflib.net} version 5 or higher
@requires: Python version 2.7 or 3.8 or higher
@organization: U{World Wide Web Consortiumhttp://www.w3.org}
@author: U{Ivan Herman}
@license: This software is available for use under the
U{W3C® SOFTWARE NOTICE AND LICENSE
@var content_to_host_language: a dictionary mapping a media type to a host language @var preferred_suffixes: mapping from preferred suffixes for media types; used if the file is local, ie, there is not HTTP return value for the media type. It corresponds to the preferred suffix in the media type registration @var initial_contexts: mapping from host languages to list of initial contexts @var accept_xml_base: list of host languages that accept the xml:base attribute for base setting @var accept_xml_lang: list of host languages that accept the xml:lang attribute for language setting. Note that XHTML and HTML have some special rules, and those are hard coded... @var warn_xmlns_usage: list of host languages that should generate a warning for the usage of @xmlns (for RDFa 1.1) @var accept_embedded_rdf_xml: list of host languages that might also include RDF data using an embedded RDF/XML (e.g., SVG). That RDF data may be merged with the output @var accept_embedded_turtle: list of host languages that might also include RDF data using a C{script} element. That RDF data may be merged with the output @var require_embedded_rdf: list of languages that must accept embedded RDF, ie, the corresponding option is irrelevant @var host_dom_transforms: dictionary mapping a host language to an array of methods that are invoked at the beginning of the parsing process for a specific node. That function can do a last minute change on that DOM node, eg, adding or modifying an attribute. The method's signature is (node, state), where node is the DOM node, and state is the L{Execution context<pyRdfa.state.ExecutionContext>}. @var predefined_1_0_rel: terms that are hardcoded for HTML+RDF1.0 and replace the initial context for that version @var beautifying_prefixes: this is really just to make the output more attractive: for each media type a dictionary of prefix-URI pairs that can be used to make the terms look better... @var default_vocabulary: as its name suggests, default @vocab value for a specific host language
1# -*- coding: utf-8 -*- 2""" 3Host language sub-package for the pyRdfa package. It contains variables and possible modules necessary to manage various RDFa 4host languages. 5 6This module may have to be modified if a new host language is added to the system. In many cases the rdfa_core as a host language is enough, because there is no need for a special processing. However, some host languages may require an initial context, or their value may control some transformations, in which case additional data have to be added to this module. This module header contains all tables and arrays to be adapted, and the module content may contain specific transformation methods. 7 8 9@summary: RDFa Host package 10@requires: U{RDFLib package<http://rdflib.net>} version 5 or higher 11@requires: Python version 2.7 or 3.8 or higher 12@organization: U{World Wide Web Consortium<http://www.w3.org>} 13@author: U{Ivan Herman<a href="http://www.w3.org/People/Ivan/">} 14@license: This software is available for use under the 15U{W3C® SOFTWARE NOTICE AND LICENSE<href="http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231">} 16 17@var content_to_host_language: a dictionary mapping a media type to a host language 18@var preferred_suffixes: mapping from preferred suffixes for media types; used if the file is local, ie, there is not HTTP return value for the media type. It corresponds to the preferred suffix in the media type registration 19@var initial_contexts: mapping from host languages to list of initial contexts 20@var accept_xml_base: list of host languages that accept the xml:base attribute for base setting 21@var accept_xml_lang: list of host languages that accept the xml:lang attribute for language setting. Note that XHTML and HTML have some special rules, and those are hard coded... 22@var warn_xmlns_usage: list of host languages that should generate a warning for the usage of @xmlns (for RDFa 1.1) 23@var accept_embedded_rdf_xml: list of host languages that might also include RDF data using an embedded RDF/XML (e.g., SVG). That RDF data may be merged with the output 24@var accept_embedded_turtle: list of host languages that might also include RDF data using a C{script} element. That RDF data may be merged with the output 25@var require_embedded_rdf: list of languages that must accept embedded RDF, ie, the corresponding option is irrelevant 26@var host_dom_transforms: dictionary mapping a host language to an array of methods that are invoked at the beginning of the parsing process for a specific node. That function can do a last minute change on that DOM node, eg, adding or modifying an attribute. The method's signature is (node, state), where node is the DOM node, and state is the L{Execution context<pyRdfa.state.ExecutionContext>}. 27@var predefined_1_0_rel: terms that are hardcoded for HTML+RDF1.0 and replace the initial context for that version 28@var beautifying_prefixes: this is really just to make the output more attractive: for each media type a dictionary of prefix-URI pairs that can be used to make the terms look better... 29@var default_vocabulary: as its name suggests, default @vocab value for a specific host language 30 31""" 32 33__version__ = "3.0" 34 35from .atom import atom_add_entry_type 36from .html5 import html5_extra_attributes, remove_rel 37 38class HostLanguage: 39 """An enumeration style class: recognized host language types for this processor of RDFa. Some processing details may depend on these host languages. "rdfa_core" is the default Host Language is nothing else is defined.""" 40 rdfa_core = "RDFa Core" 41 xhtml = "XHTML+RDFa" 42 xhtml5 = "XHTML5+RDFa" 43 html5 = "HTML5+RDFa" 44 atom = "Atom+RDFa" 45 svg = "SVG+RDFa" 46 47# initial contexts for host languages 48initial_contexts = { 49 HostLanguage.xhtml : ["http://www.w3.org/2011/rdfa-context/rdfa-1.1", 50 "http://www.w3.org/2011/rdfa-context/xhtml-rdfa-1.1"], 51 HostLanguage.xhtml5 : ["http://www.w3.org/2011/rdfa-context/rdfa-1.1"], 52 HostLanguage.html5 : ["http://www.w3.org/2011/rdfa-context/rdfa-1.1"], 53 HostLanguage.rdfa_core : ["http://www.w3.org/2011/rdfa-context/rdfa-1.1"], 54 HostLanguage.atom : ["http://www.w3.org/2011/rdfa-context/rdfa-1.1"], 55 HostLanguage.svg : ["http://www.w3.org/2011/rdfa-context/rdfa-1.1"] 56} 57 58beautifying_prefixes = { 59 HostLanguage.xhtml : { 60 "xhv": "http://www.w3.org/1999/xhtml/vocab#" 61 }, 62 # HostLanguage.html5 : { 63 # "xhv" : "http://www.w3.org/1999/xhtml/vocab#" 64 # }, 65 # HostLanguage.xhtml5 : { 66 # "xhv" : "http://www.w3.org/1999/xhtml/vocab#" 67 # }, 68 HostLanguage.atom : { 69 "atomrel": "http://www.iana.org/assignments/relation/" 70 } 71} 72 73 74accept_xml_base = [HostLanguage.rdfa_core, HostLanguage.atom, HostLanguage.svg, HostLanguage.xhtml5] 75accept_xml_lang = [HostLanguage.rdfa_core, HostLanguage.atom, HostLanguage.svg] 76 77accept_embedded_rdf_xml = [HostLanguage.svg, HostLanguage.rdfa_core] 78accept_embedded_turtle = [HostLanguage.svg, HostLanguage.html5, HostLanguage.xhtml5, HostLanguage.xhtml] 79 80# some languages, eg, SVG, require that embedded content should be combined with the default graph, 81# ie, it cannot be turned down by an option 82require_embedded_rdf = [HostLanguage.svg] 83 84warn_xmlns_usage = [HostLanguage.html5, HostLanguage.xhtml5, HostLanguage.xhtml] 85 86host_dom_transforms = { 87 HostLanguage.atom: [atom_add_entry_type], 88 HostLanguage.html5: [html5_extra_attributes, remove_rel], 89 HostLanguage.xhtml5: [html5_extra_attributes, remove_rel] 90} 91 92default_vocabulary = { 93 HostLanguage.atom: "http://www.iana.org/assignments/relation/" 94} 95 96predefined_1_0_rel = ['alternate', 'appendix', 'cite', 'bookmark', 'chapter', 'contents', 97'copyright', 'glossary', 'help', 'icon', 'index', 'meta', 'next', 'p3pv1', 'prev', 'previous', 98'role', 'section', 'subsection', 'start', 'license', 'up', 'last', 'stylesheet', 'first', 'top'] 99 100# ---------------------------------------------------------------------------------------------------------- 101 102class MediaTypes: 103 """An enumeration style class: some common media types (better have them at one place to avoid misstyping...)""" 104 rdfxml = 'application/rdf+xml' 105 turtle = 'text/turtle' 106 html = 'text/html' 107 xhtml = 'application/xhtml+xml' 108 svg = 'application/svg+xml' 109 svgi = 'image/svg+xml' 110 smil = 'application/smil+xml' 111 atom = 'application/atom+xml' 112 xml = 'application/xml' 113 xmlt = 'text/xml' 114 nt = 'text/plain' 115 116# mapping from (some) content types to RDFa host languages. This may control the exact processing or at least the initial context (see below)... 117content_to_host_language = { 118 MediaTypes.html: HostLanguage.html5, 119 MediaTypes.xhtml: HostLanguage.xhtml, 120 MediaTypes.xml: HostLanguage.rdfa_core, 121 MediaTypes.xmlt: HostLanguage.rdfa_core, 122 MediaTypes.smil: HostLanguage.rdfa_core, 123 MediaTypes.svg: HostLanguage.svg, 124 MediaTypes.svgi: HostLanguage.svg, 125 MediaTypes.atom: HostLanguage.atom 126} 127 128# mapping preferred suffixes to media types... 129preferred_suffixes = { 130 ".rdf": MediaTypes.rdfxml, 131 ".ttl": MediaTypes.turtle, 132 ".n3": MediaTypes.turtle, 133 ".owl": MediaTypes.rdfxml, 134 ".html": MediaTypes.html, 135 ".shtml": MediaTypes.html, 136 ".xhtml": MediaTypes.xhtml, 137 ".svg": MediaTypes.svg, 138 ".smil": MediaTypes.smil, 139 ".xml": MediaTypes.xml, 140 ".nt": MediaTypes.nt, 141 ".atom": MediaTypes.atom 142} 143 144# DTD combinations that may determine the host language and the rdfa version 145_XHTML_1_0 = [ 146 ("-//W3C//DTD XHTML+RDFa 1.0//EN", "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd") 147] 148 149_XHTML_1_1 = [ 150 ("-//W3C//DTD XHTML+RDFa 1.1//EN", "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd"), 151 ("-//W3C//DTD HTML 4.01+RDFa 1.1//EN", "http://www.w3.org/MarkUp/DTD/html401-rdfa11-1.dtd") 152] 153 154_XHTML = [ 155 ("-//W3C//DTD XHTML 1.0 Strict//EN", "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"), 156 ("-//W3C//DTD XHTML 1.0 Transitional//EN", "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"), 157 ("-//W3C//DTD XHTML 1.1//EN", "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd") 158] 159 160def adjust_html_version(stream, rdfa_version): 161 """ 162 Adjust the rdfa_version based on the (possible) DTD 163 @param stream: the data stream that has to be parsed by an xml parser 164 @param rdfa_version: the current rdfa_version; will be returned if nothing else is found 165 @return: the rdfa_version, either "1.0" or "1.1, if the DTD says so, otherwise the input rdfa_version value 166 """ 167 import xml.dom.minidom 168 parse = xml.dom.minidom.parse 169 dom = parse(stream) 170 171 _hl, version = adjust_xhtml_and_version(dom, HostLanguage.xhtml, rdfa_version) 172 return version 173 174def adjust_xhtml_and_version(dom, incoming_language, rdfa_version): 175 """ 176 Check if the xhtml+RDFa is really XHTML 0 or 1 or whether it should be considered as XHTML5. This is done 177 by looking at the DTD. Furthermore, checks whether whether the system id signals an rdfa 1.0, in which case the 178 version is also set. 179 180 @param dom: top level DOM node 181 @param incoming_language: host language to be checked; the whole check is relevant for xhtml only. 182 @param rdfa_version: rdfa_version as known by the caller 183 @return: a tuple of the possibly modified host language (ie, set to XHTML5) and the possibly modified rdfa version (ie, set to "1.0", "1.1", or the incoming rdfa_version if nothing is found) 184 """ 185 if incoming_language == HostLanguage.xhtml: 186 try: 187 # There may not be any doctype set in the first place... 188 publicId = dom.doctype.publicId 189 systemId = dom.doctype.systemId 190 191 if (publicId, systemId) in _XHTML_1_0: 192 return (HostLanguage.xhtml,"1.0") 193 elif (publicId, systemId) in _XHTML_1_1: 194 return (HostLanguage.xhtml,"1.1") 195 elif (publicId, systemId) in _XHTML: 196 return (HostLanguage.xhtml, rdfa_version) 197 else: 198 return (HostLanguage.xhtml5, rdfa_version) 199 except: 200 # If any of those are missing, forget it... 201 return (HostLanguage.xhtml5, rdfa_version) 202 else: 203 return (incoming_language, rdfa_version)
39class HostLanguage: 40 """An enumeration style class: recognized host language types for this processor of RDFa. Some processing details may depend on these host languages. "rdfa_core" is the default Host Language is nothing else is defined.""" 41 rdfa_core = "RDFa Core" 42 xhtml = "XHTML+RDFa" 43 xhtml5 = "XHTML5+RDFa" 44 html5 = "HTML5+RDFa" 45 atom = "Atom+RDFa" 46 svg = "SVG+RDFa"
An enumeration style class: recognized host language types for this processor of RDFa. Some processing details may depend on these host languages. "rdfa_core" is the default Host Language is nothing else is defined.
103class MediaTypes: 104 """An enumeration style class: some common media types (better have them at one place to avoid misstyping...)""" 105 rdfxml = 'application/rdf+xml' 106 turtle = 'text/turtle' 107 html = 'text/html' 108 xhtml = 'application/xhtml+xml' 109 svg = 'application/svg+xml' 110 svgi = 'image/svg+xml' 111 smil = 'application/smil+xml' 112 atom = 'application/atom+xml' 113 xml = 'application/xml' 114 xmlt = 'text/xml' 115 nt = 'text/plain'
An enumeration style class: some common media types (better have them at one place to avoid misstyping...)
161def adjust_html_version(stream, rdfa_version): 162 """ 163 Adjust the rdfa_version based on the (possible) DTD 164 @param stream: the data stream that has to be parsed by an xml parser 165 @param rdfa_version: the current rdfa_version; will be returned if nothing else is found 166 @return: the rdfa_version, either "1.0" or "1.1, if the DTD says so, otherwise the input rdfa_version value 167 """ 168 import xml.dom.minidom 169 parse = xml.dom.minidom.parse 170 dom = parse(stream) 171 172 _hl, version = adjust_xhtml_and_version(dom, HostLanguage.xhtml, rdfa_version) 173 return version
Adjust the rdfa_version based on the (possible) DTD @param stream: the data stream that has to be parsed by an xml parser @param rdfa_version: the current rdfa_version; will be returned if nothing else is found @return: the rdfa_version, either "1.0" or "1.1, if the DTD says so, otherwise the input rdfa_version value
175def adjust_xhtml_and_version(dom, incoming_language, rdfa_version): 176 """ 177 Check if the xhtml+RDFa is really XHTML 0 or 1 or whether it should be considered as XHTML5. This is done 178 by looking at the DTD. Furthermore, checks whether whether the system id signals an rdfa 1.0, in which case the 179 version is also set. 180 181 @param dom: top level DOM node 182 @param incoming_language: host language to be checked; the whole check is relevant for xhtml only. 183 @param rdfa_version: rdfa_version as known by the caller 184 @return: a tuple of the possibly modified host language (ie, set to XHTML5) and the possibly modified rdfa version (ie, set to "1.0", "1.1", or the incoming rdfa_version if nothing is found) 185 """ 186 if incoming_language == HostLanguage.xhtml: 187 try: 188 # There may not be any doctype set in the first place... 189 publicId = dom.doctype.publicId 190 systemId = dom.doctype.systemId 191 192 if (publicId, systemId) in _XHTML_1_0: 193 return (HostLanguage.xhtml,"1.0") 194 elif (publicId, systemId) in _XHTML_1_1: 195 return (HostLanguage.xhtml,"1.1") 196 elif (publicId, systemId) in _XHTML: 197 return (HostLanguage.xhtml, rdfa_version) 198 else: 199 return (HostLanguage.xhtml5, rdfa_version) 200 except: 201 # If any of those are missing, forget it... 202 return (HostLanguage.xhtml5, rdfa_version) 203 else: 204 return (incoming_language, rdfa_version)
Check if the xhtml+RDFa is really XHTML 0 or 1 or whether it should be considered as XHTML5. This is done by looking at the DTD. Furthermore, checks whether whether the system id signals an rdfa 1.0, in which case the version is also set.
@param dom: top level DOM node @param incoming_language: host language to be checked; the whole check is relevant for xhtml only. @param rdfa_version: rdfa_version as known by the caller @return: a tuple of the possibly modified host language (ie, set to XHTML5) and the possibly modified rdfa version (ie, set to "1.0", "1.1", or the incoming rdfa_version if nothing is found)