Model Inference#
Note
The purpose of this tutorial is to show how SHACL inference applied to a 223 model can make models easier to write and query. Specifically, this tutorial will teach the following:
How to load SHACL inference rules defined in the 223 ontology into memory
How to apply SHACL inference rules to a 223 model to add all “implied” triples
For this tutorial, we’ll use an existing equipment model of a variable air volume (VAV) terminal unit with cooling only from section 4.1 of ASHRAE Guideline 36-2021. This and other example models are available from Open223 Models.
What are SHACL Rules?#
SHACL rules add implied information to graphs if certain conditions are met, i.e. if certain triples exist in the source graph. The process of applying rules to an input model to generate new information (triples) is called inference. Inference makes models easier to write because the model author does not have to manually include all the triples necessary to support the desired queries; instead, some of those useful triples can be added “automatically” to the model through the use of inferencing. One way to think of inference is a way of normalizing a 223 model. Inference assures that the expected properties, types, and other annotations are present so that consumers of the model can make assumptions about what information will be contained within the graph.
Model Parsing#
First, we’ll create a new empty graph then parse (load) an existing graph into it using the Python RDFLib library.
from rdflib import Graph
# Create a Graph
model = Graph()
# Parse in an RDF file hosted on the Internet
model.parse("https://models.open223.info/guideline36-2021-4-1.ttl", format="ttl")
print(f"The model contains {len(model)} triples")
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
Cell In[1], line 5
3 model = Graph()
4 # Parse in an RDF file hosted on the Internet
----> 5 model.parse("https://models.open223.info/guideline36-2021-4-1.ttl", format="ttl")
6 print(f"The model contains {len(model)} triples")
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/rdflib/graph.py:1468, in Graph.parse(self, source, publicID, format, location, file, data, **args)
1373 def parse(
1374 self,
1375 source: Optional[
(...)
1383 **args: Any,
1384 ) -> "Graph":
1385 """
1386 Parse an RDF source adding the resulting triples to the Graph.
1387
(...)
1465
1466 """
-> 1468 source = create_input_source(
1469 source=source,
1470 publicID=publicID,
1471 location=location,
1472 file=file,
1473 data=data,
1474 format=format,
1475 )
1476 if format is None:
1477 format = source.content_type
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/rdflib/parser.py:401, in create_input_source(source, publicID, location, file, data, format)
394 assert data is None
395 assert source is None
396 (
397 absolute_location,
398 auto_close,
399 file,
400 input_source,
--> 401 ) = _create_input_source_from_location(
402 file=file,
403 format=format,
404 input_source=input_source,
405 location=location,
406 )
408 if file is not None:
409 if TYPE_CHECKING:
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/rdflib/parser.py:463, in _create_input_source_from_location(file, format, input_source, location)
461 file = open(filename, "rb")
462 else:
--> 463 input_source = URLInputSource(absolute_location, format)
465 auto_close = True
466 # publicID = publicID or absolute_location # Further to fix
467 # for issue 130
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/rdflib/parser.py:270, in URLInputSource.__init__(self, system_id, format)
266 myheaders["Accept"] = ", ".join(acc)
268 req = Request(system_id, None, myheaders) # type: ignore[arg-type]
--> 270 response: addinfourl = _urlopen(req)
271 self.url = response.geturl() # in case redirections took place
272 self.links = self.get_links(response)
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/rdflib/_networking.py:106, in _urlopen(request)
95 """
96 This is a shim for `urlopen` that handles HTTP redirects with status code
97 308 (Permanent Redirect).
(...)
103 :return: The response to the request.
104 """
105 try:
--> 106 return urlopen(request)
107 except HTTPError as error:
108 if error.code == 308 and sys.version_info < (3, 11):
109 # HTTP response code 308 (Permanent Redirect) is not supported by python
110 # versions older than 3.11. See <https://bugs.python.org/issue40321> and
111 # <https://github.com/python/cpython/issues/84501> for more details.
112 # This custom error handling should be removed once all supported
113 # versions of Python handles 308.
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
214 else:
215 opener = _opener
--> 216 return opener.open(url, data, timeout)
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:525, in OpenerDirector.open(self, fullurl, data, timeout)
523 for processor in self.process_response.get(protocol, []):
524 meth = getattr(processor, meth_name)
--> 525 response = meth(req, response)
527 return response
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:634, in HTTPErrorProcessor.http_response(self, request, response)
631 # According to RFC 2616, "2xx" code indicates that the client's
632 # request was successfully received, understood, and accepted.
633 if not (200 <= code < 300):
--> 634 response = self.parent.error(
635 'http', request, response, code, msg, hdrs)
637 return response
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:563, in OpenerDirector.error(self, proto, *args)
561 if http_err:
562 args = (dict, 'default', 'http_error_default') + orig_args
--> 563 return self._call_chain(*args)
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
494 for handler in handlers:
495 func = getattr(handler, meth_name)
--> 496 result = func(*args)
497 if result is not None:
498 return result
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:643, in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs)
642 def http_error_default(self, req, fp, code, msg, hdrs):
--> 643 raise HTTPError(req.full_url, code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found
Turtle representation of the model (pre-inference)
print(model.serialize())
Testing the Model (Failed Query)#
Below, we try to run a simple query on our model which asks what the damper in the terminal unit is connected to.
The s223:connected
relationship does not exist in the pre-inference model, so this query will not return results.
parts_query = """
PREFIX bldg: <urn:ex/>
PREFIX s223: <http://data.ashrae.org/standard223#>
SELECT ?parts WHERE {
bldg:vav-cooling-only s223:contains ?damper .
?damper s223:connected ?parts
}"""
for row in model.query(parts_query):
print('\t'.join(row))
Loading the 223 Ontology#
The 223 ontology contains the rules we will use for inference. We will load the 223 ontology into a separate graph from the model. This is mostly for maintenance: it is easy enough to merge two graphs together into one, but it is much harder to factor them out again. By keeping the ontology graph separate from the model graph, we can more easily maintain and version those graphs individually.
from rdflib import Graph
# Create a Graph
s223 = Graph()
# Parse in a recent copy of the 223 ontology
s223.parse("https://query.open223.info/ontologies/223p.ttl")
Applying Inference Rules#
To apply inference rules, we need an inference engine. This is a piece of software which knows how to properly interpret and apply the SHACL rules defined in an ontology. There are multiple closed-source and open-source implementations of SHACL available; some of these are listed on this page Below, we will be using the open-source PySHACL library.
We import the PySHACL library and then invoke the validate
function on our model graph (data graph in PySHACL parlance)
and our 223 graph (shape graph in PySHACL parlance).
import pyshacl
# skolemizing the s223 graph lets us remove blank nodes after inference
skolemized_s223 = s223.skolemize()
pyshacl.validate(model,
shacl_graph=skolemized_s223, # pass in the 223 graph object here
ont_graph=skolemized_s223, # pass in the 223 graph object here
allow_infos=True, # don't fail if we get an INFO message
allow_warnings=True, # don't fail if we get a WARNING message
abort_on_first=False, # allow errors to happen during execution
advanced=True, # allow SHACL rules to execute
inplace=True # update the 'model' object with the inferred triples
)
# remove the skolemized s223 graph from the model
model -= skolemized_s223
# de-skolemize the ret of the model
model = model.de_skolemize()
print(f"The model now contains {len(model)} triples")
This may take a few minutes to run, depending on the size of your model. If the PySHACL library is too slow, we recommend looking at alternate open-source implementations like TopQuadrant’s Java-based implementation.
We can see from the print statement that several triples have been added to the model.
Note
The pyshacl.validate
function actually returns a SHACL validation report which can be used to fix
the model and make it compatible with the 223 ontology. See the Model Validation
tutorial for how to access and interpret this report.
Turtle representation of the model (post-inference)
print(model.serialize())
Using the New Model#
To demonstrate that the model contains new triples, we can try re-running the query from before. We can see that the query returns results this time.
query = """
PREFIX bldg: <urn:ex/>
PREFIX s223: <http://data.ashrae.org/standard223#>
SELECT ?damper ?part WHERE {
bldg:vav-cooling-only s223:contains ?damper .
?damper s223:connected ?part
}"""
for row in model.query(query):
print(f"{row.get('damper')} connected to {row.get('part')}")