Model Inference#

Note

The purpose of this tutorial is to show how SHACL inference applied to a 223 model can make models easier to write and query. Specifically, this tutorial will teach the following:

  1. How to load SHACL inference rules defined in the 223 ontology into memory

  2. How to apply SHACL inference rules to a 223 model to add all “implied” triples

For this tutorial, we’ll use an existing equipment model of a variable air volume (VAV) terminal unit with cooling only from section 4.1 of ASHRAE Guideline 36-2021. This and other example models are available from Open223 Models.

What are SHACL Rules?#

SHACL rules add implied information to graphs if certain conditions are met, i.e. if certain triples exist in the source graph. The process of applying rules to an input model to generate new information (triples) is called inference. Inference makes models easier to write because the model author does not have to manually include all the triples necessary to support the desired queries; instead, some of those useful triples can be added “automatically” to the model through the use of inferencing. One way to think of inference is a way of normalizing a 223 model. Inference assures that the expected properties, types, and other annotations are present so that consumers of the model can make assumptions about what information will be contained within the graph.

Model Parsing#

First, we’ll create a new empty graph then parse (load) an existing graph into it using the Python RDFLib library.

from rdflib import Graph
# Create a Graph
model = Graph()
# Parse in an RDF file hosted on the Internet
model.parse("https://models.open223.info/guideline36-2021-4-1.ttl", format="ttl")
print(f"The model contains {len(model)} triples")
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[1], line 5
      3 model = Graph()
      4 # Parse in an RDF file hosted on the Internet
----> 5 model.parse("https://models.open223.info/guideline36-2021-4-1.ttl", format="ttl")
      6 print(f"The model contains {len(model)} triples")

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/rdflib/graph.py:1467, in Graph.parse(self, source, publicID, format, location, file, data, **args)
   1372 def parse(
   1373     self,
   1374     source: Optional[
   (...)
   1382     **args: Any,
   1383 ) -> Graph:
   1384     """
   1385     Parse an RDF source adding the resulting triples to the Graph.
   1386 
   (...)
   1464 
   1465     """
-> 1467     source = create_input_source(
   1468         source=source,
   1469         publicID=publicID,
   1470         location=location,
   1471         file=file,
   1472         data=data,
   1473         format=format,
   1474     )
   1475     if format is None:
   1476         format = source.content_type

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/rdflib/parser.py:735, in create_input_source(source, publicID, location, file, data, format)
    728         assert data is None
    729         assert source is None
    730     (
    731         absolute_location,
    732         auto_close,
    733         file,
    734         input_source,
--> 735     ) = _create_input_source_from_location(
    736         file=file,
    737         format=format,
    738         input_source=input_source,
    739         location=location,
    740     )
    742 if file is not None:
    743     if TYPE_CHECKING:

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/rdflib/parser.py:797, in _create_input_source_from_location(file, format, input_source, location)
    795     file = open(filename, "rb")
    796 else:
--> 797     input_source = URLInputSource(absolute_location, format)
    799 auto_close = True
    800 # publicID = publicID or absolute_location  # Further to fix
    801 # for issue 130

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/rdflib/parser.py:594, in URLInputSource.__init__(self, system_id, format)
    590     myheaders["Accept"] = ", ".join(acc)
    592 req = Request(system_id, None, myheaders)  # type: ignore[arg-type]
--> 594 response: addinfourl = _urlopen(req)
    595 self.url = response.geturl()  # in case redirections took place
    596 self.links = self.get_links(response)

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/rdflib/_networking.py:106, in _urlopen(request)
     95 """
     96 This is a shim for `urlopen` that handles HTTP redirects with status code
     97 308 (Permanent Redirect).
   (...)
    103 :return: The response to the request.
    104 """
    105 try:
--> 106     return urlopen(request)
    107 except HTTPError as error:
    108     if error.code == 308 and sys.version_info < (3, 11):
    109         # HTTP response code 308 (Permanent Redirect) is not supported by python
    110         # versions older than 3.11. See <https://bugs.python.org/issue40321> and
    111         # <https://github.com/python/cpython/issues/84501> for more details.
    112         # This custom error handling should be removed once all supported
    113         # versions of Python handles 308.

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/urllib/request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    214 else:
    215     opener = _opener
--> 216 return opener.open(url, data, timeout)

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/urllib/request.py:525, in OpenerDirector.open(self, fullurl, data, timeout)
    523 for processor in self.process_response.get(protocol, []):
    524     meth = getattr(processor, meth_name)
--> 525     response = meth(req, response)
    527 return response

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/urllib/request.py:634, in HTTPErrorProcessor.http_response(self, request, response)
    631 # According to RFC 2616, "2xx" code indicates that the client's
    632 # request was successfully received, understood, and accepted.
    633 if not (200 <= code < 300):
--> 634     response = self.parent.error(
    635         'http', request, response, code, msg, hdrs)
    637 return response

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/urllib/request.py:563, in OpenerDirector.error(self, proto, *args)
    561 if http_err:
    562     args = (dict, 'default', 'http_error_default') + orig_args
--> 563     return self._call_chain(*args)

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    494 for handler in handlers:
    495     func = getattr(handler, meth_name)
--> 496     result = func(*args)
    497     if result is not None:
    498         return result

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/urllib/request.py:643, in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs)
    642 def http_error_default(self, req, fp, code, msg, hdrs):
--> 643     raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 404: Not Found
Turtle representation of the model (pre-inference)
print(model.serialize())

Testing the Model (Failed Query)#

Below, we try to run a simple query on our model which asks what the damper in the terminal unit is connected to. The s223:connected relationship does not exist in the pre-inference model, so this query will not return results.

parts_query = """
PREFIX bldg: <urn:ex/>
PREFIX s223: <http://data.ashrae.org/standard223#>
SELECT ?parts WHERE {
    bldg:vav-cooling-only s223:contains ?damper .
    ?damper s223:connected ?parts
}"""

for row in model.query(parts_query):
    print('\t'.join(row))

Loading the 223 Ontology#

The 223 ontology contains the rules we will use for inference. We will load the 223 ontology into a separate graph from the model. This is mostly for maintenance: it is easy enough to merge two graphs together into one, but it is much harder to factor them out again. By keeping the ontology graph separate from the model graph, we can more easily maintain and version those graphs individually.

from rdflib import Graph
# Create a Graph
s223 = Graph()
# Parse in a recent copy of the 223 ontology
s223.parse("https://query.open223.info/ontologies/223p.ttl")

Applying Inference Rules#

To apply inference rules, we need an inference engine. This is a piece of software which knows how to properly interpret and apply the SHACL rules defined in an ontology. There are multiple closed-source and open-source implementations of SHACL available; some of these are listed on this page Below, we will be using the open-source PySHACL library.

We import the PySHACL library and then invoke the validate function on our model graph (data graph in PySHACL parlance) and our 223 graph (shape graph in PySHACL parlance).

import pyshacl

# skolemizing the s223 graph lets us remove blank nodes after inference
skolemized_s223 = s223.skolemize()

pyshacl.validate(model,
    shacl_graph=skolemized_s223,     # pass in the 223 graph object here
    ont_graph=skolemized_s223,       # pass in the 223 graph object here
    allow_infos=True,     # don't fail if we get an INFO message
    allow_warnings=True,  # don't fail if we get a WARNING message
    abort_on_first=False, # allow errors to happen during execution
    advanced=True,        # allow SHACL rules to execute
    inplace=True          # update the 'model' object with the inferred triples
)
# remove the skolemized s223 graph from the model
model -= skolemized_s223
# de-skolemize the ret of the model
model = model.de_skolemize()
print(f"The model now contains {len(model)} triples")

This may take a few minutes to run, depending on the size of your model. If the PySHACL library is too slow, we recommend looking at alternate open-source implementations like TopQuadrant’s Java-based implementation.

We can see from the print statement that several triples have been added to the model.

Note

The pyshacl.validate function actually returns a SHACL validation report which can be used to fix the model and make it compatible with the 223 ontology. See the Model Validation tutorial for how to access and interpret this report.

Turtle representation of the model (post-inference)
print(model.serialize())

Using the New Model#

To demonstrate that the model contains new triples, we can try re-running the query from before. We can see that the query returns results this time.

query = """
PREFIX bldg: <urn:ex/>
PREFIX s223: <http://data.ashrae.org/standard223#>
SELECT ?damper ?part WHERE {
    bldg:vav-cooling-only s223:contains ?damper .
    ?damper s223:connected ?part
}"""

for row in model.query(query):
    print(f"{row.get('damper')} connected to {row.get('part')}")