Model Inference#

Note

The purpose of this tutorial is to show how SHACL inference applied to a 223 model can make models easier to write and query. Specifically, this tutorial will teach the following:

  1. How to load SHACL inference rules defined in the 223 ontology into memory

  2. How to apply SHACL inference rules to a 223 model to add all “implied” triples

For this tutorial, we’ll use an existing equipment model of a variable air volume (VAV) terminal unit with cooling only from section 4.1 of ASHRAE Guideline 36-2021. This and other example models are available from Open223 Models.

What are SHACL Rules?#

SHACL rules add implied information to graphs if certain conditions are met, i.e. if certain triples exist in the source graph. The process of applying rules to an input model to generate new information (triples) is called inference. Inference makes models easier to write because the model author does not have to manually include all the triples necessary to support the desired queries; instead, some of those useful triples can be added “automatically” to the model through the use of inferencing. One way to think of inference is a way of normalizing a 223 model. Inference assures that the expected properties, types, and other annotations are present so that consumers of the model can make assumptions about what information will be contained within the graph.

Model Parsing#

First, we’ll create a new empty graph then parse (load) an existing graph into it using the Python RDFLib library.

from rdflib import Graph
# Create a Graph
model = Graph()
# Parse in an RDF file hosted on the Internet
model.parse("https://models.open223.info/guideline36-2021-4-1.ttl", format="ttl")
print(f"The model contains {len(model)} triples")
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[1], line 5
      3 model = Graph()
      4 # Parse in an RDF file hosted on the Internet
----> 5 model.parse("https://models.open223.info/guideline36-2021-4-1.ttl", format="ttl")
      6 print(f"The model contains {len(model)} triples")

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/rdflib/graph.py:1468, in Graph.parse(self, source, publicID, format, location, file, data, **args)
   1373 def parse(
   1374     self,
   1375     source: Optional[
   (...)
   1383     **args: Any,
   1384 ) -> "Graph":
   1385     """
   1386     Parse an RDF source adding the resulting triples to the Graph.
   1387 
   (...)
   1465 
   1466     """
-> 1468     source = create_input_source(
   1469         source=source,
   1470         publicID=publicID,
   1471         location=location,
   1472         file=file,
   1473         data=data,
   1474         format=format,
   1475     )
   1476     if format is None:
   1477         format = source.content_type

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/rdflib/parser.py:401, in create_input_source(source, publicID, location, file, data, format)
    394         assert data is None
    395         assert source is None
    396     (
    397         absolute_location,
    398         auto_close,
    399         file,
    400         input_source,
--> 401     ) = _create_input_source_from_location(
    402         file=file,
    403         format=format,
    404         input_source=input_source,
    405         location=location,
    406     )
    408 if file is not None:
    409     if TYPE_CHECKING:

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/rdflib/parser.py:463, in _create_input_source_from_location(file, format, input_source, location)
    461     file = open(filename, "rb")
    462 else:
--> 463     input_source = URLInputSource(absolute_location, format)
    465 auto_close = True
    466 # publicID = publicID or absolute_location  # Further to fix
    467 # for issue 130

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/rdflib/parser.py:270, in URLInputSource.__init__(self, system_id, format)
    266     myheaders["Accept"] = ", ".join(acc)
    268 req = Request(system_id, None, myheaders)  # type: ignore[arg-type]
--> 270 response: addinfourl = _urlopen(req)
    271 self.url = response.geturl()  # in case redirections took place
    272 self.links = self.get_links(response)

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/rdflib/_networking.py:106, in _urlopen(request)
     95 """
     96 This is a shim for `urlopen` that handles HTTP redirects with status code
     97 308 (Permanent Redirect).
   (...)
    103 :return: The response to the request.
    104 """
    105 try:
--> 106     return urlopen(request)
    107 except HTTPError as error:
    108     if error.code == 308 and sys.version_info < (3, 11):
    109         # HTTP response code 308 (Permanent Redirect) is not supported by python
    110         # versions older than 3.11. See <https://bugs.python.org/issue40321> and
    111         # <https://github.com/python/cpython/issues/84501> for more details.
    112         # This custom error handling should be removed once all supported
    113         # versions of Python handles 308.

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    214 else:
    215     opener = _opener
--> 216 return opener.open(url, data, timeout)

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:525, in OpenerDirector.open(self, fullurl, data, timeout)
    523 for processor in self.process_response.get(protocol, []):
    524     meth = getattr(processor, meth_name)
--> 525     response = meth(req, response)
    527 return response

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:634, in HTTPErrorProcessor.http_response(self, request, response)
    631 # According to RFC 2616, "2xx" code indicates that the client's
    632 # request was successfully received, understood, and accepted.
    633 if not (200 <= code < 300):
--> 634     response = self.parent.error(
    635         'http', request, response, code, msg, hdrs)
    637 return response

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:563, in OpenerDirector.error(self, proto, *args)
    561 if http_err:
    562     args = (dict, 'default', 'http_error_default') + orig_args
--> 563     return self._call_chain(*args)

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    494 for handler in handlers:
    495     func = getattr(handler, meth_name)
--> 496     result = func(*args)
    497     if result is not None:
    498         return result

File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/urllib/request.py:643, in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs)
    642 def http_error_default(self, req, fp, code, msg, hdrs):
--> 643     raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 404: Not Found
Turtle representation of the model (pre-inference)
print(model.serialize())

Testing the Model (Failed Query)#

Below, we try to run a simple query on our model which asks what the damper in the terminal unit is connected to. The s223:connected relationship does not exist in the pre-inference model, so this query will not return results.

parts_query = """
PREFIX bldg: <urn:ex/>
PREFIX s223: <http://data.ashrae.org/standard223#>
SELECT ?parts WHERE {
    bldg:vav-cooling-only s223:contains ?damper .
    ?damper s223:connected ?parts
}"""

for row in model.query(parts_query):
    print('\t'.join(row))

Loading the 223 Ontology#

The 223 ontology contains the rules we will use for inference. We will load the 223 ontology into a separate graph from the model. This is mostly for maintenance: it is easy enough to merge two graphs together into one, but it is much harder to factor them out again. By keeping the ontology graph separate from the model graph, we can more easily maintain and version those graphs individually.

from rdflib import Graph
# Create a Graph
s223 = Graph()
# Parse in a recent copy of the 223 ontology
s223.parse("https://query.open223.info/ontologies/223p.ttl")

Applying Inference Rules#

To apply inference rules, we need an inference engine. This is a piece of software which knows how to properly interpret and apply the SHACL rules defined in an ontology. There are multiple closed-source and open-source implementations of SHACL available; some of these are listed on this page Below, we will be using the open-source PySHACL library.

We import the PySHACL library and then invoke the validate function on our model graph (data graph in PySHACL parlance) and our 223 graph (shape graph in PySHACL parlance).

import pyshacl

# skolemizing the s223 graph lets us remove blank nodes after inference
skolemized_s223 = s223.skolemize()

pyshacl.validate(model,
    shacl_graph=skolemized_s223,     # pass in the 223 graph object here
    ont_graph=skolemized_s223,       # pass in the 223 graph object here
    allow_infos=True,     # don't fail if we get an INFO message
    allow_warnings=True,  # don't fail if we get a WARNING message
    abort_on_first=False, # allow errors to happen during execution
    advanced=True,        # allow SHACL rules to execute
    inplace=True          # update the 'model' object with the inferred triples
)
# remove the skolemized s223 graph from the model
model -= skolemized_s223
# de-skolemize the ret of the model
model = model.de_skolemize()
print(f"The model now contains {len(model)} triples")

This may take a few minutes to run, depending on the size of your model. If the PySHACL library is too slow, we recommend looking at alternate open-source implementations like TopQuadrant’s Java-based implementation.

We can see from the print statement that several triples have been added to the model.

Note

The pyshacl.validate function actually returns a SHACL validation report which can be used to fix the model and make it compatible with the 223 ontology. See the Model Validation tutorial for how to access and interpret this report.

Turtle representation of the model (post-inference)
print(model.serialize())

Using the New Model#

To demonstrate that the model contains new triples, we can try re-running the query from before. We can see that the query returns results this time.

query = """
PREFIX bldg: <urn:ex/>
PREFIX s223: <http://data.ashrae.org/standard223#>
SELECT ?damper ?part WHERE {
    bldg:vav-cooling-only s223:contains ?damper .
    ?damper s223:connected ?part
}"""

for row in model.query(query):
    print(f"{row.get('damper')} connected to {row.get('part')}")