Passage's Homepage

Backends: Storing RDF Data, creating Knowledge Graphs.

With Passage, we want to provide end-users with correct and complete results without having to modify the already stored data. This becomes crucial when the dataset comprises billions of triples/quads that could take days to ingest.

The same Passage Java engine supports any backend, granted the latter implements our provided interfaces.

Blazegraph

With our first release, we focused on Blazegraph [2]. Despite its terminated support, it presents a unique feature that we can take advantage of, to pause and continue query executions:

It uses an augmented balanced tree for each index, i.e., a counter is associated with each tree node that allows Passage to efficiently skip large parts of triple/quad patterns. Simple SPARQL queries like the following are efficiently computed:

The time complexity to create range query iterators is logarithmic compared to the number of elements matching the pattern.

SELECT * WHERE {         # A SELECT query or sub-query
  BIND (rdf:type AS ?p)  # Var(s) assignment(s)
  ?s ?p ?o               # One triple/quad pattern only
} OFFSET 10000000        # May need to skip a lot of elements

Which is equivalent to a single triple pattern whose predicate is bounded: ?s rdf:type ?o. This example shows that simple BIND are handled.

It is important since Passage extensively produces such simple subqueries as basis for more complex continuation queries [1].

Contributing

To improve adoption, one major aspect of continuation queries concerns the support for databases such as Apache Jena, RDF4J, or HDT.

Adding Support

Our aim is to support many databases with minimal effort, without modifying the data. To that end, one must implement the interface Backend. This interface requires three templates <ID, VALUE, SKIP> since most data structures rely on identifiers <ID> for storing the triples/quads but on their own <VALUE> to store their actual value (often in a separated dictionary).

Then, backends must provide a range query search that instantiate an iterator:

BackendIterator<ID, VALUE, SKIP> search(final ID s, final ID p, final ID o);

Among other, the iterators should provide a skipping mechanism to efficiently compute the simple OFFSET subqueries:

void skip(final SKIP to);

Although highly recommended, implementing a logarithmic skip is not mandatory if the call to skip is fast enough: One should only make sure that the query as enough time to progress over pause/resume cycles.


Testing

As per usual, the code should be tested. The requirements are not huge though. We mostly need backends to output values that they will be able to use as input later on.

References

Ⓒ 2017–2025 GDD Team, LS2N, University of Nantes