SPARQL, RDF Datasets, FROM, FROM NAMED, and GRAPH

| 2 Comments

Bob DuCharme suggested that I share this explanation about the role of FROM, FROM NAMED, and GRAPH within a SPARQL query. So here it is…


A SPARQL query goes against an RDF dataset. An RDF dataset has two parts:

  • A single default graph -- a set of triples with no name attached to them
  • Zero or more named graphs -- each named graph is a pair of a name and a set of triples

The FROM and FROM NAMED clauses are used to specify the RDF dataset.

The statement "FROM u" instructs the SPARQL processor to take the graph that it knows as "u", take all the triples from it, and add them to the single default graph. If you then also have "FROM v", then you take the triples from the graph known as v and also add them to the default graph.

The statement "FROM NAMED x" instructs the SPARQL processor to take the graph that it knows as "x", take all the triples from it, pair it up with the name "x", and add that pair (x, triples from x) as a named graph in the RDF dataset.
Note that "known as" is purposefully not specified -- some implementations dereference the URI to get the triples that make up that graph; others just use a graph store that maps names to triples.

All the parts of the query that are outside a GRAPH clause are matched against the single default graph.

All the parts of the query that are inside a GRAPH clause are matched individually against the named graphs.

This is why it sometimes makes sense to specify the same graph for both FROM and FROM NAMED:

FROM x
FROM NAMED x

...puts the triples from x in the default graph and also includes x as a named graph. So that later in the query, triple patterns outside of a GRAPH clause can match parts of x and so can triple patterns inside a GRAPH clause.

There's a visual picture of this on slide 13 of my SPARQL Cheat Sheet slides.

2 Comments

But wouldn't this lead to zero results in

SELECT *
FROM NAMED x
WHERE {
?s ?p ?o .
}

even if x had triples? This behaviour would seem odd(ly designed) to me, tbh.

[ That understanding is correct. ?s ?p ?o in the above query matches against the default graph. --Lee ]

(FWIW, I still haven't come across a real-world use case that required the distinction between FROM and FROM NAMED, and that would justify the confusion and complexity caused by having both language features.)

[ We use it all the time in our Anzo software, though the question of what justifies an unknown level of confusion and complexity is a topic so fraught with subjectivity that I would not choose to touch it. We use the default graph (FROM) to query large swaths of our named graph store when we have no way or desire to know where the boundaries between graphs are. We use named graphs (FROM NAMED) when we know the storage of our data and wish to carefully scope the queries being asked (e.g. to the graphs that comprise an application's user-data storage). --Lee ]

But your explanation is very clear. Thank you very much for it. I think this is the first time I understand the supposed difference between the two.

Just me once again ;)

The slide 13 link is awesome! Thanks for including it. Definitely clarifying.

Are there examples of best practices when you'd use FROM vs. FROM NAMED? Are there general performance implications? Curious.

[
There are lots of different usage scenarios for them, and I'd imagine that performance differences depend very much on the nature of the underlying store. For one thing, multiple FROM's require an RDF merge of graphs (a union in which no blank nodes are shared between graphs), which could potentially be expensive. GRAPH clauses within queries constrain results to a single graph at a time, which is less expressive but potentially faster to execute. Still, that's all speculative. --Lee
]