SPARQL Calendar Demo: Growing Our RDF Dataset

This is the sixth in a series of entries about the SPARQL calendar demo. If you haven't already, you can read the previous entry.

When the discover more people link is clicked, the calendar demo uses this SPARQL query to expands its dataset before rerunning the search for people:

PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX ical: <http://www.w3.org/2002/12/cal/icaltzd#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
SELECT DISTINCT ?url ?named_url 
WHERE {
  {
     _:x rdfs:seeAlso ?named_url .
     ?named_url rdf:type ical:Vcalendar 
  } UNION {
    ?someone foaf:knows ?known .
     OPTIONAL {
      ?known rdfs:seeAlso ?url 
    } .
     
  } 
}

This query asks:

Give me all URIs in the current dataset for the FOAF files of people known by people in the dataset; also give me all sources of additional-information that are typed as calendars.
In doing so, it takes advantages of the two breadcrumbs protocols I wrote about previously. What can we learn from this query?

  • How does the UNION combined two query patterns that don't share variables? The SQL relational union operator requires that the two resultsets being unioned together share the same columns. SPARQL, on the other hand—in the RDF spirit of ragged datasets—allows for differently shaped sets of bindings to be unioned together; variables not appearing in one operand of the union are simply unbound in the solutions contributed by that operand. Thus, this is effectively a way to combine two simple queries in one. We end up with a resultset which has some rows that only contain bindings for ?url and some that only bind ?named_url.
  • What's up with the calendar gunk? A key feature of the calendar demo is connecting people with their calendar events in RDF (and hence in SPARQL). To do this required choosing an approach to the event discovery problem. There's no widely accepted predicate that links a foaf:Person to a resource which of rdf:type ical:Vcalendar. Dan Connolly, for example, uses foaf:currentProject to link his own foaf:Person with the events in his calendar. With RDF calendar work still in development, many people store their calendar data in iCal format, devoid of any links with their RDF FOAF data.

    Because we were also demonstrating the ability to wrap .ics files as RDF, we decided to adopt a convention that treated calendars as documents. Following the lead of the breadcrumbs protocol for discovering new FOAF files, we chose to use rdfs:seeAlso to relate a foaf:Person to a URI at which events can be found that belong to that person's calendar. To add a bit more semantics to this convention, we also required that the URI object of rdfs:seeAlso is explicitly typed as an ical:VCalendar. When we find such a URI, we add it to our dataset as a named URL.

  • Why do we distinguish between ?url and ?named_url? The convention described above relies on semantics implicit in where calendar-event triples are found. That is, we associate events with a person based upon the document (named graph) in which we find the calendar-event triples. To query across this link successfully, then, we need to be able to model this link in our queries using the SPARQL GRAPH keyword, and to do this requires that we include the calendar graphs as named graphs in our dataset. In short, the calendar graphs are included as named graphs because our conventions impose semantics on the source of calendar-events triples.
While this convention that we adopted for finding calendars has been successful in the context of this demo, it has several drawbacks. First, it is not widely used (or used at all!), which requires that people massage their data into this format before taking advantage of the calendar demo with their own data.

Second, I feel that it is semantically dodgy. I'd much prefer the cleaner semantics of a chain of triples such as:

lee:LDF ex:calendar lee:LDFcalendar .
lee:LDFcalendar ical:component _:c  .
_:c ical:Vevent _:ev1 .
_:ev1 ...
Of course, as with any other semantic data, these triples can be publishes in multiple documents and distributed throughout the (semantic or world wide) web. As long as the dataset contains all of these triples, we could query calendar events without relying on the source-document semantics imposed by our use of the GRAPH keyword. Does anything exist that could take the place of the ex:calendar predicate? Are people using this construct anywhere? Of course, this would make it difficult, if not impossible, to point to an iCal file from within your FOAF data.

Finally, I think that our convention may be conflating the graph URL with the contents at the graph URL. Our convention effectively says, "See this URI, u, for more information. Oh yeah, u is a calendar, by the way." But that's not really the case. The URI is a graph that contains a calendar for the person in question, so we'd really want something like u foaf:primaryTopic ical:VCalendar. I'm not convinced that that's particularly accurate, either. Again, none of this is new, but I'm a firm believer that it never hurts to reiterate how easy it is to model the world incorrectly. These things are important to get right, and therefore important to write about when you think you've gotten them wrong!