TechnicaLee Speaking: April 2006 Archives

« March 2006 | Main | May 2006 »

April 26, 2006

SPARQL Calendar Demo: Retrieving Calendar Events

This is the seventh in a series of entries about the SPARQL calendar demo. If you haven't already, you can read the previous entry.

After one or more people have been discovered, the calendar demo allows us to select one or more people and click the refresh link in the Calendars section of the righthand panel in order to retrieve calendar events for the selected person(s). Clicking refresh with the Alice demo user selected runs this SPARQL query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX dc: <http://purl.org/dc/elements/1.1/> 
PREFIX ical: <http://www.w3.org/2002/12/cal/icaltzd#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
SELECT DISTINCT ?title ?start ?end ?name ?location ?g 
WHERE {
  {
    {
      <http://thefigtrees.net/lee/sw/demos/calendar/data/alice#alice> rdfs:seeAlso ?g .
       ?g rdf:type ical:Vcalendar .
       OPTIONAL {
        <http://thefigtrees.net/lee/sw/demos/calendar/data/alice#alice> foaf:name ?name 
      } .
       OPTIONAL {
        <http://thefigtrees.net/lee/sw/demos/calendar/data/alice#alice> rdfs:label ?name 
      } .
    }
  } .
  GRAPH ?g_wrapped {
    _:ev ical:summary ?title ; 
            ical:dtstart ?start ; 
            ical:dtend ?end ; 
            ical:location ?location .
    FILTER (regex(str(?start), '2006-04.*') ||
            regex(str(?end), '2006-04.*')).
  }.
  FILTER regex(str(?g_wrapped), str(?g)) .
}

In English—and amidst some devilish hackery—this query says:

Fetch Alice's name. Also, for every event in the graph containing her calendar entries, fetch the event's title, start time, end time, and location.

At a first glance, we see that this query finds a person's calendar events using the same breadcrumbs protocol we've discussed previously. But as with the other SPARQL queries we've looked at so far, there are several other points worthy of note and discussion in this query:

How does this query deal with multiple people? With this query, we are retrieving individual people's calendar events. Because one person's calendar, in this context, is independent of another's, we can fetch multiple people's calendars simultaneously by using the SPARQL UNION keyword to retrieve different rows (events) for different people.
Why does this query retrieve the name of the person, when we've already seen a query that retrieves names? Much of what is done in the SPARQL calendar demo could be enhanced by saving client-side state. When Elias and I developed the demo, we made a point of avoiding state as often as we could, in favor of using SPARQL queries to do as much heavy-lifting as possible. Since we wish to display a person's name alongside their calendar in the righthand panel, we retrieve that information along with the calendar events in this query.
What's the purpose of the filter expressions on ?start and ?end? There's no need to retrieve any calendar events which we can't render on the current calendar control on the main part of the webpage. To limit the events returned, we require that either the start date or the end date of each event match a regular expression which specifies the current month shown on the main calendar. (We know that the dates will conform to this format given that the RDF Calendar note specifies that the range of these properties is the XML Schema dateTime type.) This filter expression uses the SPARQL regex filter function to perform the matching.
What's the difference between ?g and ?g_wrapped? How are they related? Very little published calendar data on the web is represented in RDF; much is published in iCalendar format. Thanks to DanC's python wizardry and Elias's hosting, though, we have resolvable URLs that resolve to iCalendar data represented as RDF. So, we can include a URL like http://torrez.us/services/ics2rdf/?ical=http://example/myCalendar.ics as a named graph in the RDF dataset of our query, and then match events within that calendar by sticking a pattern inside the SPARQL GRAPH keyword. But, it's unrealistic and semantically incorrect for someone to publish these triples
```
  <http://example/me> a foaf:Person .
  <http://example/me> rdfs:seeAlso <http://torrez.us/services/ics2rdf/?ical=http://example/myCalendar.ics> .
  <http://torrez.us/services/ics2rdf/?ical=http://example/myCalendar.ics> rdf:type ical:Vcalendar .
```
rather than
```
  <http://example/me> a foaf:Person .
  <http://example/me> rdfs:seeAlso <http://example/myCalendar.ics> .
  <http://example/myCalendar.ics> rdf:type ical:Vcalendar .
```
If in our query, then, we used the same variable for both the object of rdfs:seeAlso and also for the graph name in the GRAPH clause, we wouldn't get back any rows, because the former's value is <http://example/myCalendar.ics> while the latter's value is <http://torrez.us/services/ics2rdf/?ical=http://example/myCalendar.ics>. So rather than sharing a variable, we use two different variables and use the built-in regex function as a poor man's (and unreliable) String.subStringOf substitute.

This is an ugly hack. Anyone reading this should be outraged. I was, but sometimes deadlines beckon. This is an ugly hack not least of all because the all of those ics2rdf URLs are invalid URL syntax (the reserved characters in the ical query parameter should be URL escaped). A possible solution would be to extend our calendar convention to require that a person's FOAF data contain a triple along the lines of: <http://example/myCalendar.ics> ex:asRDF <http://torrez.us/services/ics2rdf/?ical=http%3A//example/myCalendar.ics>. That's ugly, also, but perhaps a bit better. When we look at the queries that deal with people's interests and make use of data from upcoming.org, we'll see that this is a more general problem: when we access non-RDF data as RDF, how do we semantically associate the (different) URLs of (different) representations of the same data in a manner which is relatable within a SPARQL query? Elias wrote about this dilemma and solicited opinions ranging from handling this at an application level (but how do we do that with the current SPARQL Protocol over the web?) to extending the expressibility of SPARQL FROM NAMED clauses.

It's not always pretty when one takes a peak behind the curtain. As I've said before, these aren't new issues, but as far as I can tell they're sitll unsolved issues, and as such deserve whatever attention we can give to them.

Posted by Lee Feigenbaum at 2:14 AM | Permalink

April 17, 2006

AJAX callback function signatures in prototype

In the wake of using the Yahoo! User Interface Library to wrap AJAX¹ requests for the SPARQL calendar demo, I decided to try out the Prototype JavaScript framework for my latest round of (unrelated) web hacking.

Very early on, I was struck by an incongruency between the callback signatures of the onSuccess, onFailure, and on### events and those of the onLoading, onLoaded, onComplete, etc. events. Event handlers for the former collection of events—which only occur after a request is completed—receive two arguments: the XMLHttpRequest object itself, and, if applicable, a JavaScript object formed by parsing the JSON serialized value of any X-JSON response header.

On the other hand, event handlers for the latter set of events (events that correspond to ready-state changes), receive the above two arguments and in addition receive the Ajax.Request wrapper object that was used in the creation of the AJAX request. In turn, this object gives access to the options dictionary, which can contain arbitrary bits of state useful in routing callbacks to their eventual final destination(s).

Why don't the other callbacks contain the Ajax.Request objects? Got me. Maybe for compatibility with some other (unknown) API? In any case, I hacked around this in my application by duplicating the functionality from prototype.js that translates generic onComplete events into specific onSuccess and onFailure events:

  ...
  onComplete: function(request, xhr, xjson) {
    if (request.responseIsSuccess())
      this._onSuccess(request, xhr, xjson);
    else
      this._onFailure(request, xhr, xjson);
  },
  ...

_onSuccess and _onFailure act just as regular ol' onSuccess and onFailure would, except that they now have access to the original Ajax.Request object. Good enough for me, but curious nonetheless.

¹ AJAX sure has used its snazzy-name status to propel a not-so-novel idea to stratospheric levels of buzzwordiness. The lack of a consistent capitalization for it (AJAX vs. Ajax) bugs the heck out of me though. Of course, it really should be AJaX, but I doubt that will ever catch on. Alas.

Posted by Lee Feigenbaum at 12:39 AM | Permalink

April 15, 2006

Google Calendar + SPARQL = Baseball??

Elias was showing me earlier today the demo he had whipped up earlier today that shows one way of leveraging SPARQL as a query interface to Google calendar data. As he wrote, when he showed me the demo he asked me to think about some interesting queries we could do with it. Here's the first one I came up with:

Navigate to the demo
On the Search tab, enter mets and click Go
In the results table, select 2006 Mets Schedule and click Add to Calendars
On the Calendars tab, ensure that only the 2006 Mets Schedule calendar is selected

On the Query tab, enter this query:

SELECT ?when ?matchup ?broadcast 
WHERE {
  GRAPH ?g {
    _:game ical:dtstart ?when ; 
                 ical:summary ?matchup; 
                 ical:description ?broadcast.
     FILTER (?broadcast != "")
  }
}

Click Get Results

The calendar in question is setup to use the description field to list information on Mets games that will be broadcast on national TV this season. Thus, the filter ensures that the query returns only those Mets games which can be viewed nationwide this year, particularly useful for displaced fans like myself!¹ Pretty cool, eh?

For the record, I do feel a minor case of the willies that I'm using generic calendar predicates to retrieve semantic data on baseball-game schedules and broadcasting. But, hey, we've got to start somewhere. Let's go Mets, and let's go SPARQL!

¹ (Of course, in reality, I subscribe to MLB.TV and miss nary a minute of the season, heavens forbid!)

Posted by Lee Feigenbaum at 12:15 AM | Permalink

April 14, 2006

SPARQL Calendar Demo: Growing Our RDF Dataset

This is the sixth in a series of entries about the SPARQL calendar demo. If you haven't already, you can read the previous entry.

When the discover more people link is clicked, the calendar demo uses this SPARQL query to expands its dataset before rerunning the search for people:

PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX ical: <http://www.w3.org/2002/12/cal/icaltzd#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
SELECT DISTINCT ?url ?named_url 
WHERE {
  {
     _:x rdfs:seeAlso ?named_url .
     ?named_url rdf:type ical:Vcalendar 
  } UNION {
    ?someone foaf:knows ?known .
     OPTIONAL {
      ?known rdfs:seeAlso ?url 
    } .
     
  } 
}

This query asks:

Give me all URIs in the current dataset for the FOAF files of people known by people in the dataset; also give me all sources of additional-information that are typed as calendars.

In doing so, it takes advantages of the two breadcrumbs protocols I wrote about previously. What can we learn from this query?

How does the UNION combined two query patterns that don't share variables? The SQL relational union operator requires that the two resultsets being unioned together share the same columns. SPARQL, on the other hand—in the RDF spirit of ragged datasets—allows for differently shaped sets of bindings to be unioned together; variables not appearing in one operand of the union are simply unbound in the solutions contributed by that operand. Thus, this is effectively a way to combine two simple queries in one. We end up with a resultset which has some rows that only contain bindings for ?url and some that only bind ?named_url.
What's up with the calendar gunk? A key feature of the calendar demo is connecting people with their calendar events in RDF (and hence in SPARQL). To do this required choosing an approach to the event discovery problem. There's no widely accepted predicate that links a foaf:Person to a resource which of rdf:type ical:Vcalendar. Dan Connolly, for example, uses foaf:currentProject to link his own foaf:Person with the events in his calendar. With RDF calendar work still in development, many people store their calendar data in iCal format, devoid of any links with their RDF FOAF data.
Because we were also demonstrating the ability to wrap .ics files as RDF, we decided to adopt a convention that treated calendars as documents. Following the lead of the breadcrumbs protocol for discovering new FOAF files, we chose to use rdfs:seeAlso to relate a foaf:Person to a URI at which events can be found that belong to that person's calendar. To add a bit more semantics to this convention, we also required that the URI object of rdfs:seeAlso is explicitly typed as an ical:VCalendar. When we find such a URI, we add it to our dataset as a named URL.
Why do we distinguish between ?url and ?named_url? The convention described above relies on semantics implicit in where calendar-event triples are found. That is, we associate events with a person based upon the document (named graph) in which we find the calendar-event triples. To query across this link successfully, then, we need to be able to model this link in our queries using the SPARQL GRAPH keyword, and to do this requires that we include the calendar graphs as named graphs in our dataset. In short, the calendar graphs are included as named graphs because our conventions impose semantics on the source of calendar-events triples.

While this convention that we adopted for finding calendars has been successful in the context of this demo, it has several drawbacks. First, it is not widely used (or used at all!), which requires that people massage their data into this format before taking advantage of the calendar demo with their own data.

Second, I feel that it is semantically dodgy. I'd much prefer the cleaner semantics of a chain of triples such as:

lee:LDF ex:calendar lee:LDFcalendar .
lee:LDFcalendar ical:component _:c  .
_:c ical:Vevent _:ev1 .
_:ev1 ...

Of course, as with any other semantic data, these triples can be publishes in multiple documents and distributed throughout the (semantic or world wide) web. As long as the dataset contains all of these triples, we could query calendar events without relying on the source-document semantics imposed by our use of the GRAPH keyword. Does anything exist that could take the place of the ex:calendar predicate? Are people using this construct anywhere? Of course, this would make it difficult, if not impossible, to point to an iCal file from within your FOAF data.

Finally, I think that our convention may be conflating the graph URL with the contents at the graph URL. Our convention effectively says, "See this URI, u, for more information. Oh yeah, u is a calendar, by the way." But that's not really the case. The URI is a graph that contains a calendar for the person in question, so we'd really want something like u foaf:primaryTopic ical:VCalendar. I'm not convinced that that's particularly accurate, either. Again, none of this is new, but I'm a firm believer that it never hurts to reiterate how easy it is to model the world incorrectly. These things are important to get right, and therefore important to write about when you think you've gotten them wrong!

Posted by Lee Feigenbaum at 5:26 PM | Permalink

April 12, 2006

SPARQL Calendar Demo: Using SPARQL to Find, Identify, and Name People

This is the fifth in a series of entries about the SPARQL calendar demo. If you haven't already, you can read the previous entry.

This entry is the first of a few entries that will examine the specific SPARQL queries used in the calendar demo. While SPARQL bears surface resemblances to SQL, querying an RDF graph is a distinct approach from querying a relational data store, and there are several idioms and subtleties that are unique to the SPARQL language. (None of these ideas are new, of course! But as SPARQL has just moved to Candidate Recommendation status, I thought it might be useful to throw some real SPARQL queries out into the wild.)

This query is issued against the current dataset every time a new URI is added to the dataset (either manually or via the discover more people link):

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ical: <http://www.w3.org/2002/12/cal/icaltzd#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?who ?name ?id ?cal 
WHERE {
  ?who rdf:type foaf:Person .
  OPTIONAL { ?who foaf:name ?name }
  OPTIONAL { ?who rdfs:label ?name }
  OPTIONAL {
    { ?who foaf:mbox ?id } 
      UNION 
    { ?who foaf:mbox_sha1sum ?id } 
  }
  OPTIONAL {
    ?who rdfs:seeAlso ?cal .
     ?cal rdf:type ical:Vcalendar 
  }
} ORDER BY ?name

In English, this query asks:

Show me all people along with their names (if found), unique IDs (if found), and calendar URLs (if found) in my current RDF dataset.

There are a few interesting observations that we can take away from this query:

Why all the OPTIONALs? We want to build as exhaustive list of people as we can given our current dataset. When people reference their friends in their FOAF files, the amount of information that they include about them ranges from a URI-only to an IFP-only to a full suite of URI, name, and IFP information. Because we do not know the shape of the data we are querying, we take advantage of the SPARQL OPTIONAL keyword which allows us to include triple patterns which are allowed to not match the data being queried. That is, OPTIONAL ensures that if a person has a name but not an id (an IFP) that we'll receive the name and vice versa; the query will return all the information it can find without failing due to shaggy data.
Why are there two different OPTIONAL blocks that can bind the ?name variable? This idiom takes advantage of the fact that the OPTIONAL keyword is left-associative to express an ordered preference between predicates within our SPARQL query¹. That is:
```
  OPTIONAL { ?who foaf:name ?name }
  OPTIONAL { ?who rdfs:label ?name }
```
can be read as (given that ?who is already bound by the first (non-optional) triple pattern in the query):
Bind ?name to the object of either the foaf:name or rdfs:label predicates; but if both such bindings exist, we prefer the object of foaf:name.
It's a very useful idiom for sure, especially in the absence of a rules-enabled datastore that could map one predicate to another in the absence of a triple with a more-desirable predicate.
Why don't we use the same trick for finding bindings to ?id? This SPARQL query uses the ?id variable to bind to the values of inverse-functional properties (?ifp would likely have been a better name for the variable). Each such property uniquely identifies a person, and the calendar demo uses them to smush together seemingly distinct foaf:Person URIs or bnodes that actually refer to the same person. Because of this, we want to learn about as many IDs as we can and therefore we use the SPARQL UNION keyword to disjunctively include all possible bindings for ?id. (Of course, we wrap the UNION in an OPTIONAL because we want the query pattern to match a person even if no IFPs are found for that person.)
What's that oddness with the calendar gunk in the query? And why is that in this query? OK, you got me there. This bit of functionality doesn't belong here, and in fact is duplicated in the SPARQL query which mines the current RDF dataset to discover new default and named graphs to add to the dataset. I'll discuss that query next time, and explain what this bit of SPARQL is saying. Until then, happy SPARQLing...

¹ The nitty gritty: SPARQL defines A OPTIONAL B OPTIONAL C as (A OPTIONAL B) OPTIONAL C. In the case in question, A is our required triple pattern which binds ?who to the resource or bnode representing a foaf:Person. As per the definition of OPTIONAL then, the parenthesized portion of (A OPTIONAL B) OPTIONAL C will match successfully no matter what (since we're assuming A has already matched a foaf:Person), but will include bindings for B (that is, bindings of ?name to the object of foaf:name) if they exist. In either case, we then examine C. If B matched, then C can only match if it shares the same binding for ?name, so any other value as the object of rdfs:label gets ignored. If B failed to match then ?name remains unbound, and any object of rdfs:label will be bound to ?name. Voila—we have the desired behavior of expressing an ordered preference.

Posted by Lee Feigenbaum at 2:11 AM | Permalink

April 5, 2006

SPARQL Calendar Demo: A SPARQL JavaScript Library

This is the fourth in a series of entries about the SPARQL calendar demo. If you haven't already, you can read the previous entry.

A key component of the calendar demo is our SPARQL JavaScript library. Leigh Dodds blogged about his SPARQL AJAX client a few months back. As one of our motivations for the calendar demo was to explore the JSON serialization of SPARQL queries, though, we whipped up our own library for SPARQL queries. This library features:

...issuing SPARQL SELECT or ASK queries using the SPARQL Protocol for RDF extended with a parameter named output. Joseki as deployed on SPARQLer currently supports:

No output specified; results are returned in the SPARQL Query Results XML Format with a MIME type of application/sparql-results+xml.
output=xml or output=sparql; results are returned in the SPARQL Query Results XML Format with a MIME type of text/plain.
output=json; results are returned via the JSON serialization with a MIME type of text/javascript.
output=any-other-value; results are returned in RDF/XML with MIME type text/plain as a graph using the DAWG's result-set vocabulary for test cases.

...automatically validating and parsing JSON return values into JavaScript objects.
...providing several query wrapper methods and accompanying result transformations to enable direct access to single-valued query results, vectors of query results, and boolean results (for ASK queries). This mechanism could be easily extended to support parsing the XML result format.
...allowing either HTTP GET or HTTP POST to be used when sending queries.
...providing distinct service and query objects such that dataset graphs, prefixes, and other settings can be set service-wide or on a per-query basis.

The library currently has an unmotivated dependency on the Yahoo! connection manager, but this dependency could (and likely will) be easily removed.

Finally, some example usages of the library's API:

var sparqler = new SPARQL.Service("http://sparql.org/sparql");

// graphs and prefixes defined here
// are inherited by all future queries
sparqler.addDefaultGraph("http://thefigtrees.net/lee/ldf-card");
sparqler.addNamedGraph("http://torrez.us/elias/foaf.rdf");
sparqler.setPrefix("foaf", "http://xmlns.com/foaf/0.1/"); 
sparqler.setPrefix("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
	
// "json" is the default output format
sparqler.setOutput("json");

var query = sparqler.createQuery();

// these settings are for this query only
query.addDefaultGraph(...);
query.addNamedGraph(...);
query.setPrefix(...);

// query wrappers:

// passes standard JSON results object to success callback
query.setPrefix("ldf", "http://thefigtrees.net/lee/ldf-card#");
query.query(
  "SELECT ?who ?mbox WHERE { ldf:LDF foaf:knows ?who . ?who foaf:mbox ?mbox }",
  {failure: onFailure, success: function(json) { for (var x in json.head.vars) { ... } ...}}
);

// passes boolean value to success callback
query.ask(
  "ASK ?person WHERE { ?person foaf:knows [ foaf:name "Dan Connolly" ] }",
  {failure: onFailure, success: function(bool) { if (bool) ... }}
); 

// passes a single vector (array) of values 
// representing a single column of results 
// to success callback
query.setPrefix("ldf", "http://thefigtrees.net/lee/ldf-card#");
var addresses = query.selectValues(
  "SELECT ?mbox WHERE { _:someone foaf:mbox ?mbox }",
  {failure: onFailure, success: function(values) { for (var i = 0; i < values.length; i++) { ... values[i] ...} } }
); 

// passes a single value representing a single 
// row of a single column (variable) to success callback
query.setPrefix("ldf", "http://thefigtrees.net/lee/ldf-card#");
var myAddress = query.selectSingleValue(
  "SELECT ?mbox WHERE {ldf:LDF foaf:mbox ?mbox }",
  {failure: onFailure, success: function(value) { alert("value is: " + value); } }
); 
	
// shortcuts for all of the above 
// (w/o ability to set any query-specific graphs or prefixes)
sparqler.query(...);
sparqler.ask(...);
sparqler.selectValues(...);
sparqler.selectSingleValue(...);

Feel free to download and use the library as you see fit. I'll post here when there's any substantive updates to it. In the next entry, I'll start delving into the specific SPARQL queries that drive the calendar demo.

Posted by Lee Feigenbaum at 10:30 PM | Permalink | Comments (1) | TrackBacks (2)

TechnicaLee Speaking

Software designs, implementations, solutions, and musings by Lee Feigenbaum