" /> TechnicaLee Speaking: January 2008 Archives

« December 2007 | Main | February 2008 »

January 25, 2008

Why SPARQL?

I'm quite pleased to have played a part in helping SPARQL become a W3C Recommendation. As we were putting together the press release that accompanied the publication of the SPARQL recommendations, Ian Jacobs, Ivan Herman, Tim Berners-Lee, and myself put together some comments (in bullet point form) explaining some of the benefits of SPARQL. They do a good job of capturing a lot of what I find appealing about SPARQL, and I wanted to share them with other people. I don't think these are the best examples of SPARQL's value or the most eloquently expressed, but I do think it captures a lot of the essence of SPARQL. (While some of the text is attributable to me, parts are attributable to Ian, Ivan, and Tim.)


  • SPARQL is to the Semantic Web (and, really, the Web in general) what SQL is to relational databases. (This is effectively Tim's quotation from the press release.)
  • If we view the Semantic Web as a global collection of databases, SPARQL can make the collection look like one big database. SPARQL enables us to reap the benefits of federation. Examples:
    • Federating information from multiple Web sites (mashups)
    • Federating information from multiple enterprise databases (e.g. manufacturing and customer orders and shipping systems)
    • Federating information between internal and external systems (e.g. for outsourcing, public Web databases (e.g. NCBI), supply-chain partners)
  • There are many distinct database technologies in use, and it's of course impossible to dictate a single database technology at the scale of the Web. RDF (the Semantic Web data model), though, serves as a standard lingua franca (least common denominator) in which data from disparate database systems can be represented. SPARQL, then, is the query language for that data. As such, SPARQL hides the details of a sever's particular data management and structure details. This reduces costs and increases robustness of software that issues queries.
  • SPARQL saves development time and cost by allowing client applications to work with only the data they're interested in. (This is as opposed to bringing it all down and spending time and money writing software to extract the relevant bits of information.)
    • Example: Find US cities' population, area, and mass transit (bus) fare, in order to determine if there is a relationship between population density and public transportation costs.
    • Without SPARQL, you might tackle this by writing a first query to pull information from cities' pages on Wikipedia, a second query to retrieve mass transit data from another source, and then code to extract the population and area and bus fare data for each city.
    • With SPARQL, this application can be accomplished by writing a single SPARQL query that federates the appropriate data source. The application developer need only write a single query and no additional code.
  • SPARQL builds on other standards including RDF, XML, HTTP, and WSDL. This allows reuse of existing software tooling and promotes good interoperability with other software systems. Examples:
    • SPARQL results are expressed in XML: XSLT can be used to generate friendly query result displays for the Web
    • It's easy to issue SPARQL queries, given the abundance of HTTP library support in Perl, Python, php, Ruby, etc.

Finally, I scribbled down some of my own thoughts on how SPARQL takes the appealing principles of a Service Oriented Architecture (SOA) one step further:

  • With SOA, the idea is to move away from tightly-coupled client-server applications in which all of the client code needs to be written specifically for the server code and vice versa. SOA says that if instead we just agree on service interfaces (contracts) then we can develop and maintain services and clients that adhere to these interfaces separately (and therefore more cheaply, scalably, and robustly).
  • SPARQL takes some of this one step further. For SOA to work, services (people publishing data) still have to define a service, a set of operations that they'll use to let others get at their information. And someone writing a client application against such a service needs to adhere to the operations in the service. If a service has 5 operations that return various bits of related data and a client application wants some data from a few services but doesn't want most of it, the developer still must invoke all 5 services and then write the logic to extract and join the data relevant for her application. This makes for marginally complex software development (and complex == costly, of course).
  • With SPARQL, a service-provider/data-publisher simply provides one service: SPARQL. Since it's a query language accessible over a standard protocol (HTTP), SPARQL can be considered a 'universal service'. Instead of the data publisher choosing a limited number of operations to support a priori and client applications being forced to conform to these operations, the client application can ask precisely the questions it wants to retrieve precisely the information it needs. Instead of 5 service invocations + extra logic to extract and join data, the client developer need only author a single SPARQL query. This makes for a simpler application (and, of course, less costly).

As an example, consider an online book merchant. Suppose I want to create a Web site that finds books by my favorite author that are selling for less than $15, including shipping. The merchant supplies three relevant services:

  1. Search. Includes search by author. Returns book identifiers.
  2. Book lookup. Takes a book identifier and returns the title, price, abstract, shipping weight, etc.
  3. Shipping lookup. Takes total order weight, shipping method, and zip code, and returns a shipping cost.

To create my Web site without SPARQL, I'd need to:

  1. Invoke the search service. (Query 1)
  2. Write code to extract the result identifiers and, for each one, invoke the book lookup service. (Code 1, Query 2 (issued multiple times))
  3. Write code to extract the price and, for each book, invokes the shipping lookup service with that book's weight (Code 2, Query 3 (issued multiple times))
  4. Write code to add each book's price and shipping cost and check if it's less than $15. (Code 3)

Now, suppose the book merchant exposed this same data via a SPARQL endpoint. The new approach is:

  1. Use the SPARQL protocol to ask a SPARQL query with all the relevant parameters (Query 1 (issued once))

For the record, the query might look something like:

PREFIX : <http://example.com/service/sparql/>
SELECT ?book ?title
  FROM :inventory
 WHERE {
  ?book 
    a :book ; :author ?author ; 
    :title ?title ; :price ?price ;
    :weight ?weight .
  ?author :name "My favorite Author" .
  FILTER(?price + :shipping(?weight) < 15) .
}

(This example also illustrates another feature of SPARQL: SPARQL is extensible via the use of new FILTER functions that can allow a query to invoke operations (in this case, a function (:shipping) that gives shipping cost for a particular order weight) defined by the SPARQL endpoint.)