Why SPARQL?
I'm quite pleased to have played a part in helping SPARQL become a W3C Recommendation. As we were putting together the press release that accompanied the publication of the SPARQL recommendations, Ian Jacobs, Ivan Herman, Tim Berners-Lee, and myself put together some comments (in bullet point form) explaining some of the benefits of SPARQL. They do a good job of capturing a lot of what I find appealing about SPARQL, and I wanted to share them with other people. I don't think these are the best examples of SPARQL's value or the most eloquently expressed, but I do think it captures a lot of the essence of SPARQL. (While some of the text is attributable to me, parts are attributable to Ian, Ivan, and Tim.)
- SPARQL is to the Semantic Web (and, really, the Web in general) what SQL is to relational databases. (This is effectively Tim's quotation from the press release.)
- If we view the Semantic Web as a global collection of databases, SPARQL can make the collection look like one big database. SPARQL enables us to reap the benefits of federation. Examples:
- Federating information from multiple Web sites (mashups)
- Federating information from multiple enterprise databases (e.g. manufacturing and customer orders and shipping systems)
- Federating information between internal and external systems (e.g. for outsourcing, public Web databases (e.g. NCBI), supply-chain partners)
- There are many distinct database technologies in use, and it's of course impossible to dictate a single database technology at the scale of the Web. RDF (the Semantic Web data model), though, serves as a standard lingua franca (least common denominator) in which data from disparate database systems can be represented. SPARQL, then, is the query language for that data. As such, SPARQL hides the details of a sever's particular data management and structure details. This reduces costs and increases robustness of software that issues queries.
- SPARQL saves development time and cost by allowing client applications to work with only the data they're interested in. (This is as opposed to bringing it all down and spending time and money writing software to extract the relevant bits of information.)
- Example: Find US cities' population, area, and mass transit (bus) fare, in order to determine if there is a relationship between population density and public transportation costs.
- Without SPARQL, you might tackle this by writing a first query to pull information from cities' pages on Wikipedia, a second query to retrieve mass transit data from another source, and then code to extract the population and area and bus fare data for each city.
- With SPARQL, this application can be accomplished by writing a single SPARQL query that federates the appropriate data source. The application developer need only write a single query and no additional code.
- SPARQL builds on other standards including RDF, XML, HTTP, and WSDL. This allows reuse of existing software tooling and promotes good interoperability with other software systems. Examples:
- SPARQL results are expressed in XML: XSLT can be used to generate friendly query result displays for the Web
- It's easy to issue SPARQL queries, given the abundance of HTTP library support in Perl, Python, php, Ruby, etc.
Finally, I scribbled down some of my own thoughts on how SPARQL takes the appealing principles of a Service Oriented Architecture (SOA) one step further:
- With SOA, the idea is to move away from tightly-coupled client-server applications in which all of the client code needs to be written specifically for the server code and vice versa. SOA says that if instead we just agree on service interfaces (contracts) then we can develop and maintain services and clients that adhere to these interfaces separately (and therefore more cheaply, scalably, and robustly).
- SPARQL takes some of this one step further. For SOA to work, services (people publishing data) still have to define a service, a set of operations that they'll use to let others get at their information. And someone writing a client application against such a service needs to adhere to the operations in the service. If a service has 5 operations that return various bits of related data and a client application wants some data from a few services but doesn't want most of it, the developer still must invoke all 5 services and then write the logic to extract and join the data relevant for her application. This makes for marginally complex software development (and complex == costly, of course).
- With SPARQL, a service-provider/data-publisher simply provides one service: SPARQL. Since it's a query language accessible over a standard protocol (HTTP), SPARQL can be considered a 'universal service'. Instead of the data publisher choosing a limited number of operations to support a priori and client applications being forced to conform to these operations, the client application can ask precisely the questions it wants to retrieve precisely the information it needs. Instead of 5 service invocations + extra logic to extract and join data, the client developer need only author a single SPARQL query. This makes for a simpler application (and, of course, less costly).
As an example, consider an online book merchant. Suppose I want to create a Web site that finds books by my favorite author that are selling for less than $15, including shipping. The merchant supplies three relevant services:
- Search. Includes search by author. Returns book identifiers.
- Book lookup. Takes a book identifier and returns the title, price, abstract, shipping weight, etc.
- Shipping lookup. Takes total order weight, shipping method, and zip code, and returns a shipping cost.
To create my Web site without SPARQL, I'd need to:
- Invoke the search service. (Query 1)
- Write code to extract the result identifiers and, for each one, invoke the book lookup service. (Code 1, Query 2 (issued multiple times))
- Write code to extract the price and, for each book, invokes the shipping lookup service with that book's weight (Code 2, Query 3 (issued multiple times))
- Write code to add each book's price and shipping cost and check if it's less than $15. (Code 3)
Now, suppose the book merchant exposed this same data via a SPARQL endpoint. The new approach is:
- Use the SPARQL protocol to ask a SPARQL query with all the relevant parameters (Query 1 (issued once))
For the record, the query might look something like:
PREFIX : <http://example.com/service/sparql/>
SELECT ?book ?title
FROM :inventory
WHERE {
?book
a :book ; :author ?author ;
:title ?title ; :price ?price ;
:weight ?weight .
?author :name "My favorite Author" .
FILTER(?price + :shipping(?weight) < 15) .
}
(This example also illustrates another feature of SPARQL: SPARQL is extensible via the use of new FILTER functions that can allow a query to invoke operations (in this case, a function (:shipping) that gives shipping cost for a particular order weight) defined by the SPARQL endpoint.)
Comments
And I'm very pleased with the role you played in getting SPARQL into shape! :-)
I also agree that one of SPARQLs strong points is that "With SPARQL, this application can be accomplished by writing a single SPARQL query that federates the appropriate data source."
But can we prove that case by writing that query and see the results returned right now? We have the endpoints, but can we actually write that single query with the endpoints we have right now?
Posted by: Kjetil Kjernsmo | January 28, 2008 09:08 AM
It's worth checking out, though I actually don't know offhand if the data needed is in the data extracted from wikipedia or elsewhere.
It's now on the TODO list, though if anyone else gets to it first, be my guest :)
Lee
Posted by: Lee | January 28, 2008 10:07 AM
Kudos for the work done for work done in shaping up SPARQL!
Just trying to understand your post better,the example about the US cities and their population and the one you gave about bookshop,do they highlight the same feature of SPARQL about working only on data/operations of interest or there are some differences? Just trying to understand the benefit I can achieve.If possible,please clarify!
Thanks...
Posted by: Prateek | January 28, 2008 03:11 PM
Prateek,
Good point. The two examples are really getting at the same point. (That SPARQL often allows a single query to replace what would otherwise requires multiple queries and code to extract relevant information.)
The reason they're both there is just because the two examples happened to be created at different times in slightly different contexts. :)
Lee
Posted by: Lee | January 28, 2008 03:16 PM
Hello. Interesting post. So you say "SPARQL saves development time and cost". Can you actually prove this?
Posted by: Duncan Hull | January 29, 2008 04:37 AM
to Duncan Hull:
Actually I would say that Lee needs to prove absolutely nothing. Because companies using SPARQL - and in particular our product - The Semantic Discovery System are independently proving the extraordinary value of SPARQL. Its almost too obvious to be believable. Here's my explanation:
Suppose you are some bod in a large company wanting query across multiple distributed, heterogeneous data silos (i.e. a pervasive need). You know there is valuable knowledge embedded/hidden in the data, - implicit relationships between data points you want to make explicit. OK so far?
Well now, with SPARQL, instead of having to write SQL, build a data warehouse, or spend ages having programmers write some application - with SPARQL you simply state what columns (i.e. as you find in database tables or spreadsheets) you want to have in your result set - and any filters on those columns (i.e. column values) - and boom! SPARQL automatically knows everything that would have to be painstakingly specified in SQL, i.e. no more complex table joins etc etc... Any records that contain values in all the specified set of columns/filters are presented.
As an end user it is now a total simplicity to express your query in terms of the *shape* of the result you want back (essentially the columns/value filters)...
This is a literal marvel, something that has never been possible before - and if Duncan takes the time to look into this he will see it is already fully proven.
Ian Goldsmid
http://www.Meaning2Go.com
Posted by: Ian Goldsmid | February 27, 2008 10:13 PM