<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
   <channel>
      <title>TechnicaLee Speaking</title>
      <link>http://www.thefigtrees.net/lee/blog/</link>
      <description>Software designs, implementations, solutions, and musings by Lee Feigenbaum</description>
      <language>en</language>
      <copyright>Copyright 2008</copyright>
      <lastBuildDate>Thu, 05 Jun 2008 12:48:35 -0500</lastBuildDate>
      <generator>http://www.sixapart.com/movabletype/?v=3.2</generator>
      <docs>http://blogs.law.harvard.edu/tech/rss</docs> 

            <item>
         <title>The Open Anzo Command Line Interface</title>
         <description><![CDATA[<p>We're continuing to work feverishly at <a href="http://www.cambridgesemantics.com">Cambridge Semantics</a>, and one of the main focal points of our efforts is the upcoming (later this year) release of Open Anzo 3.0. In February I <a href="http://www.thefigtrees.net/lee/blog/2008/02/anzo_building_semantic_applica.html">wrote</a> a bit about the core client APIs that we've stabilized for this release. Today, I wanted to share a huge development-productivity aid that uses the Anzo.java client implementation: a feature-rich command-line client.</p> <p>Joe Betz, who added and <a href="http://groups.google.com/group/openanzo/browse_thread/thread/0e1f9c3ccda053a5#">announced</a> the new command line interface a few weeks ago, also wrote an excellent <a href="http://www.openanzo.org/projects/openanzo/wiki/CommandLineInterface">guide to getting setup and using the client</a>. I heartily recommend the guide, but to whet your appetite, here's an example interaction with the CLI client. (This interaction occurs after the install and configuring of default settings for the client, as given in the guide. It also assumes a running Anzo server (as per the "Quick Start" section in the guide).)</p>  <center><embed src="http://www.youtube.com/v/pBeDYCA8oDk" width="425" height="350" type="application/x-shockwave-flash"> </embed></center>]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2008/06/the_open_anzo_command_line_int.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2008/06/the_open_anzo_command_line_int.html</guid>
         <category>semantic web</category>
         <pubDate>Thu, 05 Jun 2008 12:48:35 -0500</pubDate>
      </item>
            <item>
         <title>Lee @ SemTech next week</title>
         <description><![CDATA[<p>I'll be heading to <a href="http://semantic-conference.com/">SemTech</a> this weekend and am looking forward to meeting a lot of new people and seeing a lot of familiar, friendly faces. I'm particularly excited about the <a href="http://semantic-conference.com/session/588/">presentation</a> that I'll be giving on Wednesday morning. In conjunction with Brand NIemann of the <a href="http://www.epa.gov/">U.S. EPA</a>, I'll be demonstrating some of the work that Cambridge Semantics has been doing to work with spreadsheets as a first-class source of semantic data. Our team has done a fantastic job building a user experience that's tightly integrated into Excel, and in doing so has provided a very easy way to free information from the confines of the spreadsheet. </p> <p>I'm going to show a few different scenarios that involve linking data between different spreadsheets, reusing spreadsheet data on the Web, keeping live data updated in real-time, and more. Much of the presentation and demonstration is in the context of the U.S. Census Bureau's <a href="http://www.census.gov/compendia/statab/">Statistical Abstract</a>, and I'll also be showing how the same software can be applied to conference data from SemTech itself.</p> <p>If you're planning to be at SemTech next week, please <a href="mailto:lee@thefigtrees.net">drop me a note</a> so that I can come and say hi there. And if you are there, please come and see my presentation:</p> <p><strong>Title: </strong>Getting to Web Semantics for Spreadsheets in the U.S. Government<br><strong>Day: </strong>Wednesday, May 21, 2008<br style="margin: 0px"><strong>Time: </strong>08:30 AM - 09:30 AM </p>]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2008/05/lee_semtech_next_week.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2008/05/lee_semtech_next_week.html</guid>
         <category>semantic web</category>
         <pubDate>Thu, 15 May 2008 11:42:59 -0500</pubDate>
      </item>
            <item>
         <title><![CDATA[Now available online - Scientific American: &quot;The Semantic Web in Action&quot;]]></title>
         <description><![CDATA[<p>I <a href="http://thefigtrees.net/lee/blog/2007/12/scientific_american_the_semant.html">blogged previously</a> about my experience co-authoring an article on the Semantic Web for <em>Scientific American</em>. Since then, <em>Scientific American</em> has granted me permission to publish the text of the article on my Web site. So please feel free to enjoy the article and share it with others: <a href="http://thefigtrees.net/lee/sw/sciam/semantic-web-in-action">"The Semantic Web In Action"</a></p> <p>A few notes:</p> <ul> <li>The default view of the article breaks it into multiple pages to make it more easily digestible and bookmarkable. There is a link at the top and bottom to a single-page version suitable for printing and reading offline. Or if you just happen to prefer reading it like that.</li> <li>The article text is followed by the text of the article's sidebars. There are links back and forth between the main text and the relevant sidebars. Most of the sidebars in the article included artwork which I do not have permission to reproduce online at this time.</li> <li>At the end of the article I've gathered links to the various companies, projects, and technologies referenced in the article. (The terms of the reproduction rights from <em>Scientific American</em> prohibit adding links within the main content of the article.)</li></ul> <p>Please let me know what you think. Also, if you have any trouble reading or printing the article, let me know as well. (I whipped together some JavaScript to do the pagination while maintaining the browser's back button and internal anchors and things like that, so there may be some bugs. I'll write more about the JavaScript some other time.)</p>]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2008/03/now_available_online_scientifi.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2008/03/now_available_online_scientifi.html</guid>
         <category>semantic web</category>
         <pubDate>Wed, 26 Mar 2008 10:26:26 -0500</pubDate>
      </item>
            <item>
         <title>Gathering SPARQL Extensions</title>
         <description><![CDATA[<p>I realized that I hadn't blogged a pointer to the compilation of SPARQL extensions that I've created on the ESW wiki. Quoting <a href="http://lists.w3.org/Archives/Public/public-sparql-dev/2008JanMar/0020.html">myself</a>:</p> <blockquote> <p>Over the DAWG's lifetime (and since publication of the SPARQL Recommendations in January), there have been many important features that have been discussed but did not get included in the SPARQL specifications. I -- and many others -- hope that many of these topics will be addressed by a future working group, though there are no concrete plans for such a group at this time.</p> <p>In the interest of cataloging these extensions and encouraging SPARQL developers to seek interoperable implementations of SPARQL extensions, I've created:</p> <p><br>&nbsp;&nbsp; <a href="http://esw.w3.org/topic/SPARQL/Extensions">http://esw.w3.org/topic/SPARQL/Extensions</a></p> <p><br>That page links to individual pages for (currently) 13 categories of SPARQL extensions. Each of those pages, in turn, discusses the relevant type of SPARQL extension and attempts to provide links to research, discussion, and implementations of the extension.</p> <p><br>I also plan to use this list to help encourage user- and implementor-driven discussion of these extensions over the coming months. Again, the goal is to allow SPARQL users to make known what features are most important to them and also to allow implementations to seek common syntaxes and semantics for SPARQL extensions. (All of this, in the end, should help a future working group charter a new version of SPARQL and produce a specification that allows for interoperable SPARQL v2 implementations.)<br><br>It's a wiki. Please add references that are not there, new topics, or discussions of existing topics. (I've tried to reuse existing ESW Wiki pages for some topics that already had discussion.)</p></blockquote> <p>Where I say "this list" above, I mean <a href="mailto:public-sparql-dev@w3.org">public-sparql-dev@w3.org</a>. Please subscribe if you're interested in discussing any or all of these potential SPARQL extensions.</p>]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2008/03/gathering_sparql_extensions.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2008/03/gathering_sparql_extensions.html</guid>
         <category>semantic web</category>
         <pubDate>Tue, 18 Mar 2008 19:30:07 -0500</pubDate>
      </item>
            <item>
         <title>Semantic Web tutorial</title>
         <description><![CDATA[<p>Last week, <a href="http://www.w3.org/People/Eric/">Eric Prud'hommeaux</a> and I presented a tutorial on Semantic Web technologies at the <a href="http://www.iscb.org/cshals2008/">Conference on Semantics in Healthcare &amp; Life Sciences (C-SHALS)</a>. It was a four-hour session covering an intro to RDF, SPARQL, GRDDL, RDFa, RDFS, and OWL, mostly in the context of health care (patients' clinical examination records) and life sciences (pyramidal neurons in Alzheimer's Disease, as per the <a href="http://www.w3.org/2001/sw/hcls/notes/kb/#usecase">W3C HCLS interest group's knowledgebase use case</a>). We reprised the GRDDL and RDFa sections yesterday in a whirlwind 15-20 minute talk at yesterday's <a href="http://esw.w3.org/topic/CambridgeSemanticWebGatherings/Meeting/2008-03-11_Gathering">Cambridge Semantic Web gathering</a>.</p> <p>Enjoy the <a href="http://www.w3.org/2008/Talks/0305-C-SHALS/">slides</a>. I'd welcome any suggestions so that the slides can be enhanced and reused (by myself and others) in the future.</p>]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2008/03/semantic_web_tutorial.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2008/03/semantic_web_tutorial.html</guid>
         <category>semantic web</category>
         <pubDate>Wed, 12 Mar 2008 19:29:04 -0500</pubDate>
      </item>
            <item>
         <title>Modeling Statistics in RDF - A Survey and Discussion</title>
         <description><![CDATA[<a id="top" name="top"></a> <p>At the Semantic Technologies Conference in San Jose in May, <a href="http://semanticommunity.wik.is/People/Brand_Niemann">Brand Niemann</a> of the <a href="http://epa.gov/">U.S. EPA</a> and I are presenting <a href="http://semantic-conference.com/session/588/"><i>Getting to Web Semantics for Spreadsheets in the U.S. Government</i></a>. In particular, Brand and I are working to exploit the semantics implicit in the nearly 1,500 spreadsheets that are in the U.S. Census Bureau's annual <a href="http://www.census.gov/compendia/statab/">Statistical Abstract of the United States</a>. The rest of this post discusses various strategies for modeling this sort of statistical data in RDF; for more information on the background of this work, please see <a href="http://semanticommunity.wik.is/@api/deki/files/285/=LFeigenbaum02052008.ppt">my presentation from the February 5, 2008, SICoP Special Conference</a>.)</p> <p>The data for the <i>Statistical Abstract</i> is effectively time-based statistics. There are a variety of ways that this information can be modeled as semantic data. The approaches differ in simplicity/complexity, semantic expressivity, and verbosity. At least as interestingly, they vary in precisely <em>what</em> they are modeling: statistical data or a particular domain of discourse. The goal of this effort is to examine the potential approaches to modeling this information in terms of ease of reuse, ease of query, ability to integrate with information from all 1,500 spreadsheets (and other sources), and the ability to enhance the model incrementally with richer semantics. There are surely other approaches to modeling this information as well: <b>I'd love to here any ideas or suggestions for other approaches to consider.</b></p> <table class="toc" id="toc" summary="Contents"> <tbody> <tr> <td> <div id="toctitle"> <h2>Contents</h2><span class="toctoggle">[<a class="internal" id="togglelink" href="javascript:toggleToc()">hide</a>]</span></div> <ul> <li class="toclevel-1"><a href="#D2R_Server_for_Eurostat"><span class="tocnumber">1</span> <span class="toctext">D2R Server for Eurostat</span></a>  <li class="toclevel-1"><a href="#The_2000_U.S._Census"><span class="tocnumber">2</span> <span class="toctext">The 2000 U.S. Census</span></a>  <li class="toclevel-1"><a href="#Riese:_RDFizing_and_Interlinking_the_EuroStat_Data_Set_Effort"><span class="tocnumber">3</span> <span class="toctext">Riese: RDFizing and Interlinking the EuroStat Data Set Effort</span></a>  <li class="toclevel-1"><a href="#Summary"><span class="tocnumber">4</span> <span class="toctext">Summary</span></a>  <li class="toclevel-1"><a href="#Statistical_Abstract_data"><span class="tocnumber">5</span> <span class="toctext">Statistical Abstract data</span></a>  <ul> <li class="toclevel-2"><a href="#Simple_with_time"><span class="tocnumber">5.1</span> <span class="toctext">Simple with time</span></a>  <ul> <li class="toclevel-3"><a href="#Simple_point-in-time_approach"><span class="tocnumber">5.1.1</span> <span class="toctext">Simple point-in-time approach</span></a>  <li class="toclevel-3"><a href="#Complex_point-in-time_approach"><span class="tocnumber">5.1.2</span> <span class="toctext">Complex point-in-time approach</span></a>  <li class="toclevel-3"><a href="#Complex_Statistics_Over_Time"><span class="tocnumber">5.1.3</span> <span class="toctext">Complex Statistics Over Time</span></a> </li></ul></li></ul> <li class="toclevel-1"><a href="#Conclusion"><span class="tocnumber">6</span> <span class="toctext">Conclusion</span></a> </li></ul></td></tr></tbody></table> <script type="text/javascript"> if (window.showTocToggle) { var tocShowText = "show"; var tocHideText = "hide"; showTocToggle(); } </script> <a name="D2R_Server_for_Eurostat"></a> <h2><span class="mw-headline">D2R Server for Eurostat </span></h2> <p>The <a class="external text" title="http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/" href="http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/" rel="nofollow">D2R server</a> guys host an <a class="external text" title="http://www4.wiwiss.fu-berlin.de/eurostat/" href="http://www4.wiwiss.fu-berlin.de/eurostat/" rel="nofollow">RDF copy</a> of the <a class="external text" title="http://ec.europa.eu/eurostat" href="http://ec.europa.eu/eurostat" rel="nofollow">Eurostat</a> collection of European economic, demographic, political, and geographic data. From the start, they make the simplifying assumption that: </p> <blockquote>Most statistical data are time series, therefore only the latest availabe value is provided here.</blockquote> <p>In other words, they do not try to capture historic statistics at all. The disclaimer also notes that what is modeled in RDF is a small subset of the available data tables. </p> <p><a class="external text" title="http://www4.wiwiss.fu-berlin.de/eurostat/snorql/?query=SELECT+DISTINCT+%3Fp+WHERE+%7B%0D%0A+%3Fs+%3Fp+%3Fo+.%0D%0A%7D&amp;prefixes=PREFIX+owl%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0D%0APREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+map%3A+%3Cfile%3A%2F%2F%2FC%3A%2Fapps%2Feurostat%2Fd2r-server-0.3.2%2F..%2Feurostat.n3%23%3E%0D%0APREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0APREFIX+db%3A+%3Chttp%3A%2F%2Fwww4.wiwiss.fu-berlin.de%2Feurostat%2Fresource%2F%3E%0D%0APREFIX+d2r%3A+%3Chttp%3A%2F%2Fsites.wiwiss.fu-berlin.de%2Fsuhl%2Fbizer%2Fd2r-server%2Fconfig.rdf%23%3E%0D%0APREFIX+eurostat%3A+%3Chttp%3A%2F%2Fwww4.wiwiss.fu-berlin.de%2Feurostat%2Fresource%2Feurostat%2F%3E%0D%0A" href="http://www4.wiwiss.fu-berlin.de/eurostat/snorql/?query=SELECT+DISTINCT+%3Fp+WHERE+%7B%0D%0A+%3Fs+%3Fp+%3Fo+.%0D%0A%7D&amp;prefixes=PREFIX+owl%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0D%0APREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+map%3A+%3Cfile%3A%2F%2F%2FC%3A%2Fapps%2Feurostat%2Fd2r-server-0.3.2%2F..%2Feurostat.n3%23%3E%0D%0APREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0APREFIX+db%3A+%3Chttp%3A%2F%2Fwww4.wiwiss.fu-berlin.de%2Feurostat%2Fresource%2F%3E%0D%0APREFIX+d2r%3A+%3Chttp%3A%2F%2Fsites.wiwiss.fu-berlin.de%2Fsuhl%2Fbizer%2Fd2r-server%2Fconfig.rdf%23%3E%0D%0APREFIX+eurostat%3A+%3Chttp%3A%2F%2Fwww4.wiwiss.fu-berlin.de%2Feurostat%2Fresource%2Feurostat%2F%3E%0D%0A" rel="nofollow">Executing</a> a <tt>SELECT DISTINCT ?p { ?s ?p ?o }</tt> to learn more about this dataset tells us: </p><pre>   db:eurostat/population_total
   db:eurostat/electricity_consumption_GWh
   db:eurostat/killed_in_road_accidents
   db:eurostat/RnD_exp_mio_euro
   db:eurostat/parentcountry
   db:eurostat/population_male
   rdfs:label
   db:eurostat/RnD_personel_percent_of_act_pop
   db:eurostat/total_average_population
   db:eurostat/population_female
   db:eurostat/unemployment_rate_total
   db:eurostat/avg_annual_population_growth
   db:eurostat/total_area_km2
   db:eurostat/name_encoded
   db:eurostat/disposable_income
   db:eurostat/injured_in_road_accidents
   db:eurostat/electricity_production_capacity_MWh
   db:eurostat/hospital_beds_per100000hab
   db:eurostat/name
   db:eurostat/landuse_total
   db:eurostat/GDP
   db:eurostat/geocode
   owl:sameAs
   rdf:type
   db:eurostat/level_of_internetaccess_households
   db:eurostat/death_rate
   db:eurostat/fertility_rate_total
   db:eurostat/level_of_internet_access
   db:eurostat/marriages
   db:eurostat/ecommerce_via_internet
   db:eurostat/pupils_and_students
   db:eurostat/inflation_rate
   db:eurostat/employment_rate_total
   db:eurostat/average_exit_age_from_laborforce
   db:eurostat/comparative_price_levels
   db:eurostat/GDP_current_prices
   db:eurostat/GDP_per_capita_PPP
   db:eurostat/monthly_labour_costs
</pre>
<p>I make a few observations from this: </p>
<ul>
<li>Most of these are predicates that correspond to a statistical category. I'm curious what the types of the subjects are. The query here is (the filter is added to limit the question to resources that use the Eurostat predicates): <pre> SELECT DISTINCT ?t WHERE {&nbsp; ?s rdf:type ?t .&nbsp; ?s ?p ?o .
  FILTER(regex(str(?p), 'eurostat') )
 }
</pre>The result is two types: regions and countries. Simple enough. 
<li>I'm also curious as to the types of the objects. Let's see if there are any resources (URIs) as objects. We do the <tt>?s ?p ?o</tt> query from before but add in <tt>FILTER(isURI(?o))</tt>. The result shows that, aside from <tt>rdf:type</tt> and <tt>owl:sameAs</tt> (which we expected), only the predicate <tt>db:eurostat/parentcountry</tt> points to other resources. Doing a query on this predicate, we see that it relates regions (e.g. <tt>db:regions/Lorraine</tt>) to countries (e.g. <tt>db:countries/France</tt>). 
<li>I'd expect that, especially in the absence of time-based data, they don't have object structures with blank nodes. Changing the previous filter to use <tt>isBlank</tt> confirms that this is true. 
<li>So what are the types of the other data? Strings? Numbers? Let's find out. Poking around with various values for <tt>XXX</tt> in the filter <tt>FILTER(isLiteral(?o) &amp;&amp; datatype(?o) = XXX)</tt> we see that some data uses <tt>xsd:string</tt>s while other data uses <tt>xsd:double</tt>. Poking around at the remaining predicates, we discover that they use <tt>xsd:long</tt> for non-decimal numbers. 
<li>What are they using <tt>owl:sameAs</tt> for? Executing <tt>SELECT ?s ?o { ?s owl:sameAs ?o }</tt> shows what I suspected: they're equating URIs that they've minted under a Eurostat namespace (<tt>http://www4.wiwiss.fu-berlin.de/eurostat/resource/</tt>) to <a class="external text" href="http://dbpedia.org">DBPedia</a> URIs (to broaden the linked data Web). Let's see if they use <tt>owl:sameAs</tt> for anything else. We add <tt>FILTER(!regex(str(?o), 'dbpedia'))</tt> and the query now returns no results. </li></ul><a name="The_2000_U.S._Census"></a>
<h2><span class="mw-headline">The 2000 U.S. Census </span></h2>
<p><a class="external text" title="http://razor.occams.info/" href="http://razor.occams.info/" rel="nofollow">Joshua Tauberer</a> converted the 2000 U.S. Census Data into <a class="external text" title="http://www.rdfabout.com/demo/census/" href="http://www.rdfabout.com/demo/census/" rel="nofollow">1 billion RDF triples</a>. He provides a well-documented <a class="external text" title="http://razor.occams.info/code/repo/?action=download&amp;url=/govtrack/census/census.pl" href="http://razor.occams.info/code/repo/?action=download&amp;url=/govtrack/census/census.pl" rel="nofollow">perl script</a> that can convert various subsets of the census data into N3. One mode that this script can be run in is to output the schema from SAS table layout files. Joshua's <a class="external text" title="http://www.rdfabout.com/demo/census/#aboutthedata" href="http://www.rdfabout.com/demo/census/#aboutthedata" rel="nofollow">about</a> provides an overview of the data. In particular, I note that he is working with tables that are multiple levels deep (e.g. population by sex and then by age). </p>
<p>The most useful part of the writeup, though, is the writeup specifically about <a class="external text" title="http://www.rdfabout.com/demo/census/#modeling" href="http://www.rdfabout.com/demo/census/#modeling" rel="nofollow">modeling the census data in RDF</a>. In general, Joshua models nested levels of statistical tables (representing multiple facets of the data) as a chain of predicates (with the interim nodes as blank nodes). If a particular criterion is further subdivided, then the aggregate total at that level is linked with <tt>rdf:value</tt>. Otherwise, the value is given as the object itself. Note that the subjects are not real-world entities ("the U.S.") but instead are data tables ("the U.S. census tables"). The entities themselves are related to the data tables via a <tt>details</tt> predicate. The below excerpt combines both types of information (the entity itself followed by the data tables above the entity): </p><pre> @prefix rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
 @prefix dc: &lt;http://purl.org/dc/elements/1.1/&gt; .
 @prefix dcterms: &lt;http://purl.org/dc/terms/&gt; .
 @prefix : &lt;tag:govshare.info,2005:rdf/census/details/100pct&gt; .
 @prefix politico: &lt;http://www.rdfabout.com/rdf/schema/politico/&gt; .
 @prefix census: &lt;http://www.rdfabout.com/rdf/schema/census/&gt; .

 &lt;http://www.rdfabout.com/rdf/usgov/geo/us&gt;
   a politico:country ;
   dc:title "United States" ;
   census:households 115904641 ;
   census:waterarea "664706489036 m^2" ;
   census:population 281421906 ;
   census:details &lt;http://www.rdfabout.com/rdf/usgov/geo/us/censustables&gt; ;
   dcterms:hasPart &lt;http://www.rdfabout.com/rdf/usgov/geo/us/al&gt;, &lt;http://www.rdfabout.com/rdf/usgov/geo/us/az&gt;, ...
 .

 &lt;http://www.rdfabout.com/rdf/usgov/geo/us/censustables&gt;&nbsp; :totalPopulation 281421906 ;     # P001001
   :totalPopulation [
      dc:title "URBAN AND RURAL (P002001)";
      rdf:value 281421906 ;   # P002001
      :urban [
         rdf:value 222360539 ;  # P002002
         :insideUrbanizedAreas 192323824 ;   # P002003
         :insideUrbanClusters 30036715 ;     # P002004
      ] 
      :rural 59061367 ;   # P002005
   ] 
   :totalPopulation [
     dc:title "RACE (P003001)";
     rdf:value 281421906 ;   # P003001
   :populationOfOneRace [
       rdf:value 274595678 ;    # P003002
       :whiteAlone 211460626 ;     # P003003
       :blackOrAfricanAmericanAlone 34658190 ;     # P003004
       :americanIndianAndAlaskaNativeAlone 2475956 ;   # P003005
   ]
 ...
</pre>
<p>This is an inconsistent modeling (which Joshua admits himself in the description). Note for instance how <tt>:totalPopulation &gt; :urban</tt> has a <tt>rdf:value</tt> link to the aggregate US urban population. When you go one level deeper though, <tt>:totalPopulation &gt; :urban &gt; :insideUrbanizedAreas</tt> has an object which is itself the value of that statistic. </p>
<p>As I see it, this inconsistency could be avoided in two ways: </p>
<ol>
<li>Always insist that a statistic hangs off of a resource (URI or blank node) via the <tt>rdf:value</tt> predicate. 
<li>Allow a criterion/classificaiton predicate to point both to a literal (aggregate) value, and also to further subdivisions. This would allow the above example to have a triple which was <tt>:totalPopulation &gt; :urban &gt; 222360539</tt> in addition to the further nested <tt>:totalPopulation &gt; :urban &gt; :insideUrbanizedAreas &gt; 192323824</tt>. </li></ol>
<p>The second approach seems simpler to me (less triples). It can be queried with an <tt>isLiteral</tt> filter restriction. The first approach might be a slightly simpler query, as it would always just query for <tt>rdf:value</tt>. (The queries would be about the same size, but the <tt>rdf:value</tt> approach is a bit clearer to read than the <tt>isLiteral</tt> filter approach.) </p>
<p>As an aside, this statement from Joshua is a telling factor on the value of what we are doing with the <i>U.S. Statistical Abstract</i> data: </p>
<blockquote>(If you followed <tt>Region &gt; households &gt; nonFamilyHouseholds</tt> you would get the number of households, not people, that are <tt>nonFamilyHouseHolds</tt>. To know what a "non-family household" is, you would have to consult the PDFs published by the Census.)</blockquote><a name="Riese:_RDFizing_and_Interlinking_the_EuroStat_Data_Set_Effort"></a>
<h2><span class="mw-headline">Riese: RDFizing and Interlinking the EuroStat Data Set Effort </span></h2>
<p><a class="external text" title="http://riese.joanneum.at/" href="http://riese.joanneum.at/" rel="nofollow">Riese</a> is another effort to convert the <a class="external text" title="http://epp.eurostat.ec.europa.eu/portal/page?_pageid=1090,30070682,1090_33076576&amp;_dad=portal&amp;_schema=PORTAL" href="http://epp.eurostat.ec.europa.eu/portal/page?_pageid=1090,30070682,1090_33076576&amp;_dad=portal&amp;_schema=PORTAL" rel="nofollow">EuroStat</a> data to RDF. It seeks to expand on the coverage of the D2R effort. Project discussion is available on <a class="external text" title="http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/EuroStat" href="http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/EuroStat" rel="nofollow">an ESW wiki page</a>, but the main details of the effort are on the project's <a class="external text" title="http://riese.joanneum.at/about.html" href="http://riese.joanneum.at/about.html" rel="nofollow">about page</a>. Currently, riese only provides five million out of the three <i>billion</i> triples that it seeks to provide. </p>
<p>The <a class="external text" title="http://riese.joanneum.at/about.html#Under_the_hood" href="http://riese.joanneum.at/about.html#Under_the_hood" rel="nofollow">under the hood</a> section of the about page links to the <a class="external text" title="http://riese.joanneum.at/schema/core" href="http://riese.joanneum.at/schema/core" rel="nofollow">riese schema</a>. (Note: this is a simple RDF schema; no OWL in sight.) The schema models statistics as <tt>item</tt>s that link to <tt>time</tt>s, <tt>dataset</tt>s, <tt>dimension</tt>s, <tt>geo</tt> information, and a value (using <tt>rdf:value</tt>). </p>
<p>Every statistical data item is a <tt>riese:item</tt>. <tt>riese:item</tt>s are qualified with <tt>riese:dimension</tt>s, one of which is, in particular, <tt>dimension:Time</tt>. </p>
<p>The <a class="external text" title="http://riese.joanneum.at/ask.html" href="http://riese.joanneum.at/ask.html" rel="nofollow">"ask" page</a> gives two sample queries over the EuroStat RDF data, but those only deal in the datasets. RDF can be retrieved for the various Riese tables and data items by appending <tt>/content.rdf</tt> to the items' URIs and doing an HTTP <tt>GET</tt>. Here's an example of some of the RDF for a particular data item (this is not strictly legal Turtle, but you'll get the point): </p><pre>@prefix : &lt;http://riese.joanneum.at/data/&gt; .
@prefix riese: &lt;http://riese.joanneum.at/schema/core#&gt; .
@prefix dim: &lt;http://riese.joanneum.at/dimension/&gt; .
@prefix dim-schema: &lt;http://riese.joanneum.at/schema/dimension/&gt; .

:bp010 a riese:dataset ;
  # all dc:title's repeated as rdfs:label
  dc:title "Current account - monthly: Total" ;
  riese:data_start "2002m10" ; # proprietary format?
  riese:data_end   "2007m09" ;
  riese:structure  "geo\time" ; # not sure of this format
  riese:datasetOf :bp010/2007m03_ea .

:bp010/2007m03_ea a riese:Item ;
  dc:title "Table: bp010, dimensions: ea, time: 2007m03" ;
  rdf:value "7093" ; # not typed
  riese:dimension dim:geo/ea ;
  riese:dimension dim:time/2007m03 ;
  riese:dataset :bp010 .

dim:geo/ea a dim-schema:Geo .
  dc:title "Euro area (EA11-2000, EA12-2006, EA13-2007, EA15)" .

dim:time/2007m03 a dim-schema:Time .
  dc:title "" . # oops

dim-schema:Geo rdfs:subClassOf riese:Dimension ; dc:title "Geo" .
dim-schema:Time rdfs:subClassOf riese:Dimension ; dc:title "Time" .
</pre>
<p>(A lot of this is available in <a class="external text" title="http://riese.joanneum.at/dump/dic.nt" href="http://riese.joanneum.at/dump/dic.nt" rel="nofollow">dic.nt</a> (39 MB).) </p><a name="Summary"></a>
<h2><span class="mw-headline">Summary </span></h2>
<p>In summary, these three examples show three distinct approaches for modeling statistics: </p>
<ol>
<li>Simple, point-in-time statistics. Predicates that fully describe each statistic relate a (geographic, in this case) entity to the statistic's value. There's no way to represent time in this (or other dimensions) into this model other than to create a new predicate for every combination of dimensions (e.g. <tt>country:bolivia stat:1990population18-30male 123456</tt>). Queries are flat and rely on knowledge of or metadata (e.g. <tt>rdfs:label</tt>) about the predicates. No way to generate tables of related values easily. Observation: this approach effectively builds <i>a model of the real-world</i>, ignoring statistical artifacts such as time, tables, and subtables. 
<li>Complex, point-in-time statistics. An initial predicate relates a (geographic, in this case) entity to both an aggregate value for the statistic, as well as to (via blank nodes) other predicates that represent dimensions. Aggregate values are available off of any point in the predicate chain. Applications need to be aware of the hierarchical predicate structure of the statistics for queries, but can reuse (and therefore link) some predicates amongst different statistcs. Nested tables can easily be constructed from this model. Observation: this approach effectively builds <i>a model of the statistical domain in question (demographics, geography, economics, etc. as broken into statistical tables).</i> 
<li>Complex statistics over time. Each statistic (each number) is represented as an item with a value. Dimensions (including time) are also described as resources with values, titles, etc. In this approach, the entire model is described by a small number of predicates. Applications can flexibly query for different combinations of time and other dimensions, though they still must know the identifying information for the dimensions in which they are interested. Applications can fairily easily construct nested tables from this model. Observation: this approach effectively uses <i>a model of statistics (in general)</i> which in turn is used to express statistics about the domains in question. </li></ol><a name="Statistical_Abstract_data"></a>
<h2><span class="mw-headline">Statistical Abstract data </span></h2><a name="Simple_with_time"></a>
<h3><span class="mw-headline">Simple with time </span></h3>
<p>One of the simplest data tables in the Statistical Abstract gives <a class="external text" href="http://www.census.gov/compendia/statab/tables/08s1047.xls" rel="nofollow">statistics for airline on-time arrivals and departures</a>. A sample of how this table is laid out is: </p>
<table style="padding-right: 6px">
<tbody>
<tr>
<th>Airport </th>
<td colspan="2"><b>On-time Arrivals</b> </td>
<td colspan="2"><b>On-time Departures</b> </td></tr>
<tr>
<th></th>
<th>2006 Q1 </th>
<th>2006 Q2 </th>
<th>2006 Q1 </th>
<th>2006 Q2 </th></tr>
<tr>
<td><b>Total major airports</b> </td>
<td>77.0 </td>
<td>76.7 </td>
<td>79.0 </td>
<td>78.5 </td></tr>
<tr>
<td>Atlanta, Hartsfield </td>
<td>73.9 </td>
<td>75.5 </td>
<td>76.0 </td>
<td>74.3 </td></tr>
<tr>
<td>Boston, Logan International </td>
<td>75.6 </td>
<td>66.8 </td>
<td>80.5 </td>
<td>74.8 </td></tr></tbody></table>
<p>Overall, this is fairly simple. Every airport, for each time period has an on-time arrival percentage and an on-time departure percentage. If we simplified it even further by removing the use of multiple times, then it's just a simple grid spreadsheet (relating airports to arrival % and departure %). This does have the interesting (?) twist that the aggregate data (total major airports) is not simply a sum of the constituent data items (since we're dealing in percentages). </p><a name="Simple_point-in-time_approach"></a>
<h4><span class="mw-headline">Simple point-in-time approach </span></h4>
<p>If we ignore time (and choose 2006 Q1 as our point in time), then this data models as: </p><pre> ex:ATL ex:ontime-arrivals 73.9 ; ex:ontime-departures 76.0 .
 ex:BOS ex:ontime-arrivals 75.6 ; ex:ontime-departures 80.5
 ex:us-major-airports ex:ontime-arrivals 77.0 ; ex:ontime-departures 79.0
</pre>
<p>This is simple, but ignores time. It also doesn't give any hint that <tt>ex:us-major-airports</tt> is a total/aggregate of the other data. We could encode time in the predicates themselvs (<tt>ex:ontime-arrivals-2006-q1</tt>), but I think everyone would agree that that's a bad idea. We could also let each time range be a blank node off the subjects, but that assumes all subjects have data conforming to the same time increments. Any such approach starts to get close to the complex point-in-time approach, so let's look at that. </p><a name="Complex_point-in-time_approach"></a>
<h4><span class="mw-headline">Complex point-in-time approach </span></h4>
<p>If we ignore time and view the "total major airports" as unrelated to the individual airports, then we have no "nested tables" and this approach degenerates to the simple point-in-time approach, effectively: </p><pre> ex:ATL a ex:Airport ;
   dcterms:isPartOf ex:us-major-airports ;
   stat:details [
     ex:on-time-arrivals 73.9 ;
     ex:on-time-departures 76.0
   ] .
 ex:BOS a ex:Airport ;
   dcterms:isPartOf ex:us-major-airports ;
   stat:details [
     ex:on-time-arrivals 75.6 ;
     ex:on-time-departures 80.5
   ] .
 ex:us-major-airports
   dcterms:hasPart ex:ATL, ex:BOS ;
   stat:details [
     ex:on-time-arrivals 77.0 ;
     ex:on-time-departures 79.0 ;
   ] .    
</pre>
<p>We could treat time as a special-case that conditionalizes the statistics (<tt>stat:details</tt>) for any particular subject, such as: </p><pre> ex:ATL a ex:Airport ;
   dcterms:isPartOf ex:us-major-airports ;
   stat:details [
     stat:start "2006-01-01"^^xsd:date ;
     stat:end   "2006-02-28"^^xsd:date ;
     stat:details [
       ex:on-time-arrivals 73.9 ;
       ex:on-time-departures 76.0
     ] .
   ] .
</pre>
<p>If we ignore time but view the "total major airports" statistics as an aggregate of the individual airports (which are subtables, then), we get this RDF structure: </p><pre> ex:us-major-airports
   ex:on-time-arrivals 77.0 ;
   ex:on-time-departures 79.0 ;
   ex:ATL [
     ex:on-time-arrivals 73.9 ;
     ex:on-time-departures 76.0
   ] ;
   ex:BOS [
     ex:on-time-arrivals 75.6 ;
     ex:on-time-departures 80.5
   ];
</pre>
<p>This is interesting because it treats the individual airports as subtables of the dataset. I don't think it's really a great way to model the data, however. </p><a name="Complex_Statistics_Over_Time"></a>
<h4><span class="mw-headline">Complex Statistics Over Time </span></h4><pre> ex:ontime-flights a stat:Dataset ;
   dc:title "On-time Flight Arrivals and Departures at Major U.S. Airports: 2006" ;
   stat:date_start "2006-01-01"^^xsd:date ;
   stat:date_end "2006-12-31"^^xsd:date ;
   stat:structure "... something that explains how to display the stats ? ..." ;
   stat:datasetOf ex:atl-arr-2006q1, ex:atl-dep-2006q1, ... ;
 
 ex:atl-arr-2006q1 a stat:Item ;
   rdf:value 73.9 ;
   stat:dataset ex:ontime-flights ;
   stat:dimension ex:Q12006 ;
   stat:dimension ex:arrivals ;
   stat:dimension ex:ATL .
 
 ex:atl-dep-2006q1 a stat:Item ;
   rdf:value 76.0 ;
   stat:dataset ex:ontime-flights ;
   stat:dimension ex:Q12006 ;
   stat:dimension ex:departures ;
   stat:dimension ex:ATL .
 
 ... more data items ...
 
 ex:Q12006 a stat:TimePeriod ;
   dc:title "2006 Q1" ;
   stat:date_start "2006-01-01"^^xsd:date ;
   stat:date_end "2006-03-31"^^xsd:date .
 
 ex:arrivals a stat:ScheduledFlightTime ;
   dc:title "Arrival time" .
 
 ex:departures a stat:ScheduledFlightTime ;
   dc:title "Departure time" .
 
 ex:ATL a stat:Airport ;
   dc:title "Atlanta, Hartsfield" .
 
 ... more dimension values ...
 
 stat:TimePeriod rdfs:subClassOf stat:Dimension ; dc:title "time period" .
 stat:ScheduledFlightTime rdfs:subClassOf stat:Dimension ; dc:title "arrival or departure" .
 stat:Airport rdfs:subClassOf stat:Dimension ; dc:title "airport" .
</pre>
<p>First, this seems to be the most verbose. It also seems to give the greatest flexibility in terms of modeling time and querying the resulting data. One related alternative to this approach would replace dimension objects with dimension predicates, as in: </p><pre> ex:atl-arr-2006q1 a stat:Item ;
   rdf:value 73.9 ;
   stat:dataset ex:ontime-flights ;
   stat:date_start "2006-01-01"^^xsd:date ;
   stat:date_end "2006-03-31"^^xsd:date .
   stat:airport ex:ATL ;
   stat:scheduled-flight-time ex:arrivals .
 
 stat:airport rdfs:subPropertyOf stat:dimension ; dc:title "airport " .
</pre>
<p>This may be a bit less verbose, but loses the ability to have multivalued dimensions such as <tt>stat:TimePeriod</tt> in the first example. </p><a name="Conclusion"></a>
<h2><span class="mw-headline">Conclusion </span></h2>
<p>The riese approach seems the best combination of flexibility and usability. It should allow us to recreate the data-table structures with a reasonable degree of fidelity in another environment (e.g. on the Web), as well as to construct a basic semantic repository by attaching definitions to the various statistical entities, facets, and properties. All that said, the proofs in the pudding, and until I'm quite open to other suggestions. </p>]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2008/03/modeling_statistics_in_rdf_a_s.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2008/03/modeling_statistics_in_rdf_a_s.html</guid>
         <category>semantic web</category>
         <pubDate>Sat, 08 Mar 2008 03:38:25 -0500</pubDate>
      </item>
            <item>
         <title>Anzo.*: Building Semantic Applications in Heterogeneous Environments</title>
         <description><![CDATA[<p>At <a href="http://www.cambridgesemantics.com">Cambridge Semantics</a> we're busy working on what will become version 3 of <a href="http://openanzo.org">Open Anzo</a>. As I've written about <a href="http://thefigtrees.net/lee/blog/2006/11/semantic_web_technologies_in_t.html">before</a>, our interest in Semantic Web technologies lies in the powerful applications that can be built by taking advantage of RDF's data model. To this end, we've continually sought RDF programming models that contain features necessary to building these applications:</p> <ul> <li>Named graphs (quads) support, for modularizing applications' data  <li>Replication, for offline applications and snappy user experience  <li>Notification, for real-time collaborative updates  <li>Role-based access control, to facilitate a multi-user environment  <li>Versioning, to maintain an auditable history of data changes</li></ul> <p>To promote a consistent development experience between the various environments that we support--Java development, Web development, Windows development--we've worked to define a core set of <a href="http://openanzo.org/projects/openanzo/wiki/AnzoClientDesign">abstract, client-side APIs</a> (documentation is currently sound but not complete) for building semantic applications that can take advantage of these enterprise features. Currently, we have three concrete instantiations of this API: Anzo.java, Anzo.js, and Anzo.NET. Version 3 of Anzo includes <a href="http://www.openanzo.org/projects/openanzo/wiki/V3DesignDocuments">many other architectural improvements</a> intended to help us realize Anzo's status as an open-source semantic middleware platform, and we're not done yet. We do our best to keep the latest version of the code in subversion stable, however, so feel free to check it out. The <a href="http://groups.google.com/group/openanzo">mailing list</a> is a great place to ask questions. As we get closer to a formal release of Anzo 3, we'll have more code samples, tutorials, and demos to share, so stay tuned...</p>]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2008/02/anzo_building_semantic_applica.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2008/02/anzo_building_semantic_applica.html</guid>
         <category>semantic web</category>
         <pubDate>Wed, 27 Feb 2008 13:03:26 -0500</pubDate>
      </item>
            <item>
         <title>Why SPARQL?</title>
         <description><![CDATA[<p>I'm quite pleased to have played a part in helping <a href="http://www.w3.org/TR/rdf-sparql-query/">SPARQL</a> become a W3C Recommendation. As we were putting together the <a href="http://www.w3.org/2007/12/sparql-pressrelease">press release</a> that accompanied the publication of the SPARQL recommendations, <a href="http://www.w3.org/People/Jacobs/">Ian Jacobs</a>, <a href="http://www.w3.org/People/Ivan/">Ivan Herman</a>, <a href="http://www.w3.org/People/Berners-Lee/">Tim Berners-Lee</a>, and myself put together some comments (in bullet point form) explaining some of the benefits of SPARQL. They do a good job of capturing a lot of what I find appealing about SPARQL, and I wanted to share them with other people. I don't think these are the best examples of SPARQL's value or the most eloquently expressed, but I do think it captures a lot of the essence of SPARQL. (While some of the text is attributable to me, parts are attributable to Ian, Ivan, and Tim.)</p> <hr>  <ul> <li>SPARQL is to the Semantic Web (and, really, the Web in general) what SQL is to relational databases. (This is effectively Tim's quotation from the press release.)  <li>If we view the Semantic Web as a global collection of databases, SPARQL can make the collection look like one big database. SPARQL enables us to reap the benefits of federation. Examples:  <ul> <li>Federating information from multiple Web sites (mashups)  <li>Federating information from multiple enterprise databases (e.g. manufacturing and customer orders and shipping systems)  <li>Federating information between internal and external systems (e.g. for outsourcing, public Web databases (e.g. <a href="http://www.ncbi.nlm.nih.gov/">NCBI</a>), supply-chain partners)</li></ul> <li>There are many distinct database technologies in use, and it's of course impossible to dictate a single database technology at the scale of the Web. RDF (the Semantic Web data model), though, serves as a standard <em>lingua franca</em> (least common denominator) in which data from disparate database systems can be represented. SPARQL, then, is the query language for that data. As such, SPARQL hides the details of a sever's particular data management and structure details. This reduces costs and increases robustness of software that issues queries.  <li>SPARQL saves development time and cost by allowing client applications to work with only the data they're interested in. (This is as opposed to bringing it all down and spending time and money writing software to extract the relevant bits of information.)  <ul> <li>Example: Find US cities' population, area, and mass transit (bus) fare, in order to determine if there is a relationship between population density and public transportation costs.  <li>Without SPARQL, you might tackle this by writing a first query to pull information from cities' pages on Wikipedia, a second query to retrieve mass transit data from another source, and then code to extract the population and area and bus fare data for each city.  <li>With SPARQL, this application can be accomplished by writing a single SPARQL query that federates the appropriate data source. The application developer need only write a single query and no additional code.</li></ul> <li>SPARQL builds on other standards including RDF, XML, HTTP, and WSDL. This allows reuse of existing software tooling and promotes good interoperability with other software systems. Examples:  <ul> <li>SPARQL results are expressed in XML: XSLT can be used to generate friendly query result displays for the Web  <li>It's easy to issue SPARQL queries, given the abundance of HTTP library support in Perl, Python, php, Ruby, etc.</li></ul></li></ul> <p>Finally, I scribbled down some of my own thoughts on how SPARQL takes the appealing principles of a Service Oriented Architecture (SOA) one step further:</p> <ul> <li>With SOA, the idea is to move away from tightly-coupled client-server applications in which all of the client code needs to be written specifically for the server code and vice versa. SOA says that if instead we just agree on service interfaces (contracts) then we can develop and maintain services and clients that adhere to these interfaces separately (and therefore more cheaply, scalably, and robustly).  <li>SPARQL takes some of this one step further. For SOA to work, services (people publishing data) still have to define a service, a set of operations that they'll use to let others get at their information. And someone writing a client application against such a service needs to adhere to the operations in the service. If a service has 5 operations that return various bits of related data and a client application wants some data from a few services but doesn't want most of it, the developer still must invoke all 5 services and then write the logic to extract and join the data relevant for her application. This makes for marginally complex software development (and complex == costly, of course).  <li>With SPARQL, a service-provider/data-publisher simply provides one service: SPARQL. Since it's a query language accessible over a standard protocol (HTTP), SPARQL can be considered a 'universal service'. Instead of the data publisher choosing a limited number of operations to support a priori and client applications being forced to conform to these operations, the client application can ask precisely the questions it wants to retrieve precisely the information it needs. Instead of 5 service invocations + extra logic to extract and join data, the client developer need only author a single SPARQL query. This makes for a simpler application (and, of course, less costly).</li></ul> <p>As an example, consider an online book merchant. Suppose I want to create a Web site that finds books by my favorite author that are selling for less than $15, including shipping. The merchant supplies three relevant services:</p> <ol> <li><em>Search. </em>Includes search by author. Returns book identifiers.  <li><em>Book lookup. </em>Takes a book identifier and returns the title, price, abstract, shipping weight, etc.  <li><em>Shipping lookup. </em>Takes total order weight, shipping method, and zip code, and returns a shipping cost.</li></ol> <p>To create my Web site without SPARQL, I'd need to:</p> <ol> <li>Invoke the search service. (Query 1)  <li>Write code to extract the result identifiers and, for each one, invoke the book lookup service. (Code 1, Query 2 (issued multiple times))  <li>Write code to extract the price and, for each book, invokes the shipping lookup service with that book's weight (Code 2, Query 3 (issued multiple times))  <li>Write code to add each book's price and shipping cost and check if it's less than $15. (Code 3)</li></ol> <p>Now, suppose the book merchant exposed this same data via a SPARQL endpoint. The new approach is:</p> <ol> <li>Use the SPARQL protocol to ask a SPARQL query with all the relevant parameters (Query 1 (issued once))</li></ol> <p>For the record, the query might look something like:</p><pre>PREFIX : &lt;http://example.com/service/sparql/&gt;
SELECT ?book ?title
  FROM :inventory
 WHERE {
  ?book 
    a :book ; :author ?author ; 
    :title ?title ; :price ?price ;
    :weight ?weight .
  ?author :name "My favorite Author" .
  FILTER(?price + :shipping(?weight) &lt; 15) .
}
</pre>
<p>(This example also illustrates another feature of SPARQL: SPARQL is extensible via the use of new FILTER functions that can allow a query to invoke operations (in this case, a function (<tt>:shipping</tt>) that gives shipping cost for a particular order weight) defined by the SPARQL endpoint.)</p>]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2008/01/why_sparql.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2008/01/why_sparql.html</guid>
         <category>semantic web</category>
         <pubDate>Fri, 25 Jan 2008 01:00:28 -0500</pubDate>
      </item>
            <item>
         <title><![CDATA[Scientific American: &quot;The Semantic Web in Action&quot;]]></title>
         <description><![CDATA[<p>I'm pleased to write that the December 2007 <a href="http://www.sciam.com/sciammag/?contents=2007-12">issue</a> of <a href="http://www.sciam.com/sciammag"><em>Scientific American</em></a> contains an article titled <a href="http://www.sciam.com/article.cfm?id=the-semantic-web-in-action">"The Semantic Web in Action"</a>, coauthored by <a href="http://www.w3.org/People/Ivan/">Ivan Herman</a>, <a href="http://www.partners.org/cird/AboutUs.asp?cBox=Staff&amp;stAb=toh">Tonya Hongsermeier</a>, <a href="http://eneumann.org/">Eric Neumann</a>, <a href="http://esw.w3.org/topic/SusieStephens">Susie Stephens</a>, and myself.</p> <p>We were invited to write the article as a follow-up to the original 2001 <em>Scientific American</em> Semantic Web <a href="http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21">article</a> by Tim Berners-Lee, Jim Hendler, and Ora Lassila. We wanted to share some practical examples of problems currently being solved with Semantic Web technologies, particularly in health care and life sciences. The article presents two detailed case studies. The first is the work of a team at Cincinnati Children's Hospital Medical Center who use RDF in conjunction with PageRank-esque algorithms to prioritize potential drug targets for cardiovascular diseases. The second case focuses on the University of Texas Health Science Center's SAPPHIRE system. SAPPHIRE integrates information from various health care providers to allow public health officials to better assess potential emerging public health risks and disease epidemics. The article also talks about the potential for Semantic Web technologies and the work of companies such as Agfa and Partners to help health care providers deal with the rate of knowledge acquisition and change in their clinical decision support (CDS) systems.</p> <p>Aside from these case studies, the article takes somewhat of a whirlwind tour across the current landscape of Semantic Web applications. Along the way, <a href="http://www.w3.org/RDF/">RDF</a>, <a href="http://www.w3.org/TR/owl-features/">OWL</a>, <a href="http://www.w3.org/TR/rdf-sparql-query/">SPARQL</a>, <a href="http://www.w3.org/TR/rdf-sparql-query/">GRDDL</a>, and <a href="http://www.foaf-project.org/">FOAF</a> all get mentions. <a href="http://sciencecommons.org/">Science Commons</a> and <a href="http://dbpedia.org/">DBpedia</a> are briefly touched on, and the article acknowledges a variety of companies that are engaged in Semantic Web application research, prototyping, or deployment: British Telecom, Boeing, Chevron, MITRE, Ordnance Survey, Vodafone, Harper's Magazine, Joost, IBM, Hewlett-Packard, Nokia, Oracle, Adobe, Aduna, Altova, @semantics, Talis, OpenLink, TopQuadrant, Software AG, Eli Lilly, Pfizer, Garlik. And there were loads that couldn't be included in the end due to space restrictions, all of which is a testament to the continued growth in adoption of these technologies. </p> <p>Unfortunately, the article is not currently available for free online. An electronic version is available (along with the rest of the December 2007 issue) from <em>Scientific American's</em> Web site for US$7.95, and the issue should also be available at newsstands in the US for a bit longer. I'm not sure when/if the article is available on newsstands across the rest of the world. I've been working with the copyright editors at <em>Scientific American</em> in an attempt to procure the rights to publish the article on my own Web site (and/or possibly on the W3C's site), but they haven't yet responded to my application.</p> <p>In any case, it was a fantastic experience working with my colleagues to bring some information on the progress of the Semantic Web to the readers of <em>Scientific American. </em>I've gotten some great feedback family, friends, and colleagues who have read the article. Several people in the Semantic Web community have let me know that they've found the article to be useful material for helping introduce people to the ideas and applications behind Semantic Web technologies. So please check out the article if you're so inclined, and I'd love to hear what you think. I'll also be sure to update this space if I'm able to secure the rights to publish the full text of the article here.</p> <p>26-Mar-2008 Update: I've since <a href="http://www.thefigtrees.net/lee/blog/2008/03/now_available_online_scientifi.html">received permission</a> to publish the article. Enjoy!</p>]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2007/12/scientific_american_the_semant.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2007/12/scientific_american_the_semant.html</guid>
         <category>semantic web</category>
         <pubDate>Thu, 20 Dec 2007 02:54:28 -0500</pubDate>
      </item>
            <item>
         <title>Announcing: Open Anzo 2.5 released</title>
         <description><![CDATA[<p>As <a href="http://www.thefigtrees.net/lee/blog/2007/10/introducing_cambridge_semantic.html">promised</a>, the <a href="http://openanzo.org">Open Anzo project</a>&nbsp;has released version 2.5 of the Anzo enterprise RDF store. Version 2.5 is a stable release with&nbsp;a collection of&nbsp;bug fixes and new features since the fork from Boca. The <a href="http://www.openanzo.org/projects/openanzo/browser/branches/openanzo-2.5/anzo-maven/VERSION.txt?rev=576">release notes</a>&nbsp;enumerate the additions, improvements, and changes, but here are some of the more significant ones:</p> <ul> <li>Add Oracle database support  <li>Add GROUP BY clause and COUNT(*) to Glitter SPARQL engine (more on this in a separate post, but along the lines of what exists in ARQ, Virtuoso, and RAP)  <li>Query performance improvements against both named graphs and metadata graphs  <li>Extensive Javadocs for all public classes, interfaces, methods, and member variables</li></ul> <p>Things you can do:</p> <ul> <li><a href="http://www.openanzo.org/projects/openanzo/downloads">Download and install</a> Open Anzo: release&nbsp;2.5, nightly snapshots, or the source from SVN  <li><a href="http://www.openanzo.org/projects/openanzo/wiki">Learn</a> from&nbsp;the Open Anzo wiki  <li><a href="http://www.openanzo.org/projects/openanzo/report/1">View</a> open tickets showing some of what's coming  <li><a href="http://groups.google.com/group/openanzo">Join</a> the Open Anzo development community  <li><a href="http://www.openanzo.org/javadoc/openanzo/2.5/">Peruse</a> the Anzo 2.5 Javadocs</li></ul>]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2007/10/announcing_open_anzo_25_releas.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2007/10/announcing_open_anzo_25_releas.html</guid>
         <category>semantic web</category>
         <pubDate>Wed, 24 Oct 2007 11:54:52 -0500</pubDate>
      </item>
            <item>
         <title>Introducing: Cambridge Semantics and the Open Anzo project</title>
         <description><![CDATA[<p>It's been a while since I last posted here to <em><a href="http://www.thefigtrees.net/lee/blog/2007/04/qotd_word_choice.html">muse about the differences between "the Semantic Web" and "Semantic Web technologies"</a>. </em>Since then, I've been quite pleased to see the <a href="http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData">Linking Open Data project</a>&nbsp;continue to soar, including an extremely successful BoF and panel at <a href="http://www2007.org/">WWW 2007</a> in Banff. New data sources continue to be linked in to the Semantic Web, including data from <a href="http://wikicompany.org/wiki/Main_Page">Wikicompany</a>, <a href="http://flickr.com">flickr</a>, and <a href="http://govtrack.us">GovTrack</a>. The project maintains a <a href="http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets">list</a>&nbsp;and a <a href="http://richard.cyganiak.de/2007/10/lod/lod-datasets_2007-10-08.png">picture</a>&nbsp;of the growing Web of linked open data.</p> <p>Meanwhile, I have not been idle in my work to advance Semantic Web technologies inside enterprises. In July, I left <a href="http://www.ibm.com">IBM</a>&nbsp;and co-founded <a href="http://www.cambridgesemantics.com/">Cambridge Semantics, Inc.</a>&nbsp;Building upon the work that began with the open-source <a href="http://ibm-slrp.sf.net">IBM Semantic Layered Research Platform</a>, Cambridge Semantics is dedicated to building&nbsp;feature-rich semantic middleware that can power a vast breadth of semantic applications that realize the potential of the full stack of Semantic Web technologies.</p> <p>One of the first things that we've done at Cambridge Semantics is setup <a href="http://www.openanzo.org/">the Open Anzo project</a>. Anzo is an open-source fork of Boca, an enterprise RDF store. Anzo&nbsp;starts with the same rich feature set of Boca, including named graphs, replication, notification, access controls, and full revision histories. To this, Anzo (so far) adds a number of bug fixes and support for running on top of an Oracle RDBMS. There's a new release of Anzo coming quite soon, and we're quite excited about some of the current and future development going on for Anzo. To learn more, feel free to join the <a href="http://groups.google.com/group/openanzo">Open Anzo discussion group</a>, check out the <a href="http://www.openanzo.org/projects/openanzo/wiki">wiki</a>, or <a href="http://www.openanzo.org/projects/openanzo/downloads">download the source or a nightly build</a>. We're also actively looking for like-minded folk to work with us to enhance and improve Anzo and to expand the scope of the project. <a href="mailto:lee@cambridgesemantics.com">Let me know</a>&nbsp;if you might be interested in sponsoring, using, or contributing to Anzo.</p> <p>I'll have a lot more to share about our team, our vision, and our software in the coming weeks and months. It's an exciting time, both for me personally, but more so for the promise of the Semantic Web and Semantic Web technologies. I'm glad to be blogging once more, and look forward to having more to say.</p>]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2007/10/introducing_cambridge_semantic.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2007/10/introducing_cambridge_semantic.html</guid>
         <category>semantic web</category>
         <pubDate>Sun, 14 Oct 2007 20:42:00 -0500</pubDate>
      </item>
            <item>
         <title>QotD: Word Choice</title>
         <description><![CDATA[<p>Danny <a href="http://dannyayers.com/2007/04/17/qotd--slamming">picked up</a> an interesting take on the foes of the Semantic Web from <a href="http://www.wasab.dk/morten/blog/archives/2007/04/14/arguments-against-the-semantic-web-getsemantic">Morten Frederiksen</a>. I was surfing that way today and noticed this gem in the latest comment from <a href="http://semwebdev.keithalexander.co.uk/blog/">Keith Alexander</a>:</p>

<blockquote>Perhaps the word that causes the trouble isn&#8217;t <i>Semantic</i>, but <i>The</i>?</blockquote>

<p>I believes in an ultimate goal similar to that of Tim Berners-Lee and also that of the <a href="http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData">Linking Open Data</a> SWEO community project. But I also see tremendous value in the adoption of Semantic Web technologies within enterprise applications and in limited, narrowly-scoped corners of the Internet and intranets. To me, it's clear that these goals are not incompatible with each other. But I <i>do</i> find myself constantly juggling the appropriate use of the phrases <i>the Semantic Web</i> and <i>Semantic Web technologies</i> depending on my audience. There's a lot of signifiance and (dare I say?) semantics in that innocent-looking three-letter word...</p>
]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2007/04/qotd_word_choice.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2007/04/qotd_word_choice.html</guid>
         <category>semantic web</category>
         <pubDate>Sun, 22 Apr 2007 17:16:07 -0500</pubDate>
      </item>
            <item>
         <title>Updates to sparql.js</title>
         <description><![CDATA[<p>I'm not sure if anyone is using Elias and my <a href="http://thefigtrees.net/lee/sw/sparql.js">sparql.js</a> JavaScript library for issuing SPARQL queries. (Probably not, given its Firefox-and-friends-only orientation and the standard cross-site XMLHttpRequest security restrictions.) Since I <a href="http://thefigtrees.net/lee/blog/2006/04/sparql_calendar_demo_a_sparql.html">first blogged about the library</a> last year, we've made a few changes to the library, Most notably, we've removed the dependency on the Yahoo! connection manager (or on any other third-party libraries, for that matter). Additionally, we've added a <tt>setRequestHeader</tt> method which passes the given headers and values along to the underlying HTTP request object. We use this functionality, for example, to provide user credentials (via HTTP Basic Auth) when SPARQLing against a <a href="http://ibm-slrp.sourceforge.net/2006/11/20/boca-the-rdf-repository-component-of-the-ibm-semantic-layered-research-platform/">Boca</a> server.</p>

<p>The update should be transparent to any current uses of the library. Please let me know if you try it out and experience any problems.</p>

]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2007/02/updates_to_sparqljs.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2007/02/updates_to_sparqljs.html</guid>
         <category>semantic web</category>
         <pubDate>Wed, 07 Feb 2007 01:04:22 -0500</pubDate>
      </item>
            <item>
         <title>Announcing: Boca 1.8 - new database support</title>
         <description><![CDATA[<p>While I've been writing <a href="http://www.thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_survey.html">dense</a> <a href="http://www.thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_vision.html">treatises</a> on Semantic Web development, Matt's been hard at work on the latest release of Boca. Matt's <a href="http://ibm-slrp.sourceforge.net/2007/01/19/boca-version-18/">announcement of Boca 1.8</a> carries all the details as well as a look at what Boca 2.0 will bring. Amidst the usual slew of bug fixes, usability improvements, and performance fixes, the major addition to Boca is support for three new databases beyond <a href="http://www.ibm.com/db2">DB2</a>. Boca now also runs on <a href="http://www.ibm.com/db2">MySQL</a>, <a href="http://www.postgresql.org/">PostgreSQL</a>, and <a href="http://hsqldb.org/">HSQLDB</a>. Cool stuff.</p>

<p>In other <a href="http://ibm-slrp.sourceforge.net/">Semantic Layered Research Platform</a> news, we're working towards pushing out stable releases(with documentation and installation packaging) of two more of our components: Queso (Atom-driven Web interface to Boca) and DDR (binary data repository with metadata-extractor infrastructure to store metadata within Boca). We're hoping to get these out by the middle of February, so stay tuned.</p>
]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2007/01/announcing_boca_18_new_databas.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2007/01/announcing_boca_18_new_databas.html</guid>
         <category>semantic web</category>
         <pubDate>Fri, 19 Jan 2007 17:57:04 -0500</pubDate>
      </item>
            <item>
         <title>Using RDF on the Web: A Vision</title>
         <description><![CDATA[<p>(This is the second part of two posts about using RDF on the Web. The first post was <a href="http://www.thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_survey.html">a survey of approaches for creating RDF-data-driven Web applications</a>.) All existing implementations referred to in this post are discussed in more detail and linked to in part one.</p>

<p>Here's what I would like to see, along with some thoughts on what is or is not implemented. It's by no means a complete solution and there are plenty of unanswered questions. I'd also never claim that it's the right solution for all or most applications. But I think it has a certain elegance and power that would make developing certain types of Web applications straightforward, quick, and enjoyable. Whenever I refer to "the application" or "the app", I'm talking about browser-based Web application implemented in JavaScript.</p>

<ul>
<li><p>To begin with, I imagine servers around the Web storing domain-specific RDF data. This could be actual, materialized RDF data or virtual RDF views of underlying data in other formats. This first piece of the vision is, of course, widely implemented (e.g. Jena, Sesame, Boca, Oracle, Virtuoso, etc.)</p></li>


<li><p>The application fetches RDF from such a server. This may be done in a variety of ways:</p>
  <ul>
    <li>An HTTP <tt>GET</tt> request for a particular RDF/XML or Turtle document</li>
    <li>An HTTP <tt>GET</tt> request for a particular named graph within a quad store (a la Boca or Sesame)</li>
    <li>A SPARQL <tt>CONSTRUCT</tt> query extracting and transforming the pieces of the domain-specific data that are most relevant to the application</li>
    <li>A SPARQL <tt>DESCRIBE</tt> query requesting RDF about a particular resource (URI)</li>
  </ul>
  <p>In my mind, the <tt>CONSTRUCT</tt> approach is the most appealing method here: it allows the application to massage data which it may be receiving from multiple data sources into a single domain-specific RDF model that can be as close as possible to the application's own view of the world. In other words, reading the RDF via a query effectively allows the application to <i>define its own API</i>.</p>
  <p>Once again, the software for this step already exists via traditional Web servers and SPARQL protocol endpoints.</p>
</li>

<li><p>Second, the application must parse the RDF into a client-side model. Precisely how this is done depends on the form taken by the RDF received from the server:</p>
  <ul>
    <li><i>The server returns RDF/XML.</i> In this case, the client can use Jim Ley's parser to end up with a list of triples representing the RDF graph. The software to do this is already implemented.</li>
    <li><i>The server returns Turtle.</i> In this case, the client can use Masahide Kanzaki's parser to end up with a list of triples representing the RDF graph. The software to do this is already implemented.</li>
    <li><i>The server returns RDF/JSON.</i> In this case, the client can use Douglas Crockford's <a href="http://www.json.org/json.js">JSON parsing library</a> (effectively a regular expression security check followed by a call to <tt>eval(...)</tt> While the software is implemented here, the RDF/JSON standard which I've cavalierly tossed about so far does not yet exist. Here, I'm imagining a specification which defines RDF/JSON based on the common JavaScript data structure used by the above two parsers. ( A bit of work probably still needs to be done if this were to become a full RDF/JSON specification, as I do not believe the current format used by the two parsers can distinguish blank node subjects from subjects with URIs.)</li>
  </ul>
<p>In any case, we now have on the client a simple RDF graph of data specific to the domain of our application. Yet as I've said before, we'd like to make application development easier by moving away from triples at this point into data structures which more closely represent the concepts being manipulated by the application.</p>
</li>

<li><p>The next step, then, is to map the RDF model into a application-friendly JavaScript object model.  If I understand ActiveRDF correctly (and in all fairness I've only had the chance to play with it a very limited amount), it will examine either the ontological statements or instance data within an RDF model and will generate a Ruby class hierarchy accordingly. The <a href="http://wiki.activerdf.org/IntroRDF">introduction to ActiveRDF</a> explains the dirty-but-well-appreciated trick that is used: "<i>Just use the part of the URI behind the last &#8221;/&#8221; or &#8221;#&#8221; and Active RDF will figure out what property you mean on its own.</i>" Of course, sometimes there will be ambiguities, clashes, or properties written to which did not already exist (with full URIs) in the instance data received; in these cases, manual intervention will be necessary. But I'd suggest that in many, many cases, applying this sort of best-effort heuristics to a domain-specific RDF model (especially one which the application has selected especially via a <tt>CONSTRUCT</tt> query) will result in extremely natural object hierarchies.</p>
<p>None of this piece is implemented at all. I'd imagine that it would not be too difficult, following the model set forth by the ActiveRDF folks.</p>
<p><i>Late-breaking news:</i> <a href="http://dustfeed.blogspot.com/">Niklas Lindstr&#246;m</a>, developer of the Python RDF ORM system <a href="http://oort.to/">Oort</a> followed up on my last post and said (among other interesting things):
<blockquote>
<q>I use an approach of "removing dimensions": namespaces, I18N (optionally), RDF-specific distinctions (collections vs. multiple properties) and other forms of graph traversing.</q>
</blockquote>
<p>Sounds like there would be some more simplification processes that could be adapted from Oort in addition to those adapted from ActiveRDF.</p>
</li>

<li><p>The main logic of the Web application (and the work of the application developer) goes here. The developer receives a domain model and can render it and attach logic to it in any way he or she sees fit. Often this will be via a traditional model-view-controller approach: this approach is facilitated by toolkits such as <a href="http://dojotoolkit.org/">dojo</a> or even via a system such as <a href="http://nike-templates.org/">nike templates</a> (nee microtemplates). Thus, the software to enable this meat-and-potatoes part of application development already exists.</p>

<p>In the course of the user interacting with the application, certain data values change, new data values are added, and/or some data items are deleted. The application controller handles these mutations via the domain-specific object structures, without regards to any RDF model.</p></li>


<li>
<p>When it comes time to commit the changes (this could happen as changes occur or once the user saves/commits his or her work), standard JavaScript (i.e. a reusable library, rather than application-specific code) recognizes what has changed and maps (inverts) the objects back to the RDF model (as before, represented as arrays of triples). This inversion is probably performed by the same library that automatically generated the object structure from the RDF model in the first place. As with that piece of this puzzle, this library does not yet exist.</p>

<p>Reversing the RDF ORM mapping is clearly challenging, especially when new data is added which has not been previously seen by the library. In some cases--perhaps even in most?--the application will need to provide hints to the library to help the inversion. I imagine that the system probably needs to keep an untouched deep copy of the original domain objects to allow it to find new, removed, and dirty data at this point. (An alternative would be requiring adds, deletes, and mutations to be performed via methods, but this constrains the natural use of the domain objects.)
</p>
</li>

<li>
<p>Next, we determine the RDF difference between our original model and our updated model. The canonical work on RDF deltas is <a href="http://www.w3.org/DesignIssues/Diff">a design note</a> by <a href="http://www.w3.org/People/Berners-Lee/">Tim Berners-Lee</a> and <a href="http://www.w3.org/People/Connolly/">Dan Connolly</a>. Basically, though, an RDF diff amounts simply to a collection of triples to remove and a collection of triples to add to a graph. No (JavaScript) code yet exists to calculate RDF graph diffs, though the algorithms are widely implemented in other environments including <a href="http://www.w3.org/2000/10/swap/doc/cwm.html">cwm</a>, <a href="http://wymiwyg.org/2005/12/22/announicing-rdf-utils?appendLang=en">rdf-utils</a>, and <a href="http://ontoware.org/projects/semversion/">SemVersion</a>. We also work often with RDF diffs in Boca (when the Boca client replicates changes to a Boca server). I'd hope that this implementation experience would translate easily to a JavaScript implementation.
</p>
</li>

<li><p>
Finally, we serialize the RDF diffs and send them back to the data source. This requires two components that are not yet well-defined:
<ul>
  <li><i>A serialization format for the RDF diffs.</i> Tim and Dan's note uses the ability to quote graphs within <a href="http://www.w3.org/DesignIssues/Notation3">N3</a> combined with a handful of predicates (<tt>diff:replacement</tt>, <tt>diff:deletion</tt>, and <tt>diff:insertion</tt>). I can also imagine a simple extension of (whatever ends up being) the RDF/JSON format to specify the triples to remove and add:
<pre>
  {
    'add' : [ <i>RDF/JSON triple structures go here</i> ],
    'remove' : [ <i>RDF/JSON triple structures go here</i> ]
  }
</pre>
 </li>
  <li><i>An endpoint or protocol which accepts this RDF diff serialization.</i> Once we've expressed the changes to our source data, of course, we need somewhere to send them. Preferably, there would be a standard protocol (&#224; la the <a href="http://www.w3.org/TR/rdf-sparql-protocol/">SPARQL Protocol</a>) for sending these changes to a server. To my knowledge, endpoints that accept RDF diffs to update RDF data are not currently implemented. (Late-breaking addition: on my first post, Chris and Richard both pointed me to Mark Baker's work on <a href="http://www.markbaker.ca/2003/05/RDF-Forms/">RDF forms</a>. While I'm not very familiar with any existing uses of this work, it looks like it might be an interesting way to describe the capabilities of an RDF update endpoint.)
  </li>
</ul>
</p>

<p>As an alternative for this step, the entire client-side RDF model could be serialized (to RDF/XML or to N-Triples or to RDF/JSON) and HTTP <tt>PUT</tt> back to an origin server. This strategy seems to make the most sense in a document-oriented system; to my knowledge this is also not currently implemented.</p>
</li>

</ul>

<p>That's my vision, as raw and underdeveloped as it may be. There are a large number of extensions, challenges and related work that I have not yet mentioned, but which will need to be addressed when creating or working with this type of Web application. Some discussion of these is also in order.</p>

<h4>Handling Multiple Sources of Data</h4>

<p>To use the above Web-application-development environment to create Web 2.0-style mash-ups, most of the steps would need to be performed once per data source being integrated. This adds to the system a provenance requirement, whereby the libraries could offer the application a unified view of the domain-specific data while still maintaining links between individual data elements and their source graphs/servers/endpoints to facilitate update. When the RDF diffs are computed, they would need to be sent back to the proper origins. Also, the sample JavaScript structures that I've mentioned as a base for RDF/JSON and the RDF/JSON diff serialization would likely need to be augmented with a URI identifying the source graph of each triple. (That is, we'd end up working with a quad system, though we'd probably be able to ignore that in the object hierarchy that the application deals with.)  In many cases, though, an application that reads from many data sources will write only to a single source; it does not seem particularly onerous for the application to specify a default "write-back" endpoint.</p>

<h4>Inverting SPARQL <tt>CONSTRUCT</tt> Queries</h4>

<p>An appealing part of the above system (to me, at least) is the use of <tt>CONSTRUCT</tt> queries to map origin data to a common RDF model before merging it on the client and then mapping it into a domain-specific JavaScript object structure. Such transformations, however, would make it quite difficult--if not impossible--to automatically send the proper updates back to the origin servers. We'd need a way of inverting the <tt>CONSTRUCT</tt> query which generated the triples the application has (indirectly) worked with, and while I have not given it much thought, I imagine that that is quite difficult, if not impossible. </p>

<h4>SPARQL <tt>UPDATE</tt>.</h4>

<p>The DAWG has <a href="http://www.w3.org/2001/sw/DataAccess/issues#update">postponed</a> any work on updating graphs for the initial version of SPARQL, but <a href="http://wiki.ontoworld.org/wiki/Max_V%C3%B6lkel">Max V&#246;lkel</a>  and <a href="http://dowhatimean.net/">Richard Cyganiak</a> have started a bit of <a href="http://esw.w3.org/topic/SparqlUpdateLanguage">discussion</a> on what update in SPARQL might look like (though Richard has apparently <a href="http://www.thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_survey.html#comment-17617">soured on the idea</a> a bit since then). At first blush, using SPARQL to update data seems like a natural counterpart to using SPARQL to retrieve the data. However, in the vision I describe above, the application would likely need to craft a corresponding SPARQL <tt>UPDATE</tt> query for each SPARQL <tt>CONSTRUCT</tt> query that is used to retrieve the data in the first place. This would be a larger burden on the application developer, so should probably be avoided.</p>

<h4>Related Work</h4>

<p>I wanted to acknowledge that in several ways this whole pattern is closely related to but (in some mindset, at least) the inverse of a paradigm that Danny Ayers has floated in the past. Danny has <a href="http://dannyayers.com/2006/12/24/merging-results-from">suggested</a> using SPARQL <tt>CONSTRUCT</tt> queries to transition from domain-specific models to domain-independent models (for example, a reporting model). Data from various sources (and disparate domains) can be merged at the domain-independent level and then (perhaps via XSLT) used to generate Web pages summarizing and analyzing the data in question. In my thoughts above, we're also using the <tt>CONSTRUCT</tt> queries to generate an agreed-upon model, but in this case we're seeking an extremely domain-specific model to make it easier for the Web-application developer to deal with RDF data (and related data from multiple sources).</p>

<p>Danny also wrote some <a href="http://lists.w3.org/Archives/Public/www-archive/2005Oct/0001.html">related material</a> to www-archive. It's not the same vision, but parts of it sound familiar.</p>

<h4>Other Caveats</h4>

<p>Updating data has security implications, of course. I haven't even begun to think about them.</p>

<p>Blank nodes complicate almost everything; this may be sacrilege in some circles, but in most cases I'm willing to pretend that blank nodes don't exist for my data-integration needs. Incorporating blank nodes makes the RDF/JSON structures (slightly) more complicated; it raises the question of smushing together nodes when joining various models; and it significantly complicates the process of specifying which triples to remove when serializing the RDF diffs. I'd guess that it's all doable using functional and inverse-functional properties and/or with told bnodes, but it probably requires more help from the application developer.</p>

<p>I have some worries about concurrency issues for update. Again, I haven't thought about that much and I know that the Queso guys have already tackled some of those problems (as have many, many other people I'm sure), so I'm willing to assert that these issues could be overcome.</p>

<p>In many rich-client applications, data is retrieved incrementally in response to user-initiated actions. I don't <i>think</i> that this presents a problem for the above scheme, but we'd need to ensure that newly arriving data could be seamlessly incorporated not only into the RDF models but also into the object hierarchies that the application works with.</p>

<p>Bill de h&#211;ra raised some questions about the feasibility of <a href="http://www.dehora.net/journal/2005/08/automated_mapping_between_rdf_and_forms_part_i.html">roundtripping RDF data with HTML forms</a> a while back. There's some interesting conversation in the comments there which ties into what I've written here. That said, I don't think the problems he illustrates apply here--there's power above and beyond HTML forms in putting an extra JavaScript-based layer of code between the data entry interface (whether it be an HTML form or a more specialized Web UI) and the data update endpoint(s).</p>

<hr/>

<p>OK, that's more than enough for now. These are still ideas clearly in progress, and none of the ideas are particularly new. That said, the environment as I envision doesn't exist, and I suppose I'm claiming that if it did exist it would demonstrate some utility of Semantic Web technologies via ease of development of data- and integration-driven Web applications. As always, I'd enjoy feedback on these thoughts and also any pointers to work I might not know about.</p>



]]></description>
         <link>http://www.thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_vision.html</link>
         <guid>http://www.thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_vision.html</guid>
         <category>semantic web</category>
         <pubDate>Thu, 18 Jan 2007 19:32:06 -0500</pubDate>
      </item>
      
   </channel>
</rss>
