" /> TechnicaLee Speaking: July 2006 Archives

« June 2006 | Main | August 2006 »

July 7, 2006

I'm a SPARQL Junkie

My coworker Wing and I wanted to send out evites to all of our immediate coworkers and the various interns that are working with us this summer. I told Wing that if he would write up the text of the evite that I would gather the email addresses. At IBM, we have a corporate directory called BluePages and I was trying to avoid manually searching for each person and looking up and copying their (internet) email addresses.

Over the years, IBMers have developed a slew of APIs to access the information in BluePages programmatically, but as I'm unfamiliar with most of them, I turned to Elias for help. Elias said:

Why don't you use SPARQL?

In the ensuing conversation, I learned that Elias had spent some time last week setting up SquirrelRDF to map SPARQL queries to BluePages, as suggested on #swig. He whipped open a browser window with the corporate LDAP schema and a terminal window with the (RDF) configuration file mapping LDAP attributes to RDF predicates.

A few minutes later, we had achieved our goal:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ibm: <http://w3.ibm.com/bluepages#>
SELECT ?mbox
WHERE {
   {
           _:elias foaf:name "Elias Torres" ; ibm:department ?dept.
           _:person ibm:department ?dept  ; foaf:mbox ?mbox .
   } UNION {
           _:wing foaf:name "Wing C. Yung" ; ibm:department ?dept.
           _:person ibm:department ?dept  ; foaf:mbox ?mbox .
   } UNION {
           _:alex foaf:name "ALEX H. CHAO" ; ibm:department ?dept ; ibm:city _:location .
           _:person ibm:department ?dept  ; foaf:mbox ?mbox ; ibm:city _:location .
   } 
} ORDER BY ?mbox

(More info: Our lab in Cambridge is composed organizationally of two different departments, and some of our interns report to yet a third department. The third department also contains people not in Cambridge, so we used seed people from each department, grabbed the information that identifies their department (and location), and found all other people matching the same criteria.) We used this SquirrelRDF config file:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix lmap: <http://jena.hpl.hp.com/schemas/ldapmap#> .
@prefix ibm: <http://w3.ibm.com/bluepages#> .
           
<> a lmap:Map ;
        lmap:server <ldap://localhost/ou=bluepages,o=ibm.com> ;
        lmap:mapsProp [ lmap:property foaf:name ; lmap:attribute "cn" ; ] ;
        lmap:mapsProp [ lmap:property ibm:department ; lmap:attribute "dept" ; ] ;
        lmap:mapsProp [ lmap:property foaf:mbox ; lmap:attribute "mail" ; ] ;
        lmap:mapsProp [ lmap:property ibm:city ; lmap:attribute "workLoc" ; ] ;
.

(This post co-authored by Elias, Wing, and myself.)

July 4, 2006

I see the Semantic Web everywhere

Several weeks ago, Elias mused about one way in which semantic web technologies could improve his day-to-day life. Even though I've been working with semantic web technologies myself for a couple of years now, it's only recently that I've found that I'm seeing the Semantic Web all around me. In the past month, I've had more conversations about the Semantic Web with (technical and non-technical) friends of mine, and more and more potential benefits of the Semantic Web seem to crystallize around me constantly.

Yesterday, I took advantage of my day off to watch Lynn do her job (with great aplomb and skill, I must say) over at Roxbury District Court. As I sat for a few hours in the courthouse watching arraignments, bail arguments, default removals, and probation restrictions, I couldn't help but see the massive quantities of data flying around the room in the form of reams and reams of printed and handwritten paper materials. Instead of blonde, brunette, redhead I was seeing criminal complaints, police reports, and suspects' records moving rapidly from clerk to court officer to defense attorney to copy machine to district attorney to probation officer and beyond.

The system functions, but it functions with massive amounts of duplication of effort, misplaced data, and needless inefficiencies. Any attempts at analysis of past precedents requires expensive, painstaking research into the paper files that record all the stages of our justice system. The creation and installation of an electronic system for these records would be invaluable. And while such a system would have gigantic benefits with technical foundations ranging from relational to XML to proprietary, semantic web technologies would really make it shine.

  • Mountains of data. The amount of data generated from such mundane activities as scheduling court dates for a single criminal charge is staggering. (but routine!)
  • Semi-structured data. The data is a mixture of well-structured form fields (the crime charged, location info, bail amounts, court dates, etc.) and unmined free text (e.g. the text of a complaint).
  • Ragged, open-world data. The data on a particular suspect is an open-world amalgamation of past charges, convictions, and current open cases from (possibly) multiple districts. A particular charge includes data generated by the district attorney's office, the court, one or more defense attorneys, the legislature, the department of correction, and more, and is often incomplete at any given moment in time. Furthermore, different charges mandate differently shaped data, as do different special bail conditions, sentences, and probation restrictions.
  • Organizational data interchange. Of course, the entire legal system is not populated by luddites. Parts of the system exist on top of electronic silos with legacy applications providing access to the data. To realize the full potential of an agile and efficient electronic system, however, data interchange between the organizations that take part in the legal system is paramount.

Yes, all of this can be accomplished with technologies other than RDF and friends. But add-in the ability to search and analyze precedents and to define rules and policies (e.g. for sentencing guidelines or indigency determination), and the complete story told by RDF, RDF-S, OWL, SPARQL, and RIF is compelling.

There are social, inertial, and monetary reasons why this sort of systemic revolution is unlikely to happen anytime soon in the (American) legal system. But as the technologies continue to evolve and standardize and the infrastructure continues to mature we'll discover more and more arenas that will benefit from the promise of semantic web technologies. And eventually the confluence of technological capabilities, infrastructure availability, and the awareness of deicision makers will reach a point where we can do far more than just talk about bringing new industries into the semantic web fold.