I see the Semantic Web everywhere

Several weeks ago, Elias mused about one way in which semantic web technologies could improve his day-to-day life. Even though I've been working with semantic web technologies myself for a couple of years now, it's only recently that I've found that I'm seeing the Semantic Web all around me. In the past month, I've had more conversations about the Semantic Web with (technical and non-technical) friends of mine, and more and more potential benefits of the Semantic Web seem to crystallize around me constantly.

Yesterday, I took advantage of my day off to watch Lynn do her job (with great aplomb and skill, I must say) over at Roxbury District Court. As I sat for a few hours in the courthouse watching arraignments, bail arguments, default removals, and probation restrictions, I couldn't help but see the massive quantities of data flying around the room in the form of reams and reams of printed and handwritten paper materials. Instead of blonde, brunette, redhead I was seeing criminal complaints, police reports, and suspects' records moving rapidly from clerk to court officer to defense attorney to copy machine to district attorney to probation officer and beyond.

The system functions, but it functions with massive amounts of duplication of effort, misplaced data, and needless inefficiencies. Any attempts at analysis of past precedents requires expensive, painstaking research into the paper files that record all the stages of our justice system. The creation and installation of an electronic system for these records would be invaluable. And while such a system would have gigantic benefits with technical foundations ranging from relational to XML to proprietary, semantic web technologies would really make it shine.

  • Mountains of data. The amount of data generated from such mundane activities as scheduling court dates for a single criminal charge is staggering. (but routine!)
  • Semi-structured data. The data is a mixture of well-structured form fields (the crime charged, location info, bail amounts, court dates, etc.) and unmined free text (e.g. the text of a complaint).
  • Ragged, open-world data. The data on a particular suspect is an open-world amalgamation of past charges, convictions, and current open cases from (possibly) multiple districts. A particular charge includes data generated by the district attorney's office, the court, one or more defense attorneys, the legislature, the department of correction, and more, and is often incomplete at any given moment in time. Furthermore, different charges mandate differently shaped data, as do different special bail conditions, sentences, and probation restrictions.
  • Organizational data interchange. Of course, the entire legal system is not populated by luddites. Parts of the system exist on top of electronic silos with legacy applications providing access to the data. To realize the full potential of an agile and efficient electronic system, however, data interchange between the organizations that take part in the legal system is paramount.

Yes, all of this can be accomplished with technologies other than RDF and friends. But add-in the ability to search and analyze precedents and to define rules and policies (e.g. for sentencing guidelines or indigency determination), and the complete story told by RDF, RDF-S, OWL, SPARQL, and RIF is compelling.

There are social, inertial, and monetary reasons why this sort of systemic revolution is unlikely to happen anytime soon in the (American) legal system. But as the technologies continue to evolve and standardize and the infrastructure continues to mature we'll discover more and more arenas that will benefit from the promise of semantic web technologies. And eventually the confluence of technological capabilities, infrastructure availability, and the awareness of deicision makers will reach a point where we can do far more than just talk about bringing new industries into the semantic web fold.