Here's just a taste of some of the content we've published in the first two months of the blog:
So what it comes down to is that for decades we’ve had one standard way to store and query important data, and today there are new choices. As with any choice, there are tradeoffs, and for some applications NoSQL databases, including Semantic Web databases, can enable organizations to get more done in less time and with less hardware than relational databases. The trick is to know when and how to deploy these new tools.
What matters most, Big Data or Right Data? One look at all the IT headlines these days would suggest that Big Data is the most important data issue today. After all, with lots of computing power and better database storage techniques it is now practical to analyze petabytes of data. However, is that really the most compelling need that end users have? I don’t think so. Instead, I would claim that the issue most end users have is getting together the right data to help them do their jobs better, not analyzing billions of individual transactions.
Analog photography went through lots of phases of dramatic improvement, becoming a mass-market technology. But...no matter how far it went it was limited in its flexibility. Every picture was pretty much as you took it. Any modification required real experts, with specialist equipment and working in a dark room. With the advent of digital photography we have achieved extreme flexibility. The picture you take is simply the starting point to create the picture you want, and the end users themselves can make the changes with easy to use tools.
Semantic Web technology represents the same dramatic shift from the traditional technologies.
In short, if Semantic Web software is hard to use, then many of the benefits of using these technologies in the first place are immediately lost. Conversely, if Semantic Web software is easy to use, on the other hand, then the benefits of Semantic Web technologies' flexibility are brought directly to the end user, the business user. The business manager can bring together new data sets for analysis today, rather than a week for now. An analyst can setup triggers and alerts to monitor for key business indicators today, rather than waiting 3 months. A senior scientist can begin looking for correlations within ad-hoc sets of data today, rather than next year.
]]>There is a new data model called RDF—the data model of the Semantic Web—which combines the best of both worlds: the flexibility of a spreadsheet and the manageability and data integrity of a relational database. Based on standards set by the World Wide Web Consortium (W3C) to enable data combination on the Web, RDF defines each data cell by the entity it applies to (row) and the attribute it represents (column). Each cell is self-describing and not locked into a grid, in other words the data doesn't have to be "regular". Further, it has formal operations that can be performed on it, much like relational algebra, but clearly at a more atomic level.
Anyways, on Wednesday I gave a talk on the patterns we use to segment data within Anzo, as well as some of our other usages of Semantic Web technologies and where we see gaps in the standards world (frankly, more in adoption than in specification). I recorded a screencast of the talk—it's not the most polished, but if you weren't able to attend the workshop you might be interested in the talk. I've also posted the slides themselves online. Here's the video:
There were a couple of discussions in the middle of the talk that I had to cut out because they involved too much cross-talk taking place far away from the mic and were hard to understand. One was a discussion around the way that we (by default) break data into graphs and how it privileges RDF subjects over objects, and whether that affects access control decisions (our experience: no). Another discussion around the 9 minute mark was about the use of the same URI to identify a graph and the subject of data within that graph. A third discussion surrounded ongoing efforts to extend VoID to do additional descriptions of linked data endpoints.
]]>The bottom line is this: The Semantic Web lets you do things fast. And because you can do things fast, you can do lots more things than you could before. You can afford to do things that fail (fail fast); you can afford to do things that are unproven and speculative (exploratory analysis); you can afford to do things that are only relevant this week or today (on-demand or situational applications); and you can afford to do things that change rapidly. Of course, you can also do things that you would have done with other technology stacks, only you can have them up and running (& ready to be improved, refined, extended, and leveraged) in a fraction of the time that you otherwise would have spent.
The word 'fast" can be a bit deceptive when talking about technology. We can all be a bit obsessed with what I call stopwatch time. Stopwatch time is speed measured in seconds (or less). It's raw performance: How much quicker does my laptop boot up with an SSD? How long does it take to load 100 million records into a database? How many queries per second does your SPARQL implementation do on the Berlin benchmark with and without a recent round of optimizations?
We always talk about stopwatch time. Stopwatch time is impressive. Stopwatch time is sexy. But stopwatch time is often far less important than calendar time.
Calendar time is measured in hours and days or in weeks and months and years. Calendar time is the actual time it takes to get an answer to a question. Not just the time it takes to push the "Go" button and let some software application do a calculation, but all of the time necessary to get to an answer: to install, configure, design, deploy, test, and use an application.
Calendar time is what matters. If my relational database application renders a sales forecast report in 500 milliseconds while my Semantic Web application takes 5 seconds, you might hear people say that the relational approach is 10 times faster than the Semantic Web approach. But if it took six months to design and build the relational solution versus two weeks for the Semantic Web solution, Semantic Sam will be adjusting his supply chain and improving his efficiencies long before Relational Randy has even seen his first report. The Semantic Web lets you do things fast, in calendar time.
Why is this? Ultimately, it's because of the inherent flexibility of the Semantic Web data model (RDF). This flexibility has been described in many different ways. RDF relies on an adaptive, resilient schema (from Mike Bergman); it enables cooperation without coordination (from David Wood via Kendall Clark); it can be incrementally evolved; changes to one part of a system don't require re-designs to the rest of the system. These are all dimensions of the same core flexibility of Semantic Web technologies, and it is this flexibility that lets you do things fast with the Semantic Web.
It's as true with Semantic Web technologies as with anything else—tuples are straightforward, ontologies build on schema languages and description logics that have been around for ages, URIs have been baked into the Web for twenty years, etc. But while the technologies are not new, the circumstances are. In particular, the W3C set of Semantic Web technologies are particularly valuable for having been brought together as a common, coherent, set of standards.
The technologies are not novel and are not perfect. But they are common, coherent, and standard and that sets them apart from a lot of what's come before and a lot of other options that are currently out there.
]]>To me, the magic crank is the unrealized holy grail of the Semantic Web in the pharma industry. And it's an extremely powerful and valuable goal. But it's a bit dangerous as well: every time someone new to the Semantic Web learns that the magic crank is what the Semantic Web is all about, they end up trying to tackle large, unsolved problems. They end up asking "What can I do with Semantic Web technologies that I can't do otherwise?". Once you've latched onto the potential of the magic crank, it's very hard to ratchet your questions back down to the less-impressive-but-practical-and-still-very-valuable, "What can I do with Semantic Web technologies that I wouldn't do otherwise?".
Credit for the image goes to Trey Ideker of UCSD. I first saw the image in a presentation by Enoch Huang at CSHALS a few years ago.
]]>I'd say that at least once a week, when talking to prospective customers, I get asked the following:
What can I do with Semantic Web technologies that I can't do otherwise?
It's a question that's asked in good faith: enterprise software buyers have heard tales of rapid data integration, automated data inference, business-rules engines, etc. time and time again. By now, any corporate IT department likely owns several software packages that purport to accomplish the same things that Semantic Web vendors are selling them. And so a potential buyer learns about Semantic Web technologies and searches for what's new:
What can I do with Semantic Web technologies that I can't do otherwise?
The real answer to this question is distressingly simple: not much. IT staff around the world are constantly doing data integration, data inference, data classification, data visualization, etc. using the traditional tools of the trade: Java, RDBMSes, XML…
But the real answer to the question misses the fact that this is the wrong question. We ought instead to ask:
What can I do with Semantic Web technologies that I wouldn't do otherwise?
Enterprise projects are proposed all the time, and all eventually reach a go/no-go decision point. Businesses regularly consider and reject valuable projects not because they require revolutionary new magic, but because they're simply too expensive for the benefit or they'd take too long to fix the situation that's at hand now. You don't need brand new technology to make dramatic changes to your business.
The point of semantic web tech is not that it's revolutionary – it's not cold fusion, interstellar flight, quantum computing – it's an evolutionary advantage – you could do these projects with traditional techs but they're just hard enough to be impractical, so IT shops don't – that's what's changing here. Once the technologies and tools are good enough to turn "no-go" into "go", you can start pulling together the data in your department's 3 key databases; you can start automating data exchange between your group and a key supply-chain partner; you can start letting your line-of-business managers define their own visualizations, reports, and alerts that change on a daily basis. And when you start solving enough of these sorts of problems, you derive value that can fundamentally affect the way your company does business.
I'll write more in the future about what changes with Semantic Web technologies to let us cross this threshold. But for now, when you're looking for the next "killer application" for Semantic Web in the enterprise, you don't need to look for the impossible, just the not (previously) practical.
]]>We took this video of the lightning demo from the audience, but it gives some idea of what Anzo Connect is all about. If you're interested in learning more about Anzo Connect or any of our other software, please drop me a note.
]]>I've now placed the presentation online. It's broken down into three basic parts:
I found the last of the three sections particularly interesting, and I hope you do too.
The presentation has speaker's notes along with them that add significant commentary to the slides. You can view them by clicking on the "Speaker Notes" tab below the slides. Please let me know what you think: Evolution Towards Web 3.0: The Semantic Web.
]]>How do I write down metadata about the return type of a SPARQL function that returns a URI?
Since "returns a URI" can be a bit ambiguous in the face of things like xsd:anyURI typed literals, we can be a bit more precise:
How do I write down metadata about the return type of a SPARQL function that returns a term for which the isURI function returns true?
Functions like this have all sorts of uses. We use them all the time in conjunction with CONSTRUCT queries and the SPARQL 1.1 BIND clause to generate URIs for new resources.
So, when describing this function, how do I write down the return type of one of these URI-generating functions? I want to write something like:
fn:GenerateURI fn:returns ??
If I had a function that returned an integer, I'd expect to be able to write something like:
fn:Floor fn:returns xsd:integer
But in that case, I'm taking advantage of the fact that datatyped literals denote themselves. (Thanks to Andy Seaborne for pointing this out to me.) I can't say this:
fn:GenerateURI fn:returns xsd:anyURI
This seems to tell me that my function returns something that denotes a URI. (One such things that denotes a URI is an xsd:anyURI literal.) But, again, that's not what I want to say here. I want to say that my function returns something that is syntactically a URI. That is, it returns something that is named by a URI. I considerd something like:
fn:GenerateURI fn:returns rdfs:Resource
But rdfs:Resource is a class of everything, and as far as I can tell would mean that my function could return a URI, a literal, or a blank node.
So any suggestions for how to approach this sort of modeling of the return type (and parameter types) for SPARQL functions?
]]>
Really easy to do. Let me know if you have any questions!
]]>If you're interested in applying for any of these positions, please send your resume to jobs@cambridgesemantics.com. If you know anyone who might be interested, please send them our way!
]]>A SPARQL query goes against an RDF dataset. An RDF dataset has two parts:
The FROM and FROM NAMED clauses are used to specify the RDF dataset.
The statement "FROM u" instructs the SPARQL processor to take the graph that it knows as "u", take all the triples from it, and add them to the single default graph. If you then also have "FROM v", then you take the triples from the graph known as v and also add them to the default graph.
The statement "FROM NAMED x" instructs the SPARQL processor to take the graph that it knows as "x", take all the triples from it, pair it up with the name "x", and add that pair (x, triples from x) as a named graph in the RDF dataset.
Note that "known as" is purposefully not specified -- some implementations dereference the URI to get the triples that make up that graph; others just use a graph store that maps names to triples.
All the parts of the query that are outside a GRAPH clause are matched against the single default graph.
All the parts of the query that are inside a GRAPH clause are matched individually against the named graphs.
This is why it sometimes makes sense to specify the same graph for both FROM and FROM NAMED:
FROM x
FROM NAMED x
...puts the triples from x in the default graph and also includes x as a named graph. So that later in the query, triple patterns outside of a GRAPH clause can match parts of x and so can triple patterns inside a GRAPH clause.
There's a visual picture of this on slide 13 of my SPARQL Cheat Sheet slides.
]]>When I got back to Boston, I made a recording of the same lightning demo for posterity. Please enjoy it here and drop me a note if you have any questions or would like to learn more.
(Best viewed in full screen, 720p.)
]]>While we will of course do this (solicit as widespread review of our Last Call drafts as possible), I'd like to put out a call for reviews of our current set of Working Drafts. If you can only do one review, you're probably best off waiting for Last Call; but if you have the inclination and time, it would be great to receive reviews of our current set of Working Drafts at our comments list at public-rdf-dawg-comments@w3.org. The Working Group has committed to responding formally to all comments received from hereon out.
Here is our current set of documents, along with a few explicit areas/issues that the Working Group and editors would love to receive feedback about (of course, all reviews & all feedback is welcome):