Enterprise Semantics Blog

We (Cambridge Semantics) have recently launched a new blog, Enterprise Semantics. The blog covers a mix of technical and business topics related to the use of semantic technologies inside large enterprises. I'm writing some posts on that blog, and I'll be continuing to put posts here as well. You can sign up to follow the blog in an RSS reader via its feed, or you can receive emails when there are new posts by subscribing on the blog itself, or you can just follow us @CamSemantics. (The feed is not currently syndicated by Planet RDF, so if you read my blog via Planet RDF and are interested in enterprise semantics, you should probably still sign up separately.)

Here's just a taste of some of the content we've published in the first two months of the blog:

What Happened to NoSQL for the Enterprise?

So what it comes down to is that for decades we’ve had one standard way to store and query important data, and today there are new choices.  As with any choice, there are tradeoffs, and for some applications NoSQL databases, including Semantic Web databases, can enable organizations to get more done in less time and with less hardware than relational databases.  The trick is to know when and how to deploy these new tools.

Big Data... or Right Data?

What matters most, Big Data or Right Data? One look at all the IT headlines these days would suggest that Big Data is the most important data issue today. After all, with lots of computing power and better database storage techniques it is now practical to analyze petabytes of data. However, is that really the most compelling need that end users have? I don’t think so. Instead, I would claim that the issue most end users have is getting together the right data to help them do their jobs better, not analyzing billions of individual transactions.

What the Semantic Web and Digital Cameras have in Common

Analog photography went through lots of phases of dramatic improvement, becoming a mass-market technology. But...no matter how far it went it was limited in its flexibility. Every picture was pretty much as you took it. Any modification required real experts, with specialist equipment and working in a dark room. With the advent of digital photography we have achieved extreme flexibility. The picture you take is simply the starting point to create the picture you want, and the end users themselves can make the changes with easy to use tools.

Semantic Web technology represents the same dramatic shift from the traditional technologies.

Why Semantic Web Software Must Be Easy(er) to Use

In short, if Semantic Web software is hard to use, then many of the benefits of using these technologies in the first place are immediately lost. Conversely, if Semantic Web software is easy to use, on the other hand, then the benefits of Semantic Web technologies' flexibility are brought directly to the end user, the business user. The business manager can bring together new data sets for analysis today, rather than a week for now. An analyst can setup triggers and alerts to monitor for key business indicators today, rather than waiting 3 months. A senior scientist can begin looking for correlations within ad-hoc sets of data today, rather than next year.

It's All About the Data Model

There is a new data model called RDF—the data model of the Semantic Web—which combines the best of both worlds: the flexibility of a spreadsheet and the manageability and data integrity of a relational database. Based on standards set by the World Wide Web Consortium (W3C) to enable data combination on the Web, RDF defines each data cell by the entity it applies to (row) and the attribute it represents (column). Each cell is self-describing and not locked into a grid, in other words the data doesn't have to be "regular". Further, it has formal operations that can be performed on it, much like relational algebra, but clearly at a more atomic level.

I spent Tuesday and Wednesday this week at the W3C Linked Enterprise Data Patterns workshop at MIT (#LEDP). After all, we do linked data and we work with large enterprise customers, so it seemed like a natural fit. The workshop was an interesting two days hearing folks share their experiences using linked data (and sometimes not using linked data) in enterprise situations (and sometimes not in enterprise situations). The main consensus that emerged from the workshop was a desire for a set of profiles of conformance criteria for what constitutes interoperable linked data implementations. I'm personally pretty certain though that the consensus ends there: people continue to have very different views of what pieces of the Semantic Web technology stack (or related technologies like REST and Atom) are most important for a linked data deployment. Eric Prud'hommeaux tried to classify the linked data camps into those doing data integration and storage and query and those doing HTTPy resource linking, but I'm guessing the distinctions are even more nuanced than that.

Anyways, on Wednesday I gave a talk on the patterns we use to segment data within Anzo, as well as some of our other usages of Semantic Web technologies and where we see gaps in the standards world (frankly, more in adoption than in specification). I recorded a screencast of the talk—it's not the most polished, but if you weren't able to attend the workshop you might be interested in the talk. I've also posted the slides themselves online. Here's the video:


There were a couple of discussions in the middle of the talk that I had to cut out because they involved too much cross-talk taking place far away from the mic and were hard to understand. One was a discussion around the way that we (by default) break data into graphs and how it privileges RDF subjects over objects, and whether that affects access control decisions (our experience: no). Another discussion around the 9 minute mark was about the use of the same URI to identify a graph and the subject of data within that graph. A third discussion surrounded ongoing efforts to extend VoID to do additional descriptions of linked data endpoints.

When I suggested that we're often asking the wrong question about why we should use Semantic Web technologies, I promised that I'd write more about what it is about these technologies that lowers the barrier to entry enough to let us do (lots of) things that we otherwise wouldn't. In the meantime, some other people have done a great job of anticipating and echoing my own thoughts on the topic, so I'm going to summarize them here.

The bottom line is this: The Semantic Web lets you do things fast. And because you can do things fast, you can do lots more things than you could before. You can afford to do things that fail (fail fast); you can afford to do things that are unproven and speculative (exploratory analysis); you can afford to do things that are only relevant this week or today (on-demand or situational applications); and you can afford to do things that change rapidly. Of course, you can also do things that you would have done with other technology stacks, only you can have them up and running (& ready to be improved, refined, extended, and leveraged) in a fraction of the time that you otherwise would have spent.

The word 'fast" can be a bit deceptive when talking about technology. We can all be a bit obsessed with what I call stopwatch time. Stopwatch time is speed measured in seconds (or less). It's raw performance: How much quicker does my laptop boot up with an SSD? How long does it take to load 100 million records into a database? How many queries per second does your SPARQL implementation do on the Berlin benchmark with and without a recent round of optimizations?

We always talk about stopwatch time. Stopwatch time is impressive. Stopwatch time is sexy. But stopwatch time is often far less important than calendar time.

Calendar time is measured in hours and days or in weeks and months and years. Calendar time is the actual time it takes to get an answer to a question. Not just the time it takes to push the "Go" button and let some software application do a calculation, but all of the time necessary to get to an answer: to install, configure, design, deploy, test, and use an application.

Calendar time is what matters. If my relational database application renders a sales forecast report in 500 milliseconds while my Semantic Web application takes 5 seconds, you might hear people say that the relational approach is 10 times faster than the Semantic Web approach. But if it took six months to design and build the relational solution versus two weeks for the Semantic Web solution, Semantic Sam will be adjusting his supply chain and improving his efficiencies long before Relational Randy has even seen his first report. The Semantic Web lets you do things fast, in calendar time.

Why is this? Ultimately, it's because of the inherent flexibility of the Semantic Web data model (RDF). This flexibility has been described in many different ways. RDF relies on an adaptive, resilient schema (from Mike Bergman); it enables cooperation without coordination (from David Wood via Kendall Clark); it can be incrementally evolved; changes to one part of a system don't require re-designs to the rest of the system. These are all dimensions of the same core flexibility of Semantic Web technologies, and it is this flexibility that lets you do things fast with the Semantic Web.

(There is a bit of nuance here: if stopwatch performance is below a minimum threshold of acceptability, then no one will use a solution in the first place. Semantic Web technologies have had a bit of a reputation for this in the past, but it's long since true. I'll write more about that in a future post.)

To paraphrase both Ecclesiastes and Michael Stonebraker & Joseph Hellerstein, there is nothing new under the sun.

It's as true with Semantic Web technologies as with anything else—tuples are straightforward, ontologies build on schema languages and description logics that have been around for ages, URIs have been baked into the Web for twenty years, etc. But while the technologies are not new, the circumstances are. In particular, the W3C set of Semantic Web technologies are particularly valuable for having been brought together as a common, coherent, set of standards.

  • Common. Semantic Web technologies are broadly applicable to many, many different use cases. People use them to publish pricing data online, to uncover market opportunities, to integrate data in the bowels of corporate IT, to open government data, to promote structured scientific discourse, to build open social networks, to reform supply chain inefficiencies, to search employee skill sets, and to accomplish about ten thousand other tasks. This makes a one-size-fits-all elevator pitch challenging, but it also means that there's a large audience of practitioners that are benefitting from these technologies and so are coming together to create standards, build tool sets, and implement solutions. These are not niche technologies with limited resources for ongoing development or at risk to be hijacked for a purpose at odds with your own.
  • Coherent. Semantic Web technologies are designed to work together. The infamous layer cake diagram may have many shortcomings, but it does demonstrate that these technologies fit together like jigsaw puzzle pieces. This means that I can build an application using the RDF data model, and then incrementally bring new functionality online by adopting other Semantic Web technologies. Without a coherent set of technologies, I'd have to either roll my own solutions for new functionality (expensive, error-prone) or try to overcome impedance mismatches in connecting together multiple unrelated technologies (expensive, error-prone).
  • Standard. Semantic Web technologies are developed in collaborative working groups under the auspices of the World Wide Web Consortium (W3C). The specifications are free (both as in beer and as in not constrained by intellectual property) and are backed by test suites and implementation reports that go a long way to encouraging interoperable tools.

The technologies are not novel and are not perfect. But they are common, coherent, and standard and that sets them apart from a lot of what's come before and a lot of other options that are currently out there.

The Magic Crank

As a brief addendum to my previous post: I've been using this image for a few years now to illustrate what the Semantic Web is not. I call it the magic crank. I imagine that it sits in the corner of the office of some senior pharma executive, and every time their drug development pipeline gets a bit thin or patent protection for the big blockbuster drugs wears off, the executive pulls it out. She dusts off the crank and plugs in the latest databases full of data on genomics, protein interactions, efficacy and safety studies, etc. A few turns of the magic crank later, and she's rewarded with a little card that tells her exactly what drug to invest in next.

To me, the magic crank is the unrealized holy grail of the Semantic Web in the pharma industry. And it's an extremely powerful and valuable goal. But it's a bit dangerous as well: every time someone new to the Semantic Web learns that the magic crank is what the Semantic Web is all about, they end up trying to tackle large, unsolved problems. They end up asking "What can I do with Semantic Web technologies that I can't do otherwise?". Once you've latched onto the potential of the magic crank, it's very hard to ratchet your questions back down to the less-impressive-but-practical-and-still-very-valuable, "What can I do with Semantic Web technologies that I wouldn't do otherwise?".


Credit for the image goes to Trey Ideker of UCSD. I first saw the image in a presentation by Enoch Huang at CSHALS a few years ago.

I haven't written much lately. I've been busy building things. And while I've been building things, I've been learning things. I'd like to start writing and start sharing some of the things I've been learning.

I'd say that at least once a week, when talking to prospective customers, I get asked the following:

What can I do with Semantic Web technologies that I can't do otherwise?

It's a question that's asked in good faith: enterprise software buyers have heard tales of rapid data integration, automated data inference, business-rules engines, etc. time and time again. By now, any corporate IT department likely owns several software packages that purport to accomplish the same things that Semantic Web vendors are selling them. And so a potential buyer learns about Semantic Web technologies and searches for what's new:

What can I do with Semantic Web technologies that I can't do otherwise?

The real answer to this question is distressingly simple: not much. IT staff around the world are constantly doing data integration, data inference, data classification, data visualization, etc. using the traditional tools of the trade: Java, RDBMSes, XML…

But the real answer to the question misses the fact that this is the wrong question. We ought instead to ask:

What can I do with Semantic Web technologies that I wouldn't do otherwise?

Enterprise projects are proposed all the time, and all eventually reach a go/no-go decision point. Businesses regularly consider and reject valuable projects not because they require revolutionary new magic, but because they're simply too expensive for the benefit or they'd take too long to fix the situation that's at hand now. You don't need brand new technology to make dramatic changes to your business.

The point of semantic web tech is not that it's revolutionary – it's not cold fusion, interstellar flight, quantum computing – it's an evolutionary advantage – you could do these projects with traditional techs but they're just hard enough to be impractical, so IT shops don't – that's what's changing here. Once the technologies and tools are good enough to turn "no-go" into "go", you can start pulling together the data in your department's 3 key databases; you can start automating data exchange between your group and a key supply-chain partner; you can start letting your line-of-business managers define their own visualizations, reports, and alerts that change on a daily basis. And when you start solving enough of these sorts of problems, you derive value that can fundamentally affect the way your company does business.

I'll write more in the future about what changes with Semantic Web technologies to let us cross this threshold. But for now, when you're looking for the next "killer application" for Semantic Web in the enterprise, you don't need to look for the impossible, just the not (previously) practical.

At last week's SemTech conference, my colleague Ben Szekely kicked off the business track of the lightning talks by debuting our new product, Anzo Connect. Ben showed how Anzo Connect can be used in just a few minutes (4.5 to be precise) to pull data from a relational database, map it to an ontology, integrate the data into an existing RDF store, and visualize the results in a Web-based dashboard.

We took this video of the lightning demo from the audience, but it gives some idea of what Anzo Connect is all about. If you're interested in learning more about Anzo Connect or any of our other software, please drop me a note.

(5 Minute ETL with Anzo Connect)

On April 21, 2011, I had the pleasure of speaking to Professor Stuart Madnick's "Evolution Towards Web 3.0" class at the MIT Sloan School of Management. The topic of the lecture was—unsurprisingly—the Semantic Web. I had a great time putting together the material and discussing it with the students, who seemed to be very engaged in the topic. It was a less technical audience then I often speak with, and so I tried to focus on some of the motivating trends, use cases, and challenges involved with Semantic Web technologies and the vision of the Semantic Web.

I've now placed the presentation online. It's broken down into three basic parts:

  • What about the development of the Web and enterprise IT motivates the Semantic Web?
  • How is it being used today?
  • What are some of the challenges facing the Semantic Web, both on the World Wide Web and within enterprises?

I found the last of the three sections particularly interesting, and I hope you do too.

The presentation has speaker's notes along with them that add significant commentary to the slides. You can view them by clicking on the "Speaker Notes" tab below the slides. Please let me know what you think: Evolution Towards Web 3.0: The Semantic Web.

Find recent content on the main index or look in the archives to find all content.

Recent Comments

  • Irene Polikoff: Very good post. I have had similar experiences and questions, read more
  • kmb: That's quite an interesting thread. You write "In the meantime, read more
  • Terry J. Leach: Maybe in your next post you can dig deep into read more
  • dr0i: You can even do more with Linked Open Data, freely read more
  • Terry Leach: Great post! It caused me to take a step back read more
  • Ivan Herman: Hey Lee, All true. Another question, which does not invalidate read more
  • karl: By giving URIs to your data, you are isolating the read more
  • Alexander Gödde: Agree that data is data, irrespective of the format, so read more
  • Mike Bergman: Hi Lee, Very well said, and I agree. I would read more
  • Jeremy Redburn: I really wanted Ben to discover something sinister about the read more

Recent Assets

Powered by Movable Type 4.23-en