TechnicaLee Speaking

Enterprise Semantics Blog

2012-03-05T05:55:26Z

We (Cambridge Semantics) have recently launched a new blog, Enterprise Semantics. The blog covers a mix of technical and business topics related to the use of semantic technologies inside large enterprises. I'm writing some posts on that blog, and I'll be continuing to put posts here as well. You can sign up to follow the blog in an RSS reader via its feed, or you can receive emails when there are new posts by subscribing on the blog itself, or you can just follow us @CamSemantics. (The feed is not currently syndicated by Planet RDF, so if you read my blog via Planet RDF and are interested in enterprise semantics, you should probably still sign up separately.)

Here's just a taste of some of the content we've published in the first two months of the blog:

What Happened to NoSQL for the Enterprise?

So what it comes down to is that for decades we’ve had one standard way to store and query important data, and today there are new choices. As with any choice, there are tradeoffs, and for some applications NoSQL databases, including Semantic Web databases, can enable organizations to get more done in less time and with less hardware than relational databases. The trick is to know when and how to deploy these new tools.

Big Data... or Right Data?

What matters most, Big Data or Right Data? One look at all the IT headlines these days would suggest that Big Data is the most important data issue today. After all, with lots of computing power and better database storage techniques it is now practical to analyze petabytes of data. However, is that really the most compelling need that end users have? I don’t think so. Instead, I would claim that the issue most end users have is getting together the right data to help them do their jobs better, not analyzing billions of individual transactions.

What the Semantic Web and Digital Cameras have in Common

Analog photography went through lots of phases of dramatic improvement, becoming a mass-market technology. But...no matter how far it went it was limited in its flexibility. Every picture was pretty much as you took it. Any modification required real experts, with specialist equipment and working in a dark room. With the advent of digital photography we have achieved extreme flexibility. The picture you take is simply the starting point to create the picture you want, and the end users themselves can make the changes with easy to use tools.

Semantic Web technology represents the same dramatic shift from the traditional technologies.

Why Semantic Web Software Must Be Easy(er) to Use

In short, if Semantic Web software is hard to use, then many of the benefits of using these technologies in the first place are immediately lost. Conversely, if Semantic Web software is easy to use, on the other hand, then the benefits of Semantic Web technologies' flexibility are brought directly to the end user, the business user. The business manager can bring together new data sets for analysis today, rather than a week for now. An analyst can setup triggers and alerts to monitor for key business indicators today, rather than waiting 3 months. A senior scientist can begin looking for correlations within ad-hoc sets of data today, rather than next year.

It's All About the Data Model

There is a new data model called RDF—the data model of the Semantic Web—which combines the best of both worlds: the flexibility of a spreadsheet and the manageability and data integrity of a relational database. Based on standards set by the World Wide Web Consortium (W3C) to enable data combination on the Web, RDF defines each data cell by the entity it applies to (row) and the attribute it represents (column). Each cell is self-describing and not locked into a grid, in other words the data doesn't have to be "regular". Further, it has formal operations that can be performed on it, much like relational algebra, but clearly at a more atomic level.

Linked Enterprise Data Patterns Workshop

2011-12-09T05:13:51Z

I spent Tuesday and Wednesday this week at the W3C Linked Enterprise Data Patterns workshop at MIT (#LEDP). After all, we do linked data and we work with large enterprise customers, so it seemed like a natural fit. The workshop was an interesting two days hearing folks share their experiences using linked data (and sometimes not using linked data) in enterprise situations (and sometimes not in enterprise situations). The main consensus that emerged from the workshop was a desire for a set of profiles of conformance criteria for what constitutes interoperable linked data implementations. I'm personally pretty certain though that the consensus ends there: people continue to have very different views of what pieces of the Semantic Web technology stack (or related technologies like REST and Atom) are most important for a linked data deployment. Eric Prud'hommeaux tried to classify the linked data camps into those doing data integration and storage and query and those doing HTTPy resource linking, but I'm guessing the distinctions are even more nuanced than that.

Anyways, on Wednesday I gave a talk on the patterns we use to segment data within Anzo, as well as some of our other usages of Semantic Web technologies and where we see gaps in the standards world (frankly, more in adoption than in specification). I recorded a screencast of the talk—it's not the most polished, but if you weren't able to attend the workshop you might be interested in the talk. I've also posted the slides themselves online. Here's the video:

There were a couple of discussions in the middle of the talk that I had to cut out because they involved too much cross-talk taking place far away from the mic and were hard to understand. One was a discussion around the way that we (by default) break data into graphs and how it privileges RDF subjects over objects, and whether that affects access control decisions (our experience: no). Another discussion around the 9 minute mark was about the use of the same URI to identify a graph and the subject of data within that graph. A third discussion surrounded ongoing efforts to extend VoID to do additional descriptions of linked data endpoints.

Saving Months, Not Milliseconds: Do More Faster with the Semantic Web

2011-09-27T04:11:38Z

When I suggested that we're often asking the wrong question about why we should use Semantic Web technologies, I promised that I'd write more about what it is about these technologies that lowers the barrier to entry enough to let us do (lots of) things that we otherwise wouldn't. In the meantime, some other people have done a great job of anticipating and echoing my own thoughts on the topic, so I'm going to summarize them here.

The bottom line is this: The Semantic Web lets you do things fast. And because you can do things fast, you can do lots more things than you could before. You can afford to do things that fail (fail fast); you can afford to do things that are unproven and speculative (exploratory analysis); you can afford to do things that are only relevant this week or today (on-demand or situational applications); and you can afford to do things that change rapidly. Of course, you can also do things that you would have done with other technology stacks, only you can have them up and running (& ready to be improved, refined, extended, and leveraged) in a fraction of the time that you otherwise would have spent.

The word 'fast" can be a bit deceptive when talking about technology. We can all be a bit obsessed with what I call stopwatch time. Stopwatch time is speed measured in seconds (or less). It's raw performance: How much quicker does my laptop boot up with an SSD? How long does it take to load 100 million records into a database? How many queries per second does your SPARQL implementation do on the Berlin benchmark with and without a recent round of optimizations?

We always talk about stopwatch time. Stopwatch time is impressive. Stopwatch time is sexy. But stopwatch time is often far less important than calendar time.

Calendar time is measured in hours and days or in weeks and months and years. Calendar time is the actual time it takes to get an answer to a question. Not just the time it takes to push the "Go" button and let some software application do a calculation, but all of the time necessary to get to an answer: to install, configure, design, deploy, test, and use an application.

Calendar time is what matters. If my relational database application renders a sales forecast report in 500 milliseconds while my Semantic Web application takes 5 seconds, you might hear people say that the relational approach is 10 times faster than the Semantic Web approach. But if it took six months to design and build the relational solution versus two weeks for the Semantic Web solution, Semantic Sam will be adjusting his supply chain and improving his efficiencies long before Relational Randy has even seen his first report. The Semantic Web lets you do things fast, in calendar time.

Why is this? Ultimately, it's because of the inherent flexibility of the Semantic Web data model (RDF). This flexibility has been described in many different ways. RDF relies on an adaptive, resilient schema (from Mike Bergman); it enables cooperation without coordination (from David Wood via Kendall Clark); it can be incrementally evolved; changes to one part of a system don't require re-designs to the rest of the system. These are all dimensions of the same core flexibility of Semantic Web technologies, and it is this flexibility that lets you do things fast with the Semantic Web.

(There is a bit of nuance here: if stopwatch performance is below a minimum threshold of acceptability, then no one will use a solution in the first place. Semantic Web technologies have had a bit of a reputation for this in the past, but it's long since true. I'll write more about that in a future post.)

Why Semantic Web Technologies: Common, Coherent, Standard

2011-09-12T14:34:18Z

To paraphrase both Ecclesiastes and Michael Stonebraker & Joseph Hellerstein, there is nothing new under the sun.

It's as true with Semantic Web technologies as with anything else—tuples are straightforward, ontologies build on schema languages and description logics that have been around for ages, URIs have been baked into the Web for twenty years, etc. But while the technologies are not new, the circumstances are. In particular, the W3C set of Semantic Web technologies are particularly valuable for having been brought together as a common, coherent, set of standards.

Common. Semantic Web technologies are broadly applicable to many, many different use cases. People use them to publish pricing data online, to uncover market opportunities, to integrate data in the bowels of corporate IT, to open government data, to promote structured scientific discourse, to build open social networks, to reform supply chain inefficiencies, to search employee skill sets, and to accomplish about ten thousand other tasks. This makes a one-size-fits-all elevator pitch challenging, but it also means that there's a large audience of practitioners that are benefitting from these technologies and so are coming together to create standards, build tool sets, and implement solutions. These are not niche technologies with limited resources for ongoing development or at risk to be hijacked for a purpose at odds with your own.
Coherent. Semantic Web technologies are designed to work together. The infamous layer cake diagram may have many shortcomings, but it does demonstrate that these technologies fit together like jigsaw puzzle pieces. This means that I can build an application using the RDF data model, and then incrementally bring new functionality online by adopting other Semantic Web technologies. Without a coherent set of technologies, I'd have to either roll my own solutions for new functionality (expensive, error-prone) or try to overcome impedance mismatches in connecting together multiple unrelated technologies (expensive, error-prone).
Standard. Semantic Web technologies are developed in collaborative working groups under the auspices of the World Wide Web Consortium (W3C). The specifications are free (both as in beer and as in not constrained by intellectual property) and are backed by test suites and implementation reports that go a long way to encouraging interoperable tools.

The technologies are not novel and are not perfect. But they are common, coherent, and standard and that sets them apart from a lot of what's come before and a lot of other options that are currently out there.

The Magic Crank

2011-08-29T14:23:15Z

As a brief addendum to my previous post: I've been using this image for a few years now to illustrate what the Semantic Web is not. I call it the magic crank. I imagine that it sits in the corner of the office of some senior pharma executive, and every time their drug development pipeline gets a bit thin or patent protection for the big blockbuster drugs wears off, the executive pulls it out. She dusts off the crank and plugs in the latest databases full of data on genomics, protein interactions, efficacy and safety studies, etc. A few turns of the magic crank later, and she's rewarded with a little card that tells her exactly what drug to invest in next.

To me, the magic crank is the unrealized holy grail of the Semantic Web in the pharma industry. And it's an extremely powerful and valuable goal. But it's a bit dangerous as well: every time someone new to the Semantic Web learns that the magic crank is what the Semantic Web is all about, they end up trying to tackle large, unsolved problems. They end up asking "What can I do with Semantic Web technologies that I can't do otherwise?". Once you've latched onto the potential of the magic crank, it's very hard to ratchet your questions back down to the less-impressive-but-practical-and-still-very-valuable, "What can I do with Semantic Web technologies that I wouldn't do otherwise?".

Credit for the image goes to Trey Ideker of UCSD. I first saw the image in a presentation by Enoch Huang at CSHALS a few years ago.

Why Semantic Web Technologies: Are We Asking the Wrong Question?

2011-08-22T14:47:10Z

I haven't written much lately. I've been busy building things. And while I've been building things, I've been learning things. I'd like to start writing and start sharing some of the things I've been learning.

I'd say that at least once a week, when talking to prospective customers, I get asked the following:

What can I do with Semantic Web technologies that I can't do otherwise?

It's a question that's asked in good faith: enterprise software buyers have heard tales of rapid data integration, automated data inference, business-rules engines, etc. time and time again. By now, any corporate IT department likely owns several software packages that purport to accomplish the same things that Semantic Web vendors are selling them. And so a potential buyer learns about Semantic Web technologies and searches for what's new:

What can I do with Semantic Web technologies that I can't do otherwise?

The real answer to this question is distressingly simple: not much. IT staff around the world are constantly doing data integration, data inference, data classification, data visualization, etc. using the traditional tools of the trade: Java, RDBMSes, XML…

But the real answer to the question misses the fact that this is the wrong question. We ought instead to ask:

What can I do with Semantic Web technologies that I wouldn't do otherwise?

Enterprise projects are proposed all the time, and all eventually reach a go/no-go decision point. Businesses regularly consider and reject valuable projects not because they require revolutionary new magic, but because they're simply too expensive for the benefit or they'd take too long to fix the situation that's at hand now. You don't need brand new technology to make dramatic changes to your business.

The point of semantic web tech is not that it's revolutionary – it's not cold fusion, interstellar flight, quantum computing – it's an evolutionary advantage – you could do these projects with traditional techs but they're just hard enough to be impractical, so IT shops don't – that's what's changing here. Once the technologies and tools are good enough to turn "no-go" into "go", you can start pulling together the data in your department's 3 key databases; you can start automating data exchange between your group and a key supply-chain partner; you can start letting your line-of-business managers define their own visualizations, reports, and alerts that change on a daily basis. And when you start solving enough of these sorts of problems, you derive value that can fundamentally affect the way your company does business.

I'll write more in the future about what changes with Semantic Web technologies to let us cross this threshold. But for now, when you're looking for the next "killer application" for Semantic Web in the enterprise, you don't need to look for the impossible, just the not (previously) practical.

Anzo Connect: Semantic Web ETL in 5 Minutes

2011-06-27T13:24:44Z

At last week's SemTech conference, my colleague Ben Szekely kicked off the business track of the lightning talks by debuting our new product, Anzo Connect. Ben showed how Anzo Connect can be used in just a few minutes (4.5 to be precise) to pull data from a relational database, map it to an ontology, integrate the data into an existing RDF store, and visualize the results in a Web-based dashboard.

We took this video of the lightning demo from the audience, but it gives some idea of what Anzo Connect is all about. If you're interested in learning more about Anzo Connect or any of our other software, please drop me a note.

(5 Minute ETL with Anzo Connect)

Evolution Towards Web 3.0: The Semantic Web

2011-05-31T06:52:05Z

On April 21, 2011, I had the pleasure of speaking to Professor Stuart Madnick's "Evolution Towards Web 3.0" class at the MIT Sloan School of Management. The topic of the lecture was—unsurprisingly—the Semantic Web. I had a great time putting together the material and discussing it with the students, who seemed to be very engaged in the topic. It was a less technical audience then I often speak with, and so I tried to focus on some of the motivating trends, use cases, and challenges involved with Semantic Web technologies and the vision of the Semantic Web.

I've now placed the presentation online. It's broken down into three basic parts:

What about the development of the Web and enterprise IT motivates the Semantic Web?
How is it being used today?
What are some of the challenges facing the Semantic Web, both on the World Wide Web and within enterprises?

I found the last of the three sections particularly interesting, and I hope you do too.

The presentation has speaker's notes along with them that add significant commentary to the slides. You can view them by clicking on the "Speaker Notes" tab below the slides. Please let me know what you think: Evolution Towards Web 3.0: The Semantic Web.

Describing the Structure of RDF Terms

2011-03-24T14:20:16Z

I'm wondering if there are existing vocabularies and best practices that deal with the following use case:

How do I write down metadata about the return type of a SPARQL function that returns a URI?

Since "returns a URI" can be a bit ambiguous in the face of things like xsd:anyURI typed literals, we can be a bit more precise:

How do I write down metadata about the return type of a SPARQL function that returns a term for which the isURI function returns true?

Functions like this have all sorts of uses. We use them all the time in conjunction with CONSTRUCT queries and the SPARQL 1.1 BIND clause to generate URIs for new resources.

So, when describing this function, how do I write down the return type of one of these URI-generating functions? I want to write something like:

fn:GenerateURI fn:returns ??

If I had a function that returned an integer, I'd expect to be able to write something like:

fn:Floor fn:returns xsd:integer

But in that case, I'm taking advantage of the fact that datatyped literals denote themselves. (Thanks to Andy Seaborne for pointing this out to me.) I can't say this:

fn:GenerateURI fn:returns xsd:anyURI

This seems to tell me that my function returns something that denotes a URI. (One such things that denotes a URI is an xsd:anyURI literal.) But, again, that's not what I want to say here. I want to say that my function returns something that is syntactically a URI. That is, it returns something that is named by a URI. I considerd something like:

fn:GenerateURI fn:returns rdfs:Resource

But rdfs:Resource is a class of everything, and as far as I can tell would mean that my function could return a URI, a literal, or a blank node.

So any suggestions for how to approach this sort of modeling of the return type (and parameter types) for SPARQL functions?

Moving to a new SSD

2011-02-09T00:37:21Z

At work, we recently got new solid-state drives (SSD). I've been pretty happy with my current Windows 7 installation on my ThinkPad T61 laptop, so I wanted to move to the new drive by cloning the existing drive. One challenge here is that the old drive was a larger capacity than the new drive (320GB vs. 256GB). This post summarizes how I easily cloned my old drive to the SSD.

Make sure that the total amount of used space on the old drive is less than the size of the new drive. Delete or move any large content necessary to get to this point.
Use the Disk Management tool in the Windows 7 control panel (descriptive text is Create and format hard disk partitions) to shrink the primary partition of the old drive so that it is smaller than the new drive.

When you right-click on the partition and choose Shrink Volume, Windows will think for awhile and then tell you the maximum amount of space that can be removed. If this is not enough space, then you'll need to follow the following steps to identify the unmovable files that are preventing further shrinkage and remove them / make them movable.
The Windows Event Viewer records an event with type 259 that references the unmovable file that is preventing the partition from being shrunk further. When I went through this process, I ran into three types of files that blocked my progress:

The Windows pagefile.sys file. This is c:\pagefile.sys. You can remove the pagefile by turning off the page file in the Advanced system settings area of the System control panel. Click through to Advanced > Performance > Settings… > Advanced > Change… and choose No paging file. After you restart Windows, you'll be able to remove pagefile.sys.
The Windows hiberfil.sys file. This is c:\hiberfil.sys, and is used for system hibernation. Disable system hibernation in the Power Options control panel and then delete the hibernation file.
Windows Search index files. These files have a file extension of .wid. I was able to remove them after disabling the Windows Search service.

After removing / moving an unmovable file, repeat the attempt to shrink the partition until the partition can be shrunk to a size smaller than the new SSD's capacity.

Connect the new SSD. I did this using a drive bay in place of my DVD drive. You could use an external USB enclosure as well, but the clone will be much slower.
Create a bootable Linux CD or USB stick. I used a colleague's existing USB stick, but something like Knoppix would work just as well.
Restart your machine and boot into Linux.
Run fdisk –l to check which device is which drive. In my case, my old drive was /dev/sda and the SSD was /dev/sdb.
Use dd to clone the old drive to the new drive: dd if=/dev/sda of=/dev/sdb bs=16M

Make sure that the if parameter points to the old drive and the of parameter to the new drive.
I used a block size of 16M but you could probably even go smaller.
Note that dd doesn't give any progress indication, at least not without jumping through any additional hoops.
At the end of the cloning, dd will give you an error because the new drive is out of space while the old drive is not yet finished. This is expected and not a problem given that the remaining space on the old drive is unallocated.

Swap drives so that the SSD is the main laptop drive and boot the computer. In my case, Windows prompted me to restart once and then everything was all set to go.

Really easy to do. Let me know if you have any questions!

Cambridge Semantics is Hiring

2011-01-17T15:18:06Z

At Cambridge Semantics, we're excited to be bringing a few new people onto our team. We're looking to hire:

A Web Engineer. If you're an expert in serious JavaScript, HTML, and CSS development, this is a great position for you. You'll be working to further Anzo on the Web, our Web-based self-service reporting and data collection tool that uses semantic technologies to put flexible, data-driven visualizations and analytics in the hands of non-technical business users.
A Customer Implementation Engineer. We're looking for a sharp, creative problem solver to join our professional services team and help our customers use Anzo for Excel, Anzo on the Web, and the rest of our Anzo semantic technologies You'll work directly with our customers to solve a wide variety of business problems and also work closely with our entire Cambridge Semantics team, from engineering to sales to marketing.
A Quality Assurance Engineer. If you're experienced in designing and executing software test plans and are looking for an exciting opportunity to apply your talents to cutting-edge enterprise semantic software, then check out this position. You'll be working to design, execute, and automate test cases for all of our current and new Anzo products to help make the software as good as it can possibly be.

If you're interested in applying for any of these positions, please send your resume to jobs@cambridgesemantics.com. If you know anyone who might be interested, please send them our way!

SPARQL, RDF Datasets, FROM, FROM NAMED, and GRAPH

2010-11-24T05:23:41Z

Bob DuCharme suggested that I share this explanation about the role of FROM, FROM NAMED, and GRAPH within a SPARQL query. So here it is…

A SPARQL query goes against an RDF dataset. An RDF dataset has two parts:

A single default graph -- a set of triples with no name attached to them
Zero or more named graphs -- each named graph is a pair of a name and a set of triples

The FROM and FROM NAMED clauses are used to specify the RDF dataset.

The statement "FROM u" instructs the SPARQL processor to take the graph that it knows as "u", take all the triples from it, and add them to the single default graph. If you then also have "FROM v", then you take the triples from the graph known as v and also add them to the default graph.

The statement "FROM NAMED x" instructs the SPARQL processor to take the graph that it knows as "x", take all the triples from it, pair it up with the name "x", and add that pair (x, triples from x) as a named graph in the RDF dataset.
Note that "known as" is purposefully not specified -- some implementations dereference the URI to get the triples that make up that graph; others just use a graph store that maps names to triples.

All the parts of the query that are outside a GRAPH clause are matched against the single default graph.

All the parts of the query that are inside a GRAPH clause are matched individually against the named graphs.

This is why it sometimes makes sense to specify the same graph for both FROM and FROM NAMED:

FROM x
FROM NAMED x

...puts the triples from x in the default graph and also includes x as a named graph. So that later in the query, triple patterns outside of a GRAPH clause can match parts of x and so can triple patterns inside a GRAPH clause.

There's a visual picture of this on slide 13 of my SPARQL Cheat Sheet slides.

Could SemTech Run On Excel? (SemTech Lightning Demo)

2010-07-07T14:02:35Z

At SemTech a couple of weeks ago, I participated in the jam-packed lightning talk session, 90 minutes packed with 5-minute talks and moderated with great aplomb by Paul Miller. While most of the speakers presented pithy, informative, and witty slide decks, I opted to go a different route: I've long believed that some of the biggest value in Semantic Web technologies lies in their ability to dramatically change the timescales involved in traditional IT projects—to this end, I used my 5 minute slot to give a live demo of using our Anzo software suite to build a solution for running a conference such as SemTech using just Excel and a Web browser.

When I got back to Boston, I made a recording of the same lightning demo for posterity. Please enjoy it here and drop me a note if you have any questions or would like to learn more.

(Best viewed in full screen, 720p.)

Early SPARQL Reviews

2010-07-01T17:59:49Z

The SPARQL Working Group is still working on all of our specifications. None are yet at Last Call, though we feel our designs are quite stable and we're hoping to reach Last Call within a few months. Standard W3C process encourages interested community members to review Working Drafts as they're produced, but especially encourages reviews of Last Call drafts.

While we will of course do this (solicit as widespread review of our Last Call drafts as possible), I'd like to put out a call for reviews of our current set of Working Drafts. If you can only do one review, you're probably best off waiting for Last Call; but if you have the inclination and time, it would be great to receive reviews of our current set of Working Drafts at our comments list at public-rdf-dawg-comments@w3.org. The Working Group has committed to responding formally to all comments received from hereon out.

Here is our current set of documents, along with a few explicit areas/issues that the Working Group and editors would love to receive feedback about (of course, all reviews & all feedback is welcome):

SPARQL 1.1 Query

Feedback on MINUS and NOT EXISTS, the two new negation constructs in SPARQL 1.1 (section 8)
Feedback on the new functions in SPARQL 1.1 (15.4.14 through 15.4.21)
Feedback on the aggregates ("set functions") included in SPARQL 1.1 (section 10.2.1)
Feedback on property paths (currently in its own document)

SPARQL 1.1 Update

Handling of RDF datasets in SPARQL Update (particularly the WITH, USING, and USING NAMED clauses)

SPARQL 1.1 Service Description

Discovery mechanism for service descriptions (section 2)
Modeling of graphs and RDF datasets (3.2.7 through 3.2.10 and 3.4.11 through 3.4.17)
Service description as related to entailment (3.2.5, 3.2.6 and 3.4.3 through 3.4.5)

SPARQL 1.1 Entailment Regimes

The mechanisms for restricting solutions in all regimes
Are the OWL Direct Semantics too general? E.g. it allows for variables in complex class expressions

SPARQL 1.1 Federation Extensions

Should support for SERVICE be mandatory in SPARQL 1.1 Query implementations?
Should support for BINDINGS be mandatory in SPARQL 1.1 Query implementations?

SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs

Interpretation/translation of HTTP verbs into SPARQL Update statements
Handling of indirect graph identification (section 4.2 et al.)

Windows Software I Pay For

2010-02-15T06:50:41Z

After four months of sitting on the installation discs but being too busy to act, I’ve finally said goodbye to Windows XP and upgraded to Win7. My jury’s still out as to the new operating system, but in the process of reinstalling all of my software applications I took particular note of a few pieces of software that are not mainstream but that I find indispensable in getting my work done. I pay for all of these (rather than use the free versions) because they make my life better.

ScreenHunter. ScreenHunter lets me take screen shots. Pick a hot key (I use F6, the suggested default). When you hit the hot key, ScreenHunter takes a screenshot of either the current window or a rectangular region of the screen of you’re choosing. This screenshot is copied to the clipboard and saved to a directory of your choosing (one reason ScreenHunter surpasses Windows’s built-in print screen capabilities). I pay ($19.95) for ScreenHunter Plus primarily for its support for multiple hot keys: I assign one to the active window and another to a rectangular region and I’m set.
ScreenSteps. At Cambridge Semantics we use ScreenSteps to produce much of our documentation. ScreenSteps has a clunky, sluggish user interface and is not the most robust software in the world, but it does one thing really, really well: ScreenSteps makes it a breeze to create documents that interleave text and annotated screenshots. Thanks to William Hayes for alerting me to ScreenSteps.
Pdf995. Pdf995 installs a printer driver that lets you print to a PDF file. It’s trivial to use, creates good looking PDF documents, and requires no setup/configuration. It just works. I use it all the time for producing documents, saving Web pages, archiving receipts, and more. I pay ($9.95) for it to avoid the ads every time I use it.
SimpleDiagrams. SimpleDiagrams is an Adobe AIR application that lets me create rough diagrams of processes, organizations, etc. in seconds. It’s purposefully low on functionality—no anchor points, grids, etc. Just a few palettes of common shapes and a blank canvas onto which to drag them. The diagrams look great and are easy to make. (Last week I was sitting around a meeting in which 4 other people were furiously scribbling down the contents of the whiteboard in their notebooks; I used SimpleDiagrams to capture an electronic version of the drawing in a fraction of the time.) I pay ($19.00) for SimpleDiagrams to have access to a few additional symbol libraries, to get rid of the annoying nag windows, and because I think it’s quite cool software.