Life sciences on the web with SPARQL

I've been meaning to write up my experiences from WWW2006 in Edinburgh since getting back at the end of May, but the arrival of heaps of summer interns and the projects that accompany them (including Queso, a semantic-web-powered web-application framework) seems to have defeated that desire.

At the least, though, I wanted to mention the SPARQL/RDF life-sciences web mashup that I demoed at the Advancements in Semantic Web session of the W3C track on Friday. (And in doing so, follow my own example.) In his demo, we use RDF representations and SPARQL queries to integrate protein data from the NCBI with antibodies information from the Alzheimer Research Forum Antibody Database The presentation that I gave in Edinburgh has some more information on how the demo is put together and what Elias and I learned from our work on it.

How to use the demo

Navigate to the demo at http://thefigtrees.net/lee/sw/demos/antibodies/.
Enter a search term to find related proteins. For the purposes of trying out the demo, enter p53 and click Find Proteins.¹
Up to twelve proteins found in the search are rendered on the display, along with the protein's species, description, and NCBI number. Click on a protein to search for antibodies that target that protein. For the purposes of trying out the demo, click on NP_000537.²
Any antibodies found are displayed in a column on the right-half of the page. The information displayed includes the distributor of the antibody, the distributor's catalog number, the immunogen used to generate the antibody, the specificity of the antibody, and the uses for which the antibody is appropriate.

This demo makes use of some of the early work being done by Alan Ruttenberg in conjunction with the BioRDF subgroup of the W3C's Semantic Web in Health care and Life Sciences Interest Group.

Behind the scenes

¹ Two SPARQL queries are used to do this initial search. First, we use a service written by Ben Szekely which performs an NCBI Entrez search and returns the LSIDs of the resulting objects within a simple RDF graph. For each of these LSIDs, we make use of a second one of Ben's services which allows us to resolve the metadata for an LSID via a simple HTTP GET. We use the URLs to this service as the graphs for a second SPARQL query which retrieves the details of the proteins. We take the results of this second SPARQL query as JSON and bind them to a microtemplate to render the protein information.

²Retrieving the antibodies for the selected protein involves two more SPARQL queries. First, we query against a map created by Alan Ruttenberg in order to find AlzForum antibody IDs that correspond to the target protein. We need the results of this query to generate HTTP URLs which search the AlzForm antibody database for the proper antibodies. (If we had a full RDF representation of the antibody database, this query would be unnecessary.) These search URLs are wrapped in a service we created that scrapes the HTML from the antibody search results Web page and generates RDF (how I yearn for RDFa adoption) and then uses these wrapped URLs as the graphs for a second SPARQL query. This query joins the NCBI data with Alan's mapping and the antibody details to retrieve the information that is rendered for each antibody of the target protein.

4 Comments

Stian Soiland | August 17, 2006 9:15 AM

The demo don't work with Firefox 1.5:

Failed to get privilege UniversalBrowserRead to open http://sparql.org/sparql. A script from "http://thefigtrees.net" was denied UniversalBrowserRead privileges.

Couldn't get xp connect

Lee | August 17, 2006 10:17 AM

Hi Stian,

My apologies about that. I've added a note to the demo about that problem to the demo, and am reproducing it here:

"""
Please note that the demo works best using Firefox, and may not work at all with other browsers. In addition, if you receive an error along the lines of Failed to get privilege UniversalBrowserRead, please follow the instructions at the link below and then re-try the demo:

http://esw.w3.org/topic/SparqlCalendarDemoUsage#FAQ
"""

thanks,
Lee

Francisco | September 27, 2006 6:10 AM

Would you consider using Flash crossdomain.xml on your top level domaain. We can then use Flash XMLHttp to query your processor and not worry about UniversalBrowserRead. For example I am prototyping some semantic stuff for finance and I only have .NET server and would like to tap in the work done with your SPARQL parser.

Lee | September 29, 2006 1:40 AM

Hi Francisco,

I'm not particularly familiar with Flash scripting and how crossdomain.xml works.

If I'm reading
http://www.adobe.com/cfusion/knowledgebase/index.cfm?id=tn_14213#policy correctly, though, the crossdomain.xml file needs to go on the *remote* server which is being read via XMLHttpRequest.

In the antibodies demo, we make use of http://sparql.org (HP Labs SPARQLer service) to perform SPARQL queries of NCBI and AlzForum data. Unfortunately, http://sparql.org is not under my control and so I cannot place any files on that web server.

My regrets -- I do agree that this is a problem without a happy solution at the moment.

On my TODO list is work to use the "script" tag callback hack to avoid the privilege request, but even that solution fails in the face of larger SPARQL queries.

Lee

Life sciences on the web with SPARQL

How to use the demo

Behind the scenes

Categories:

4 Comments

Search

About this Entry

Categories

Monthly Archives