graphic with four colored squares
Cover page images (Wev-Verse)

SPARQL: Some Hands-On examples using Twinkle and Uniprot

Eric Neumann, eric@clinicalsemantics.com

SPARQL Query: Components

(Thanks to Lee Feigenbaum for input here)
# prefix declarations
	PREFIX namespace: 
	...
# result clause
	SELECT ?prot...
# query pattern
	WHERE {
	    ...
	}
# query modifiers
	ORDER BY ...

SPARQL Endpoint

Initial SPARQL Query: Perdiodic Table

Launch Twinkle: java -jar twinkle.jar
Select 'Periodic Table'
Paste in below Query
Press 'Run'
PREFIX table: <http://www.daml.org/2003/01/periodictable/PeriodicTable#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?element ?an ?clr ?state
WHERE {
?element rdf:type table:Element .
?element table:atomicWeight ?aw .
?element table:atomicNumber ?an .
?element table:color ?clr .
?element table:standardState ?state .
FILTER ( ?aw > 140.0 && ?aw < 160.0)
}
In SPARQL, it's all about 'triples'!

Use

1) Query for Things of a Kind

Now select 'Write Simple Query'
Choose File 'uniprot.rdf'
Paste in Query
Press 'Run'
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX core: <http://purl.uniprot.org/core/>
SELECT ?prot ?name
WHERE {
?prot rdf:type core:Protein .
?prot core:mnemonic ?name .
}

1a) Query for kinds (Classes)

Items are of rdf:type 'Classes'
Here's how you can discover them...
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?cl
WHERE {
?s rdf:type ?cl .
}

1b) Query for relations of things of a Kind (Protein)

Each Class has a set of Predicate types
Here's how you can discover them...
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX core: <http://purl.uniprot.org/core/>
SELECT DISTINCT ?pred
WHERE {
?prot rdf:type core:Protein .
?prot ?pred ?o .
}
These are the things stated about "Protein" in Uniprot

2) Limiting number of results

What if we don't want everything?
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
SELECT ?prot ?name
WHERE {
?prot rdf:type :Protein .
?prot :mnemonic ?name .
}
LIMIT 10

2a) Limiting number of results with a start offset

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
SELECT ?prot ?name
WHERE {
?prot rdf:type :Protein .
?prot :mnemonic ?name .
}
LIMIT 10
OFFSET 20

3) Finding other relations with shorter syntax

Use ';' for compound statements having the same Subject
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
SELECT ?prot ?cit
WHERE {
?prot rdf:type :Protein ;
   :citation ?cit .
}

4) Traversing the graph

Node-Edge-Node-Edge-Node...
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
SELECT ?prot ?cit ?prot2
WHERE {
?prot rdf:type :Protein ;
   :citation ?cit .
?prot2 :citation ?cit .
FILTER ( ?prot != ?prot2)
}

5) Finding proteins using restrictions 2 steps away

Using the larger Graph for constraints
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?prot ?name ?res
WHERE {
?prot rdf:type :Protein ;
   :mnemonic ?name ;
   rdfs:seeAlso ?res .
?res :database "Pfam" .
}

6) Negation: Finding proteins that do not have relations

Negation requires "closed-world assumptions" on the dataset

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?prot ?name ?res
WHERE {
?prot rdf:type :Protein ;
   :mnemonic ?name .
OPTIONAL { ?prot rdfs:seeAlso ?res . ?res :database "Pfam"}.
FILTER (!bound(?res) ).
}

7) Proteins with specific classifications (GO)

GO:0006954 => inflammatory response
GO terms as URIs... try following the link

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?prot ?name
WHERE {
?prot rdf:type :Protein ;
   :mnemonic ?name ;
   :classifiedWith <http://purl.uniprot.org/go/0006954> .
}

8) Statements about statements

Why do some Uniprot predicates have 3 items?
<citation rdf:resource="http://purl.uniprot.org/citations/9575201" rdf:ID="_1"/>

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?p ?o
WHERE {
<http://HoustonSemWeb#_2DB2> ?p ?o .
}

Be sure to add "http://HoustonSemWeb" as your Base URI

9) Reification Statements

 Decomposing Triples into S-P-O Statements

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?p ?o ?stat
WHERE {
?stmt rdf:subject ?s ; rdf:predicate ?p ; rdf:object ?o ; :status ?stat .
}

10) Statements of evidence for Inflammatory proteins

Links to GO that have evidence statements
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?prot ?name ?db ?cit ?stat
WHERE {
?prot rdf:type :Protein ;
:mnemonic ?name ;
   :classifiedWith <http://purl.uniprot.org/go/0006954> .
?stmt rdf:subject ?prot ; rdf:object <http://purl.uniprot.org/go/0006954> ;
   :database ?db ; :citation ?cit ; :status ?stat .
}

11) Finding Proteins with IntAct associations

Uniprot contains Interaction Graphs
Here's how you can begin to mine them

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?prot ?res ?part ?intact
WHERE {
?prot rdf:type :Protein ;
   rdfs:seeAlso ?res .
?res :database "IntAct" .
?part owl:sameAs ?prot .
?intact :participant ?part .
}

12) Finding all interactions between proteins

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?IntAct ?exp ?part ?name
WHERE {
?IntAct rdf:type :Interaction ; :experiments ?exp ;
   :participant ?part .
OPTIONAL { ?part rdfs:label ?name }.
}
ORDER BY DESC(?exp) ?IntAct

Try following the participant links


13) Finding instances linked by owl:sameAs

 owl:sameAs is symmetric, but most SPARQL engines do not know this


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?prot ?t ?prot2 ?t2
WHERE {
?prot rdf:type ?t .
?prot owl:sameAs ?prot2 .
?prot2 rdf:type ?t2 .
}

14) Finding Proteins with Reactome associations

 Uniprot uses rdfs:seeAlso for links to Pathways and such


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?prot ?see ?prot2
WHERE {
?prot rdf:type :Protein ;
   rdfs:seeAlso ?see .
?prot2 rdfs:seeAlso ?see .
?see :database "Reactome" .
FILTER ( ?prot != ?prot2)
}

15) Finding Proteins with both Reactome and Citation associations to each other

 Selection of Proteins 'doubly linked'


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?prot ?see ?prot2 ?cit
WHERE {
?prot rdf:type :Protein ;
   rdfs:seeAlso ?see .
?prot :citation ?cit .
?prot2 rdfs:seeAlso ?see ;
   :citation ?cit .
?see :database "Reactome" .
FILTER ( ?prot != ?prot2)
}

16) Using DESCRIBE

 DESCRIBE means "Get me everything you know about that item"


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
DESCRIBE ?prot
WHERE {
?prot rdf:type :Protein .
?prot :mnemonic "ARL6_HUMAN" .
}

17) Using ASK

 ASK means "Do you have any matches?"


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
ASK
WHERE {
?prot rdf:type :Protein .
?prot :mnemonic "ARL6_HUMAN" .
}

18) Using CONSTRUCT

 CONSTRUCT means "Build these triples for anything found"


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
CONSTRUCT { ?prot :selected "true" . ?prot :citedBy ?cit }
WHERE {
?prot rdf:type :Protein ;
   :citation ?cit ;
   :mnemonic "ARL6_HUMAN" .
}

19) Finding proteins by text matching the fullname

 REGEX for regular expression matching


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
SELECT ?prot ?name
WHERE {
?prot rdf:type :Protein ;
   :recommendedName ?rn .
?rn :fullName ?name .
FILTER regex( ?name, "receptor", "i")
}

20) Finding proteins with Helix Annotations, and displaying the range

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://purl.uniprot.org/core/>
SELECT ?prot ?name ?b ?e
WHERE {
?prot rdf:type :Protein ;
   :mnemonic ?name ;
:annotation ?anno .
?anno rdf:type :Helix_Annotation ;
   :range ?rng . ?rng :begin ?b ; :end ?e .
}