SPARQL and suggester queries

From ScienceSource
Revision as of 12:20, 28 November 2018 by Tagishsimon (talk | contribs) (Tables of example queries: typo)
Jump to: navigation, search

Use the "Query Service" link in the sidebar to the left to run SPARQL queries here.

For a cheatsheet on query building, see the data schema. The SPARQL here is currently the best way to explore and understand the site. It is designed to be used for auxiliary text-mining functions. The technical side of text-mining can seem opaque, but writing code for it in SPARQL means the relational side is in the foreground, which is helpful for conceptual understanding.

Basic ideas

Basic to ScienceSource is the understanding of search terms, as found in the text of articles, as forming a co-occurrence network. The properties P8 and P9 applied to anchor points measure distances apart. If we say that the distance apart between adjacent anchor points is going to be limited to 100, we have set up a proximity criterion for what we mean by co-occurrence. That means a signal/noise distinction: distance at most 100 will be treated as "signal", the rest of the pairs being discarded as "noise".

Setting up a query that detects this signal is not hard in SPARQL, and side conditions on the annotations based on those anchor points can be included. This mechanism allows co-occurrence to be found in a way that human checking of whether a pair of terms are in a useful "triple" in the article can be applied to carefully chosen parts of the text-mining data.

Using these ideas, "suggester queries" are provided here, to play the initial part in the process of human fact-checking.

Tables of example queries

The tables below are black-and-white version of https://www.wikidata.org/wiki/User:Charles_Matthews/ScienceSource_queries, for running queries and add comments. Resources on SPARQL can be found by following the links on that page.

With each link to the Query Service SPARQL, you have to click the white-on-blue arrow to run the query.

Visualizations

Without any SPARQL background, these are going to be the easy way in.

Query reference name Run from this link Comments SPARQL code
Counting drug annotations link Bubble chart breakdown of drug annotations here. #Counting drug annotations #defaultView:BubbleChart
SELECT ?drug ?drugLabel ?count
WHERE 
{
  {
    SELECT ?drug (COUNT(?annotation) AS ?count)    
WHERE {
        ?annotation wdt:P15 ?drug.
        ?annotation wdt:P16 "infectiousdiseasesdrugs" .
       }
  GROUP BY ?drug 
  HAVING (?count > 1)        
  }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY DESC(?count)
LIMIT 500
Counting disease annotations link Bubble chart breakdown of disease annotations here. #Counting disease annotations #defaultView:BubbleChart
SELECT ?disease ?diseaseLabel ?count
WHERE 
{
  {
    SELECT ?disease (COUNT(?annotation) AS ?count)    
WHERE {
        ?annotation wdt:P15 ?disease.
        ?annotation wdt:P16 "infectiousdiseases" .
       }
  GROUP BY ?disease
  HAVING (?count > 1)        
  }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY DESC(?count)
LIMIT 500
Co-occurrence (version 3) link Close pairs of drug terms and disease terms in a paper displayed on vertical lines representing text positions. In this case, a single drug (left) is found next to two drug terms. The article used here is Africa's 32 Cents Solution for HIV/AIDS #Co-occurrence (filtered) displayed for single article Q6679 #defaultView:Dimensions
SELECT  ?drugLabel ?charnumber2 ?charnumber1 ?diseaseLabel
WHERE {
         ?anchor1 wdt:P12 wd:Q6679;
                  wdt:P10 ?charnumber1.
         ?anchor2 wdt:P12 wd:Q6679;
                  wdt:P10 ?charnumber2.
         ?term1 wdt:P19 ?anchor1.
         ?term2 wdt:P19 ?anchor2.
         ?term1 wdt:P15 ?disease.
         ?term2 wdt:P15 ?drug.
         ?term1 wdt:P16 "infectiousdiseases".
         ?term2 wdt:P16 "infectiousdiseasesdrugs".     
         FILTER (?charnumber2 > ?charnumber1)
         FILTER (?charnumber2 - ?charnumber1 < 200)
 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LineChart by decile for given dictionary and paper link Divides the paper into ten parts and shows the distribution of annotations #defaultView:LineChart
SELECT ?decile (COUNT(?annotation) AS ?count)
WHERE
{
    ?annotation wdt:P3 wd:Q5 ;
                wdt:P19 ?anchor;
                wdt:P16 "infectiousdiseasesdrugs".
    ?anchor wdt:P12 wd:Q6679.
    ?anchor wdt:P10 ?charnumber.
    ?annotationZ wdt:P19 ?anchorZ.
    ?anchorZ wdt:P7 wd:Q6;
             wdt:P12 wd:Q6679;
             wdt:P10 ?length.
   BIND (floor(10*?charnumber/?length) AS ?decile)
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  }
 GROUP BY ?decile

Other queries

Query reference name Run from this link Comments SPARQL code
Articles by ascending length link This query locates the final anchor point in each article, by the fact that the following anchor point notionally is "terminus" (Q6).
The "Snakebite in South Asia" paper is a dummy.
#Articles by ascending notional length
SELECT ?annotation ?title ?length ?article

WHERE 
   {?annotation wdt:P19 ?anchor.
    ?anchor wdt:P7 wd:Q6.
    ?anchor wdt:P12 ?article.
    ?anchor wdt:P10 ?length.
    ?article wdt:P20 ?title
   }
ORDER BY ASC (?length)
Annotations for a particular term link #Find all annotations with term "toxocariasis", showing the article in which the term is found.
SELECT ?item ?article ?term 
  WHERE {?item wdt:P3 wd:Q5.
         ?item wdt:P15 ?term.
         ?item wdt:P20 ?article.
        FILTER (?term = "toxocariasis")
         }
Distinct terms found link #Show all distinct terms found (currently disease and drug terms)
SELECT DISTINCT ?term
  WHERE {?item wdt:P3 wd:Q5.
         ?item wdt:P15 ?term}
Co-occurrence 1, duplications removed link #Co-occurrence of drug and disease terms (duplications removed)
SELECT DISTINCT ?articletitle ?disease ?drug
  WHERE {?term1 wdt:P19 ?anchor1.
         ?term2 wdt:P19 ?anchor2.
         ?term1 wdt:P15 ?disease.
         ?term2 wdt:P15 ?drug.
         ?term1 wdt:P16 "infectiousdiseases".
         ?term2 wdt:P16 "infectiousdiseasesdrugs".
         ?anchor1 wdt:P12 ?article.
         ?anchor2 wdt:P12 ?article.
         ?article wdt:P20 ?articletitle.
         SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
         }
Co-occurrence 1, disease term before drug term link #Co-occurrence filtered by the condition "disease term before drug term"
SELECT ?articletitle ?disease ?drug ?charnumber1 ?charnumber2
  WHERE {?term1 wdt:P19 ?anchor1.
         ?term2 wdt:P19 ?anchor2.
         ?term1 wdt:P15 ?disease.
         ?term2 wdt:P15 ?drug.
         ?term1 wdt:P16 "infectiousdiseases".
         ?term2 wdt:P16 "infectiousdiseasesdrugs".
         ?anchor1 wdt:P10 ?charnumber1.
         ?anchor2 wdt:P10 ?charnumber2.
         ?anchor1 wdt:P12 ?article.
         ?anchor2 wdt:P12 ?article.
         ?article wdt:P20 ?articletitle.
         FILTER (?charnumber2 > ?charnumber1)
         SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
         }
Co-occurrence 2 link #Co-occurrence filtered by the condition "disease term before drug term" and character distance apart < 100
SELECT ?articletitle ?disease ?drug ?charnumber1 ?charnumber2
  WHERE {?term1 wdt:P19 ?anchor1.
         ?term2 wdt:P19 ?anchor2.
         ?term1 wdt:P15 ?disease.
         ?term2 wdt:P15 ?drug.
         ?term1 wdt:P16 "infectiousdiseases".
         ?term2 wdt:P16 "infectiousdiseasesdrugs".
         ?anchor1 wdt:P10 ?charnumber1.
         ?anchor2 wdt:P10 ?charnumber2.
         ?anchor1 wdt:P12 ?article.
         ?anchor2 wdt:P12 ?article.
         ?article wdt:P20 ?articletitle.
         FILTER (?charnumber2 > ?charnumber1)
         FILTER (?charnumber2 - ?charnumber1 < 100)
         SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
         }
Counting annotations in a time period link #Counting annotations in a time period: publication date 2013 earliest.
SELECT ?annotation ?term ?title ?date1
WHERE 
   {?annotation wdt:P19 ?anchor.
    ?annotation wdt:P15 ?term.
    ?anchor wdt:P12 ?article.
    ?article wdt:P17 ?date.
    ?article wdt:P20 ?title.
   BIND (YEAR(?date) AS ?date1)
   FILTER (?date1 > 2012)
   }
Annotations and quantiles link #Annotations in notional first 10% of paper, ordered by absolute distance
#Good correlation with terms in title.
SELECT ?annotation ?title ?term ?length1 ?length
WHERE 
   {?annotationZ wdt:P19 ?anchorZ.
    ?anchorZ wdt:P7 wd:Q6.
    ?anchorZ wdt:P12 ?article.
    ?anchorZ wdt:P10 ?length.
    ?article wdt:P20 ?title.
    ?annotation wdt:P19 ?anchor.
    ?anchor wdt:P12 ?article.
    ?annotation wdt:P15 ?term.
    ?anchor wdt:P10 ?length1.
   FILTER(10*?length1 < ?length)
   }
ORDER BY ASC (?length1)