Main Page

From ScienceSource
Revision as of 14:10, 22 July 2019 by Charles Matthews (talk | contribs) (update)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Welcome to the ScienceSource wiki! This is the wiki site of the ScienceSource project funded by the Wikimedia Foundation, and will host text mining and annotation of biomedical papers.

This wiki is the place to participate in ScienceSource's activities, and a forum to discuss them. For the licensing of content, refer to the details given in Help:Licenses.

  • The main technical page here is data schema.
  • The "Review tool" link on the sidebar is the good way to see the content here. Select a Q-number link by a paper title, and you'll be given instructions on how to participate in the fact mining here.

NB two points about the boxed SPARQL queries on the review pages:

  1. The service used announces them as "Wikidata"; but they are in fact run on the query service available on the sidebar.
  2. The link in the bottom left of the box allows you to get the code, and so to modify the separation of 200 characters, but shows you the code at (where it doesn't work). It means that at present you need to copy the SPARQL into to run it.

For assistance, leave a message on User talk:Charles Matthews.

Quick introduction[edit]

The purpose of this site is to annotate biomedical papers, with a view to finding facts in them.

You can think of our annotations on a paper as like a comb: the teeth of the comb are the annotations themselves, which correspond to search terms in our dictionaries (compiled from Wikidata). There is also a spine of "anchor points", the places where the teeth of the comb join onto it. The anchor points therefore can be thought of as lying in a row, as well as being associated to places where you could put your cursor in the text of the paper. How far into the text an anchor point is to be found (its "offset" from the beginning) is what we call its "character number".

So there is a system here of connections between items: annotation to anchor point (and back): anchor point to next anchor point; and anchor point to the item that represents the underlying paper. There are also definite distances between those anchor points: think of it as a road map. This allows us to think also to think of distances between annotations.

Using the Query Service here, anchor points that are relatively close together can be found in a paper. When that co-occurrence happens, a reader can decide whether the associated language in the paper states a fact interesting in medicine. Property P26 here can be used to record that, on the "subject item". The project is looking for places where the subject item is an annotation from a drug dictionary, when the identified "object item" is an annotation from a disease dictionary, and the paper could be used to reference a corresponding statement on Wikidata constructed with P2175, "medical condition treated".

Therefore the process of fact extraction is semi-automated: humans examine the language, bearing in mind that in papers related to medicine careful language is used, often qualified in some way. The role of software is to process papers at scale, using thousands of search terms, and to make the quest for candidate facts thorough, and something that can be supported by tools and visualizations.

Later in the project, these facts will be "real data" inputs to an automated process designed to filter the paper, in terms of its acceptability under the referencing guideline MEDRS used on English Wikipedia.


This wiki has now been opened for account creation. Please open an account and support our work.


The top-level category here is Category:Content, and all categories added should belong to the category tree under it.

Search is not working properly here. To navigate to articles, we suggest using the special page

Manual pages[edit]

Project pages[edit]