Main Page

From ScienceSource
Revision as of 09:58, 21 February 2019 by Charles Matthews (talk | contribs) (New!: update)
Jump to: navigation, search

Welcome to the ScienceSource wiki! This is the wiki site of the ScienceSource project funded by the Wikimedia Foundation, and will host text mining and annotation of biomedical papers.

This wiki is the place to participate in ScienceSource's activities, and a forum to discuss them. For the licensing of content, refer to the details given in Help:Licenses.

For assistance, leave a message on User talk:Charles Matthews.


21 February 2019: Latest upload is a test batch of otolaryngology reviews, found via the NCBI2wikidata tool.

ScienceSource currently has editing access only via admin account. Leave a message at for any enquiries.

Quick introduction

The purpose of this site is to annotate biomedical papers, with a view to finding facts in them.

You can think of our annotations on a paper as like a comb: the teeth of the comb are the annotations themselves, which correspond to search terms in our dictionaries (compiled from Wikidata). There is also a spine of "anchor points", the places where the teeth of the comb join onto it. The anchor points therefore can be thought of as lying in a row, as well as being associated to places where you could put your cursor in the text of the paper. How far into the text an anchor point is to be found (its "offset" from the beginning) is what we call its "character number".

So there is a system here of connections between items: annotation to anchor point (and back): anchor point to next anchor point; and anchor point to the item that represents the underlying paper. There are also definite distances between those anchor points: think of it as a road map. This allows us to think also to think of distances between annotations.

Using the Query Service here, anchor points that are relatively close together can be found in a paper. When that co-occurrence happens, a reader can decide whether the associated language in the paper states a fact interesting in medicine. Property P26 here can be used to record that, on the "subject item". The project is looking for places where the subject item is an annotation from a drug dictionary, when the identified "object item" is an annotation from a disease dictionary, and the paper could be used to reference a corresponding statement on Wikidata constructed with P2175, "medical condition treated".

Therefore the process of fact extraction is semi-automated: humans examine the language, bearing in mind that in papers related to medicine careful language is used, often qualified in some way. The role of software is to process papers at scale, using thousands of search terms, and to make the quest for candidate facts thorough, and something that can be supported by tools and visualizations.

Later in the project, these facts will be "real data" inputs to an automated process designed to filter the paper, in terms of its acceptability under the referencing guideline MEDRS used on English Wikipedia.


This wiki has now been opened for account creation. Please open an account and support our work. Spam issues mean we'll have to return to restricted access later.


The top-level category here is Category:Content, and all categories added should belong to the category tree under it.

Manual pages

Project pages