Welcome to the ScienceSource wiki! This is the wiki site of the ScienceSource project funded by the Wikimedia Foundation, and will host text mining and annotation of biomedical papers.
This wiki is the place to participate in ScienceSource's activities, and a forum to discuss them. For the licensing of content, refer to the details given in Help:Licenses.
- The main technical page here is data schema.
- The first phase of the project has been taking place on Wikidata, at https://www.wikidata.org/wiki/Wikidata:ScienceSource_focus_list (shortcut on Wikidata is WD:SSFL). There you can help select the 30,000 papers ScienceSource will download. For a basic introduction to the project and its context, see https://en.wikiversity.org/wiki/Wikiversity:ScienceSource_mentoring.
For assistance, leave a message on User talk:Charles Matthews.
Coming real soon! Federated queries to add Wikidata conditions to searches over annotations!
The purpose of this site is to annotate biomedical papers, with a view to finding facts in them.
You can think of our annotations on a paper as like a comb: the teeth of the comb are the annotations themselves, which correspond to search terms in our dictionaries (compiled from Wikidata). There is also a spine of "anchor points", the places where the teeth of the comb join onto it. The anchor points therefore can be thought of as lying in a row, as well as being associated to places where you could put your cursor in the text of the paper. How far into the text an anchor point is to be found (its "offset" from the beginning) is what we call its "character number".
So there is a system here of connections between items: annotation to anchor point (and back): anchor point to next anchor point; and anchor point to the item that represents the underlying paper. There are also definite distances between those anchor points: think of it as a road map. This allows us to think also to think of distances between annotations.
Using the Query Service here, anchor points that are relatively close together can be found in a paper. When that co-occurrence happens, a reader can decide whether the associated language in the paper states a fact interesting in medicine. Property P26 here can be used to record that, on the "subject item". The project is looking for places where the subject item is an annotation from a drug dictionary, when the identified "object item" is an annotation from a disease dictionary, and the paper could be used to reference a corresponding statement on Wikidata constructed with P2175, "medical condition treated".
Therefore the process of fact extraction is semi-automated: humans examine the language, bearing in mind that in papers related to medicine careful language is used, often qualified in some way. The role of software is to process papers at scale, using thousands of search terms, and to make the quest for candidate facts thorough, and something that can be supported by tools and visualizations.
Later in the project, these facts will be "real data" inputs to an automated process designed to filter the paper, in terms of its acceptability under the referencing guideline MEDRS used on English Wikipedia.
This wiki has now been opened for account creation. Please open an account and support our work. Spam issues mean we'll have to return to restricted access later.
The top-level category here is Category:Content, and all categories added should belong to the category tree under it.
- Help:Troubleshooting - please add maintenance issues here
- Help:User-generated content