Data schema
Revision as of 16:51, 20 December 2018 by Charles Matthews (talk | contribs) (→Property table: update per fallout of discussion of December 13)
This is the basic schema that will be used to store text-mining annotations in ScienceSource. See notes below.
Property table
Property code | Property label | Type of field | Notes |
---|---|---|---|
P2 | Wikidata item code | External identifier | Needs formatter URL set as https://www.wikidata.org/wiki/$. Check documentation, see needed string property below. |
P3 | instance of | Item | Equivalent of P31 on Wikidata |
P4 | subclass of | Item | Equivalent of P279 on Wikidata |
P6 | preceding anchor point | Item | |
P7 | following anchor point | Item | |
P8 | distance to preceding | Quantity | |
P9 | distance to following | Quantity | |
P10 | character number | Quantity | Offset from the initial annotation point in the article. If not a robust figure, adequate for the project. The text version saved as the article here will be the "SI standard". |
P11 | article text title | String | Not disambiguated (see P20). |
P12 | anchor point in | Item | Refers back to underlying article. |
P13 | preceding phrase | String | (1) Subject to a character limit (2) Initially may have some tags (3) Initially not constrained by spaces; certainly, though, from the point of view of human readability, should not have breaks in the middle of words |
P14 | following phrase | String | As for P13 |
P15 | term found | String | Word or phrase from dictionary, mentioned in text and starting at anchor point |
P16 | dictionary name | String | ScienceSource dictionary as named, date tracked by P17 for version |
P17 | publication date | Point in time | For articles |
P18 | length of term found | Quantity | String length of term, pre-computed for use in offsets and constraint checking |
P19 | based on | Item | For an item that is instance of annotation, "based on" has as object the anchor point or annotation it is based on. Therefore this is a child-parent type of property, defining the tree of annotations growing out of a given anchor point. As a constraint, every annotation is required to have such a statement. |
P20 | ScienceSource article title | String | Identifies article, by disambiguated title, for human readability. Because of disambiguation, this title will not always coincide with other versions of the title, such as given by P11. There could also be a cross-namespace version of this property that was lyingly based on "external identifier". |
P22 | time code | Point in time | MediaWiki UTC code, set by creation time for batch (approximate, needed for batch tagging) |
P24 | anchors | Item | Partial inverse property of P19. |
P25 | Page ID | Quantity | MediaWiki page identifier. |
P26 | is subject of a Wikidata triple with object | Item | Drug annotations can be linked to disease annotations when the text states that a Wikidata P2175 statement on Wikidata holds. (Such statements may then be converted into annotations.) |
P27 | Wikidata property in a claimed triple | String | Identifies the Wikidata property in a claimed triple (default P2175) |
P28 | Wikidata subject item | String | Identifies the Wikidata subject in a claimed triple |
P29 | Wikidata object item | String | Identifies the Wikidata object in a claimed triple |
P30 | human has checked | Item | For fact-checking, could be used as a bot intermediate to an annotation, depending on implementation |
P31 | supersedes | Item | Expresses the dominance relation between reviews, for clinical purposes |
P32 | include only after | Quantity | For filtering annotations by restriction to part of an article, by offset |
P33 | include only before | Quantity | For filtering annotations by restriction to part of an article, by offset |
P34 | deprecation for reason | Item | For fine-grained analysis of reasoning that a review should fail MEDRS, in terms of publication type ontology |
P35 | passed MEDRS | Point in time | For recording with a time-stamp the acceptance of a "fact found" annotation by the MEDRS algorithm |
P36 | failed MEDRS | Point in time | For recording with a time-stamp the rejection of a "fact found" annotation by the MEDRS algorithm |
P? | formatter URL | string | Equivalent of P1630 on Wikidata
. See mail thread on configuration https://lists.gt.net/wiki/mediawiki/887858 |
P? | (not yet defined) |
Notes
- For "type of field" see https://www.wikidata.org/wiki/Help:Data_type. For data types available on ScienceSource, see http://sciencesource.wmflabs.org/wiki/Special:ListDatatypes.
- This table now uses "P" for the property prefix. As of October 2018, a Phabricato rthread on using another prefix is still active -T202676.
- The schema will be extended, in particular for checking annotations.
- The project will comply with the W3C Web Annotation Data Model of February 2017. The annotations will be stored here in a Wikibase site, so that inherently they are available in RDF. In principle several dumps will be available from the site, such an RDF of all annotations and other data here, and a dump just of the annotations directly pointing to the articles (i.e. none of the community-added annotations, or of the auxiliary data). What we mean by compliance to the standard will be the availability in principle of the annotations, in the W3C-recommended JSON format.
Item table
Item code | Item label | Comments |
---|---|---|
Q2 | anchor point | Anchor points are where initial annotations hang off articles, and are first-class entities in ScienceSource ontology. We use stand-off annotation, so that nothing is actually inserted into articles. Notionally an anchor point is a place you could find in the article text in an article with your cursor, so for example any place between two letters. In practice anchor points will typically be between a space and an alphanumeric character. |
Q4 | article | Wikibase indexation of the Article: namespace; also serves as initial anchor point for each article. |
Q5 | annotation | An annotation must hang off (a) an anchor point, or (b) another annotation. |
Q6 | terminus | Terminal marker defined uniformly for all articles. The actual final anchor point in an article will be the one linking to Q6, i.e. having a P7 statement with object Q6. |
Q7 | demo article item | |
Q8 | demo anchor point item | |
Q9 | demo annotation item | |
Q6818 | dictionary item |