Skip to content

Latest commit

 

History

History
33 lines (31 loc) · 5.38 KB

evidence-of-identifier-pain.md

File metadata and controls

33 lines (31 loc) · 5.38 KB

This is a list of real-world identifier issues encountered; it aims to be representative rather than exhaustive. This list could be used to

  • Convince funders of the problem
  • Provide a set of references for a paper or specification
  • See what can be done to improve informatics/tooling around identifiers

We warmly welcome anyone to contribute.

Reported by Reported about Problems referenced Problem category
EBI-Ontology Lookup Service (OLS) various ontologies underscore delimited vs colon-delimited forms, case sensitivity search, delimiters
Not clear Darwin Core Triples institutional code collisions amongst darwin core triples collisions, institution identifiers
PrefixCommons NCBI number of shortform and http URI permutations found in the wild for a single identifier in NCBI gene data integration, text mining
General (wikipedia entry) Web-at-large 17 different ways in which URLs could be determined to be equivalent; some of these are lossy data integration
biostars HGNC Mapping between similar entities across databases mapping
Human Phenotype Ontology OMIM Prefix heterogeneity OMIM vs MIM. Have to build special processors to collapse them prefix variation, data integration
Monarch Initiative TAIR TAIR prefix variation difficult to resolve type-specificity
Stian EU grants No obvious documentation for permalinks in EU grants, nor any correlation between destination URL and project ID documentation
H pylori paper HP Protein identifiers Naming problems that result from embedded meaning in identifiers and evolving scientific knowledge. Embedded meaning
PrefixCommons HGNC co-occuring identifier complexities in HGNC (multiple entity types, multiple identifier types, prefixed/unprefixed versions, type-specific URLs without type-specific determinism in local IDs) type-specificity
WebProNews EBAY need for location-independent ids data integration
PrefixCommons ZENODO No rollup to impact for all DOI versions DOI versions
Monarch Initiative Monarch's ingest of FlyBase Faulty ingest process resulted in fly and human genes being considered equivalents instead of orthologs. Data integration
Monarch Initiative EBI-OLS Tricky to support searches of identifiers because of standard query-parsing behavior of solr. Data applications
Ziemann et al Several journals Gene name corruption in supplementary data affects 20% of papers Data quality
D. Natale NCBI's Gene database Large number of identifiers went stale for strains declared "out of scope" or other reasons. In some cases no alternative is offered. Example 1 https://www.ncbi.nlm.nih.gov/gene/?term=5203950. Example 2 https://www.ncbi.nlm.nih.gov/gene/?term=1165308 data stability
Monarch Initiative Massive DB hashed links like http://massive.ucsd.edu/ProteoSAFe/result.jsp?task=f847302a49e34ab89ebf3ecc2250be96&view=advanced_view, especially when surrounded by a lot of implementation-specific cruft, do not inspire confidence. They appear even as though they may be session-specific. There are local IDs that are supported in more deterministic URIs; however these are virtually unfindable except through trial and error: eg. https://gnps.ucsd.edu/ProteoSAFe/dataset_id_redirect.jsp?massiveid=MSV000079621 persistence, documentation
Monarch Initiative Incoming links Other sites are linking to us but in ways that have different conventions about leading zeros, eg. https://monarchinitiative.org/disease/DOID:0050202 isn't correctly formed and leads to 404. persistence, integration
Gene Ontology Duplicated prefixes in EBI RDF platform Prefixes for GO Ids are double encoded and 404 (EBISPOT/RDF-platform#3) persistence, integration
Monarch Initiative OMIM links to ClinicalTrials.gov Lack of identified 'hooks' into clinicaltrials.gov means that searching for an entity leads to false positives integration
Monarch Initiative Link Rot Ruins Halloween Persistence
Gene Ontology Gene Ontology xrefs Russian-doll nesting of id minting authorities integration
Monarch Initiative Prefix collision FB is used as a prefix for FaceBase and for FlyBase Integration