Bioschemas: Community Adoption and Training

Bioschemas (http://bioschemas.org) is a community initiative which aims to improve data discoverability in the life sciences and provide better exposure of our data repositories, including the ELIXIR Core and Node Data Resources, to generic search engines, such as Google, and domain specific  repositories such as Identifiers.org, FAIRsharing.org, and DataMed. It does this by encouraging content providers in life sciences to use Schema.org markup to expose consistent structured data in their websites.

Integrating reference taxonomic databases for metabarcoding and metagenomics identification

Comparison of environmental sequences to reference sets from curated marker loci provides a mainstay for taxonomic analysis of microbial communities. Microbial eukaryotic sequencing requires many distinct reference sets to cover diversity adequately. Those producing reference sets follow different curation workflows, but share the need to provide their data onwards to a common set of tools and services, such as EMG, Megan, MetaPIPE and BioMaS.

There are multiple inefficiencies:

Extending open proteomics data analysis pipelines in the cloud: Additional tools and focus on scalability, supporting the dramatic growth of public proteomics data

An ELIXIR implementation study started in February 2017, as a collaboration between EMBL-EBI and ELIXIR-DE. Its main objective is to develop open, robust, scalable and reproducible proteomics data analysis workflows based on OpenMS, directly connected to the PRIDE database (an ELIXIR core data resource) and to deploy these pipelines in the EMBL-EBI "Embassy Cloud" as a proof of concept.

Building on this work, we here propose a follow-up project that has three objectives: 

Integration and standardization of intrinsically disordered protein data (2018-IDPs)

Intrinsically disordered proteins (IDPs), characterized by high conformational variability, cover almost a third of the residues in Eukaryotic proteomes. As major players in cellular regulation, IDPs are involved in numerous diseases.

Specialized IDP databases provide a starting point for analysis, yet their integration into core databases remains very limited. Here, we propose to start integrating IDP information into ELIXIR Core Data Resources.

FAIRness of the current ELIXIR Core resources: Application (and test) of newly available FAIR metrics, and identification of steps to increase interoperability (2018-FAIRCDR)

The FAIR (Findable, Accessible, Interoperable and Reusable) principles aim to maximize the discovery and reusability of digital resources. While the principles have enjoyed rapid uptake across communities (ELIXIR, G20, EOSC, H2020, NIH), the implementation details remain unclear.

Apple as a Model for Genomic Information Exchange

Apple is one of the most famous fruits globally and occupies a central position in folklore, culture, and art. Apple cultivars have retained high genetic and phenotypic diversity, evidenced by the high number of apple varieties cultivated today. The economic and cultural importance of apple has driven efforts to catalogue and exploit this genetic diversity, but few of these data are currently integrated into ELIXIR resources.