Galaxy services

Name Description ELIXIR Node
ELIXIR Germany
ELIXIR Belgium, ELIXIR Germany

Our aim is to extend myFAIR Analysis into a cloud based service that can be executed using the advanced INDIGO PaaS services on-top of any ELIXIR Compute Platform cloud resource.  This approach will enable the advanced features provided by INDIGO to be made accessible to the whole of ELIXIR by porting them on the standard ECP cloud resource. The utility of this myFAIR cloud will be demonstrated using existing validated test case scenarios (e.g. Mothur-SOP and/or EGA), and building towards providing myFAIR Analysis as a research service CLOUD (myFAIR CLOUD Analysis) supporting single/multi-user and single/multi-center for FAIR data management and analysis.

Impact of the Study: 

  • Extend myFAIR Analysis into a cloud based service, executed using advanced INDIGO PaaS services on-top of any ECP cloud resource.
  • Enable INDIGO features to be accessible to the whole of ELIXIR by porting them on the standard ECP cloud resource. 
  • Demonstrate myFAIR cloud using existing validated test case scenarios (e.g. Mothur-SOP and/or EGA). 
  • Building towards providing myFAIR Analysis as a research service CLOUD (myFAIR CLOUD Analysis) supporting single/multi-user and single/multi-center for FAIR data management and analysis.
ELIXIR Netherlands, ELIXIR Italy, EMBL-EBI

Over the coming decade, Europe will face critical challenges in maintaining biodiversity, ensuring food security and combating pathogens. Our 2024–28 Programme will address these issues by mobilising and integrating molecular data, using successful coordination models from human genomics. Through strategic investments and collaboration in externally-funded projects, ELIXIR will enhance scientific services and support transnational research in these essential areas.

The following projects have been selected as part of the ELIXIR 2024–28 Programme’s Biodiversity, food security and pathogens Science Tier:

  • E-PAN: Enhancing pan-genome analysis in plants
  • FAIRyMAGs: Optimising Metagenomics Assembled Genomes building: workflow finalisation, training material development, real data evaluation and resource allocation tool creation
  • HARVEST: Handling and alignment of plant research FAIRification – value through the use of ELIXIR data Standards and Tools
  • Odyssey: Connecting molecular and geographical biodiversity data

With the declining cost of genome sequencing, the focus of plant researchers is shifting towards characterising the wide genomic diversity present within a species. Crop pan-genomes consist of the sequencing, comparison and integration of multiple different genomes from the same agriculturally important species such as wheat, rice and potatoes. Exploiting the information encoded within these pan-genomes can lead to the development of new cultivars more resilient to upcoming challenges like increased drought and heat stress. 

Multiple consortia are independently generating and integrating these pan-genomes, but there is currently little progress in streamlining and homogenising these efforts. While sequence quality is no longer a major issue, the completeness of both assembly and subsequent gene annotation are much harder to correctly quantify, while being the major drivers in explaining the adaptive differences between genotypes. Where there are efforts to visualise and browse pan-genomes, for example by using graph representations, the easy retrieval of gene Presence Absence Variation information or structural rearrangements is currently lacking, hampering knowledge learning. 

E-PAN aims to streamline the efforts of different research groups within the ELIXIR Plant Science Community. This encompasses the development of effective standards, computational pipelines and tutorials to assess the quality of pan-genomes and provide solutions to identified problems. We will also evaluate and integrate different approaches for data visualisation and browsing, which will be used by different partners sharing pan-genomics results. A one-day meeting and an online workshop will be organised to disseminate results and initiate new collaborative projects. These concerted efforts will lead to a standardised approach to be used in future pan-genome projects, a reduction in duplication efforts across consortia, and a set of tools to visualise and mine pan-genomics results.

Nodes involved: ELIXIR Belgium, ELIXIR Germany, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR UK
Communities: Plant sciences

Metagenomics Assembled Genomes (MAGs) are crucial for understanding biodiversity, enhancing food security and combating pathogens by providing insight on uncultured and unexplored genomes. This proposal outlines a comprehensive project aimed at advancing metagenomics research through the advancement, optimisation, evaluation and dissemination of robust FAIR workflows for building MAGs. 

Leveraging the Galaxy platform, our primary objectives include finalising a user-friendly state-of-the-art Galaxy workflow tailored for MAG construction, and ensuring its accessibility and reusability through integration with WorkflowHub. To support user adoption and proficiency, we will create FAIR educational materials hosted on the Galaxy Training Network (GTN), empowering researchers with the skills necessary to use the workflow effectively. 

The efficacy of the developed workflow will be rigorously evaluated by analysing MAGs generated from simulated and real-world data-spanning diverse environments: atmosphere, marine and cow gut microbiomes. This evaluation will provide valuable insights into the workflow's performance and its applicability across different sample types, complexities and ecosystems.

We will also investigate the computational resources required for executing the assembly step of the workflow using data provided by several Galaxy servers and the MGnify team on various input datasets. The aim would be to optimise resource allocation to ensure efficient and cost-effective MAGs construction. A novel tool will be developed to facilitate this process, allowing researchers to accurately estimate and allocate resources for each step of the assembly pipeline. 

By addressing these objectives, our project aims to accelerate metagenomics research by providing researchers with a comprehensive and accessible framework for MAGs construction. This framework will not only streamline the workflow for building MAGs but also facilitate reproducibility, collaboration and innovation within the ELIXIR Microbiome Community.

Nodes involved: ELIXIR France, ELIXIR Germany, ELIXIR Italy, EMBL-EBI
Communities: Galaxy, Microbiome

The standardisation and accessibility of plant data is a major challenge for agricultural research. MIAPPE, which was developed as part of the transPLANT and ELIXIR-EXCELERATE projects, has made a decisive contribution to unifying data capturing. Also, the FONDUE Implementation Study facilitated the integration of phenotypic and genotypic data. 

Nevertheless, challenges persist in achieving full FAIRness of plant data. The development of guidelines and best practice documents within the Commissioned Service INCREASING has improved this. However, further enhancements are required, such as providing additional documentation and reference datasets. 

To address these needs, it is important to assess the practical effort required to FAIRify datasets using MIAPPE, ISA, ARC and RO-Crate standards. The idea is to provide biologist-friendly data documentation and at the same time  introduce machine-actionable formats for bioinformaticians to use. A further challenge arises from the scattered nature of the information, as there is no single resource on which all the information is collated. 

In HARVEST, we aim to address these challenges by FAIRifying datasets (DROPS, AGENT) using the latest version of MIAPPE as a basis, which now covers more diverse and complex use cases. This process will include enriching the MIAPPE documentation in particular with example datasets, updating training material and refining mappings to other interoperable formats such as BrAPI, Bioschemas and ISA-Tab/JSON. We will also establish links using FAIDARE to repositories such as EMBL-EBI EVA, e!DAL-PGP, recherche.data.gouv and Zenodo, to enhance data sharing and reuse opportunities. An extension of the RDMkit Plant Sciences pages will be implemented to serve as a primary hub for information on FAIRification of plant data. Furthermore, we will be consolidating resources and improving accessibility through direct linking to the original web resources and recipes, also adding Jupyter notebooks to the FAIR Cookbook where possible.

Nodes involved: ELIXIR Germany, ELIXIR France, ELIXIR Netherlands, ELIXIR UK, EMBL-EBI
Communities: Plant Sciences

Understanding molecular biodiversity is essential for ecological conservation and sustainable development. While a vast array of molecular data awaits exploration, its lack of connectivity with other sources of data and metadata such as geographical reference, habitat, population size and phenotypic data often pose significant barriers to biodiversity research.

This project proposal is about developing Odyssey, a web portal in the form of a user-friendly interface that will allow researchers, educators and citizens to navigate the world of molecular biodiversity using Greece and Norway as case studies – two countries with a characteristic and unique wealth of biodiversity, representative for Mediterranean and Nordic types of ecosystems respectively. 

Based on existing sources of information and prototype applications available for specific regions and taxa, this project aims to link actual efforts and develop a new interface to offer diverse functionalities for data exploration and analysis, such as descriptive statistics, graphs, maps, customisable data filters and dynamic visualisations. Through modular design, the application will ensure flexibility and scalability, enabling easy integration of new data sets and analytical tools in the future. This approach will be used for training and communication, inviting traditional biodiversity research groups to utilise new information concerning the spatial patterns of biodiversity and their connection with features that are important for designing conservation measures, such as habitat connectivity, representativity, population demographics, dynamics of adaptation and migration.

Odyssey’s outcome will be a valuable tool for studying and, ultimately, offering a basis for managing and conserving the rich molecular biodiversity of Greece and Norway, as well as supporting the activities of the ELIXIR Biodiversity Community in the two Nodes and in Europe. This will promote collaboration, innovation and knowledge exchange in biodiversity research and beyond. 

This new tool will be developed and offered under an open-source licence, encouraging community participation and contribution to further enhance its capabilities and broaden its applications, fostering a robust network for biodiversity research in Greece and Norway.

Nodes involved: ELIXIR Greece, ELIXIR Norway
Communities: Biodiversity

ELIXIR Belgium, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR UK, EMBL-EBI, ELIXIR Italy

Cellular and molecular biology are fundamental to ELIXIR's mission. As part of our 2024–28 Programme, we are committed to advancing data services and software for research on nucleic acids, proteins and other biomolecules. This initiative will address new demands for multi-omics and multi-modal analyses, including imaging, by developing methods and partnerships. We will also expand expertise in reusable data and software to incorporate FAIR models, ensuring robust solutions for modelling at all scales. 

The following projects are key to connecting the latest developments with established data resources, unlocking the potential of cellular and molecular biology:

  • Advancing structural and functional ontologies of disordered proteins 
  • DBTLHub: Towards a one-stop shop for connecting databases, datasets and tools for the Design-Build-Test-Learn cycle in biotechnology 
  • Spatial2Galaxy: There is no Galaxy without Space 
  • Next level of reproducible, comparable and integrable Metabolomics

This project addresses the limitations of current ontologies in capturing the dynamic nature of disordered protein regions by pursuing several primary objectives. Firstly, novel structural and functional ontologies will be developed to accurately represent the structural heterogeneity and dynamic functional annotations of proteins. These ontologies will incorporate timescales, annotating the kinetics of structural transformations to elucidate molecular mechanisms and regulatory pathways governing protein dynamics. 

Collaborating with existing databases and consortia will ensure seamless integration of ontological resources and experimental data, fostering interoperability and accelerating discoveries. A standardised file format specification will also be developed in collaboration with the Human Proteome Organisation Proteomics Standards Initiative, facilitating the encoding of structural state transitions within disordered protein regions. This specification will enhance data interoperability and exchange among research groups and databases, providing a common language for describing structural transitions and advancing our understanding of the functional implications of protein dynamics in biological systems.

Nodes involved: ELIXIR Belgium, ELIXIR Hungary, ELIXIR Italy, EMBL-EBI
Communities: 3D BioInfo, Intrinsically Disordered Proteins

This project aims to strengthen the basis for a one-stop shop connecting databases, datasets and tools for the deployment of the engineering Design-Build-Test-Learn (DBTL) framework in biotechnology. It will do so by surveying the tools and data landscape, pinpointing gaps and opportunities, and establishing design patterns for task-specific workflows for analysis, integration and sharing of multimodal data. 

It will provide a resource that will allow users to navigate the complex landscape of biotechnology tooling and data, as well as to establish solutions that fit their specific DBTL requirements. Use cases from ongoing programmes in various communities will be used to ascertain and establish the pragmatic value of the solutions. 

The work will be carried out through hands-on activities, dedicated workshops and hackathons, providing training and resources, as well as fostering industrial engagement. The experience of the communities and platforms involved in systems biology, industrial biotechnology, metabolic modelling, metabolomics, enzymes, bioprospecting and data management will be particularly valuable in this respect, as well as their respective industrial relations. Accordingly, the project engages participants from seven ELIXIR nodes and connects researchers and their activities from six communities. 

The project outcomes will contribute to advancing the ambition of connecting the latest developments and established data resources across ELIXIR to realise the potential of cellular and molecular biology, particularly in the fields of industrial biotechnology and biomanufacturing.

Nodes involved: ELIXIR Spain, ELIXIR Greece, ELIXIR France, ELIXIR Netherlands, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR UK
Communities: Biodiversity, Microbiome, Metabolomics, Microbial Biotechnology, Research Data Management, Systems Biology

Spatial transcriptomics (ST) was named ‘Method of the Year 2020’ by Nature Methods and was more recently featured in Nature’s Seven technologies to watch in 2024. ST is now a prerequisite for researching transcriptional pathology at the cellular and molecular levels. Current use of ST is ubiquitously applied to multiple pathologies, including neurodegenerative disease, cancer, cardiomyopathy and nephrology. There is also an emerging application of ST in plant and microbiome research. While there are a plethora of spatial analysis applications, these are not unified or easily manageable by research scientists and they lack any hope of delivering FAIR and reproducible results.

To address this challenge, we will implement Spatial2Galaxy (S2G) – a self-contained, reproducible, scalable FAIR spatial transcription analysis platform for researchers and bioinformaticians alike. We will develop S2G based on our success with developing Galaxy workflows, training materials and ST and single-cell analysis pipelines. 

S2G will provide state-of-the-art ST tools and workflows with proven high performance in benchmarking studies, ensuring the uptake of best practices. These tools will be demonstrated on datasets that connect various ST databases. This will consolidate community guidelines for integrative multi-modal single-cell omics and imaging analysis. Compared to non-spatial single-cell sequencing, presented as the Nature ‘Method of the Year 2013', it took six years until practical training and workflows for its analysis were FAIRified and available in Galaxy by 2019. In contrast, S2G aims to reduce this gap between technologies becoming relevant and provision of FAIR resources to the life science community for ST. 

Nodes involved: ELIXIR Germany, ELIXIR France, ELIXIR Netherlands, ELIXIR UK
Communities: Cancer Data, Galaxy, Human Copy Number Variation, Single-Cell Omics

The ELIXIR metabolomics community relies on standards, formats and data treatment solutions development and adoption, but it remains challenging to ensure high-quality reported metadata, sufficiently contextualised results, interoperable and reusable datasets and to integrate these metabolomics data with other omics or studies. 

This project is designed to address these issues and aims to connect key international standards with ELIXIR resources, as well as creating associated community guidelines and training materials. 

Based on the FAIRification framework, activities in the project will: i) increase interoperability and reuse of public metabolomics datasets and workflows through enhanced and extended open data standards, resources and new semantic annotations, ii) define, ensure and establish quality control for study baselines in Metabolomics and Exposomics, and iii) facilitate metabolomic data interpretation and meta-analysis integration with multi-omics and systems biology studies. 

As a first necessary step, the project will create a Semantic Metabolomics Data Model to standardise metadata, ensuring unambiguous reuse of metabolomics projects. This model will focus on integrating key ontologies, providing open training initiative and enhancing the interoperability of metabolomics data through the production of open guidelines for annotation steps. By linking with ELIXIR’s Deposition databases, ISA Framework and other services, the project seeks to boost interconnection with ELIXIR platforms, other ELIXIR communities (Systems Biology, Food and Nutrition, Galaxy, Proteomics, Toxicology, Research Data Alliance Focus Group ...), the FAIR Cookbook and BioSchemas.org communities. Project outcomes are expected to promote  the emergence of ambitious and innovative semantic-based solutions for inter-comparison of studies in healthcare, clinical and plant domains.

Nodes involved: ELIXIR Czech Republic, ELIXIR Germany, ELIXIR Italy, ELIXIR Spain, ELIXIR France, ELIXIR Netherlands, ELIXIR Sweden, ELIXIR UK, EMBL-EBI
Communities: Food and Nutrition, Galaxy, Metabolomics, Proteomics, Research Data Management, Single-Cell Omics, Systems Biology, Toxicology

ELIXIR Belgium, ELIXIR Czech Republic, ELIXIR France, ELIXIR Greece, ELIXIR Hungary, ELIXIR Italy, ELIXIR Netherlands, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR UK, EMBL-EBI

The aim of this Implementation Study is to determine the requirements for validation with ELIXIR partners, to build prototype open validation services for archetype archival databases and knowledge bases, in particular:

  • Content validation according to minimum information checklists.
  • Syntactic format validation according to a standard format in conjunction with the GA4GH file formats team as part of the Large Scale Genomics Workstream.
  • Syntactic format validation for Phenotyping data.
  • Semantic validation according to a publicly available ontology.
ELIXIR Belgium, ELIXIR France, EMBL-EBI, ELIXIR UK
ELIXIR Belgium, ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Estonia, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Ireland, ELIXIR Israel, ELIXIR Italy, ELIXIR Luxembourg, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI
ELIXIR Belgium, ELIXIR Cyprus, ELIXIR Czech Republic, ELIXIR Denmark, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Greece, ELIXIR Hungary, ELIXIR Israel, ELIXIR Italy, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Portugal, ELIXIR Slovenia, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI

ELIXIR is about integration of diverse resources including tools, training materials and technical services. Within EXCELERATE, ELIXIR is building portals to collate information on tools and data services (bio.tools), training events and material (TeSS, WP11 e-learning environment), compute resources (WP4 technical service registry) and cross-linked policy, standards and databases (FAIRsharing, WP4). A focus of EXCELERATE is to set up these portals such that they can interoperate.

Currently, a scientist can use TeSS to find training events and materials and then, in a separate search, use bio.tools to find relevant tools, and FAIRsharing to find standards and databases. At the moment these ELIXIR portals provide a useful, but fragmented service.  Ideally, linking TeSS and bio.tools to ELIXIR’s computer resources via common workflow diagrams would enable end-users to discover and learn about the prevalent bioinformatics workflows. In this implementation study, we want to achieve the first step and link TeSS and bio.tools via most prevalent bioinformatics workflows and lay the foundation to later incorporate other ELIXIR platforms, such as the compute resources, to provide an even more useful service for the researcher.

The goal of this implementation study is to provide the life-scientist end-user with a powerful tool to find and use ELIXIR resources - across the spectrum - based on intuitive graphical diagrams of the most prevalent scientific workflows.

ELIXIR UK, ELIXIR Estonia, ELIXIR Belgium, ELIXIR Denmark, ELIXIR Switzerland, EMBL-EBI, ELIXIR Norway, ELIXIR France

Part of the EOSC-Hub project proposal it to establish an ELIXIR Competency Centre (ECC).  The EOSC-Hub proposal is currently under review by the EC and a funding decision is expected around late summer 2017 with a tentative project start date of January 2018. The focus of the ECC is to look at the distribution of reference data sets within the EOSC environment and it is proposed by the Compute Platform ExCo to kick-start the work within the ECC through a proof of concept study (funded by the ELIXIR-Hub) on making big data sets available on remote compute infrastructures.

Therefore the overall purpose of this Proof of Concept Study is to bring together funded work already taking place within ELIXIR-Excelerate, EUDAT2020 and the ELIXIR nodes into an integrated activity:

  • Deployment and integration of FTS3 by CESNET for use by ELIXIR and integrated into the ELIXIR AAI for use by RDSDS and other activities.
  • Deploy and test in a wide area testbed the first pre-releases from EMBL-EBI of the Reference Data Set Distribution Service (RDSDS) currently being developed by EMBL-EBI within the EUDAT 2020 project.
  • The expansion and performance of the current data transfer testbed to explore FTS3 and RDSDS by identifying ELIXIR nodes (and others) capable of hosting large reference data sets and the reference data sets that will be transferred to test the system

This study is now completed, the work is described in the end report. The outcome of this study is summarised in a webinar:

This study is associated with: The use of Cloud & VM for training.

ELIXIR Sweden, ELIXIR Germany, ELIXIR Czech Republic, EMBL-EBI, ELIXIR Finland

Part of the EOSC-Hub project proposal it to establish an ELIXIR Competency Centre (ECC).  The EOSC-Hub proposal is currently under review by the EC and a funding decision is expected around late summer 2017 with a tentative project start date of January 2018. The focus of the ECC is to look at the distribution of reference data sets within the EOSC environment and it is proposed by the Compute Platform ExCo to kick-start the work within the ECC through a proof of concept study (funded by the ELIXIR-Hub) on making big data sets available on remote compute infrastructures.

Therefore the overall purpose of this Proof of Concept Study is to bring together funded work already taking place within ELIXIR-Excelerate, EUDAT2020 and the ELIXIR nodes into an integrated activity:

  • Deployment and integration of FTS3 by CESNET for use by ELIXIR and integrated into the ELIXIR AAI for use by RDSDS and other activities.
  • Deploy and test in a wide area testbed the first pre-releases from EMBL-EBI of the Reference Data Set Distribution Service (RDSDS) currently being developed by EMBL-EBI within the EUDAT 2020 project.
  • The expansion and performance of the current data transfer testbed to explore FTS3 and RDSDS by identifying ELIXIR nodes (and others) capable of hosting large reference data sets and the reference data sets that will be transferred to test the system

This study is now completed, the work is described in the end report. The outcome of this study is summarised in a webinar:

This study is associated with: The use of Cloud & VM for training.

ELIXIR Sweden, ELIXIR Germany, ELIXIR Czech Republic, EMBL-EBI, ELIXIR Finland

The Marine Metagenomics Community has adopted the use of the Common Workflow Language (CWL) as an interoperable way to describe their analysis pipelines. One of the most complex and fully developed CWL workflows implements the EBI metagenomics analysis pipeline.

In coordination with MG-RAST, a US based metagenomics analysis pipeline, there are now two different large-scale metagenomics CWL workflows. Each uses a different CWL execution framework (namely Toil and AWE) and are run on different compute infrastructures. During the course of the coming year, the Marine Use Case expects META-pipe (the ELIXIR-NO, marine specific metagenomics pipeline) and other metagenomics related tools (e.g. ITS1 analysis from ELIXIR-IT) to adopt CWL. These additional tools can be used as alternatives for pre­existing tools or extend the functionality of the current workflows.

This Implementation Study aims to:

  1. demonstrate the benefits of using CWL by combining different workflows components to make new workflows;
  2. extend the current CWL workflows to enable greater reuse;
  3. enhance the execution frameworks to improve both deployment and scalability;
  4. deploy a single CWL workflow on different ELIXIR cloud environments to enable parallel processing and reproducibility.

To provide an exemplar to both the ELIXIR and the broader scientific communities, we will work through a community case study and ensure that the data, analysis and results conform to a bona fide Research Object (RO), ensuring that they comply with FAIR principles. We will develop appropriate training materials for two key target audiences - producers of (workflows and ROs) and consumers.

This study is closely linked with the work of the Bioschemas Community.

ELIXIR France, ELIXIR UK, EMBL-EBI, ELIXIR Finland
ELIXIR Germany

As data analysis is now common place in life sciences, we need to develop scalable ways to develop and share analysis workflows and train researchers to make use of them. The latter entails an end-to-end approach from access to data over selection and proper usage of the appropriate workflow and deploying this on available (cloud) resources.

The ELIXIR Communities bring together domain experts. This is an ideal way to identify and develop standard workflows for commonly used analyses in that specific domain. Since summer 2016 the Galaxy Training Network has been collecting and further developing training material for analysis in, development and administration of Galaxy in a collaborative way (https://training.galaxyproject.org).

This project has three main goals

  1. Expand the portfolio of Community workflows, including training material to describe them
  2. Facilitate access to data in Core Data Resources and Deposition Databases
  3. Improve the user experience of the Galaxy platform

Background

Galaxy is a workflow management system that 1) provides support for reproducible science, 2) facilitates sharing of data and results and 3) removes the need for users to compile and install tools. Galaxy offers a user-interface, through a web browser, in which virtually any command line tool can be integrated. This is done by defining the inputs, outputs and parameters in a wrapper script. As analyses usually consist of multiple steps, tools can be composed in workflows, which facilitates the processing of multiple samples and reproduction of analyses. Galaxy is available as a world-wide free-to-use online portal, following open-source policy development and can be freely downloaded for a local installation.

The Galaxy workflow system is extensively used as part of national infrastructures in several ELIXIR Nodes. Galaxy itself is considered an integral part of bioinformatics infrastructure by many bioinformatics researchers and core facility groups because it enables simplified access to data and analysis tools under a single “intuitive” interface. Education and training is an integral part of the Galaxy community. The Galaxy Training Network (GTN) are working since several years with Goblet and the ELIXIR Training Platform to enhance and deliver first-class training to the Scientific community - targeting not only scientists but also developers and admins. 

Goals

In this project we will engage with ELIXIR Communities that are not yet well represented in the Galaxy ecosystem. The aim is to deliver deployable workflows for commonly used analysis in these scientific domains. This encompasses the whole stack needed: availability of tools in Galaxy and exemplar workflows as well as access to data, both reference data and published research data.

In collaboration with the Training platform and the involved Community, we will do a gap analysis to identify which components of the stack are missing or can be improved. We will bring experts of the scientific discipline, training and Galaxy together in a hackathon to address this and develop training materials to document the developed workflows and enable trainings. This training material will be included in the Galaxy Training Network initiative https://training.galaxyproject.org (which is indexed by TeSS). As through these events we are bringing together established and potential new trainers, we will combine this with a Train the Trainer event.

We will also organise trainings using already existing and newly developed material, targeted towards researchers within this community. We also will make use of these events to assess the usability of Galaxy.

WP1 : Enable commonly used workflows for Communities

The bulk of the funding in this project will be allocated to organize events, bringing together experts of a scientific Community, Galaxy and the Training platform. We have selected five Communities (Plant, Metagenomics, Metabolomics , Proteomics and 3D BioInfo), for each one we will organize three events : a hackathon, a training for researchers and a Train-the-Trainer event. To reduce costs and travelling, we envision that these are organised co-located and back-to-back (per community).

The aim of the hackathon is to address (selected) issues that have been identified through a gap analysis, to enable researchers to perform standard analysis in Galaxy. This is done in preparation of the hackathon, in collaboration with all stakeholders involved. These issues can range from wrapping tools into Galaxy, making workflows, providing visualisation plugins, etc. Also access to data is in scope, in collaboration with WP2 of this proposal. To disseminate the work done, training materials will be developed using these developments.

We will ensure all developments are appropriately referenced in ELIXIR registries, building on the expertise available in ELIXIR, in the Tools and Interoperability Platform: tools will be added to bio.tools and, if applicable, containers will be made available in BioContainers, and workflows will be registered in MyExperiment. This will be done in alignment with and complementary to the development of this infrastructure in EOSC-Life (WP Tools Collaboratory).

We will combine this hackathon with a Train-the-Trainer event, building on the expertise of the Training Platform. This aims to improve the teaching skills of the trainers as well as make them more familiar with the Galaxy platform and how it can support trainers and training events.

WP2 : Access to data

Access to data is currently a major bottleneck for many users. In collaboration with data providers, we will incorporate ELIXIR Deposition Databases and confer with the communities what additional resources are of interest.

Access to data from UCSC has been integrated in Galaxy through dedicated additions to the web pages that allow searching this resource. Based on work started in the PheNoMeNal projects, a Galaxy tool is developed to communicate with MetaboLights. Dedicated tools can provide both the ability to retrieve as submit data to Deposition Databases. However, these current approaches are very labour intensive to scale to all ELIXIR Core Data Resources and Deposition Databases, as well as difficult to maintain.

The Omics Discovery Index (OmicsDI, http://omicsdi.org) provides an integrated metadata resource for 20 different databases with currently 450,000 datasets (October 2018), including four Elixir Core Data Resources (ArrayExpress, EGA, ENA, PRIDE), as well as two additional Elixir Deposition Databases (BioModels, MetaboLights). OmicsDI already provides web service access for search and metadata retrieval across all integrated resources. To facilitate data access in Galaxy workflows, we will add a method to OmicsDI to provide the direct data download URLs for any selected dataset. This method will be integrated into Galaxy workflows to automatically download and process relevant datasets selected based on standardised metadata criteria.  

The proposed integration of OmicsDI in Galaxy as a data source allows to enable access to datasets from different data sources through a common entrypoint. The data access method that will be developed in this project will be independent of Galaxy. This makes integration in other workflow systems or scripts possible, broadening the impact beyond the Galaxy community. We envision this work as a step towards a common way to (programmatically) access ELIXIR Core Data Resources and Deposition Databases, based on both keyword search as well as commonly used identifiers.

This aligns to the objectives of the ELIXIR Interoperability Platform and Galaxy Community. This will result in improved access to ELIXIR Core Data Resources and Deposition Databases for the whole life science community, as well as a specific integration in Galaxy. The work will be disseminated through e.g. usage in the developed training materials (WP1). In consultation with the ELIXIR Hub, a dedicated webinar can be organised.

ELIXIR Spain, ELIXIR Portugal, ELIXIR Germany, ELIXIR France, ELIXIR Netherlands, EMBL-EBI, ELIXIR UK, ELIXIR Belgium, ELIXIR Greece, ELIXIR Norway, ELIXIR Czech Republic

UseGalaxy.eu (https://usegalay.eu) is the main European Galaxy public instance, providing more than 2000 tools belonging to different bioinformatics research fields. Tens of smaller scale Galaxy instances are also available across Europe, while ELIXIR-IT is launching its first own Galaxy on-demand cloud service at the ReCaS Cloud and GARR Cloud facilities. It is therefore critical for ELIXIR Nodes to quickly build expertise in Galaxy servers administration and to continue developing the technological infrastructure needed to sustain the European Galaxy network.

The ELIXIR-IT "Laniakea@GARRCloud" program, currently ongoing, aims to deploy a Laniakea Galaxy on-demand service also at GARR Cloud, following the one already available at ReCaS Cloud, and the concurrent installation of a Pulsar endpoint to allow the execution of UseGalaxy.eu jobs on GARR Cloud, together with the deployment of a mirror (stratum 1) of the Galaxy CVMFS System server. Both UseGalaxy.eu and Pulsar network development and maintenance are driven by the Freiburg Galaxy Team (ELIXIR-DE).

This Staff Exchange's purpose is to allow "Laniakea@GARRCloud" developer and Laniakea lead developer to travel to Freiburg, where the Galaxy Europe Freiburg Team (ELIXIR-DE) is based, for a couple of months in order to quickly build expertise and receive advanced training on Pulsar, CVMFS and their deployment on Cloud infrastructures.

We expect that this Staff Exchange will foster a smoother and faster integration of ELIXIR-IT activities within the ELIXIR Galaxy community, establishing more direct links between the Italian and the German ELIXIR Nodes and allowing the creation of a robust and lasting collaboration between the two Nodes on Galaxy-related topics.

Blog post decribing the project: https://usegalaxy-eu.github.io/posts/2022/02/22/ELIXIR-staff-exchange/

ELIXIR Italy, ELIXIR Germany

This Implementation Study is to establish the Galaxy ELIXIR Community to promote exchanges between the several European initiatives around the framework Galaxy. 

  • Build a comprehensive community among the developers and administrators, and
  • Collaborate with trainers and other ELIXIR Communities and platforms.
  • To hire a dedicated Community Manager to facilitate interactions and dissemination of information.

To develop a network of Galaxy based on: 

  • national Galaxy instances already available or planned in the Nodes
  • best-practices
  • shared data and with a unified authentication

 This tudy will also identify technical developments that will help integration with ELIXIR Services (ISA-TAB and selected Core Data Resources) and improve visualisation infrastructure.

.

ELIXIR Belgium, ELIXIR France, ELIXIR Germany, ELIXIR Norway, ELIXIR Italy, ELIXIR Czech Republic, ELIXIR Netherlands, ELIXIR Spain, ELIXIR UK, ELIXIR Switzerland, ELIXIR Slovenia
ELIXIR Germany
ELIXIR Belgium, ELIXIR Germany, ELIXIR UK
ELIXIR Germany
ELIXIR Germany
ELIXIR France

Human data and translational research is a high priority for ELIXIR and builds on the progress made in the previous programmes by the Human Data Communities. Within the Science Tier of the ELIXIR 2024–2028 Programme, advances will be focussed on enabling researchers (including research clinicians) to use ELIXIR’s infrastructure, for human genomic, phenotypic, imaging and demographic data to support discovery, analysis, innovation and integration of research findings into the clinic and healthcare. More specifically, through these projects we will ensure that millions of human genomes are discoverable and exploited in a biomedical setting through ELIXIR-supported infrastructure and community-endorsed standards, software, workflows and analysis environments across ELIXIR Nodes. 

On Data Deposition:

  • FAIR-FEGA: Accelerating high quality FAIR data deposition in Federated EGA
  • FHDportal: Open National Submission and Access Portal for Federated Human Data

On Federated Data Analysis:

  • Leveraging federated learning and RO-Crates for human genomic data analysis and provenance tracking
  • Empowering Users: Orchestrating Sensitive Data Access for Interactive Federated Analysis in Virtual Research Environments 

On Linking Data:

  • FEGA-Connect: Linking European human multi-omic data deposition databases, biobanks and derived knowledge resources

Theme: Data Deposition

The Federated European Genome-Phenome Archive (FEGA) network is an ELIXIR-supported infrastructure for making human genomic data discoverable and accessible across ELIXIR Nodes. This project seeks to accelerate data depositions into FEGA, which will significantly increase the data flow in and from FEGA nodes. 

In alignment with the goals of the Human data and translational research Tier of the ELIXIR 2024–2028 programme, this project will promote seamless data integration and increase global researchers’ confidence in the data stored within FEGA, thus strengthening the network's position as a trusted resource for genomic data. It will build capacity within the FEGA Nodes and increase awareness among a wide range of stakeholders, thus altogether achieving the ultimate goal of enhancing data reuse. 

The project will be carried out by a strategic consortium comprising seven ELIXIR Nodes and two ELIXIR Communities. Partners represent four FEGA nodes at different levels of maturity, a member of the Cancer Data Community and both institutions managing Central EGA. The proposal is formulated around five timely coordinated tasks where all partners contribute their expertise to the final outcomes, converging in the deposition of several datasets to different nodes, testing the new tools and metadata model and blueprinting deposition of high-quality FAIR data in the future.

Nodes involved: ELIXIR Switzerland, ELIXIR Spain, ELIXIR France, ELIXIR Norway, ELIXIR Portugal, EMBL-EBI
Communities: Cancer Data, Federated Human Data

Theme: Data Deposition

Human data, especially genomic data, is increasingly being federated across borders and institutions, with many stakeholders participating in multinational and global biomedical and health data networks, fostering collaborations and partnerships. While such international efforts are essential for the compilation and reuse of data, regulatory constraints often hinder the movement of certain data beyond organisational or national boundaries. Centralised approaches such as the Central European Genome-Phenome Archive (CEGA) are valuable, but not all data can be centralised. 

The Federated European Genome-phenome Archive network (FEGA) addresses this, with early work concentrated on local collection of data with central archiving of metadata. FHDportal aims to support both federated and central submission of metadata. It will do this by providing a reusable portal for gathering and storing metadata at a national level, and submitting required metadata centrally to enable discovery of datasets via the CEGA. FHDportal complements the existing system by providing a way to explore richer metadata (for example, including detailed information on specific datasets or local funding information), while enabling a core set of metadata to be queried centrally. 

FHDportal will be deployed and tested on FEGA nodes, and should be of interest to the many other countries seeking to join FEGA. The need for FHDportal is based on experience during onboarding and in moving to production nodes. It will offer a common solution for local mobilisation of data and metadata, which can be adapted to local situations. During development, it will be tested on both new and well-established nodes using different technical platforms and infrastructures. The resulting software will be provided  to the whole community, and will hopefully become part of the emerging toolkit for new FEGA nodes wishing to establish themselves, and to ensure their nodes meet local needs while bringing European scale benefits. 

Nodes involved: ELIXIR Switzerland, ELIXIR Finland, ELIXIR Luxembourg, ELIXIR UK
Communities: Federated Human Data, Human Copy Number Variation

Theme: Federated Data Analysis

Federated analysis (FA) revolutionises genomics research by enabling collaborative analysis across distributed datasets, while safeguarding data privacy and facilitating comprehensive insights into genetic diseases. Federated access and analysis of human datasets is part of the ELIXIR scientific program. ELIXIR is also involved with the EUCAIM (European Cancer Imaging Initiative project, and coordinates the European Genomic Data Infrastructure (GDI) project, which aims to provide federated access to 1+M whole genome sequences (WGS). While the GDI project explores federated solutions to analyse its data, it does not foresee deploying FA solutions for evaluation. 

This project seeks to implement FA across four ELIXIR Nodes, using synthetic and real, publicly accessible Genome-Wide Association Studies (GWAS) data. To maximise the impact of this proposal, we plan to leverage the developments already made in the context of the EUCAIM project, specifically the orchestration solution around the Flower Framework and the ongoing developments in the FA, in the context of the Staff Exchange BRIDGE between ELIXIR and DCEG/NIH, where Yjs framework is the chosen solution.

We also aim to represent the analysis using RO-Crates to track the provenance of the analysis, following the Five Safes Framework. The proposal is built around ongoing collaborations on deploying and testing FA solutions for analysing sensitive data across different projects like GDI, EUCAIM, BY-COVID and TRE-FX. 

This project aims not only to boost this interaction using the Flower Framework for FA, but also to strengthen the connection to NIH/DCEG through dataset sharing and comparing different FA frameworks. All Nodes involved in this project are active members of the ELIXIR Human Data Communities, especially the Federated Human Data and Cancer Data ones. The outcomes derived from this project will be disseminated not only to these Communities but also to all ELIXIR projects where this topic is relevant.

Nodes involved: ELIXIR Belgium, ELIXIR Spain, ELIXIR France, ELIXIR Portugal, ELIXIR UK
Communities: Cancer Data, Federated Human Data

Theme: Federated Data Analysis 

Through the 1+Million Genomes (1+MG) initiative, Europe is scaling up efforts to build a shared framework and infrastructure to safely access and integrate clinical human data across borders, following regulatory efforts like the General Data Protection Regulation (GDPR) and the European Health Data Space (EHDS). These are pivotal in safeguarding sensitive information, while enabling authorised access for researchers, healthcare professionals and other actors. 

Integral to biomedical data security considerations are the European Genome-Phenome Archive (EGA), in both Central and Federated forms, recognised as the predominant European repository for the secure storage of pheno-clinical and genomics data. Mobilising data for secure analysis in Virtual Research Environments (VREs) remains challenging. Indeed, it is an active focus in ongoing projects like the European Genomic Data Infrastructure (GDI), EOSC-ENTRUST and EOSC4Cancer. 

Galaxy is a popular open-source, community-driven VRE for bioinformatics analysis that represents a unique platform for developing and testing novel strategies for data analysis. A prototyping strategy for the access and processing of sensitive data was demonstrated in a previous ELIXIR implementation study (2021–2023). By adopting GA4GH Crypt4GH encryption standard features, we enabled Galaxy users within Trusted Research Environments (TREs) to decrypt sensitive data for workflow execution without sharing private encryption keys. 

We propose expanding this prototype into a comprehensive solution for secure data analysis in Galaxy, facilitating encrypted data access and transfer from FEGA/EGA repositories to designated TREs, all interactively orchestrated by the users on a public Galaxy server. The proposed solution offers flexibility with different levels of enforced restrictions ranging from scenarios with no limitations on encrypted data transfer and storage, to fully federated analysis scenarios, where analysis occurs near the data. Most of the required infrastructure can also be deployed independent of Galaxy, simplifying the potential implementation of these concepts in other VREs.

Nodes involved: ELIXIR Belgium, ELIXIR Germany, ELIXIR Spain, ELIXIR Norway
Communities: Cancer Data, Federated Human Data, Galaxy

Theme: Linking Data

Today, research generates more data than ever, and a multitude of experimental data types. Such data types are often connected at source: perhaps generated from the same samples or as part of the same study. It is important that different data types are made available for re-use in a linked and coordinated manner, enabling full reuse of all the data in integrated analysis. Experimental data types are often siloed in varied specialised repositories, using different metadata models, so linking them is not straightforward. Also, data obtained from living humans is sensitive and shared under a controlled access model, adding an extra layer of complexity.

In this project, partners will establish a strong foundation for developing solutions to integrate multi-omic sensitive data effectively among FEGA nodes, biobanks and ELIXIR Core Data Resources such as PRIDE and GWAS Catalog. Five ELIXIR Nodes will be involved, as well as the Polish FEGA node (in-kind contribution) from two ELIXIR Communities (Federated Human Data and Proteomics), spanning three diverse data use cases to address the challenges of this open call.

The project will start by developing a comprehensive landscape analysis of current human data linkage challenges and solutions (Task 1). Based on this, concrete models and prototypes will be proposed to link sensitive proteomics data (Task 2), cohorts and biobank data (Task 3), and population cohort-derived data (Task 4) to genomics data. Results from tasks 2 to 4 will be used to improve the FEGA metadata model. The project will result in more coherent data deposition, discoverability and retrieval of multi-omics datasets, providing FAIRer data, and accelerating research. To facilitate broad engagement, the project will engage the ELIXIR Communities through dedicated online and in-person events, where both interim and final results of the project will be disseminated.

Nodes involved: ELIXIR Finland, ELIXIR Germany, ELIXIR Spain, ELIXIR Sweden, EMBL-EBI
Communities: Federated Human Data, Proteomics

ELIXIR Belgium, ELIXIR Finland, ELIXIR France, ELIXIR Germany, ELIXIR Luxembourg, ELIXIR Norway, ELIXIR Portugal, ELIXIR Spain, ELIXIR Sweden, ELIXIR Switzerland, ELIXIR UK, EMBL-EBI
ELIXIR Belgium, ELIXIR Czech Republic, ELIXIR Germany, ELIXIR Netherlands, ELIXIR UK
ELIXIR Belgium, ELIXIR Germany

This study will build on recent work with  Software containers which, as a key element in the frame of Open Science, Open Data & Open Source, is strongly supported and advocated by ELIXIR. Software containers are transversal to most of the strategic lines of the ELIXIR Tools platform for the 2019 - 2023 Scientific programme.

There will be three Work Packages: 

1. To maintain and extend the work initiated in the 2018 ELIXIR implementation study on Biocontainers. The implementation study on Biocontainers contributed to unify various initiatives in ELIXIR nodes around software containers and bring them under a common infrastructure which will be consolidated by incorporating new technologies for software containerisation and explore the federation of the platform to facilitate its sustainability in the long-term.

2. To implement the evolution of the ELIXIR tools platform ecosystem to create a central repository providing metadata rich, technology agnostic software containers for its use and deployment across sites and platforms. Initially this will integrate content from bio.tools, Biocontainers, OpenEBench and Galaxy, and in time facilitate the inclusion of new data and metadata producers (e.g. bioconda, bioconductor, etc) and/or new data and metadata consumers (e.g. GA4GH TRS, MyExperiment, etc). 

3. Engaging with existing and newly created community of users (within ELIXIR and without) who are of the utmost importance to guarantee that whatever standard and/or technology responds to users needs. Software containers will play an important role here to ensure users can benefit from the ongoing efforts in the evolved tools platform ecosystem and with other ELIXIR platforms such as Training, Interoperability and/or Compute.

This study will provide containerised tools and state-of-the-art benchmarked workflows available in Galaxy for scientific communities. For long-term sustainability and impact, we will ensure that all workflows and tools are curated to a high standard, rendered FAIR, and follow agreed standards within ELIXIR and by initiatives like GA4GH and EOSC.

ELIXIR Belgium, ELIXIR France, ELIXIR Italy, ELIXIR Norway, ELIXIR Spain, EMBL-EBI, ELIXIR Germany, ELIXIR Spain, ELIXIR Denmark

This study will build on recent work with  Software containers which, as a key element in the frame of Open Science, Open Data & Open Source, is strongly supported and advocated by ELIXIR. Software containers are transversal to most of the strategic lines of the ELIXIR Tools platform for the 2019 - 2023 Scientific programme.

There will be three Work Packages: 

1. To maintain and extend the work initiated in the 2018 ELIXIR implementation study on Biocontainers. The implementation study on Biocontainers contributed to unify various initiatives in ELIXIR nodes around software containers and bring them under a common infrastructure which will be consolidated by incorporating new technologies for software containerisation and explore the federation of the platform to facilitate its sustainability in the long-term.

2. To implement the evolution of the ELIXIR tools platform ecosystem to create a central repository providing metadata rich, technology agnostic software containers for its use and deployment across sites and platforms. Initially this will integrate content from bio.tools, Biocontainers, OpenEBench and Galaxy, and in time facilitate the inclusion of new data and metadata producers (e.g. bioconda, bioconductor, etc) and/or new data and metadata consumers (e.g. GA4GH TRS, MyExperiment, etc). 

3. Engaging with existing and newly created community of users (within ELIXIR and without) who are of the utmost importance to guarantee that whatever standard and/or technology responds to users needs. Software containers will play an important role here to ensure users can benefit from the ongoing efforts in the evolved tools platform ecosystem and with other ELIXIR platforms such as Training, Interoperability and/or Compute.

This study will provide containerised tools and state-of-the-art benchmarked workflows available in Galaxy for scientific communities. For long-term sustainability and impact, we will ensure that all workflows and tools are curated to a high standard, rendered FAIR, and follow agreed standards within ELIXIR and by initiatives like GA4GH and EOSC.

ELIXIR Belgium, ELIXIR France, ELIXIR Italy, ELIXIR Norway, ELIXIR Spain, EMBL-EBI, ELIXIR Germany, ELIXIR Spain, ELIXIR Denmark

This proposal focuses on the enhancement of Galaxy's data management features to provide additional provenance information and improve the integration of Galaxy in the existing data management ecosystem. We will leverage existing technologies and services in ELIXIR and complement ongoing international projects (ELIXIR-CONVERGE, the COVID-19 Data portal, EOSC Life, etc.) while building on national initiatives (German NFDI, ELIXIR Belgium strategy, UK BioFAIR, etc.).

Among the goals, we aim to make the Galaxy Data Libraries more scalable and further improve the reusability features of the platform by metadata enrichment. The Galaxy metadata system will be extended to enable the export of analysis records together with their provenance to relevant ELIXIR Core Data Resources and registries (e.g. WorkflowHub).

A strong emphasis will be on the integration of EGA, FAIRtracks, and the GA4GH Beacon network into Galaxy to support analyses of human data. Therefore, support for user-level encrypted data processing will also be added to allow for the analysis of sensitive data. To this end, we will include an encryption layer into the Pulsar network and enhance performance by increasing the data locality of distributed Galaxy analyses through a prototype data caching network.

These data management-related features and improvements aim to tackle concrete current worldwide needs, like the ones related to COVID-19 (meta-)analyses. The Galaxy Community has demonstrated the ability to sustain a fast rollout of novel fit-for-purpose features for the needs of European researchers, a trend we intend to continue with this proposal.

ELIXIR Belgium, ELIXIR Switzerland, ELIXIR Czech Republic, ELIXIR Germany, ELIXIR Spain, ELIXIR France, ELIXIR Israel, ELIXIR Italy, ELIXIR Netherlands, ELIXIR Norway, ELIXIR Slovenia, ELIXIR UK

This collaborative project focuses on two issues related to bioinformatics.

The first issue concerns the availability of the virAnnot pipeline (doi:10.1094/PBIOMES-07-19-0037-A) that was developed in the Virology team (INRAE UMR 1332) for the CATI BARIC community and the collaborators of the Virology team. This pipeline, intended for everyone, is however only used by bioinformaticians as it can only be used on command line.

The first objective of this project is thus the integration of this pipeline in the GALAXY environment which is widely used by the scientific community for the analysis of HTS data in viral metagenomics. The Bioinformatics group of Wageningen University is a major pole of the European programme for the provision of bioinformatics resources in life sciences and is involved in training under Galaxy. It will be possible to follow the training given on site and interact with the trainers.

At the end of the mission, the pipeline will be implemented in Galaxy and will be included in the analysis offer of the Galaxy platform hosted on the Genouest cluster in Rennes, France, which is part of the French Institute of Bioinformatics (IFB), the French Node of ELIXIR, making viral metagenomics analyses easily accessible to the CATI BARIC community and to all users of the GALAXY environment.

The second issue concerns the accurate identification of QTL regions. Indeed, a QTL region covers a rather large gene region with many genes, so it is difficult to find the real causal gene. This is often done using experimental fine mapping, but this method is laborious.

The second objective of this project is therefore to develop a method to prioritize the genes of a QTL region in order to identify more precisely the causal gene affecting the trait. An in silico method will be developed to improve the resolution of a QTL analysis. This method will be based on prior knowledge from literature/databases via semantic web technologies and will take the form of a data query tool.

The developments of Dr. Harm Nijveen's workgroup are based on Semantic Web and Linked Open Data (LOD) technologies. The tool developments will be based on two existing platforms, WormQTL2 (doi:10.1093/database/baz149) and AraQTL (doi:10.1111/tpj.13457), populated with data on C. elegans and A. thaliana. These databases can then be used and enriched with data on fruit trees, which are present in the Virology team.

This work will enhance interactions between the Virology team and the Bioinformatics group as well as strengthening skills around the semantic web and data interoperability for both partners.

ELIXIR France, ELIXIR Netherlands

Bioinformatics analysis typically involves a large number of software and reference data, making the installation process a time-consuming task. This problem is aggravated in a course setting, where every participant needs to have an identical installation, sufficient hardware to run it, and, ideally, access to an identical set-up after the course.

Ready-to-run virtual machine (VM) images containing an operating system and pre-installed analysis software, as well as containers, are gaining momentum in bioinformatics. These images and containers can be run on cloud platforms, which allows easy scaling for running tens or hundreds of simultaneous jobs in a course setting.

Several ELIXIR Nodes already provide cloud resources for national use. In order to enable also other Nodes to use cloud for training, it is necessary to investigate which of these cloud providers would be willing to provide cloud resources for international use in a sustainable manner.

As cloud, VMs and containers are new topics to many bioinformatics trainers, it is important that technical help is available. ELIXIR needs to have a streamlined process for requesting cloud resources and technical help, so that a suitable cloud is found promptly for a course and there is a clear mechanism for reimbursing the technical personnel and computing resource costs for the provider.

This study is now complete, the final report and other documents will be added as they are available. 

ELIXIR Finland, ELIXIR Netherlands, ELIXIR Switzerland, ELIXIR France, ELIXIR UK, ELIXIR Belgium, ELIXIR Spain, ELIXIR Slovenia, ELIXIR Germany

Bioinformatics analysis typically involves a large number of software and reference data, making the installation process a time-consuming task. This problem is aggravated in a course setting, where every participant needs to have an identical installation, sufficient hardware to run it, and, ideally, access to an identical set-up after the course.

Ready-to-run virtual machine (VM) images containing an operating system and pre-installed analysis software, as well as containers, are gaining momentum in bioinformatics. These images and containers can be run on cloud platforms, which allows easy scaling for running tens or hundreds of simultaneous jobs in a course setting.

Several ELIXIR Nodes already provide cloud resources for national use. In order to enable also other Nodes to use cloud for training, it is necessary to investigate which of these cloud providers would be willing to provide cloud resources for international use in a sustainable manner.

As cloud, VMs and containers are new topics to many bioinformatics trainers, it is important that technical help is available. ELIXIR needs to have a streamlined process for requesting cloud resources and technical help, so that a suitable cloud is found promptly for a course and there is a clear mechanism for reimbursing the technical personnel and computing resource costs for the provider.

This study is now complete, the final report and other documents will be added as they are available. 

ELIXIR Finland, ELIXIR Netherlands, ELIXIR Switzerland, ELIXIR France, ELIXIR UK, ELIXIR Belgium, ELIXIR Spain, ELIXIR Slovenia, ELIXIR Germany