Strategic Development of DOME Recommendation for Machine learning Focus Group

Machine Learning (ML) enables computers to assist humans in making sense of large and complex data sets. With the fall in the cost of high-throughput technologies, large amounts of omics data are being generated and made accessible to researchers. Analysing these complex high-volume data is not trivial, and the use of classical statistics cannot explore their full potential.

Tools integration under Galaxy and tools development for the integration and querying of heterogeneous QTL data

This collaborative project focuses on two issues related to bioinformatics.

The first issue concerns the availability of the virAnnot pipeline (doi:10.1094/PBIOMES-07-19-0037-A) that was developed in the Virology team (INRAE UMR 1332) for the CATI BARIC community and the collaborators of the Virology team. This pipeline, intended for everyone, is however only used by bioinformaticians as it can only be used on command line.

ELIXIR Portugal as a case-study for the deployment of Local EGA/Beacon v2 instances

The European Genome-phenome Archive (EGA) is a repository for all types of sequence and genotype
experiments, including case-control, population, and family studies. The EGA will serve as a permanent archive
that will archive several levels of data including the raw data (which could, for example, be re-analysed in the
future by other algorithms) as well as the genotype calls provided by the submitters. In spite of EGA accepting
data from all Europe, due to regulations over data and other constraints, it is desirable that ELIXIR Nodes deploy

Towards Analysis of SMiLE-seq raw data with the ultimate goal of identification of binding sites of the poorly characterized transcription factors

SMiLE-seq is a new effective experimental method for transcription factor (TF) binding site sequence inference. Still, some TFs are challenging to analyze. We hope to improve the method by using modern statistical and deep learning approaches in both experiment design and the subsequent data analysis.

Deliverables:

  • a tool for inferring binding motifs that cover the sequence space representatively
  • GUI for analysis and analysis improvement
  • “denoisifier” – a tool to use prior to the HMM-based analysis

Milestones:

Open access tools for effective management of ELIXIR Nodes based on collaborative work developed in ELIXIR-CONVERGE, RITRAIN, RItrainPlus and EMMRI

The OATEN staff exchange aims at familiarising Nodes with open access tools for effective ELIXIR Node management. Several workshops are planned for the spring, together with the CONVERGE WP2 ELITMa program, covering strategic management, financial management and project management.