forked from elixir-europe/biohackathon-projects-2021
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Martin Cook
authored
Jul 22, 2021
1 parent
6631eb4
commit fe50c59
Showing
38 changed files
with
1,822 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Project 1: Improve BioHackrXiv | ||
|
||
## Abstract | ||
|
||
BioHackrXiv is a low-threshold citable publishing platform for | ||
biohackathon projects which is (now) listed on PMC. | ||
https://biohackrxiv.org/ is a scholarly publication service for | ||
biohackathons and codefests where papers are generated from markdown | ||
templates where the header is a YAML/JSON record that includes the | ||
title, authors, affiliations and tags. As part of the Elixir | ||
Biohackathon 2020 we created a metadata resource for BioHackrXiv, a | ||
prepublishing service hosted on OSF.io that allows for citable | ||
Biohackathon reports using unique identifiers. We added metadata in RDF with information on the biohackathons, papers, repositories, contributors and tags. This | ||
metadata can be expanded and that can easily be done by modifying the | ||
source code in the online github repository. During the ELIXIR | ||
biohackathon 2021 we'll add functionality building on the work we did | ||
at the 2020 online ELIXIR biohackathon. | ||
|
||
## Topics | ||
|
||
Tools Platform | ||
|
||
**Project Number:** 1 | ||
|
||
|
||
|
||
**EasyChair Number:** 2 | ||
|
||
## Team | ||
|
||
### Lead(s) | ||
|
||
Pjotr Prins [email protected] | ||
|
||
## Expected outcomes | ||
|
||
Enhanced submission and search for BioHackrXiv. | ||
|
||
## Expected audience | ||
|
||
2 | ||
|
||
**Number of expected hacking days**: 4 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# Project 10: Development of training modules for Gallantries | ||
|
||
## Abstract | ||
|
||
The Gallantries project is a collaboration between five European universities, members of Software Carpentry, and members of the Galaxy Project. Goal of the project is to increase bioinformatics and core data analysis skills in the field of life sciences across Europe. | ||
A pilot effort in 2019, started during Biohackathon in 2018, developed Hybrid Training: broadcast of a single instructor to learners in distributed classrooms with on-site helpers. This significantly improved the scalability and decreased the environmental impact of having instructors travel around Europe. With COVID-19 pandemic, hybrid and/or fully virtual training events have become the norm. To support this teaching format, training materials must be adapted, and instructors need to be trained. | ||
Main focus of the BioHackathon project will be to discuss, and create draft for training modules on microbial analysis, machine-learning, and Train-the-Trainer (TtT), specifically tailored to fit a remote/hybrid training format. Specifically, the project will include the following activities: | ||
- Review existing ELIXIR TrP and GTN TtT material | ||
- Define a learning plan and rough draft of the 3 modules | ||
- Implement a template repository that can be readily re-used to easily create a course website for organizers of Galaxy-based training events (based on the GTN Smorgasbord event) | ||
|
||
## Topics | ||
|
||
Galaxy | ||
Machine learning | ||
Training Platform | ||
|
||
**Project Number:** 10 | ||
|
||
|
||
|
||
**EasyChair Number:** 15 | ||
|
||
## Team | ||
|
||
### Lead(s) | ||
|
||
Bérénice Batut <[email protected]> | ||
|
||
## Expected outcomes | ||
|
||
- Day 1: Review existing ELIXIR TrP and GTN TtT material | ||
- Day 2-4: Define a learning plan and rough draft of the 3 modules | ||
- Day 4: Implement a template repository that can be readily re-used to easily create a course website for organizers of Galaxy-based training events (based on the GTN Smorgasbord event) | ||
|
||
## Expected audience | ||
|
||
researchers with knowledge in training development, microbial data analysis, machine learning and Train the Trainer | ||
Some expected people: | ||
- Bérénice Batut (NL) | ||
- Anthony Bretaudeau (FR) | ||
- Coline Royaux (FR) | ||
- Fotis Psomoupolos (GR) | ||
- Saskia Hiltemann (NL) | ||
- Helena Rasche (NL) | ||
|
||
**Number of expected hacking days**: 4 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# Project 11: Improve FAIR sharing for workflow systems using WorkflowHub and RO-Crate | ||
|
||
## Abstract | ||
|
||
WorkflowHub.eu is being established as a global workflow-language agnostic registry of life science computational workflows [https://doi.org/10.5281/zenodo.4605654]. Pre-launch in early 2020 was accelerated by the COVID-19 Biohackathon, along with close collaboration with EOSC-Life research infrastructure, and the registry now in public beta has expanded to support workflows across more than 30 different research groups and initiatives. We have co-evolved the community-developed data packaging standard RO-Crate [https://w3id.org/ro/crate/] and Bioschemas to support exchange and registration of complex workflows along with rich metadata and provenance, as well as their test and execution details with the LifeMonitor. | ||
In this hackathon we aim to expand RO-Crate integration with other workflow systems such as Nextflow and Snakemake, in order to obtain what we have already achieved with Galaxy and CWL, and to expand the collection of workflows registered in WorkflowHub by means of collaborating with repository managers like nf-core [https://nf-co.re/] and Australian BioCommons [https://www.biocommons.org.au/] and helping individual users during the workflow integration process into RO-Crate packages and Workflowhub entries. | ||
We will also be building a tighter integration with the Tools platform for detection of bio.tools and Bioconda/BioContainer usage within registered workflows, adding reverse registration for related workflows. | ||
|
||
## Topics | ||
|
||
Bioschemas | ||
Compute Platfrom | ||
Containers | ||
EOSC-life | ||
GA4GH partnership | ||
Galaxy | ||
Interoperability Platform | ||
Tools Platform | ||
|
||
**Project Number:** 11 | ||
|
||
|
||
|
||
**EasyChair Number:** 16 | ||
|
||
## Team | ||
|
||
### Lead(s) | ||
|
||
Ignacio Eguinoa ([email protected]) | ||
|
||
## Expected outcomes | ||
|
||
New prototypes for workflow engine integrations with WorkflowHub, for instance Snakemake, Nextflow | ||
Matured previous prototypes for integration, (Galaxy, CWL) | ||
Draft of shared metadata model for workflow repositories (WorkflowHub, nf-core, Dockstore, Bioschemas) | ||
Extend the collection and diversity of workflows entries registered at WorkflowHub. | ||
|
||
## Expected audience | ||
|
||
Workflow users (e.g. CWL, Nextflow, Galaxy, Snakemake) | ||
Workflow engine developers | ||
Platform developers (e.g. bio.tools) | ||
Tool maintainers/packagers | ||
Metadata/ontology experts (e.g. Bioschemas) | ||
Python developers (to extend RO-Crate manipulation tools) | ||
Ruby developers (WorkflowHub backend) | ||
|
||
**Number of expected hacking days**: 4 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Project 12: Join the dots: Making sense out of biodiversity data with a human focus | ||
|
||
## Abstract | ||
|
||
People’s identities are one of the most solid, indivisible entities within biodiversity data. People collect, observe, identify, experiment and publish. People are also idiosyncratic. Their interests and therefore their scientific data are like a fingerprint, unique to them. Yet, it is no secret that biodiversity data are full of errors. Can we use the idiosyncraticities of people’s data to find these errors and correct them? During the Biohackathon 2021 we will use people’s biodiversity observation data and connect those data to their biographies and other research outputs through Wikidata. We will particularly characterize their spatial patterns of observing with the intention to identify outliers. For example, we can identify errors where a person is purported to be in two places at the same time. We can further extend this by calculating the properties of a person’s observing patterns. People observe in very different ways depending on the target species, the landscape, and their own preferences and abilities. Improved models of the data collection process would further help us disentangle the artifacts generated from the data collection process from the biological patterns we are trying to determine. | ||
|
||
## Topics | ||
|
||
Biodiversity | ||
|
||
**Project Number:** 12 | ||
|
||
|
||
|
||
**EasyChair Number:** 19 | ||
|
||
## Team | ||
|
||
### Lead(s) | ||
|
||
Quentin Groom, [email protected] | ||
|
||
## Expected outcomes | ||
|
||
Software that will be able to take data from an online database with person and geographic information (GBIF, iNaturalist, ENA) and generate maps, spatial statistics and outliers from those data. | ||
|
||
## Expected audience | ||
|
||
Python and of R programing | ||
GIS knowledge | ||
PostGreSQL with PostGIS | ||
Knowledge of GBIF and other biodiversity observation data | ||
Experience with Wikidata (i.e. SPARQL knowledge) | ||
Experience with spatial statistics | ||
|
||
**Number of expected hacking days**: 4 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Project 13: Integration of visualisation tools for disease mechanisms and their annotations | ||
|
||
## Abstract | ||
|
||
We will integrate resources enabling the visual exploration of the mechanisms of diseases across different levels - gene/protein annotation, protein-protein interaction, pathways and genomic variation. Disease maps (disease-maps.org) provide a standardised, diagrammatic way to encode mechanisms of human diseases (https://biohackrxiv.org/gmbjv/), with COVID-19 as a prime example (https://fairdomhub.org/projects/190). We aim to integrate these maps with data from the recently developed UniProt Alzheimer’s disease portal and COVID-19 platform (https://diseases.uniprot.org, https://covid-19.uniprot.org). | ||
|
||
We will work with UniProt and the MINERVA Platform (minerva-web.lcsb.uni.lu), ELIXIR resources which we have already started to bring together (https://github.com/xwatkins/disease-map-portal). In this project, we will use the Nightingale library (https://ebi-webcomponents.github.io/nightingale/#/), a suite of standardised modular data visualisation components, including the protein feature annotation viewer ProtVista, a protein interaction visualisation and a 3D viewer Mol* (https://molstar.org). We will embed diagrams visualised by MINERVA with corresponding protein-level visualisations, and explore the sequence annotation visualisation to MINERVA via its plugin architecture. Finally, this will allow us to define standards for the data exchange for Nightingale components, to make them easily usable by other ELIXIR resources. | ||
|
||
## Topics | ||
|
||
Covid-19 | ||
Data Platform | ||
Interoperability Platform | ||
Rare Disease | ||
Tools Platform | ||
|
||
**Project Number:** 13 | ||
|
||
|
||
|
||
**EasyChair Number:** 20 | ||
|
||
## Team | ||
|
||
### Lead(s) | ||
|
||
Xavier Watkins, [email protected], corresponding author | ||
Marek Ostaszewski, [email protected] | ||
|
||
## Expected outcomes | ||
|
||
- A MINERVA-based disease map visualisation embedded in the Disease Maps portal (during the BioHackathon) | ||
- Protein structure visualisation coupled to the visualised diagram (during the BioHackathon) | ||
- Protein sequence visualisation for the MINERVA Platform as a plugin (draft: during the BioHackathon, stable version: 3 months later) | ||
- Standardised representation of information exchange between the components (draft: during the BioHackathon, stable version: 6 months later) | ||
|
||
## Expected audience | ||
|
||
We invite participants in the knowledge of: | ||
- web development, in particular JavaScript/TypeScript | ||
- graphics design and/or visual analytics | ||
|
||
We expect to have at least six people participating from ELIXIR-Hub (3), ELIXIR-LU (2) and ELIXIR-CZ (1). | ||
|
||
**Number of expected hacking days**: 4 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# Project 14: Nightingale – modular web components for visualisation of biological data | ||
|
||
## Abstract | ||
|
||
Nightingale is a set of open-source, reusable, composable, and extendable components to visualise biological data and implemented following web standards. Their main focus is visualising biological sequences information. | ||
|
||
Nightingale was born out of the visualisation needs of UniProt, InterPro, and PDBe (all being ELIXIR Core Data Resources) and expanded to accommodate the requirements of other resources such as Open Targets. The Biohackathon would provide a fertile ground to collect new use cases, and would allow us to improve the definition of our component APIs adhering to ubiquitous web standards so they can be used by the broader community. | ||
|
||
While the aim of the Nightingale project is to cover the data visualisation aspect, the definition of common standards throughout the life science ecosystem is already a task that the BioSchemas community is undertaking. Therefore, a current Nightingale goal is to make this set of components consume data marked up with the Protein profile of BioSchemas using existing scrappers like BMUSE. | ||
The open source nature of the project makes it challenging to find common time and space for growth, but the Biohackathon has previously provided that space for Nightingale and we hope to return in 2021 and improve Nightingale even further. | ||
|
||
## Topics | ||
|
||
Bioschemas | ||
Data Platform | ||
Interoperability Platform | ||
Proteomics | ||
Tools Platform | ||
|
||
**Project Number:** 14 | ||
|
||
|
||
|
||
**EasyChair Number:** 21 | ||
|
||
## Team | ||
|
||
### Lead(s) | ||
|
||
Aurélien Luciani [email protected] | ||
|
||
## Expected outcomes | ||
|
||
- Better interoperability using/consuming FAIR resources | ||
- Make the library more FAIR | ||
- Discover new requirements from community, provide better onboarding to the tool | ||
- Feed back to community about standards and visualisation libraries | ||
|
||
## Expected audience | ||
|
||
- web developers (throughout project. for development) | ||
- data visualisation users (defining requirements) | ||
- resource owners (defining data sources for visualisation) | ||
- input from BMUSE developers to use the tool | ||
|
||
**Number of expected hacking days**: 4 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Project 15: CAB2: A step towards Biodiversity data enrichment | ||
|
||
## Abstract | ||
|
||
Linking molecular data to taxonomic names and their extensive taxonomic treatments represents a fundamental component in biodiversity assessment. Voucher specimens for sequenced data can be the key nodes to make these connections. During Biohackathon 2020, several projects investigated how sequence (meta)data could be retrieved from ENA and connected to taxonomic treatment or specimen databases like TreatmentBank and GBIF. | ||
|
||
With this proposal, we aim to link more voucher specimens to sequences by applying machine learning techniques to specimen images, retrieving sequencing metadata physically on the specimen that can facilitate and maximize the linking process. We will then employ these metadata to improve the ENA linking process, allowing wider data discovery and enhancement. We also aim to develop a standard module to compare ENA, GBIF, and TB geographical data related to specific taxa and return the results in an interactive data exploration dashboard. The improvements will also address the gap-filling of gene names embedded in scientific papers relative to the accession numbers. | ||
|
||
Results obtained in this project will reflect the importance of integrating different data sources in order to deliver consistent and complete biodiversity data to the scientific community and feed into European biodiversity projects such as Bioscan, BiCIKL and ERGA. | ||
|
||
## Topics | ||
|
||
Biodiversity | ||
Data Platform | ||
Interoperability Platform | ||
Machine learning | ||
Plant Sciences | ||
|
||
**Project Number:** 15 | ||
|
||
|
||
|
||
**EasyChair Number:** 22 | ||
|
||
## Team | ||
|
||
### Lead(s) | ||
|
||
Mathias Dillen [email protected] | ||
Bachir Balech [email protected] | ||
|
||
## Expected outcomes | ||
|
||
An adaptable workflow which finds sequenced specimens, captures sequencing data and uses this information to find the sequences. | ||
Voucher specimen records with explicit connections to DNA sequence records. | ||
Publication in BioHackRxiv. | ||
|
||
## Expected audience | ||
|
||
Participants: | ||
Maarten Trekels | ||
Steven Verstockt | ||
Sofie Meeus | ||
Kenzo Milleville | ||
Krishna Kumar Thirukokaranam Chandrasekar | ||
Bachir Balech | ||
Donat Agosti | ||
Alberto Brusati | ||
Anna Sandionigi | ||
Dario Pescini | ||
Marcus Guidoti | ||
|
||
Skillsets: | ||
sequence and specimen databases | ||
image analysis | ||
text detection (OCR, HTR) | ||
text mining and matching | ||
scientific literature mining | ||
|
||
**Number of expected hacking days**: 4 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# Project 16: Wine (Ontology) Tasting: testing technicality in practicality for the food industry | ||
|
||
## Abstract | ||
|
||
Data mining with machine learning is an urgency for the study of dietary intolerance and allergies, especially when accurate labeling is required by laws. Large-scale databasing also introduces demands for structuring information of manufacturer’s data. Lack of semantic implementation to structure food data is a resolvable challenge that can be overcome by using existing technologies equipped with knowledge dissemination to aid the food-science research and advanced databasing. As use cases emerge in the industry domain and food allergy studies to exploit ontologies such as the Wine Ontology in data classification, an exploration of semantics capability of related ontologies including the Wine Ontology, and Food Ontology (FoodOn) is proposed as a proof-of-concept in this BioHackathon topic. The plan for this topic is as follows | ||
|
||
1) Technical tutorial to examine the two existing wine ontologies for a pragmatic reuse | ||
|
||
2) Wine tasting session(s) with an expert | ||
|
||
3) Application testing: curation of a donated test dataset from a wine merchant, | ||
Alignment of WineOntologies and FoodOntology | ||
|
||
This proof-of-concept can be extended to cover the other areas of food science such as cultures for fermented products, and food allergy studies. This is of relevance to the ELIXIR Food & Nutrition Community, and its industry partners. | ||
|
||
## Topics | ||
|
||
Data Platform, | ||
industry, | ||
Interoperability Platform, | ||
Machine learning | ||
|
||
**Project Number:** 16 | ||
|
||
|
||
|
||
**EasyChair Number:** 24 | ||
|
||
## Team | ||
|
||
### Lead(s) | ||
|
||
Sirarat Sarntivijai, [email protected] | ||
|
||
## Expected outcomes | ||
|
||
1) A summary of findings - Pros & Cons of each wine ontology examined and their capability of data integration and interoperability. | ||
|
||
2) The test dataset that is curated and mapped to existing wine ontology (-ies) with a follow-up plan of technical implementation at the wine merchant’s database. | ||
|
||
3) A report of interoperability between the wine ontologies and Food Ontology for reuse of FoodOn as an application ontology in the practice. | ||
|
||
## Expected audience | ||
|
||
Participants of this topic are projected to include the ontology expert topic leads, and any members who are interested in learning ontologies through hands-on tutorial exploring the wine ontologies. | ||
|
||
**Number of expected hacking days**: 4 | ||
|
Oops, something went wrong.