Description
Why this is important?
Bioconda is a specialized package repository for bioinformatics and genomics. Since 2020, there has been notable growth in multi-core Arm-based laptops and desktops running Linux and MacOS. In the coming years, Arm anticipates an increase in available OEM (original equipment manufacturer) devices. These machines facilitate the execution of computationally intensive bioinformatics and statistics tasks locally. Potential downstream applications include faster, more affordable diagnoses that can be conducted closer to hospital patients, exemplified by the pilot ROBIN software. While many leading Bioconda packages now support Linux/Arm, there still emulated components that can be the bottleneck.
Project summary
This project aims to benchmark specific Bioconda packages that have been built for Arm64 using the nf-core-arm-discovery repository. The participant will utilize public genomic datasets from databases such as NCBI, select appropriate datasets, and execute bioinformatics workflows on Arm-based infrastructure. The candidate will evaluate the performance, compatibility, and efficiency of these packages, document errors and failures, and investigate the reasons behind package build failures. The final deliverable will be a detailed report with performance metrics, identified issues, and recommended improvements to enhance package support on Arm64.
The deliverables of the project are as follows:
- Selection and justification of public genomic datasets.
- Execution of bioinformatics workflows using Bioconda packages on Arm64.
- Performance benchmarking and comparison with x86 architectures.
- Documentation of failed package builds and proposed fixes.
- Comprehensive report with results, analysis, and recommendations.
Prequisites
- Intermediate understanding of Python, Bash and nextflow
- Basic experience with nf-core pipelines, Conda, Docker/Singularity, Snakemake
- Access to Arm64-based cloud instances (e.g., AWS Graviton) with plenty of memory and storage
- IP access to Public genomic databases (NCBI, ENA, etc.)
Resources from Arm and our partners
-
External Documentation: nf-core documentation
-
External Documentation: AWS Graviton documentation
-
Repository: Arm64 nf-core pipelines
-
Repository: Bioconda package repository
-
Dataset: NCBI Datasets
Support Level
This project is designed to be self-serve but comes with opportunity of some community support from Arm Ambassadors, who are part of the Arm Developer program. If you are not already part of our program, click here to join.
Benefits
Standout project contributions will result in preferential internal referrals to Arm Talent Acquisition (with digital badges for CV building). And we are currently discussing with national agencies the potential for funding streams for Arm Developer Labs projects, which would flow to you, not us.
To receive the benefits, you must show us your project through our online form. Please do not include any confidential information in your contribution. Additionally if you are affiliated with an academic institution, please ensure you have the right to share your material.