Description
This project aims to benchmark specific Bioconda packages that have been built for Arm64 using the nf-core-arm-discovery repository (GitHub link). The PhD candidate will utilize public genomic datasets from databases such as NCBI, select appropriate datasets, and execute bioinformatics workflows on Arm-based infrastructure. The candidate will evaluate the performance, compatibility, and efficiency of these packages, document errors and failures, and investigate the reasons behind package build failures. The final deliverable will be a detailed report with performance metrics, identified issues, and recommended improvements to enhance package support on Arm64.
Deliverables:
Selection and justification of public genomic datasets.
Execution of bioinformatics workflows using Bioconda packages on Arm64.
Performance benchmarking and comparison with x86 architectures.
Documentation of failed package builds and proposed fixes.
Comprehensive report with results, analysis, and recommendations.
Hardware / Software Requirements
Languages: Python, Bash, Nextflow
Tooling: nf-core pipelines, Conda, Docker/Singularity, Snakemake
Hardware: Access to Arm64-based cloud instances (e.g., AWS Graviton) with plenty of memory and storage
IP Access: Public genomic databases (NCBI, ENA, etc.), Bioconda repository
Resources
AWS Graviton documentation
Benefits
-
Standout projects could be internally referred for relevant positions at Arm!
-
If your submission is approved, you will receive a recognised badge that you can list on your CV and shared on LinkedIn. A great way to stand out from the crowd!
-
Problem-Solving Experience: Opportunity to debug and optimize bioinformatics software for emerging computing architectures.
-
Industry Relevance: Hands-on experience with Arm-based architectures, applicable to genomics research and cloud computing.