mGWAS: reimplementation of treeWAS algorithms in Python
Fabian Gribi
Microbial genome-wide association studies (mGWAS) require specialized computational tools for data analysis. One such tool is Scoary2, developed by Thomas Roder during his PhD at the IBU. Scoary2 sgnificantly improves runtime when compared to the original Scoary. Moreover it includes a data exploration app, which, combined with the improved performance, allows Scoary2 to be applied to datasets with many phenotypes. However, Scoary2 inherits certain issues from its predecessor, such as the lack of support for numeric data and the computationally expensive permutation test for post-hoc pairwise comparisons, which must be applied to every phenotype. Another algorithm, treeWAS, does not have these issues, but is implemented in R. My goal is therefore to reimplement the treeWAS algorithm in Python and to possibly incorporate it into Scoary2.
Integrating KeggMapWizard into a Public RNAseq Pipeline
Fammi Maria Parokkaran
The RNAseq pipeline is a fundamental tool for analyzing gene expression data, typically visualizing which genes are over or underexpressed using static KEGG pathway maps. However, static maps limit user interactivity and data exploration.
Aparna Pandey, has developed KeggMapWizard, a software that converts static KEGG pathway maps into interactive SVGs. These interactive maps can dynamically display relevant RNAseq data, such as p-values and fold-change values, provide links to more information about genes of interest, and show other pathways the genes are involved in. The aim is to extend this functionality by integrating KeggMapWizard into a publicly accessible pipeline, such as the nf-core RNAseq pipeline, implemented in Nextflow. This would increase accessibility to a broader scientific audience and streamline RNAseq data interpretation.
Automated bacterial assembly pipeline
Federico Silva Gutierrez
Current bacterial genome assembly pipelines using long HiFi sequencing technologies face several challenges. One major problem is that different assemblers often produce varying results. Therefore, pipelines usually involve running multiple assemblers and manually selecting the best outcome. This project aims to automate this process by developing a tool that provides side-by-side summaries of all assemblies, making it easier to decide which one is optimal. We will use Nextflow to implement scripts and experience into a streamlined pipeline. This work builds on Thomas Roder’s expertise and complements his ongoing development of an assembly curation tool.”