Authors: Aliaksei Shauchenka, Michele De Cillis
The objective of this project is to identify biomarkers associated with Amyotrophic Lateral Sclerosis (ALS). To achieve this, we have access to RNA-Seq sequencing data from post-mortem brain cortex biopsies of individuals diagnosed with ALS and those without the disease.
The data for this project originates from the study titled "Postmortem Cortex Samples Identify Distinct Molecular Subtypes of ALS: Retrotransposon Activation, Oxidative Stress, and Activated Glia" by Oliver H. Tam.
The provided data comprises two types of information:
Our initial step was to load and parse our data. Given the number of files and the potential time required for loading, we developed a utility to save our dataset into a CSCV file. The default path is /outputs/.
This step was implemented using an object-oriented approach by creating two distinct classes: one for the Samples and one for the Annotations. We decided to keep each class separate to handle and edit different kinds of data efficiently.

This module processes annotations derived from XML data, wherein each snippet describes metadata associated with a biological sample.