Authors: Aliaksei Shauchenka, Michele De Cillis

Objective


The objective of this project is to identify biomarkers associated with Amyotrophic Lateral Sclerosis (ALS). To achieve this, we have access to RNA-Seq sequencing data from post-mortem brain cortex biopsies of individuals diagnosed with ALS and those without the disease.

The data for this project originates from the study titled "Postmortem Cortex Samples Identify Distinct Molecular Subtypes of ALS: Retrotransposon Activation, Oxidative Stress, and Activated Glia" by Oliver H. Tam.

1. Data Preprocessing


The provided data comprises two types of information:

Our initial step was to load and parse our data. Given the number of files and the potential time required for loading, we developed a utility to save our dataset into a CSCV file. The default path is /outputs/.

This step was implemented using an object-oriented approach by creating two distinct classes: one for the Samples and one for the Annotations. We decided to keep each class separate to handle and edit different kinds of data efficiently.

Screenshot 2024-05-10 at 5.00.36 PM.png

1.1 Annotations


This module processes annotations derived from XML data, wherein each snippet describes metadata associated with a biological sample.