Internal Seminar - Saadat Hussain
EXTERNAL SEMINAR - Daniel Rico : Network approaches to understand chromatin interactions and gene regulation
Seminar of Daniel Rico from the Institute of Cellular Medicine, Newcastle University (UK).
Title: Network approaches to understand chromatin interactions and gene regulation
Place: Campus of Luminy. Seminar room TPR2, Bloc 5. November 20th @ 11h00
Our research focuses in understanding how to read the genome so we can identify the instructions for building different cell types. We develop and apply new computational methods to integrate genomic, epigenomic and transcriptomic data to decode these instructions. We collaborate with colleagues at Newcastle University and the Great North Children Hospital to understand which mutations can cause “typos” in these instructions, leading to diseases. We are particularly interested in the immune system, where we are investigating how chromatin contributes to sex-specific immune responses. In addition, we have exciting collaborations where we are trying to understand the functional role of chromatin dynamics during mitosis and the interplay of chromatin and genomic translocations in leukemias.
* Rigau M, Juan D, Valencia A, Rico D. Widespread population variability of intron size in evolutionary old genes: implications for gene expression variability. BioRxiv 2017, Epub ahead of print.
* Pancaldi V, Carrillo-de-Santa-Pau E, Javierre BM, Juan D, Fraser P, Valencia A, Rico D. Integrating epigenomic data and 3D genomic structure with a new measure of chromatin assortativity. Genome Biology 2016, 17, 152.
* Carrillo-de-Santa-Pau E, Juan D, Pancaldi V, Were F, Martin-Subero I, Rico D, Valencia A, The BLUEPRINT Consortium. Automatic identification of informative regions with epigenomic changes associated to hematopoiesis. Nucleic Acids Research 2017, 45(16), 9244-9259.
Internal Seminar - Andreas Zanzoni
Internal Seminar - Jeanne Cheneby
PhD Thesis defense - Diogo Ribeiro - Discovery of the role of protein-RNA interactions in protein multifunctionality and cellular complexity
December 5th, 2018 - 14h, MIO amphitheatre
Over time, life has evolved to produce remarkably complex organisms. To cope with this complexity, organisms have evolved a plethora of regulatory mechanisms. For instance, thousands of long non-coding RNAs (lncRNAs) are transcribed by mammalian genomes, presumably expanding their regulatory capacity. An emerging concept is that lncRNAs can serve as protein scaffolds, bringing proteins in proximity, but the prevalence of this mechanism is yet to be demonstrated. In addition, for every messenger RNA encoding a protein, regulatory 3’ untranslated regions (3’UTRs) are also present. Recently, 3’UTRs were shown to form protein complexes during translation, affecting the function of the protein under synthesis. However, the extent and importance of these 3’UTR-protein complexes in cells remains to be assessed.
This thesis aims to systematically discover and provide insights into two ill-known regulatory mechanisms involving the non-coding portion of the human transcriptome. Concretely, the assembly of protein complexes promoted by lncRNAs and 3’UTRs is investigated using large-scale datasets of protein-protein and protein-RNA interactions. This enabled to (i) predict hundreds of lncRNAs as possible scaffolding molecules for more than half of the known protein complexes, as well as (ii) infer more than a thousand distinct 3’UTR-protein complexes, including cases likely to post-translationally regulate moonlighting proteins, proteins that perform multiple unrelated functions. These results indicate that a high proportion of lncRNAs and 3’UTRs may be employed in regulating protein function, potentially playing a role both as regulators and as components of complexity.
Internal Seminar - Guillaume Charbonnier
Internal Seminar - Christophe Chevillard
PhD Thesis defense - Mustafa Abuelqumsan - Assessment of Supervised Classification Methods for the Analysis of RNA-seq Data
Decembre 20th, 2018 -
In recent years, the advent of next-generation sequencing (NGS) technology has been revolutionizing how genomic studies are processed. An important and widely used application of NGS technology is the study of transcriptome through sequencing of cDNA obtained from RNA (RNA-seq). Compared with previous technologies like microarrays, RNA-seq data have many advantages, such as dynamic and wider ranges of measurements, increased precision, higher throughput, discovery of novel RNA species and splice forms, etc. Thence, RNA-seq has been became suitable alternative for the microarray approach as the main platform to transcriptome studies. NGS technologies produce huge amounts of data, which urges the development of effective multivariate analysis methods adapted to the particular nature of the data (discrete counts, huge dynamic range, outliers, …). In this dissertation, we focus on the use of machine learning methods to perform supervised classification to assign samples to groups based on their RNA-seq gene expression profiles.
First, we briefly revise the state-of-art for the genomics and the statistical methods to treat NGS data, in order to draw lessons from the latest developments in analysis the NGS data and to evaluate what our research will provide to the latest scientific developments in the scope of multivariate analysis for the NGS data.
We perform a comparative assessment of supervised classification methods, based on published data downloaded from the recount2 warehouse, which contains around 2000 RNA-seq experiments. From this database, we selected seven study cases that are representative for typical of RNA-seq studies with different type of categories (classes): disease states (cancer types, leukemia, psoriasis), or cell types (nervous cells). We assessed the impact of pre-processing on classifiers: filtering procedures (discarding unsuited genes and/or samples), normalization, PCA transformation. We also studied the impact of the feature selection, to circumvent the problem of over-dimensionality of the feature space, and find out the subset of genes or components that optimizes the accuracy of classifiers. The feature selection relied on variable ordering based on either differential expression analysis, or on variable importance returned by a Random Forest classifier.
We pay a particular attention to the metadata and we explore the structure of the datasets, in order to interpret the behavior of each tested classifier (Support Vector Machines, Random Forest, and K Nearest Neighbouts), in light of the specificities of each study case (number of samples, number of classes, distribution of the count values, bulk or single-cell RNA-seq, …).