Analysis of Region-Specific Transcriptomic Changes in the Autistic Brain

Size: px
Start display at page:

Download "Analysis of Region-Specific Transcriptomic Changes in the Autistic Brain"

Transcription

1 University of Miami Scholarly Repository Open Access Dissertations Electronic Theses and Dissertations Analysis of Region-Specific Transcriptomic Changes in the Autistic Brain Dmitry Velmeshev University of Miami, Follow this and additional works at: Recommended Citation Velmeshev, Dmitry, "Analysis of Region-Specific Transcriptomic Changes in the Autistic Brain" (2016). Open Access Dissertations This Embargoed is brought to you for free and open access by the Electronic Theses and Dissertations at Scholarly Repository. It has been accepted for inclusion in Open Access Dissertations by an authorized administrator of Scholarly Repository. For more information, please contact

2 UNIVERSITY OF MIAMI ANALYSIS OF REGION-SPECIFIC TRANSCRIPTOMIC CHANGES IN THE AUTISTIC BRAIN By Dmitry Velmeshev A DISSERTATION Submitted to the Faculty of the University of Miami in partial fulfillment of the requirements for the degree of Doctor of Philosophy Coral Gables, Florida May 2016

3 2016 Dmitry Velmeshev All Rights Reserved

4 UNIVERSITY OF MIAMI A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy ANALYSIS OF REGION-SPECIFIC TRANSCRIPTOMIC CHANGES IN THE AUTISTIC BRAIN Dmitry Velmeshev Approved: Mohammad Faghihi, M.D., Ph.D. Assistant Professor, Department of Psychiatry and Behavioral Sciences Eleonore Beurel, Ph.D. Associate Professor, Department of Biochemistry and Molecular Biology Vance Lemmon, Ph.D. Professor, Department of Neurological Surgery and Cell Biology and Anatomy Gavin Rumbaugh, Ph.D. Associate Professor, Department of Neuroscience The Scripps Research Institute, Jupiter, Florida Pantelis Tsoulfas, M.D. Associate Professor, Department of Neurological Surgery Guillermo Prado, Ph.D. Dean of Graduate School

5 VELMESHEV, DMITRY (Ph.D., Biochemistry and Molecular Biology) Analysis of Region-Specific Transcriptomic Changes (May 2016) in the Autistic Brain Abstract of a dissertation at the University of Miami. Dissertation supervised by Professor Mohammad Faghihi. No. of pages in text. (88) Autism is a highly prevalent neurodevelopmental disorder that affects 1 in every 68 children in the US. Autism comprises a spectrum of highly heterogeneous disorders defined by characteristic deficits of communication and by presence of repetitive and stereotyped behavior. At the same time the disease is highly heritable and genetically heterogeneous, with hundreds of genetic variants that have been associated with it. Previous systems biology studies have provided first insights into convergence of various disease-associated genetic variants onto specific pathways, developmental windows and cell types in the brain. Additionally, transcriptomic and epigenetic studies have identified genes and pathways affected by common molecular pathology in autism. However, previous studies lacked special resolution, have not investigated commonalities and differences in transcriptomic and epigenetic changes across specific brain regions in autism, have not linked epigenetic changes to transcriptional regulation in the disease and have not looked at other levels of transcriptional control rather than gene expression, such as alternative splicing and expression of long noncoding RNAs. In my dissertation I attempt to fill in these gaps by

6 performing analysis of antisense RNA expression in the autistic brain and by performing RNA sequencing and DNA methylation analysis of 6 different cortical regions of autism patients and controls. I demonstrate that many autismassociated genomic loci express antisense noncoding RNAs, with some of them upregulated in the cortex of autism patients. I have also developed an automated user-friendly tool to perform RNA sequencing data analysis and applied it to a large cohort of RNA-seq samples to identify differentially expressed proteincoding genes and long noncoding RNAs, with many genes and pathways recurring in multiple regions and patient cohorts. Finally, I identify convergence of transcriptomic changes of the frontal cortical regions in autism on common genes and pathways and a divergent effect of the pathology on gene expression and alternative splicing in the prefrontal and frontoinsular cortices in autism. The results of the study prompt to investigate dysfuntion of the insular cortex in autism and provide high-confidence autism gene candidates for further functional studies.

7 Table of Contents LIST OF FIGURES... iv LIST OF TABLES... vi Chapter 1: Introduction... 1 Chapter 2: Expression of Non-Protein-Coding Antisense RNAs in Genomic Regions Linked to Autism Spectrum Disorders... 8 Bioinformatic identification of ASD-associated noncoding RNAs ASD-associated NATs are expressed in human brain tissues ASD-NATs are differentially expressed in human brain regions ASD-NATs show characteristic patterns of expressions with respect to their sense protein-coding partner SYNGAP1 antisense RNA (SYNGAP1-AS) is differentially expressed in ASD brain tissues in comparison to age-matched controls Subcellular localization of ASD-NATs Conclusions Chapter 3: Development of CANEapp: An Automated User-Friendly Application for Comprehensive Analysis of Next-generation Sequencing Experiments CANEapp: flexible and user-friendly multiplatform framework for integrated transcriptome analysis Automated scalable RNA-seq analysis pipeline for accurate and comprehensive transcriptome analysis CANEapp functionality on various Linux server architectures and its performance and accuracy in identifying differentially expressed genes from real datasets Discovery of novel long noncoding RNAs using CANEapp and their experimental validation Conclusions Chapter 4: Meta-Analysis of Transcriptomic Changes in the Cortex of Autistic Patients Identifies Region-Specific Patterns of Molecular Pathology Assessing regional-specific transcriptomic and epigenetic changes in the cortex of autism patients Autism-associated transcriptomic changes converge on frontal lobe regions and are fundamentally different in the frontoinsular cortex Cross-region analysis of genes differentially expressed in the frontal lobe reveals high-confidence molecular targets underlying autism pathology Divergence of molecular changes in the prefrontal and frontoinsular cortices can be observed on the level of individual genes Autism pathology causes widespread changes in alternative splicing in the PFC and FIC, with common changes being discordant between the two regions Discordant alternative splicing impacts regulatory protein domain structure of synaptotagmin I and ankyrin 2 in the PFC and FIC Conclusions Chapter 5: Final Conclusions REFERENCES iii

8 LIST OF FIGURES Figure 1 Bioinformatics pipeline used for the identification of noncoding antisense RNAs. 8 Figure 2 ASD-associated NATs and mrna expression in different human brain regions. 14 Figure 3 ASD-associated NATs and mrna expression in different human brain regions (continued). 15 Figure 4 Schematic representation of Antisense and Sense RNA partners.16 Figure 5 Differential expression of SYNGAP1-AS in the non-asd postmortem brain and in the brain of patients affected by ASD. 19 Figure 6 Expression of FOXG1-AS, VPS13B-AS and NHS-AS in the non-asd brain and in the brain of patients affected by ASD. 21 Figure 7 Subcellular localization of antisense transcripts overlapping ASDassociated genes. 25 Figure 8 CANEapp and the graphical user interface. 27 Figure 9 Server-side RNA-seq analysis pipeline. 31 Figure 10 Detection and of novel long noncoding RNAs by CANEapp and their validation by real-time PCR. 44 Figure 11 Detection and of novel long noncoding RNAs by CANEapp and their validation by real-time PCR. 48 Figure 12 Overview of the cortical regions and analysis workflows used in the current study. 55 iv

9 Figure 13 Gene expression analysis demonstrates convergence of autismrelated transcriptional changes between different cortical regions. 59 Figure 14 Overlap between genes differentially expressed in different cortical regions in autism. 62 Figure 15 Differential splicing analysis of the PFC and FIC in autism. 67 Figure 16 Differential splicing in the PFC and FIC has opposite affects on synaptotagmin I and ankyrin-2 activities in autism through alternative splicing of exons overlapping regulatory protein domains. 70 v

10 LIST OF TABLES Table 1 Antisense RNAs to ASD-associated genes expressed in the human brain. 12 Table 2 List of software packages and scripts used in CANEapp. 33 Table 3 Comparison of CANEapp with previously developed tools for RNA-seq analysis. 38 Table 4 Description of datasets used to validate CANEapp performance to estimate differential gene expression. 40 Table 5 Fold changes of gene expression in three datasets reanalyzed by CANEapp and compared to qrt-pcr results. 43 Table 6 Summary of the experimental group sizes and sample demographics. 56 Table 7 Summary of genes differentially expressed in autism in different cortical regions. 58 Table 8 Differential splicing events common between the PFC and FIC. 68 vi

11 Chapter 1: Introduction Autism or autism spectrum disorders (ASD) are heterogeneous neurodevelopmental disorders, both in terms of clinical manifestations and genetic risk factors (1). Disease frequency among siblings of affected children is approximately 2% - 8%, which is much higher than the prevalence rate of the general population and monozygotic twins have 60% concordance for classic autism and 92% for broader autistic phenotypes, indicating strong genetic inheritance as the predominant causative agent (2). Autism is highly heterogeneous in its behavioral manifestation, with different cases spanning a spectrum of behavioral abnormalities, hence the name autism spectrum disorder. A high proportion of autism cases have a delay in language development, comorbidity of epilepsy and intellectual disability and other disorders. However, in spite of the profound heterogeneity in symptoms and severity, autism is defined as a compound of two clinical manifestations: communication deficits and repetitive or stereotyped behavior (3). This signature behavioral dysfunction of autistic patients points to a neurological basis of the disease shared across its entire spectrum. Indeed, multiple studies demonstrated that specific neurodevelopmental and neurological processes are dysfunctional in autism, such as establishment and maintenance excitatory/inhibitory balance in the neocortex (4), functional local versus distal connectivity in certain cortical regions 1

12 2 (5, 6) and production of neurons in frontal lobe regions (7, 8). Moreover, several specific cortical and brain regions were implicated in the disorder through functional MRI studies (9). Altogether these findings support the hypothesis that autism pathogenesis affects specific aspects of development of particular brain regions that results in characteristic behavioral abnormalities. Different types of genetic variations were associated with the disorder: rare de novo variants contributing to a minority (~3%) of the cases but having large effect sizes, and common variants contributing to a majority of cases (~40%) but whose individual effect are small and have to be combined in order to produce the phenotype (10). Such mutations may contribute to ASD etiology by affecting conventional genes directly or indirectly by altering the function of non-protein coding RNAs (ncrnas) expressed in the same genomic loci. Recent evidence has implicated such ncrnas in neurodevelopmental and neurodegenerative disorders including autism (11-20). Large transcriptomics consortiums such as ENCODE (21) and FANTOM (22, 23) have demonstrated that the human genome is pervasively transcribed and that the primary output is ncrnas. Through diverse mechanisms, these ncrnas control protein production and function at multiple levels, including epigenetic control of their corresponding or distant loci (24, 25), alteration of localization, stability or processing of targets (25, 26), or by modulating translational efficiency by binding to the 3 UTR of transcripts, as in the case of micrornas (27, 28). Natural Antisense Transcripts (NATs) are a conserved class of long (> 200nt in

13 3 length) ncrna molecules that are transcribed from the opposite DNA strand of a sense RNA partner with which they have sequence complementarity (23, 29). Such antisense RNAs can exert cis-regulatory functions to increase (concordant) or decrease (discordant) expression levels of their corresponding sense mrna (26). The gene regulator functions can also work in trans by affecting genes from distant genomic loci. In addition to the profound heterogeneity in the behavioral manifestation of autism, it s also characterized by an unprecedented genetic heterogeneity, with more than a hundred genetic risk factors already known and thousands of variants predicted to be linked to the disorder in the future (30, 31). A small number of these variants are highly deleterious mutations with a high penetrance of the disease that make them perfect for studying the mechanisms of the disorder using model organisms. However, the majority of the variants are predicted to be common variants with small effect size that produce the disorder when combined, and the traditional genetic approach is impractical when investigating the combined effect of multiple pathogenic gene variants. Recently, systems biology approaches have been used to demonstrate the convergence of genetic variants associated with autism on specific biological pathways, neuronal cell types and developmental windows that provided important insights into the mechanisms of the disease (32, 33). These studies indicated the enrichment of different types of autism-associated genetic variants in different biological pathways and developmental time windows. In particular, transcriptional regulation was demonstrated to be enriched with rare de novo variants of genes

14 4 crucial during prenatal brain development, whereas common variants were shown to be enriched in synaptic development and function during early postnatal development. In addition, glutamategic projection neurons in the forebrain were shown to be primarily affected. Therefore, systems biology studies offered specific insights into the pathways, cell types and time windows of brain development affected by different genetic variants in autism. Alternatively, high-throughput genomics techniques profiling transcriptional changes in autism brain tissue compared to neurologically normal controls were employed to identify common biological pathways and individual genes whose expression is altered in multiple autism cases (34, 35). These studies revealed remarkable convergence of the molecular pathology of autism across different patient cohorts and provided complementary basis for dissecting common mechanism of disease. Specifically, previous studies have identified misregulation of genes involved in synapse formation and function, neuronal projection growth and neurotransmitter release, as well as markers of astrocytes and activated microglia. Additionally, many immune system-related genes were also show to be affected. Therefore, gene expression studies provide complementary insight to systems biology analysis of genetic variants and identify common pathways dysregulated in the brain of individuals with diverse genetic backgrounds. However, previous studies focused on profiling whole-genome transcriptional changes in large brain areas (such as frontal cortex) and thus have not investigated the impact of molecular pathology on development and function of

15 5 specific areas of the brain whose function is altered in autism and the convergence or divergence of these changes in different regions. Additionally, due to the limitations of previously used high-throughput technologies, such as microarrays, the analysis was largely restricted to annotated genes and to measuring changes in gene expression, thus missing changes in RNA processing. Lastly, the interplay between epigenetic changes, regulation of transcription and RNA splicing in specific brain regions has not been investigated. In my thesis work I tried to fill in these gaps by investigating long noncoding RNAs expressed from ASD-associated genetic loci and studying region-specific epigenetic and transcriptomic changes taking place in the autistic brain. First, I developed an algorithm to mine existing public transcriptomic repositories for the presence of NATs that are produced from ASD-associated genomic regions. I believe that ncrna information processing systems involving such transcripts represent a critical but under-appreciated dimension of the cell machinery that must be considered in order to identify pathological genetic events and facilitate novel therapeutic development strategies for ASD. Next generation sequencing (NGS) technologies are indispensable for molecular biology research, but data analysis represents the bottleneck in their application. Users need to be familiar with the computer terminal commands, Linux environment, various software tools and scripts. Analysis workflows have to be optimized and experimentally validated to extract biologically meaningful data. Moreover, as larger datasets are being generated, their analysis requires use of

16 6 high-performance servers. To address these needs and to handle RNA-seq analysis of a large autism dataset during my thesis work, I developed CANEapp (application for Comprehensive automated Analysis of sequencing Experiments), a unique suite that combines a Graphical User Interface (GUI) and an automated server-side analysis pipeline that is platform-independent, making it suitable for any server architecture. The GUI runs on a PC or Mac and seamlessly connects to the server to provide full GUI control of RNA-seq project analysis. The serverside analysis pipeline contains a framework that is implemented on a Linux server through completely automated installation of software components and reference files. Analysis with CANEapp is also fully automated and performs differential gene expression analysis through alternative workflows, as well as novel noncoding RNA discovery. CANEapp is available free of charge at I then perform a meta-analysis of transcriptional changes in 4 association and 2 sensory cortical regions of autistic patients using RNA sequencing. The study includes three patient cohorts and 122 brain tissue samples from 92 individuals. Additionally, I perform alternative splicing analysis of samples from the prefrontal and frontoinsular cortices of patients from two sample cohorts. I report surprising degree of convergence of the molecular pathology affecting cellular proliferation in the three cortical regions from the frontal lobe, a sticking discordant pattern of gene expression and splicing changes in the prefrontal and frontoinsular cortices

17 7 and drastically reduced degree of molecular pathology in the sensory cortical areas. This study is the first to describe molecular bases of differential affects of autism pathology on development and function of different cortical systems, such as frontal cortex, limbic system and sensory areas. For the first time I provide a link between region-specific molecular changes in autism brain and recurrent behavioral phenotypes observed in the disease.

18 Chapter 2: Expression of Non-Protein-Coding Antisense RNAs in Genomic Regions Linked to Autism Spectrum Disorders Bioinformatic identification of ASD-associated noncoding RNAs. AceView is a transcriptome database created and supported by the National Center for Biotechnology Information (NCBI) that represents a curated nonredundant collection of RNA transcripts derived from public cdna collections (mrnas from GenBank or RefSeq, and single pass cdna sequences from dbest and Trace) (36). Aceview also includes information on tissue-specific expression for transcripts and is an excellent source of transcriptomic data for high-throughput genome-wide studies. Figure 1 a) Bioinformatics pipeline used for the identification of noncoding antisense RNAs. 8

19 9 b) Antisense noncoding RNAs to 103 ASD-associated genes derived with our bioinformatics pipeline. Seventy-one noncoding antisense RNAs were identified overlapping 38 of 103 analyzed genes; thus, each sense gene has ~2 antisense RNAs on average. c) Distribution of noncoding antisense RNAs to ASD-associated genes based on the type of sense-antisense overlap. The majority of the antisense RNAs overlaps an intron of the sense gene but do not have an overlap with the mature sense transcript (genomic overlap). Approximately equal numbers of antisense RNAs overlap an exon of the sense transcripts (exonic overlap) or gene promoters (promoter overlap); some of the antisense transcripts with genomic or exonic overlap have mixed classifications and can also overlap the promoter regions of their sense partners. I have developed a bioinformatics pipeline to mine the AceView and to perform high-throughput searches of noncoding RNAs in the antisense orientation to genes of interest (Figure 1A). This pipeline uses information on the genomic coordinates of transcripts, their exonic structure and their coding potential that is contained in the gene transfer format (GTF) files downloaded from the AceView website, to perform a simultaneous search for non protein-coding antisense RNAs. The program first retrieves, from the AceView database, the exonic coordinates of all alternative transcripts corresponding to a user-provided list of

20 10 genes. The information of exonic structure of sense transcripts is utilized to obtain AceView transcripts that are in antisense conformation and determine the type of sense-antisense overlap. Next, coding antisense transcripts are filtered out preserving only non-protein-coding transcripts. I utilized this pipeline to investigate the presence of antisense transcripts overlapping 103 genes mutations of which were causally implicated in ASD (37). The gene list I selected to use for bioinformatics analysis was manually curated by examining all existing medical literature and published in a peer-reviewed journal. Thus, using this gene list as a reference provided both the scope and confidence in the quality of the analysis. I was able to identify at least one noncoding antisense RNA partner for 38 of the examined genes (37%). Overall, 71 noncoding RNA loci were identified, yielding 2 antisense partners per sense gene on average (Figure 1B). These antisense RNAs represent 3 structural classes based on the position of the antisense transcript with respect to the sense gene: genomic overlap, exonic overlap and promoter overlap (Figure 1A and 1C). Therefore, a significant number of gene loci associated with ASD have one or more noncoding antisense transcripts that may contribute to ASD pathophysiology, and are hence-forth referred to as ASD-NATs. ASD-associated NATs are expressed in human brain tissues. Of the 71 identified ASD-NATs, 18 were selected for quantitative RT-PCR (qrt- PCR) validation studies using commercially available RNAs from total brain extract, frontal cortex, and cerebellum. For qrt-pcr, total RNA was reverse

21 11 transcribed using the High-Capacity cdna Reverse Transcription Kit (Life Technologies). The cdna was then diluted 1:5 and was used as a template for both SYBR Green (Life Technologies, ) and TaqMan qpcr using the ABI 7900 (Life Technologies). TaqMan probes for human PGK1 from Life Technologies (Hs _g1) were used to measure gene expression of the endogenous control. Three technical replicates were performed for each reaction. No-template controls were included in each reaction and the melting curve was analyzed to assess the specificity of each primer. In case the primers were designed for a single exon and did not span a splice junction, appropriate no-rt controls were used to avoid including samples contaminated with DNA. The results of the quantitative real-time RT-PCR were analyzed with SDS 2.3 software from Life Technologies. To perform strand-specific measurement of antisense transcript expression, I designed primers for a region of antisense transcript that overlaps with an intron or the promoter of the sense gene. Next, I used one-step RNA-to-Ct SYBR Green Kit (Life Technologies, ). I performed reverse transcription (RT) step in a 384-well optical plate using reverse primers to specifically reversetranscribe antisense RNA and to exclude the possibility of measuring the expression of the sense pre-mrna. Samples were then incubated at 95 C for 5 minutes to inactivate the reverse transcriptase enzyme. Forward primers were then added to the reaction and quantitative PCR was performed on the same plate. I included no-rt control and no-template controls for each set of primers to control for non-specific binding.

22 12 The expression of 12 transcripts was confirmed in at least one brain region or in the total brain extract (Table 1). Notably, the antisense transcript of FOXG1 (FOXG1-AS) was found in the cortex but not in the cerebellum, implying a certain level of region-specificity in the expression of some ASD-NATs. FOXG1 encodes a transcription factor thought to play a role in the development of the cortex (38) and mutations in this gene have been linked to a variety of neurodevelopment disorders and higher-order brain function (39). Table 1. Antisense RNAs to ASD-associated genes expressed in the human brain Sense gene name Antisense gene coordinates Antisense AceView name Antisense type FOXP1 3: ,1 chyrarbu exonic;promoter ZNF81 X: ,-1 zoyfoy genomic SYNGAP1 6: ,-1 kleefloybu exonic;promoter CACNA1C 12: ,-1 kirare exonic NIPBL 5: ,-1 LOC promoter

23 13 VPS13B 8: ,-1 speeshor promoter NHS X: ,-1 kiro exonic DHCR7 11: ,1 steymor genomic;promoter LAMP2 X: ,1 werkoy exonic PTEN 10: ,-1 kloloy genomic FOXG1 14: ,-1 sachawbu promoter PQBP1 X: ,-1 foyker exonic;promoter To expand these studies, I compared expression levels of ASD-NATs across multiple brain regions using a cohort of human postmortem brain samples provided by the National Institute of Child Health and Development (NICHD) at the University of Maryland. RNA was extracted from 3 brain regions; the prefrontal cortex (PFC), superior temporal gyrus (STG) and cerebellum. I observed that 9 out of the 10 ASD-NATs were detectable in all brain regions, except FOXG1-AS, which was detected in all PFC and STG samples but none of the cerebellar samples (Figure 2A, Figure 3). This finding corroborates our initial

24 14 analysis using commercial RNA from the frontal cortex, and confirms regionselective expression of FOXG1-AS. Figure 2 ASD-associated NATs and mrna expression in different human brain regions. qrt-pcr analysis of ASD-associated NATs (A) and corresponding mrna (B) in the prefrontal cortex (PFC), superior temporal gyrus (STG) and cerebellum of non-asd human postmortem brain. Transcript expression is normalized to PGK1. Strand-specific qrt-pcr was used to measure expression of SYNGAP1-AS and PQBP1-AS *-p<0.05, **-p<0.01, Tukey s post-hoc test, NS-not significant, NE-not expressed.

25 15 Figure 3 ASD-associated NATs and mrna expression in different human brain regions. qrt-pcr analysis of ASD-associated NATs in the prefrontal cortex (PFC), superior temporal gyrus (STG) and cerebellum of non-asd human postmortem brain samples. Transcripts expression is normalized to PGK1. Strand-specific qrt-pcr was used to measure expression of ZNF81-AS and NHS-AS These data demonstrate that a large portion of noncoding antisense RNAs from ASD-associated loci are expressed in the human brain and suggest the possibility that certain ASD-NATs may have region-dependent patterns of expression reflecting their biological functions. ASD-NATs are differentially expressed in human brain regions. Many ncrnas are dynamically regulated during differentiation and exhibit tissue- and cell type-specific patterns of expression with proposed functions and

26 16 mechanisms far more complex than originally anticipated (40-44). Temporal and spatial expression of many long ncrnas appears to be crucial for proper CNS development and neurological functioning through the precise regulation of a variety of biological processes (45-47). Figure 4 Schematic representation of Antisense and Sense RNA partners. Diagram showing the genomic location of Antisense (in blue) and Sense (in black) RNA partners. The primers used to measure antisense RNAs expression by qrt-pcr are shown as red arrows. Here, I investigated the expression patterns of ASD-NATs in the 9 PFC, 9 STG and 7 cerebella of non-asd young individuals with average age of years ± 4.05 years. Among the 11 selected NATs, I found 6 to be differentially expressed

27 17 within the examined brain regions. The structure, position in respect to the sense gene and location of the primers for these transcripts is shown in Figure 4. Three of these transcripts, SYNGAP1-AS, CACNA1C-AS and NIBPL-AS have higher expression levels in the cerebellum as compared to the PFC and STG, while PQBP1-AS is more abundantly expressed in the PFC compared to both the STG and cerebellum and LAMP2-AS was expressed at a higher level in PFC compared to cerebellum (Figure 3A). The region-specific expression of the above antisense transcripts suggests a possible role in the development and function of the PFC, STG and cerebellum. The region-specificity of these transcripts shows evidence that NATs are not a product of random spurious transcription and provides a basis for future therapeutic approaches that could be tailored to specific regions of the brain, targeting non-protein-coding antisense targets instead of protein-coding genes. ASD-NATs show characteristic patterns of expressions with respect to their sense protein-coding partner. NATs can exert regulatory functions in cis by modulating the expression of neighboring genes (11, 24, 26). In order to determine if ASD-NATs modulate the expression levels of protein-coding ASD associated genes through cis-regulation, I examined their respective expression levels. I observed discordant patterns of expression for two sense-antisense pairs: SYNGAP1/SYNGAP1-AS and PQBP1/PQBP1-AS. SYNGAP1 was more highly expressed in the cortex compared to the cerebellum, whereas PQBP1 is more abundant in the

28 18 cerebellum (Figure 3B). Discordant expression of these sense/antisense pairs suggests possible regulation of the protein-coding gene by its noncoding counterpart, a phenomenon already described for other loci (20, 24, 48). Two other protein-coding genes, NIBPL and FOXG1, showed a pattern of regionspecific expression similar to their antisense partners (Figure 3B). These NATs may have a positive regulatory effect on the sense partner (20), or the senseantisense pairs might be co-regulated (49). It is noteworthy that these two ASD- NATs overlap the promoter of their protein-coding sense partner, thus potentially sharing the same regulatory elements. Overall, these data show that ASD-NATs show regional expression pattern in the brain and further show discordant or concordant expression in regards to their sense partners, suggesting these noncoding antisense transcripts may perform highly specialized region-specific functions by affecting the expression of their sense partners. SYNGAP1 antisense RNA (SYNGAP1-AS) is differentially expressed in ASD brain tissues in comparison to age-matched controls. The differential expression of ASD-NATs observed across brain regions suggests a tissue-specific function for these RNA transcripts. Thus, I hypothesized that expression of ASD-NATs may be altered in the brain of patients with autism compared to non-asd cases. To test this hypothesis, I used qrt-pcr to measure the expression of the 10 ASD-NATs that I could detect in in the PFC

29 19 and STG of 18 (9 autistic and 9 age-matched individuals) and in the cerebella of 13 (7 autistic and 6 age-matched individuals). Figure 5 Differential expression of SYNGAP1-AS in the non-asd postmortem brain and in the brain of patients affected by ASD. Strand-specific qrt-pcr analysis of SYNGAP1-AS expression in: A) Prefrontal cortex (PFC, N=9), B) Superior temporal gyrus (STG, N=9) and C) Cerebellum (CER, N=7) of ASD patients and age-matched non-asd individual. *-p<0.05, **-p<0.01, Student s T-test

30 20 D) Linear regression plot demonstrating negative correlation of SYNGAP1-AS with SYNGAP1 gene expression in the prefrontal cortex of control individuals. R= , p<0.05, Pearson Correlation. SYNGAP1-AS expression is normalized to PGK1. I found SYNGAP1-AS to be significantly upregulated (p<0.05) in the PFC and STG of autistic patients (Figure 5A and 5B), but not in the cerebellum (Figure 5C). SYNGAP1 gene codes for Synaptic Ras GTPase activating protein 1, which is critical for synapse function and is involved in cognition (50). SYNGAP1 plays a role in brain development as well as higher-order brain function, as mutations in this gene lead to mental retardation (51-53). Although not statistically significant, three other ASD-NATs (FOXG1-AS, VSP13B-AS, NHS-AS) show an appreciable trend of differential expression in ASD (Figure 6). These finding that SYNGAP1-AS expression is affected only in the PFC and STG and not in the cerebellum of autistic patients suggests that dysregulation of this non-proteincoding antisense transcript may be cortex-specific, leading to possible impairment of cortical function.

31 21 Figure 6 Expression of FOXG1-AS, VPS13B-AS and NHS-AS in the non-asd brain and in the brain of patients affected by ASD. qrt-pcr analysis of antisense RNAs expression in the non-asd brain and brain affected by ASD pathology: A) Expression of FOXG1-AS in the prefrontal cortex (PFC), B) Expression of VPS13B-AS in the PFC and C) Strand-specific qpcr analysis of NHS-AS in the superior temporal gyrus (STG). Antisense RNA expression is normalized to PGK1. p value - Student s T-test

32 22 Interestingly, I found that the expression of SYNGAP1-AS negatively correlates with the expression of SYNGAP1 sense gene in the prefrontal cortex of non-asd individuals (Figure 5D). The correlation was statistically significant, and I observed the same trend of negative correlation in the PFC of ASD patients, though the correlation did not reach statistical significance (data not shown). Subcellular localization of ASD-NATs. Of the proposed functions of NATs (26), regulation of chromatin structure and epigenetic memory have received the most experimental support. Antisense transcripts have been shown to provide a scaffold by which proteins can interact with DNA and histones in a locus specific manner (24, 54). Thus it is not surprising that ncrnas are predominantly localized to the nucleus or associated with chromatin, while protein-coding RNAs are more abundant in the cytosol (21). In order to assess the subcellular localization of ASD-NATs, I isolated RNA from three cellular fractions (cytoplasm, nucleus, chromatin) of SH-SY5Y neuroblastoma cells and performed RNA sequencing (RNA-seq). SH-SY5Y cells were fractionated using a modified NE-PER Kit (PIERCE) to isolate RNA from the cytosol, nucleoplasm and chromatin. RNA samples were prepared for directional RNA sequencing using a modified version of the Illumina sample preparation protocol. Briefly, 1 µg of total RNA was processed using Ribo- ZeroTM rrna Removal Kits to remove ribosomal RNAs. Ribosome-depleted RNA was treated with phosphatase before being treated with T4 polynucleotide kinase (PNK). PNK-treated RNA was then purified with the QIAGEN RNeasy

33 23 column purification kit and 3 and 5 RNA adapters were ligated to both ends of the RNA in separate reactions. Next, the RNA was reverse transcribed and PCR amplified. PCR products were purified using AMPure beads. RNA sequencing libraries were validated using the Agilent Bioanalyzer High Sensitivity DNA kit and sequenced using the Illumina HiSeq2000 platform at the Genomics sequencing core at the University of Miami. Each sample was run in a single flowcell to increase depth of sequencing. The sequencing reads were preprocessed with a custom Python script to trim library adapters. This allowed the generation of 62,500,000 reads per sample on average, which provided an acceptable coverage and sequencing depth. The trimmed reads were then aligned to the human transcriptome assembly GRCh37 from ENSEMBL using TopHat version (55). TopHat was run with default parameters and Samtools (56) were used to calculate the alignment statistics for each sample. The bam files generated with TopHat were further used as input for Cufflinks (57) to perform ab initio transcriptome assembly. The assembled fragments were then annotated using the Cuffcompare module of Cufflinks and AceView database file as a reference. The fragments that originated from introns and incompletely spliced RNAs were filtered out, and Fragments Per Kilobase of transcript per Million reads Mapped (FPKM) values for fragments transcribed from each locus were added to obtain locus expression. Fragments Per Kilobase per Million of reads Mapped (FPKM) reflecting expression levels of individual loci were used to further compare the expression of antisense RNAs between different compartments. I found that three out of 10

34 24 ASD-associated NATs could be detected in SH-SY5Y cells using RNA-seq; SYNGAP1-AS, VPS13B-AS and NIBPL-AS. All three antisense transcripts were expressed predominantly in the nucleoplasm or chromatin compartments, while little or no expression was observed in the cytoplasm (Figure 7A-C). The pattern of subcellular localization of these ncrnas is different from that of protein-coding genes such as beta-actin, which is largely localized to the cytoplasm (Figure 7D). The nuclear localization of these NATs offers evidence for the function of these transcripts in nuclear-associated processes and suggests that ASD-NATs might play a role in chromatin modifications or in transcriptional regulation. Overall, these data suggest that ASD- antisense RNAs overlapping ASD-associated genes represent functional elements that may regulate brain function and development by regulating transcription of other genes.

35 25 Figure 7 Subcellular localization of antisense transcripts overlapping ASDassociated genes. RNAseq analysis of RNA extracted from the cytoplasm, nucleoplasm and chromatin of SH-SY5Y cells. Expression of SYNGAP1-AS (a), NIBPL-AS (b), VPS13B-AS (c), b-actin (ACTB) (d) is shown as fragments per kilobase of transcript per million reads mapped (FPKM). Conclusions. The data presented here provide evidence that the molecular network underlying ASD pathology is far more complex than anticipated and may involve

36 26 dysregulation of ncrnas. These factors should to be taken into account in order to obtain a more holistic vista of the interplay of factors that lead to the disease state. Abundant transcription of regulatory ncrnas in ASD-related genomic regions indicate that in addition to conventional protein coding genes, disruption of RNA regulatory elements may contribute to the pathogenesis of ASD. Identification of disease specific RNAs (20), as well as novel technologies that enable targeting of these regulatory RNA molecules (48) adds a new dimension to current efforts investigating novel therapeutic targets for ASD.

37 Chapter 3: Development of CANEapp: An Automated User-Friendly Application for Comprehensive Analysis of Next-generation Sequencing Experiments CANEapp: flexible and user-friendly multiplatform framework for integrated transcriptome analysis Figure 8 CANEapp and the graphical user interface. 27

38 28 A) General structure of CANEapp. The Java application component is the only user-accessible component and operates on a personal computer to provide a point and click interface to configure RNA-seq analysis. The interface either establishes a connection with an Amazon Cloud instance (1) created using the preconfigured CANEapp Amazon Machine Image (AMI) or with a Unix server, in which case server-side pipeline components are automatically transferred to the server through the GUI. After configuring a project GUI communicates with the server side to transfer raw data files and options file and initiate the analysis. B) Design of the CANEapp s graphical user interface. C) CANEapp GUI s capabilities and project design steps. The Manage Projects tab allows creating, deleting or loading projects from a file. Additionally, user can see the status of the selected project on this tab. The next two tabs allow adding experimental groups and samples. On the Add Samples tab the user can specify the library preparation that has been used before sequencing and define such parameters as single or paired-end sequencing, strand selection and adapter sequences. The Analysis Settings tab is used to set up parameters of separate analysis steps, such as alignment, reconstruction and differential expression analysis. Finally, the last tab is used to specify server address and user credentials and initiate the analysis on the server side. CANEapp is an installation-free analysis framework (Fig. 8A) that allows designing, managing and monitoring of RNA-seq analysis experiments on a personal computer. CANEapp takes advantage of a Java-based graphical user

39 29 interface (GUI) to implement our Python-based automated RNA-seq analysis pipeline on a Linux server to perform resource-demanding analysis. Framework of CANEapp is highly flexible and can be implemented on a variety of server types, including standard Linux servers, Linux servers that use IBM Platform LSF Session Scheduler and Amazon Cloud servers. CANEapp was tested on a number of Linux operation systems, including Ubuntu, CentOS, RedHat Enterprise, Fedora, as well as Amazon Cloud Linux and CentOS server utilizing IBM Platform LSF Session Scheduler. The only component that the user needs in order to utilize CANEapp is the GUI (Fig. 8B). The GUI was written in Java with help of NetBeans Integrated Development Environment and JavaFX Scene Builder. Scene Builder was utilized to design most of the GUI s graphical components, whereas the working scripts were written using NetBeans. Java Secure Channel (JSch) protocols served as the foundation for establishing connection with the server, data upload to the server and data download from the server to the local machine. The GUI allows easy step-by-step design of RNA-seq analysis project, set up of analysis configuration, data transfer, project management and status monitoring (Fig. 8C). GUI seamlessly interacts with the server side to engage the analysis pipeline that is in essence a black box hidden from the user but containing the components to perform all the required steps of the analysis. The black box model insures that user does not have to directly interact with the server or any of the software at any stage of the analysis. This makes CANEapp immediately accessible to any user with little to zero background in bioinformatics or

40 30 computational science. Moreover, all project configurations are automatically stored in the GUI s memory, which allows managing running projects on different servers and getting instant access to project design and settings. Automation saves both computational and hands-on time considerably and removes a requirement of detailed knowledge of computational tools and together with a point and click interface will allow users without bioinformatics background to perform RNA-seq analysis. Automated scalable RNA-seq analysis pipeline for accurate and comprehensive transcriptome analysis. Once the project has been designed and analysis settings have been specified, server address and credentials need to be provided in order to submit the project. GUI will connect to the server and copy the pipeline components, raw data files together with project design and settings. After the data transfer is completed, a notification window will appear and analysis will be initiated on the server side through the computational pipeline. Once the analysis is initiated GUI can be closed and used at any time to check the status of the particular project. The analysis pipeline was written in Python and consists of several interacting scripts to perform automated analysis of RNA-seq experiments. The pipeline also generates a status file that is used to communicate with the GUI and keep track of the progress of each project. The GUI evokes the main pipeline script after the raw data files have been transferred to the server. Then the main script guide the construction of analysis framework if it has not been performed before and

41 31 evokes child processes that perform parallel analysis of samples using other pipeline scripts and software tools. The pipeline automatically passes the analysis settings specified in GUI to the appropriate pipeline script or software tool. The main script monitors resource usage and completion of analysis of individual samples. After all samples are analyzed, the main script evokes a series of secondary scripts to combine the data, perform filtering and differential gene expression analysis. Finally, the data is formatted into an easy-to-view format and can be downloaded through the GUI.

42 32 Figure 9 Server-side RNA-seq analysis pipeline. A) Installation and configuration. First the GUI transfers the pipeline scripts to the server or utilizes pre-installed scripts if Amazon Cloud instance is being used. Then the pipeline detects installed software and downloads and installs all the analysis tools required for the workflow using an update file on our website which is linked to the current version of CANEapp. After that the pipeline downloads required reference files from ENSEMBL. Reference indexes for STAR and TopHat, as well as gene classification files are prepared in the next step. B) Parallel alignment and reconstruction module. Samples are analyzed in parallel; first the reads go through an optional trimming step and are aligned to the genome with either TopHat or STAR. Aligned reads are used to reconstruct transcripts with Cufflinks. The module includes a resource monitor that optimally distributes available resources between subrocesses. C) Transcript filtering and classification module. ENSEMBL reference is used to classify genes generated from combining transcript files from all samples. Then the transcripts are filtered to remove potentially spurious single-exon transcripts, and unannotated transcripts and loci are analyzed to predict their ability to code for proteins. D) Gene expression and results formatting module. Cuffdiff, edger and DESeq2 are used to quantify gene expression and identify differentially expression genes. The pipeline converts output files into fully annotated tab-delimited files, as well as GTF files containing differentially expressed genes. The module also contains

43 33 primer design scripts that automate primer design for qrt-pcr validation of gene expression. Table 2. List of software packages and scripts used in CANEapp. Software name Function CANE module SRA tools FASTQ extraction from the Alignment and SRA file format reconstruction TopHat Read alignment Alignment and reconstruction STAR Read alignment Alignment and reconstruction Cufflinks Ab initio transcript reconstruction Alignment reconstruction and Cuffcompare Merging transcripts Transcript filtering and classification Samtools Nucleotide sequence extraction Transcript filtering and classification CNCI Coding potential prediction Transcript filtering and classification Cuffdiff Differential expression testing Gene expression and results formatting HTSeq Counting reads in loci Gene expression and results formatting

44 34 edger (R Differential expression Gene expression and package) testing results formatting DESeq2 (R Differential expression Gene expression and package) testing results formatting Primer 3 Primer sequence retrieval Primer design The pipeline consists of several modules, first of which is the installation module (Fig. 9A). This module will download and install all the required software (Table 2), download the reference genome and transcriptome files from ENSEMBL, according to the species and assembly specified for the project. The installation module will also build indexes for TopHat (55) and STAR (58) alignment and will prepare the reference annotation for gene classification and coding potential calculation. The next pipeline module engaged after the installation module is the parallel alignment and reconstruction module (Fig. 9B), which will first perform optional preprocessing of reads. Accepted raw data format is FASTQ or FASTQ files compressed as bz2, tar, gz, tar.gz archives, as well as in the NIH Short Sequence Archive (SRA) format. This step incudes optional extraction of archived files or files in the SRA format and library adapter trimming with our custom Python script in order to remove adapter sequences and improve read alignment and to calculate mean and standard deviation of the insert sizes based on supplied mean and standard deviation of fragment length and library adaptor length. The module will then proceed to perform alignment of RNA-seq reads

45 35 using TopHat or STAR. TopHat and STAR are used with default parameters, but the user has the possibility to specify custom parameters in the GUI. Aligned reads will be further used to perform ab initio (59) reconstruction of transcripts using Cufflinks (60), which allows identification of novel, previously unannotated transcriptome features, such as novel long noncoding RNAs. As with TopHat and STAR, the user can specify parameters for Cufflinks in the GUI. Importantly, the alignment and reconstruction module includes a real-time resource monitor that keeps track of the amount of available memory and cores to protect the system from memory or processor overload and ensure optimal resource usage for the fastest performance. Once all the individual samples have been processed, aligned and reconstructed, the data is passed to the transcript filtering and classification module (Fig. 9C). The module will first combine transcript from individual samples using Cuffcompare and then will perform optional transcript filtering described in detail below and in Fig. 10A. Transcript filtering is performed using our custom scripts and in general improves accuracy of abundance estimation and detection of novel transcripts by removing spurious transcripts such as intronic and premrna species. The transcripts are then classified into annotated and unannotated transcripts. Annotated transcripts are further assigned a gene biotype according to the ENSEMBL reference, whereas the protein-coding potential of the unannotated transcripts is predicted using Coding-NonCoding Index (CNCI) software (61) and further sub classified into novel noncoding RNAs and potentially novel protein-coding genes.

46 36 The final step of the analysis is differential gene expression analysis, which is performed by Cuffdiff (57) or by the R packages edger (62) and DESeq2 (63) using HTSeq (64) to count reads prior to processing the data in R. User can select to either perform analysis with one of the three workflows for differential gene expression analysis (Cufflinks, edger and DESeq2) or to run all three of them in parallel. The results of the entire analysis are formatted to create a single tab-delimited file for each differential gene expression analysis method containing expression values, fold changes, statistics and metadata such as gene classification and chromosomal location (Fig. 9D). The filtered and annotated Gene Transfer File (GTF) as well as aligned reads for individual samples serve as the input for Cuffdiff and HTSeq. Cuffdiff performs differential gene expression analysis and generates normalized abundance estimates, fold changes of gene expression, as well as p and FDR-corrected p values. For analysis with R packages, individual count files generated with HTSeq are combined into one count file, which is supplied to edger and DESeq2 together with the parameters defined by the user in the GUI. Cuffdiff, edger and DESeq2 output files together with the GTF files are processed by the pipeline scripts to generate unified data tables containing information on the gene id, name, biotype, read count and abundance (for Cuffdiff) in each sample and group, as well as fold change of expression between groups and p and FDR values. Separate GTF files containing only the differentially expressed genes are also generated. Data tables can be opened in excel to easily interpret the results and rank genes depending on the project goals. GTF files can be displayed in a

47 37 third-party software such as Integrated Genome Viewer (65) to visualize gene and transcripts structures and genomic locations. The final results files can be easily downloaded through the GUI as soon as the analysis is completed. After the analysis is finished, qrt-pcr validation primers for specific genes can be designed through the GUI s Primer Design tab. I automated design of primers for validating sequencing results with qrt-pcr. Our primer design tool searches for a common spliced junction that exists in all isoforms of a gene. In case there are no common junctions the program looks for an exonic region overlapping all the isoforms. After that Samtools is used to extract the nucleotide sequence of the exons spanning the junction or the exonic region where primers will be designed. Finally, the sequences are supplied to Primer 3 software that designs the primers. All the intermediate files are stored on the server and can be retrieved by the user through the terminal in case they are required for any downstream applications.

48 38 Table 3. Comparison of CANEapp with previously developed tools for RNAseq analysis. Comparison of CANEapp to other applications for RNA-seq data analysis. In order to comprehensively compare CANEapp to previously developed and published applications aiming to simplify RNA-seq analysis by providing a graphical user interface, I considered a number of features and contrasted them between the software packages (Table 3). The compared features included: 1- The ability to perform automated analysis of multiple samples and groups through a complete pipeline without the need to perform analysis of each sample at each step of the pipeline. 2- Automated installation of the application and its

49 39 components without cumbersome command line installation procedures. 3- The possibility to utilize the software on different operation systems and server architectures, including the cloud. 4- The availability of alternative analysis workflows. 5- The ability to efficiently use the computational resources and adapt to the amount of data analyzed to be suitable for analysis of large datasets and efficient implementation on high-performance systems. I compared CANEapp to six other published applications for RNA-seq analysis I are aware of. Only free softwares were included in this list. As can be seen from Table 3, CANEapp possesses all of the abovementioned features, which makes it a powerful but easy-to-use tool for comprehensive RNAseq data analysis that can be ported to a variety of server architectures and applied to large datasets without the need for step-by-step analysis or concerns about sufficiency of computational resources (which is handled by the CANEapp s resource monitor). Although some previously developed and published tools have some of these features, none combine them in one package, which limits their performance and scope of application. For instance, Galaxy offers a number of next-generation sequencing data analysis tools that can be operated through a graphical user interface. However, Galaxy does not offer automation of the analysis, and every step has to be performed manually. Moreover, the scale and speed of analysis through Galaxy server is limited, and if it is to be installed on a local server or cloud it requires installation by a person with computer science skills. Other tools such as RNA Compass offer automation of analysis and work on the cloud in addition to local server but again require

50 40 cumbersome installation and lack other important features highlighted in the Table 3. Overall, I believe that CANEapp presents significant improvements of previously developed user-friendly applications for RNA-seq analysis and will offer biologists a powerful analysis framework that can be easily ported to their favorite system and used without the need to manually perform any of the installation or analysis steps. CANEapp functionality on various Linux server architectures and its performance and accuracy in identifying differentially expressed genes from real datasets. Table 4. Description of datasets used to validate CANEapp performance to estimate differential gene expression. Name Organis Experiment N of RNA Library Singl GE m al groups sample selecti preparati e or O s on on paire protoc d- ol end Transcript Homo Alzheimer 4 vs 4 Ribo- Illumina singl GS omic sapiens s disease depleti directiona e E67 changes vs age- on l small 333 in and sex- RNA

51 41 hippocam matched prep pi of neurologic Alzheimer s disease ally normal controls patients Transcript Mus E17 cortex 4 vs 3 Poly-A Illumina paire GS omic muscul vs adult selecti mrna- d E39 changes us cortex on Seq prep 866 in embryonic and adult mouse cortex SEQC Rat Rattus N- 3 vs 3 Poly-A Illumina paire GS liver norvegi Nitrosodim 3 vs 4 selecti TruSeq d E55 toxicogen cus ethylamine on RNA 347 omics, Aflatoxin study B1 Vehicle vs treatments In order to test CANEapp performance and accuracy in estimating gene expression changes in different biological systems and experimental paradigms, I

52 42 used publically available RNA-seq data from three published studies (Table 4) together with qrt-pcr validation of gene expression changes for several genes for each study. RNA-seq data for testing performance of CANEapp were downloaded from the NIH Short Sequence Archive (SRA) as SRA files and were used directly as input files for CANEapp. The datasets included RNA-seq of hippocampi of Alzheimer s disease patients and controls (4 AD vs 4 controls), RNA-seq of developing mouse cortex (4 embryonic cortical samples vs 3 adult) (66), and RNA-seq of rat liver from the SEQC Toxicogenomics Study for chemical treatment with two chemical compounds causing DNA damage (N-Nitrosodimethylamine, NIT and Aflatoxin B1, AFL, N=3 for each treatment group, compared to a corresponding control group, N=3 and 4) (67). The human dataset was generated by sequencing RNA depleted of ribosomal RNAs, whereas mouse and rat RNA-seq data were derived from sequencing of polya-selected RNA. All three datasets were generated using different library preparation techniques. RNA in human and rat experiments was sequenced on the Illumina HiSeq 2000 machine, whereas mouse RNA-seq data was produced by sequencing RNA on Illumina GA-IIx sequencer. This diversity of experimental paradigms, organisms, RNA preparation, library generation and sequencing techniques allowed a comprehensive assessment of the robustness of CANEapp. To analyze these datasets, raw data were downloaded from SRA and CANEapp was used to perform analysis on a High-Performance Computing cluster Pegasus2 at the University of Miami and Amazon Elastic Cloud 2 (EC2). In order

53 43 to comprehensively test the functionality of CANEapp on various Linux architectures, Amazon Machine Images containing distributions of CentoOS, Ubuntu and RedHat Linux, as well as Amazon Linux, were used to create instances running these different Linux platforms. All three datasets were analyzed on this instances and Pegasus2 system using solely CANEapp application to perform analysis. After the completion of the analysis the generation of functional software binaries from source, as well as reference files, intermediary analysis files and final result files were validated to assure the stability of the pipeline independent of the server architecture. Table 5. Fold changes of gene expression in three datasets reanalyzed by CANEapp and compared to qrt-pcr results. Gene Name Cuffdiff edger_g edger_et DESeq2 QRT- Alzheimer s disease dataset SERPINE TAC ID GRM LINC RP Mouse cortex dataset Vax Caly Igf2bp Draxin Nrp Ttr Mobp Wipf Rat liver dataset Bax-AFL Cdkn1a-AFL Myc-AFL

54 44 Met-AFL Bax-NIT Cdkn1a-NIT Figf-NIT Fzd4-NIT Figure 10 Detection and of novel long noncoding RNAs by CANEapp and their validation by real-time PCR. A) Filtering strategies and protein-coding potential prediction. (Right) CANEapp preserves any transcripts that contain a splice junction (a) or single-exon

55 45 transcripts expressed in a majority of samples (c), whereas single-exon transcripts detected in a minority of samples are filtered out (b). (Center) Loci that have insufficient read coverage are not considered for differential expression testing. (Left) In order to differentiate between novel noncoding RNAs and potential protein-coding genes, each isoform from a novel locus is tested for presence of a significant open reading frame. Loci that contain at least one isoform with an open reading frame are not considered novel noncoding RNA. B) Gel electrophoresis image of PCR amplification products for experimentally validated novel long noncoding RNAs. 5 novel antisense RNAs and 3 long intergenic noncoding RNAs (lincrnas) predicted from the human RNA-seq dataset analysis were amplified with real-time PCR. For mouse cortex dataset, real-time PCR was performed on RNA extracted from adult mouse cortex. 3 antisense RNAs and 5 lincrnas were successfully validated. C) and D) Novel long noncoding RNAs span a wide range of expression levels in human and mouse tissues. Relative expression of validated long noncoding RNAs was calculated by normalizing it to the Ct value of the endogenous control beta-actin. Raw data were uploaded through CANEapp by selecting the Upload From Computer option, and STAR aligner was selected to perform alignment to the latest genome assembly available. For the rest of CANEapp options, the default settings were used. Real-time PCR results for candidate genes from each analyzed dataset were either retrieved from supplementary material for the original publication or received from the authors upon request. Once the analysis

56 46 was completed, data was downloaded from the server through the GUI, and fold changes in gene expression between experimental groups generated either by Cuffdiff, edger using Generalized Linear Model (GLM) or exact test approaches or DESeq2 were compared with qrt-pcr results for the same gene. For all three datasets, I found perfect correspondence between the direction of gene expression changes estimated from RNA-seq data analyzed with CANEapp using 4 different approaches for differential gene expression analysis and qrt- PCR. All the genes upregulated in RNA-seq were upregulated in qrt-pcr data, and the same was true for downregulated genes (Fig. 10, Table 5). For the human RNA-seq data from hippocampi of Alzheimer s disease patients and controls, I have compared fold changes in gene expression for 6 genes (4 upregulated and 2 downregulated) (Fig. 10A). R squared correlation coefficient between the fold changes in RNA-seq and qrt-pcr for these 6 genes is 0.84 for Cuffdiff, 0.88 for both analysis approaches with edger (Generalized Linear Model or exact test) and 0.68 for DESeq2, indicating accuracy and robustness of CANEapp performance on real biological RNA-seq data that, however, differs depending on the analysis approach used. Analysis of gene expression changes in mouse embryonic versus adult cortex with CANEapp and their comparison with qrt-pcr results produced a similar result (Fig 10B). For the 8 genes validated with qrt-pcr (4 upregulated and 4 downregulated), R squared coefficient between RNA-seq and qrt-pcr data was 0.96 for Cuffdiff and edger and 0.97 for DESeq2. In the case of rat liver toxicology experiment expression of all 8 tested genes (6 upregulated and 2 downregulated) was successfully

57 47 validated with qrt-pcr (Fig 10C). R squared coefficient between RNA-seq and qrt-pcr data was 0.98 for Cuffdiff, 0.73 for edger using GLM, 0.79 for edger using exact test and 0.67 for DESeq2. In all three datasets and with all 4 approaches to differential gene expression analysis, correlation of fold changes produced from RNA-seq by CANEapp and qrt-pcr was statistically significant (p<0.05) using two-tailed T test. Therefore, CANEapp demonstrates excellent performance in estimating gene expression changes in a variety of experimental designs and using RNA-seq data produced with different experimental and sequencing protocols, as well as alternative analysis approaches. This indicates that CANEapp is not only userfriendly and adaptable to different computational platforms, but is also a robust and accurate tool to perform differential gene expression analysis. Moreover, the ability to perform analysis of differential gene expression with alternative tools in parallel makes CANEapp useful for performing benchmarking experiments.

58 48 Discovery of novel long noncoding RNAs using CANEapp and their experimental validation. Figure 11 Detection and of novel long noncoding RNAs by CANEapp and their validation by real-time PCR. A) Filtering strategies and protein-coding potential prediction. (Right) CANEapp preserves any transcripts that contain a splice junction (a) or single-exon

59 49 transcripts expressed in a majority of samples (c), whereas single-exon transcripts detected in a minority of samples are filtered out (b). (Center) Loci that have insufficient read coverage are not considered for differential expression testing. (Left) In order to differentiate between novel noncoding RNAs and potential protein-coding genes, each isoform from a novel locus is tested for presence of a significant open reading frame. Loci that contain at least one isoform with an open reading frame are not considered novel noncoding RNA. B) Gel electrophoresis image of PCR amplification products for experimentally validated novel long noncoding RNAs. 5 novel antisense RNAs and 3 long intergenic noncoding RNAs (lincrnas) predicted from the human RNA-seq dataset analysis were amplified with real-time PCR. For mouse cortex dataset, real-time PCR was performed on RNA extracted from adult mouse cortex. 3 antisense RNAs and 5 lincrnas were successfully validated. C) and D) Novel long noncoding RNAs span a wide range of expression levels in human and mouse tissues. Relative expression of validated long noncoding RNAs was calculated by normalizing it to the Ct value of the endogenous control beta-actin. It is becoming more and more evident that the ability to extend analysis of transcriptome beyond expression changes in annotated gene loci and transcripts is indispensable to elucidating normal cellular processes and pathological states (68-72). For instance, a recent study analyzing thousands of RNA-seq datasets from normal tissues and cancers have annotated ~50,000 novel long noncoding

60 50 RNA transcripts and have implicated these transcripts as important markers of cancer subtypes and normal tissues types (73). Therefore, a true cutting-edge RNA-seq analysis package must include the functionality to perform accurate discovery of novel transcripts. CANEapp peroforms ab initio assembly of transcripts that is not dependent on previous transcriptome annotations and allows discovery of unannotated transcripts (59, 74). It includes a workflow (Fig. 11A) that filters single-exon transcripts that potentially originate from transcriptional noise or sequencing artifacts, filters out lowly expressed loci and classifies novel loci into noncoding RNA or potential novel protein-coding genes. In order to experimentally validate the expression of predicted novel long noncoding RNAs from the human and mouse datasets, I used CANEapp s primer design feature to design exon-junction spanning primers and performed RTqPCR experiments on RNA extracted either from human hippocampus or mouse cortex. Expression validation of novel lncrnas identified in mouse cortex and human hippocampus was performed by SYBR Green-based qrt-pcr analysis. 1mg of mouse cortex and human hippocampus RNA were converted to cdna using the high capacity cdna synthesis kit from Life Technologies. 1uL of diluted cdna was used for SYBR Green-based real time PCR analysis. Expression of each gene was normalized to Ct value of beta actin. Amplification specificity was assessed by the presence of a single peak in the melting curve analysis and by checking the size of the amplified products on 2% agarose gel electrophoresis. Overall I designed primers for 20 novel long noncoding RNAs, 10 for each dataset. 10 of those were antisense RNAs and 10 were long intergenic

61 51 noncoding RNAs. I could accurately detect expression of 15 (75%) out of 20 predicted transcripts, as is evident from the gel electrophoresis image of realtime PCR reaction products (Fig. 11B). Therefore, the novel RNA prediction workflow and primer design software were accurate and robust in two different datasets; since I used only one primer set per transcript, using a second set of primers would probably increase the rate of successfully detected transcripts. Novel long noncoding RNAs identified with CANEapp span a wide range of expression levels (Fig. 11C), suggesting that the software is accurate in detecting both lowly and highly expressed transcripts. Conclusions. CANEapp potentially represents a novel platform for integrating next-generation sequencing analysis pipelines and tools into a user-friendly suite that can be immediately accessed by scientists. One of the main challenges of highthroughput biology is integrating data from different sources and experiments. CANEapp utilizes a standardized analysis pipeline and internally generated experimental design templates and can be run on any Linux architecture by a non-expert user. The use of a standardized pipeline together with a pre-defined software-generated design template that will include all specification of the biological experiment and technical protocols can serve as a primer to develop a standard way to analyze next-generation sequencing data and high-throughput data in general. This will create an opportunity to integrate data into global databases for sharing and meta analyses. I believe that CANEapp will not only

62 52 benefit biologists in performing their RNA-seq experiments, but will also inspire and provide bioinformaticians with code source material to develop user-friendly analysis tools for various applications in genomics analyses such as analysis of gene fusions, RNA editing, circular RNA analysis and simultaneous analysis of genome and transcriptome.

63 Chapter 4: Meta-Analysis of Transcriptomic Changes in the Cortex of Autistic Patients Identifies Region-Specific Patterns of Molecular Pathology Assessing regional-specific transcriptomic and epigenetic changes in the cortex of autism patients. In order to gain insight into gene expression changes that take place in a regionspecific manner in different cortical regions in autism, I performed RNA sequencing (RNA-seq) of the prefrontal cortex samples from 9 autism patient samples and 9 matched neurologically normal controls. In addition, I retrieved raw RNA-seq data generated from 5 additional cortical samples and 2 autism patient cohorts deposited in the National Database for Autism Research (NDAR) (Figure 12A, Table 6). The samples were obtained from the Maryland Brain Bank (PFC, FIC, V1C, BA10, BA44, BA19) and the Harvard Brain Bank (BA10, BA44, BA19). All the autism cases were sporadic, with know known genetic cause of the disease. There were no significant differences in the ages of the subjects, the gender compositions or RNA integrity numbers between the control and autism groups within each area (p>0.05, Table S1). I profiled molecular changes in cortical regions that have been functionally implicated in autism and autism-related neuronal functions through fmri and cytological studies: prefrontal cortex (PFC) involved in executive function, inhibition of irrelevant activations and emotion, frontopolar prefrontal cortex or Brodmann area 10 (BA10) involved in integrating the multiple cognitive 53

64 54 operations, left-side of Brodmann area 44 (BA44), which is a part of Broca s language area, fronoinsular cortex (FIC) involved in processing of social emotions and language, Brodmann area 19 (BA19), a part of the peristriate cortex, a visual association area, and primary visual cortex (V1). Overall, I analyzed 122 cortical samples (66 control and 56 autism) from 92 individuals (52 control and 40 autism) (Table 6, Table S1). I then utilized CANEapp, an automated application for RNA-seq analysis I previously developed (75), to perform differential gene expression analysis and novel long noncoding RNA discovery. Additionally, I performed alternative splicing analysis for the PFC and frontoinsular cortex (FIC) datasets and isolated DNA from the same PFC samples used for RNA-seq to profile DNA methylation changes with bisulfite sequencing and Illumina 450K Bead Array (Figure 12B). Autism-associated transcriptomic changes converge on frontal lobe regions and are fundamentally different in the frontoinsular cortex. In order to perform RNA-seq library preparation for the PFC samples, I used NEB Ultra directional kit (NEB #E7420). The libraries were then sequenced on the Illumina HiSeq 2000 machine to generated paired-end sequencing data. I utilized CANEapp, a user-friendly RNA-seq analysis tool I previously published (75), to perform analysis of raw RNA-seq data either generated in our lab or downloaded from the NIH National Database for Autism Research (NDAR). ENSEMBL GRCh37 genome and transcriptome reference was used without performing novel transcript reconstruction.

65 55 Figure 12 Overview of the cortical regions and analysis workflows used in the current study. A) Graphical representation of the cortical regions analyzed. B) Schematic of the analysis approaches utilized.

66 56 Table 6. Summary of the experimental group sizes and sample demographics. Differential gene expression analysis of the five cortical regions identified genes (FDR<0.05) to be differentially expressed in the frontal lobe regions (PFC, BA10, BA44) and FIC. (Table 7, Table S2), whereas I identified only 51 differentially expressed genes (DEGs) in BA19 and none in V1. This result suggests that the molecular pathology caused by autism might primarily affects association and limbic areas of the cortex, and that primary sensory areas might be relatively resistant to the pathological changed caused by the disease. In order to assess global similarities and differences of gene expression changes caused by autism in different cortical areas, I performed hierarchical clustering of the 6 cortical regions based on fold change of gene expression in autism versus control using dchip tool (76) (Figure 13A). I used 634 genes that are differentially expressed in at least one region and expressed at a detectable level in all the others. Importantly, the clustering grouped frontal lobe regions PFC, BA10 and BA44 together, with PFC and its subregion BA10 demonstrating close similarity of autism-associated gene expression signatures in the, even though these two regions were sampled from two different cohorts and using different sequencing

67 57 protocols. FIC demonstrated a profile of gene expression changes in autism drastically different from that of frontal lobe regions and clustered closer with the V1 region that did not display any differential gene expression. That suggests that the pathogenesis of autism affects FIC in fashion principally different from that in the frontal lobe. In order to perform qrt-pcr validation of RNA-seq data, I utilized CANEapp to perform automated design of gene-specific primers. I then performed reverse transcription using the same RNA from the PFC samples used for RNA sequencing and qscript cdna Synthesis Kit (Quanta Biosciences, 95047). cdna was used to run SYBR Green or TaqMan qrt-pcr reactions on the Quanta Studio 6 Flex Real-Time PCR System. Each sample was run in three techniques replicates. Ct values of 18S, GAPDH, ACTB and PGK1 were averaged using geometric average formula and used as endogenous control. The relative expression values were calculated by first calculating the difference between the average Ct value across the technical replicates and the Ct value of the endogenous control and then calculating the minus 2 in the power of the resulting difference. Differential gene expression estimates highly correlated with qrt- PCR validation for 5 up- and 5 downregulated genes from the PFC (Figure 13B), proving the accuracy of our analysis. Gene set enrichment analysis (GSEA) (77, 78) using separate lists of genes differentially expressed in each cortical region revealed enrichment in several processes related to neurodevelopment and previously reported to be involved in the autism pathology (Figure 13C, Table S3). Control of cell proliferation, cell

68 58 development and immune system process activation were reported independently in the three frontal lobe regions and FIC, which is in line with multiple reports of changes in neuronal numbers and immune system activation and abnormal neural development in the cortex of autistic patients. Changes in apoptosis were detected exclusively in the frontal lobe regions, suggesting that it might further affect the neuronal numbers in these regions but not in FIC in autism. I also detected enrichment in nervous system development and vesiclemediated transport in isolated cortical regions. I did not detect enrichment in any specific processes using the list of genes differentially expressed in the BA19 (Table S3). Interestingly, genes misregulated in the FIC in autism were enriched in multiple pathways related to specific neurodevelopmental processes and previously implicated in the disease (Figure 13D). These processes were not detected in any of the other brain regions, highlighting the importance of FIC in autism pathology. Table 7. Summary of genes differentially expressed in autism in different cortical regions.

69 59 Figure 13 Gene expression analysis demonstrates convergence of autism-related transcriptional changes between different cortical regions. A) Hierarchical clustering based on fold change of gene expression between the control and autism group in each cortical region. B) qrt-pcr validation of RNA-seq gene

70 60 expression estimates for 5 up- and 5 downregulated genes. C) Gene set enrichment analysis using individual lists of genes differentially expressed in each region. Only pathways related to autism and neurodevelopment and detected in multiple regions are displayed. D) Neurodevelopmental pathways enriched exclusively in the FIC gene set. Cross-region analysis of genes differentially expressed in the frontal lobe reveals high-confidence molecular targets underlying autism pathology. Next, in order to identify genes shared between different cortical regions affected by autism, I intersected individual sets of genes differentially expressed in each of the 5 regions analyzed. First, I looked at genes shared between PFC, BA10 and BA44 (Figure 14A). Interestingly, I found 34 genes that were differentially expressed in these three regions (Tables S4). All of these genes had the same direction of change in all the three regions. GSEA of these set of 34 genes revealed that cellular proliferation was the most enriched pathway (Figure 14B), suggesting that abnormality of molecular control over neural proliferation is a common feature of the three frontal lobe regions. Closer investigation of the individual genes commonly dysregulated in the frontal lobe revealed candidate genes previously implicated in neurodevelopment and autism. S100A11 codes for a calcium-binding protein that has been shown to regulate cell cycle progressing and differentiation (79). Translocator protein 18kDa (TSPO) is a marker of reactive gliosis in the brain and has been suggested to activate proliferation of microglia and activated astrocytes (80).

71 61 Therefore, molecular pathology linked to cellular proliferation in the frontal cortex of autism subjects may not be restricted to neuronal cells and can also impact activation and proliferation of glia. Among other important genes previously linked to autism are moesin (MSN), a protein crucial for the migration of neuroblasts from the ventricular zone in the cortex and for axon outgrowth (81, 82), CD44, a marker of astrocyte precursor cells (83), and VAMP5 that is important for synaptic vesicle docking and fusion (84). Therefore, autism-related transcriptional changes in the frontal lobe converge on pathways and genes controlling neural proliferation, microglial activation, neuronal precursor migration and synaptic transmission. Interestingly, downregulation of parvalbumin, a marker of inhibitory GABAergic interneurons, has been observed in the PFC and BA10 and not in the BA44. At the same time, many immune system genes were also upregulated exclusively in the PFC and BA10, such as CD99, IL1R1, IL4R and IRF7. These observations point to a possibility that inhibitory balance and immune system activation are restricted to the prefrontal areas in autism.

72 62 Figure 14 Overlap between genes differentially expressed in different cortical regions in autism. A) (top) Overlap between frontal lobe regions PFC, BA10 and BA44. (bottom) 34 genes common between all three frontal lobe regions are highly enriched in the cell proliferation Gene Ontology process. B) (top) Overlap between the PFC and FIC. (bottom) Anticorrelation between fold changes of expression in control versus autism for the 128 genes common between PFC and FIC.

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies

More information

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq) RNA sequencing (RNA-seq) Module Outline MO 13-Mar-2017 RNA sequencing: Introduction 1 WE 15-Mar-2017 RNA sequencing: Introduction 2 MO 20-Mar-2017 Paper: PMID 25954002: Human genomics. The human transcriptome

More information

RNA-seq Introduction

RNA-seq Introduction RNA-seq Introduction DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different functional RNAs Which RNAs (and sometimes then translated

More information

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Accessing and Using ENCODE Data Dr. Peggy J. Farnham 1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000

More information

Simple, rapid, and reliable RNA sequencing

Simple, rapid, and reliable RNA sequencing Simple, rapid, and reliable RNA sequencing RNA sequencing applications RNA sequencing provides fundamental insights into how genomes are organized and regulated, giving us valuable information about the

More information

Transcriptome Analysis

Transcriptome Analysis Transcriptome Analysis Data Preprocessing Sample Preparation Illumina Sequencing Demultiplexing Raw FastQ Reference Genome (fasta) Reference Annotation (GTF) Reference Genome Analysis Tophat Accepted hits

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples DNA CLONING DNA AMPLIFICATION & PCR EPIGENETICS RNA ANALYSIS Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples LIBRARY

More information

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro. RNA sequencing overview BIMM 143 Genome Informatics II Lecture 14 Barry Grant http://thegrantlab.org/bimm143 In vivo In vitro In silico ( control) Goal: RNA quantification, transcript discovery, variant

More information

ncounter Data Analysis Guidelines for Copy Number Variation (CNV) Molecules That Count NanoString Technologies, Inc.

ncounter Data Analysis Guidelines for Copy Number Variation (CNV) Molecules That Count NanoString Technologies, Inc. ncounter Data Analysis Guidelines for Copy Number Variation (CNV) NanoString Technologies, Inc. 530 Fairview Ave N Suite 2000 Seattle, Washington 98109 www.nanostring.com Tel: 206.378.6266 888.358.6266

More information

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion,

More information

Data mining with Ensembl Biomart. Stéphanie Le Gras

Data mining with Ensembl Biomart. Stéphanie Le Gras Data mining with Ensembl Biomart Stéphanie Le Gras (slegras@igbmc.fr) Guidelines Genome data Genome browsers Getting access to genomic data: Ensembl/BioMart 2 Genome Sequencing Example: Human genome 2000:

More information

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. Supplementary Figure 1 Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. (a) Pearson correlation heatmap among open chromatin profiles of different

More information

Circular RNAs (circrnas) act a stable mirna sponges

Circular RNAs (circrnas) act a stable mirna sponges Circular RNAs (circrnas) act a stable mirna sponges cernas compete for mirnas Ancestal mrna (+3 UTR) Pseudogene RNA (+3 UTR homolgy region) The model holds true for all RNAs that share a mirna binding

More information

Finding subtle mutations with the Shannon human mrna splicing pipeline

Finding subtle mutations with the Shannon human mrna splicing pipeline Finding subtle mutations with the Shannon human mrna splicing pipeline Presentation at the CLC bio Medical Genomics Workshop American Society of Human Genetics Annual Meeting November 9, 2012 Peter K Rogan

More information

Below, we included the point-to-point response to the comments of both reviewers.

Below, we included the point-to-point response to the comments of both reviewers. To the Editor and Reviewers: We would like to thank the editor and reviewers for careful reading, and constructive suggestions for our manuscript. According to comments from both reviewers, we have comprehensively

More information

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by

More information

Ambient temperature regulated flowering time

Ambient temperature regulated flowering time Ambient temperature regulated flowering time Applications of RNAseq RNA- seq course: The power of RNA-seq June 7 th, 2013; Richard Immink Overview Introduction: Biological research question/hypothesis

More information

RNA SEQUENCING AND DATA ANALYSIS

RNA SEQUENCING AND DATA ANALYSIS RNA SEQUENCING AND DATA ANALYSIS Length of mrna transcripts in the human genome 5,000 5,000 4,000 3,000 2,000 4,000 1,000 0 0 200 400 600 800 3,000 2,000 1,000 0 0 2,000 4,000 6,000 8,000 10,000 Length

More information

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. SUPPLEMENTAL FIGURE AND TABLE LEGENDS Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. A) Cirbp mrna expression levels in various mouse tissues collected around the clock

More information

A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis

A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis APPLICATION NOTE Cell-Free DNA Isolation Kit A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis Abstract Circulating cell-free DNA (cfdna) has been shown

More information

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits Accelerating clinical research Next-generation sequencing (NGS) has the ability to interrogate many different genes and detect

More information

1 in 68 in US. Autism Update: New research, evidence-based intervention. 1 in 45 in NJ. Selected New References. Autism Prevalence CDC 2014

1 in 68 in US. Autism Update: New research, evidence-based intervention. 1 in 45 in NJ. Selected New References. Autism Prevalence CDC 2014 Autism Update: New research, evidence-based intervention Martha S. Burns, Ph.D. Joint Appointment Professor Northwestern University. 1 Selected New References Bourgeron, Thomas (2015) From the genetic

More information

A Quick-Start Guide for rseqdiff

A Quick-Start Guide for rseqdiff A Quick-Start Guide for rseqdiff Yang Shi (email: shyboy@umich.edu) and Hui Jiang (email: jianghui@umich.edu) 09/05/2013 Introduction rseqdiff is an R package that can detect differential gene and isoform

More information

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression Chromatin Array of nuc 1 Transcriptional control in Eukaryotes: Chromatin undergoes structural changes

More information

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,

More information

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological

More information

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing PacBio Americas User Group Meeting Sample Prep Workshop June.27.2017 Tyson Clark, Ph.D. For Research Use Only. Not

More information

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant

More information

Supplementary Material for IPred - Integrating Ab Initio and Evidence Based Gene Predictions to Improve Prediction Accuracy

Supplementary Material for IPred - Integrating Ab Initio and Evidence Based Gene Predictions to Improve Prediction Accuracy 1 SYSTEM REQUIREMENTS 1 Supplementary Material for IPred - Integrating Ab Initio and Evidence Based Gene Predictions to Improve Prediction Accuracy Franziska Zickmann and Bernhard Y. Renard Research Group

More information

ChIP-seq data analysis

ChIP-seq data analysis ChIP-seq data analysis Harri Lähdesmäki Department of Computer Science Aalto University November 24, 2017 Contents Background ChIP-seq protocol ChIP-seq data analysis Transcriptional regulation Transcriptional

More information

High AU content: a signature of upregulated mirna in cardiac diseases

High AU content: a signature of upregulated mirna in cardiac diseases https://helda.helsinki.fi High AU content: a signature of upregulated mirna in cardiac diseases Gupta, Richa 2010-09-20 Gupta, R, Soni, N, Patnaik, P, Sood, I, Singh, R, Rawal, K & Rani, V 2010, ' High

More information

MODULE 3: TRANSCRIPTION PART II

MODULE 3: TRANSCRIPTION PART II MODULE 3: TRANSCRIPTION PART II Lesson Plan: Title S. CATHERINE SILVER KEY, CHIYEDZA SMALL Transcription Part II: What happens to the initial (premrna) transcript made by RNA pol II? Objectives Explain

More information

Epigenetic Principles and Mechanisms Underlying Nervous System Function in Health and Disease Mark F. Mehler MD, FAAN

Epigenetic Principles and Mechanisms Underlying Nervous System Function in Health and Disease Mark F. Mehler MD, FAAN Epigenetic Principles and Mechanisms Underlying Nervous System Function in Health and Disease Mark F. Mehler MD, FAAN Institute for Brain Disorders and Neural Regeneration F.M. Kirby Program in Neural

More information

Analysis of small RNAs from Drosophila Schneider cells using the Small RNA assay on the Agilent 2100 bioanalyzer. Application Note

Analysis of small RNAs from Drosophila Schneider cells using the Small RNA assay on the Agilent 2100 bioanalyzer. Application Note Analysis of small RNAs from Drosophila Schneider cells using the Small RNA assay on the Agilent 2100 bioanalyzer Application Note Odile Sismeiro, Jean-Yves Coppée, Christophe Antoniewski, and Hélène Thomassin

More information

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012 Elucidating Transcriptional Regulation at Multiple Scales Using High-Throughput Sequencing, Data Integration, and Computational Methods Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder

More information

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits Next-generation performance in liquid biopsies 2 Accelerating clinical research From liquid biopsy to next-generation

More information

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University. Databases and Tools for High Throughput Sequencing Analysis P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University. HTseq Platforms Applications on Biomedical Sciences

More information

Analysis with SureCall 2.1

Analysis with SureCall 2.1 Analysis with SureCall 2.1 Danielle Fletcher Field Application Scientist July 2014 1 Stages of NGS Analysis Primary analysis, base calling Control Software FASTQ file reads + quality 2 Stages of NGS Analysis

More information

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis Jian Xu, Ph.D. Children s Research Institute, UTSW Introduction Outline Overview of genomic and next-gen sequencing technologies

More information

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells Ondrej Podlaha 1, Subhajyoti De 2,3,4, Mithat Gonen 5, Franziska Michor 1 * 1 Department of Biostatistics

More information

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017 Epigenetics Jenny van Dongen Vrije Universiteit (VU) Amsterdam j.van.dongen@vu.nl Boulder, Friday march 10, 2017 Epigenetics Epigenetics= The study of molecular mechanisms that influence the activity of

More information

Transcript reconstruction

Transcript reconstruction Transcript reconstruction Summary I Data types, file formats and utilities Annotation: Genomic regions Genes Peaks bedtools Alignment: Map reads BAM/SAM Samtools Aggregation: Summary files Wig (UCSC) TDF

More information

Golden Helix s End-to-End Solution for Clinical Labs

Golden Helix s End-to-End Solution for Clinical Labs Golden Helix s End-to-End Solution for Clinical Labs Steven Hystad - Field Application Scientist Nathan Fortier Senior Software Engineer 20 most promising Biotech Technology Providers Top 10 Analytics

More information

Introduction to Systems Biology of Cancer Lecture 2

Introduction to Systems Biology of Cancer Lecture 2 Introduction to Systems Biology of Cancer Lecture 2 Gustavo Stolovitzky IBM Research Icahn School of Medicine at Mt Sinai DREAM Challenges High throughput measurements: The age of omics Systems Biology

More information

Genetics and Genomics in Medicine Chapter 6 Questions

Genetics and Genomics in Medicine Chapter 6 Questions Genetics and Genomics in Medicine Chapter 6 Questions Multiple Choice Questions Question 6.1 With respect to the interconversion between open and condensed chromatin shown below: Which of the directions

More information

User Guide. Association analysis. Input

User Guide. Association analysis. Input User Guide TFEA.ChIP is a tool to estimate transcription factor enrichment in a set of differentially expressed genes using data from ChIP-Seq experiments performed in different tissues and conditions.

More information

PERSONALIZED GENETIC REPORT CLIENT-REPORTED DATA PURPOSE OF THE X-SCREEN TEST

PERSONALIZED GENETIC REPORT CLIENT-REPORTED DATA PURPOSE OF THE X-SCREEN TEST INCLUDED IN THIS REPORT: REVIEW OF YOUR GENETIC INFORMATION RELEVANT TO ENDOMETRIOSIS PERSONAL EDUCATIONAL INFORMATION RELEVANT TO YOUR GENES INFORMATION FOR OBTAINING YOUR ENTIRE X-SCREEN DATA FILE PERSONALIZED

More information

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB

RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB CSF-NGS January 22, 214 Contents 1 Introduction 1 2 Experimental Details 1 3 Results And Discussion 1 3.1 ERCC spike ins............................................

More information

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit APPLICATION NOTE Ion PGM System Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit Key findings The Ion PGM System, in concert with the Ion ReproSeq PGS View Kit and Ion Reporter

More information

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Marilou Wijdicks International Product Manager Research For Life Science Research Only. Not for Use in Diagnostic Procedures.

More information

Small RNA Sequencing. Project Workflow. Service Description. Sequencing Service Specification BGISEQ-500 SERVICE OVERVIEW SAMPLE PREPARATION

Small RNA Sequencing. Project Workflow. Service Description. Sequencing Service Specification BGISEQ-500 SERVICE OVERVIEW SAMPLE PREPARATION BGISEQ-500 SERVICE OVERVIEW Small RNA Sequencing Service Description Small RNAs are a type of non-coding RNA (ncrna) molecules that are less than 200nt in length. They are often involved in gene silencing

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1

Nature Neuroscience: doi: /nn Supplementary Figure 1 Supplementary Figure 1 Illustration of the working of network-based SVM to confidently predict a new (and now confirmed) ASD gene. Gene CTNND2 s brain network neighborhood that enabled its prediction by

More information

EBCC Data Analysis Tool (EBCC DAT) Introduction

EBCC Data Analysis Tool (EBCC DAT) Introduction Instructor: Paul Wolfgang Faculty sponsor: Yuan Shi, Ph.D. Andrey Mavrichev CIS 4339 Project in Computer Science May 7, 2009 Research work was completed in collaboration with Michael Tobia, Kevin L. Brown,

More information

Introduction. Introduction

Introduction. Introduction Introduction We are leveraging genome sequencing data from The Cancer Genome Atlas (TCGA) to more accurately define mutated and stable genes and dysregulated metabolic pathways in solid tumors. These efforts

More information

Genetics. Instructor: Dr. Jihad Abdallah Transcription of DNA

Genetics. Instructor: Dr. Jihad Abdallah Transcription of DNA Genetics Instructor: Dr. Jihad Abdallah Transcription of DNA 1 3.4 A 2 Expression of Genetic information DNA Double stranded In the nucleus Transcription mrna Single stranded Translation In the cytoplasm

More information

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford Lecture 8 Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology David K. Gifford 1 Lecture 8 RNA-seq Analysis RNA-seq principles How can we characterize mrna isoform

More information

Aberrant Expression of Long Noncoding RNAs in Autistic Brain

Aberrant Expression of Long Noncoding RNAs in Autistic Brain J Mol Neurosci (2013) 49:589 593 DOI 10.1007/s12031-012-9880-8 Aberrant Expression of Long Noncoding RNAs in Autistic Brain Mark N. Ziats & Owen M. Rennert Received: 21 July 2012 / Accepted: 20 August

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive

More information

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Supplemental Data Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Jianfu Jiang 1, Xinna Liu 1, Guotian Liu, Chonghuih Liu*, Shaohuah Li*, and Lijun

More information

RNA- seq Introduc1on. Promises and pi7alls

RNA- seq Introduc1on. Promises and pi7alls RNA- seq Introduc1on Promises and pi7alls DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different func1onal RNAs Which RNAs (and some1mes

More information

RNA SEQUENCING AND DATA ANALYSIS

RNA SEQUENCING AND DATA ANALYSIS RNA SEQUENCING AND DATA ANALYSIS Download slides and package http://odin.mdacc.tmc.edu/~rverhaak/package.zip http://odin.mdacc.tmc.edu/~rverhaak/rna-seqlecture.zip Overview Introduction into the topic

More information

Bi 8 Lecture 17. interference. Ellen Rothenberg 1 March 2016

Bi 8 Lecture 17. interference. Ellen Rothenberg 1 March 2016 Bi 8 Lecture 17 REGulation by RNA interference Ellen Rothenberg 1 March 2016 Protein is not the only regulatory molecule affecting gene expression: RNA itself can be negative regulator RNA does not need

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG)

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG) Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG) Ordering Information Acceptable specimen types: Fresh blood sample (3-6 ml EDTA; no time limitations associated with receipt)

More information

Assignment 5: Integrative epigenomics analysis

Assignment 5: Integrative epigenomics analysis Assignment 5: Integrative epigenomics analysis Due date: Friday, 2/24 10am. Note: no late assignments will be accepted. Introduction CpG islands (CGIs) are important regulatory regions in the genome. What

More information

Module 3: Pathway and Drug Development

Module 3: Pathway and Drug Development Module 3: Pathway and Drug Development Table of Contents 1.1 Getting Started... 6 1.2 Identifying a Dasatinib sensitive cancer signature... 7 1.2.1 Identifying and validating a Dasatinib Signature... 7

More information

Small RNAs and how to analyze them using sequencing

Small RNAs and how to analyze them using sequencing Small RNAs and how to analyze them using sequencing RNA-seq Course November 8th 2017 Marc Friedländer ComputaAonal RNA Biology Group SciLifeLab / Stockholm University Special thanks to Jakub Westholm for

More information

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles ChromHMM Tutorial Jason Ernst Assistant Professor University of California, Los Angeles Talk Outline Chromatin states analysis and ChromHMM Accessing chromatin state annotations for ENCODE2 and Roadmap

More information

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6 Alternative splicing Biosciences 741: Genomics Fall, 2013 Week 6 Function(s) of RNA splicing Splicing of introns must be completed before nuclear RNAs can be exported to the cytoplasm. This led to early

More information

Autism Pathways Analysis: A Functional Framework and Clues for Further Investigation. Martha Herbert PhD MD Ya Wen PhD July 2016

Autism Pathways Analysis: A Functional Framework and Clues for Further Investigation. Martha Herbert PhD MD Ya Wen PhD July 2016 Autism Pathways Analysis: A Functional Framework and Clues for Further Investigation Martha Herbert PhD MD Ya Wen PhD July 2016 1 Report on pathway network analyses in autism, based on open-access paper

More information

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland AD Award Number: W81XWH-12-1-0298 TITLE: MTHFR Functional Polymorphism C677T and Genomic Instability in the Etiology of Idiopathic Autism in Simplex Families PRINCIPAL INVESTIGATOR: Xudong Liu, PhD CONTRACTING

More information

Regulation of Gene Expression in Eukaryotes

Regulation of Gene Expression in Eukaryotes Ch. 19 Regulation of Gene Expression in Eukaryotes BIOL 222 Differential Gene Expression in Eukaryotes Signal Cells in a multicellular eukaryotic organism genetically identical differential gene expression

More information

Evaluating Classifiers for Disease Gene Discovery

Evaluating Classifiers for Disease Gene Discovery Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics

More information

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Not IN Our Genes - A Different Kind of Inheritance! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Epigenetics in Mainstream Media Epigenetics *Current definition:

More information

Evaluation of MIA FORA NGS HLA test and software. Lisa Creary, PhD Department of Pathology Stanford Blood Center Research & Development Group

Evaluation of MIA FORA NGS HLA test and software. Lisa Creary, PhD Department of Pathology Stanford Blood Center Research & Development Group Evaluation of MIA FORA NGS HLA test and software Lisa Creary, PhD Department of Pathology Stanford Blood Center Research & Development Group Disclosure Alpha and Beta Studies Sirona Genomics Reagents,

More information

2/10/2016. Evaluation of MIA FORA NGS HLA test and software. Disclosure. NGS-HLA typing requirements for the Stanford Blood Center

2/10/2016. Evaluation of MIA FORA NGS HLA test and software. Disclosure. NGS-HLA typing requirements for the Stanford Blood Center Evaluation of MIA FORA NGS HLA test and software Lisa Creary, PhD Department of Pathology Stanford Blood Center Research & Development Group Disclosure Alpha and Beta Studies Sirona Genomics Reagents,

More information

EPIGENOMICS PROFILING SERVICES

EPIGENOMICS PROFILING SERVICES EPIGENOMICS PROFILING SERVICES Chromatin analysis DNA methylation analysis RNA-seq analysis Diagenode helps you uncover the mysteries of epigenetics PAGE 3 Integrative epigenomics analysis DNA methylation

More information

ncounter TM Analysis System

ncounter TM Analysis System ncounter TM Analysis System Molecules That Count TM www.nanostring.com Agenda NanoString Technologies History Introduction to the ncounter Analysis System CodeSet Design and Assay Principals System Performance

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

Content Part 2 Users manual... 4

Content Part 2 Users manual... 4 Content Part 2 Users manual... 4 Introduction. What is Kleos... 4 Case management... 5 Identity management... 9 Document management... 11 Document generation... 15 e-mail management... 15 Installation

More information

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data.

Nature Genetics: doi: /ng Supplementary Figure 1. PCA for ancestry in SNV data. Supplementary Figure 1 PCA for ancestry in SNV data. (a) EIGENSTRAT principal-component analysis (PCA) of SNV genotype data on all samples. (b) PCA of only proband SNV genotype data. (c) PCA of SNV genotype

More information

DNA-seq Bioinformatics Analysis: Copy Number Variation

DNA-seq Bioinformatics Analysis: Copy Number Variation DNA-seq Bioinformatics Analysis: Copy Number Variation Elodie Girard elodie.girard@curie.fr U900 institut Curie, INSERM, Mines ParisTech, PSL Research University Paris, France NGS Applications 5C HiC DNA-seq

More information

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

DNA Sequence Bioinformatics Analysis with the Galaxy Platform DNA Sequence Bioinformatics Analysis with the Galaxy Platform University of São Paulo, Brazil 28 July - 1 August 2014 Dave Clements Johns Hopkins University Robson Francisco de Souza University of São

More information

Section 6. Junaid Malek, M.D.

Section 6. Junaid Malek, M.D. Section 6 Junaid Malek, M.D. The Golgi and gp160 gp160 transported from ER to the Golgi in coated vesicles These coated vesicles fuse to the cis portion of the Golgi and deposit their cargo in the cisternae

More information

Deploying the full transcriptome using RNA sequencing. Jo Vandesompele, CSO and co-founder The Non-Coding Genome May 12, 2016, Leuven

Deploying the full transcriptome using RNA sequencing. Jo Vandesompele, CSO and co-founder The Non-Coding Genome May 12, 2016, Leuven Deploying the full transcriptome using RNA sequencing Jo Vandesompele, CSO and co-founder The Non-Coding Genome May 12, 2016, Leuven Roadmap Biogazelle the power of RNA reasons to study non-coding RNA

More information

R2 Training Courses. Release The R2 support team

R2 Training Courses. Release The R2 support team R2 Training Courses Release 2.0.2 The R2 support team Nov 08, 2018 Students Course 1 Student Course: Investigating Intra-tumor Heterogeneity 3 1.1 Introduction.............................................

More information

Exercises: Differential Methylation

Exercises: Differential Methylation Exercises: Differential Methylation Version 2018-04 Exercises: Differential Methylation 2 Licence This manual is 2014-18, Simon Andrews. This manual is distributed under the creative commons Attribution-Non-Commercial-Share

More information

Multi-omics data integration colon cancer using proteogenomics approach

Multi-omics data integration colon cancer using proteogenomics approach Dept. of Medical Oncology Multi-omics data integration colon cancer using proteogenomics approach DTL Focus meeting, 29 August 2016 Thang Pham OncoProteomics Laboratory, Dept. of Medical Oncology VU University

More information

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene

More information

Serotonergic Control of the Developing Cerebellum M. Oostland

Serotonergic Control of the Developing Cerebellum M. Oostland Serotonergic Control of the Developing Cerebellum M. Oostland Summary Brain development is a precise and crucial process, dependent on many factors. The neurotransmitter serotonin is one of the factors

More information

Analyse de données de séquençage haut débit

Analyse de données de séquençage haut débit Analyse de données de séquençage haut débit Vincent Lacroix Laboratoire de Biométrie et Biologie Évolutive INRIA ERABLE 9ème journée ITS 21 & 22 novembre 2017 Lyon https://its.aviesan.fr Sequencing is

More information

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients AD (Leave blank) Award Number: W81XWH-12-1-0444 TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients PRINCIPAL INVESTIGATOR: Mark A. Watson, MD PhD CONTRACTING ORGANIZATION:

More information

A CONVERSATION ABOUT NEURODEVELOPMENT: LOST IN TRANSLATION

A CONVERSATION ABOUT NEURODEVELOPMENT: LOST IN TRANSLATION A CONVERSATION ABOUT NEURODEVELOPMENT: LOST IN TRANSLATION Roberto Tuchman, M.D. Chief, Department of Neurology Nicklaus Children s Hospital Miami Children s Health System 1 1 in 6 children with developmental

More information

DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester

DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING John Archer Faculty of Life Sciences University of Manchester HIV Dynamics and Evolution, 2008, Santa Fe, New Mexico. Overview

More information

Investigating rare diseases with Agilent NGS solutions

Investigating rare diseases with Agilent NGS solutions Investigating rare diseases with Agilent NGS solutions Chitra Kotwaliwale, Ph.D. 1 Rare diseases affect 350 million people worldwide 7,000 rare diseases 80% are genetic 60 million affected in the US, Europe

More information

York criteria, 6 RA patients and 10 age- and gender-matched healthy controls (HCs).

York criteria, 6 RA patients and 10 age- and gender-matched healthy controls (HCs). MATERIALS AND METHODS Study population Blood samples were obtained from 15 patients with AS fulfilling the modified New York criteria, 6 RA patients and 10 age- and gender-matched healthy controls (HCs).

More information

Expression of non-protein-coding antisense RNAs in genomic regions related to autism spectrum disorders

Expression of non-protein-coding antisense RNAs in genomic regions related to autism spectrum disorders Velmeshev et al. Molecular Autism 2013, 4:32 RESEARCH Open Access Expression of non-protein-coding antisense RNAs in genomic regions related to autism spectrum disorders Dmitry Velmeshev, Marco Magistri

More information