proteins Characterizing of functional human coding RNA editing from evolutionary, structural, and dynamic perspectives

Size: px
Start display at page:

Download "proteins Characterizing of functional human coding RNA editing from evolutionary, structural, and dynamic perspectives"

Transcription

1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS Characterizing of functional human coding RNA editing from evolutionary, structural, and dynamic perspectives Oz Solomon, 1,2 Lily Bazak, 2 Erez Y. Levanon, 2 Ninette Amariglio, 1 Ron Unger, 2 Gideon Rechavi, 1,3 and Eran Eyal 1 * 1 Cancer Research Center, Chaim Sheba Medical Center, Tel Hashomer 52621, Ramat Gan, Israel 2 The Everard & Mina Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 52900, Israel 3 Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel ABSTRACT A-to-I RNA editing has been recently shown to be a widespread phenomenon with millions of sites spread in the human transcriptome. However, only few are known to be located in coding sequences and modify the amino acid sequence of the protein product. Here, we used high-throughput data, variant prediction tools, and protein structural information in order to find structural and functional preferences for coding RNA editing. We show that RNA editing has a unique pattern of amino acid changes characterized by enriched stop-to-tryptophan changes, positive-to-neutral and neutral-to-positive charge changes. RNA editing tends to have stronger structural effect than equivalent A-to-G SNPs but weaker effect than random A-to-G mutagenesis events. Sites edited at low level tend to be located at conserved positions with stronger predicted deleterious effect on proteins comparing to sites edited at high frequencies. Lowly edited sites tend to destabilize the protein structure and affect amino acids with larger number of intra-molecular contacts. Still, some highly edited sites are predicted also to prominently affect structure and tend to be located at critical positions of the protein matrix and are likely to be functionally important. Using our pipeline, we identify and discuss several novel putative functional coding changing editing sites in the genes COPA (I164V), GIPC1 (T62A), ZN358 (K382R), and CCNI (R75G). Proteins 2014; 82: VC 2014 Wiley Periodicals, Inc. Key words: RNA editing; ADAR; thermostability; RNA modification; RNA-seq; protein structure analysis. INTRODUCTION A-to-I RNA editing (de-amination of adenosine to inosine) is a widespread post transcriptional modification 1 10 catalyzed by the ADAR (adenosine deaminase acting on RNA) family of enzymes. In human, the family includes ADAR, ADARB1, and ADARB2 (synonymous: ADAR1, ADAR2, and ADAR3, respectively). 6,11 14 Changes in RNA editing are essential to normal development. Adar knock-out mice died with disseminated apoptosis at embryonic stage, reflecting Adar s important role in early hematopoiesis in the embryonic liver. 15,16 Adarb1 knock-out mice suffer from epileptic seizures and died at very early age. 17,18 Editing was found to be connected with several human disorders and diseases. 19 Abnormal editing at the 5-hydroxytryptamine (serotonin) receptor 2C (HTR2C) was found to be linked with depression, schizophrenia and suicide. 19 Defective Additional Supporting Information may be found in the online version of this article. Abbreviations: AA, amino acid; ANM, anisotropic network model; CDS, coding sequence; DB, database; dbnsfp, database for non-synonymous SNVs functional predictions; MPS, massively parallel sequencing; PDB, protein data bank; PPI, protein protein interaction; RNA-seq, RNA sequencing; SNP, simple nucleotide polymorphism; WTS, whole transcriptome sequencing. Grant sponsor: Flight Attendant Medical Research Institute (FAMRI), Israeli Centers of Research Excellence (I-CORE), ISF Gene Regulation in Complex Human Disease Center, ISF Chromatin and RNA Gene Regulation Center. G.R. holds the Djerassi Chair in Oncology, Tel Aviv University, Israel. O.S. and E.E. performed the bioinformatics and statistical assays, analyzed the data, and wrote the manuscript. L.B. and E.Y.L. helped analyzing the human bodymap data, conceived ideas and helped writing the manuscript. R.U., N.A., and G.R. conceived ideas and helped writing the article. *Correspondence to: Eran Eyal, Cancer research Center, Sheba Medical Center, 2 Sheba Rd., Ramat Gan, Israel eran.eyal@sheba.health.gov.il Phone: ; Fax: Received 15 April 2014; Revised 28 July 2014; Accepted 11 August 2014 Published online 18 August 2014 in Wiley Online Library (wileyonlinelibrary. com). DOI: /prot VC 2014 WILEY PERIODICALS, INC. PROTEINS 3117

2 O. Solomon et al. editing at GRIA2 mrna was connected with amyotrophic lateral sclerosis (ALS) etiology. 20 Mutations in ADAR were associated with up-regulation of the interferon pathway and found to cause Aicardi-Goutières syndrome. 21 Altered editing levels were also detected in different types of cancers. 22,23 The role of abnormal editing was mostly studied in brain tumors, in particular in glioblastoma multiforme (GBM). 24 ADAR was found to play an important role also in chronic myeloid leukemia (CML) tumorigenesis. 25 RNA editing in the coding region of AZIN1 was found to contribute to human hepatocellular carcinoma (HCC) pathogenesis. 26 Inosine is recognized as guanosine by cellular machineries such as splicing and translation. In addition, current sequencing methods identify inosine as guanosine. This makes the identification of editing sites an appealing bioinformatics problem and a body of recent studies have been devoted to detection of additional editing sites. 1 5,8 10,27,28 It has been recently estimated that millions of RNA editing sites are found in the human transcriptome. 1,27 Most of the RNA editing sites in human are located in Alu repeats at non-coding regions (3 UTRs and introns) ADAR and its RNA editing activity were shown to be connected with splicing and gene expression regulation Recently, we detected changes in splicing pattern on transcriptome-wide scale following knock-down (KD) of ADAR. 29 According to current estimations (RADAR 36 database) there are 2411 editing sites which are located in coding sequences (CDS), 36,37 only few of which are well studied and known to affect the function of the protein products (such as the sites in GRIA2, 17 KCNA1, 39 or AZIN1 26 ). Most of these sites, however, are poorly studied and have not been carefully examined in the context of the protein. The gap between the characterization of the sites at the RNA level and the ability to understand the implications on the protein product is clearly increasing in the massively parallel sequencing (MPS) era. In the current study, we examined the evolutionary, structural and functional effects of coding RNA editing using MPS data from various sources including Illumina human bodymap. 40 For the first time, we combined the editing data with available protein structural data in order to investigate common structural and functional preferences of human coding RNA editing sites. We were able to find statistical significant preferences for the types of amino acid changes induced by editing sites within coding regions. Most editing sites are edited at low level and we show correlation between the editing level and the structural significance of the amino acid change as well as the predicted effect on thermostability. Finally, we point to several new cases in which highly edited sites are predicted by computational tools to have significant effect on the protein structure and function. Further experimental work is still needed in order to explore the editing dependent regulation in these examples. METHODS Construction of a dataset of coding RNA editing sites in human A-to-I RNA editing sites were obtained from recent studies which identified editing events based on RNAseq. 1 5 In the current study RNA editing sites were not identified ab initio and we associate structural and functional features only to previously characterized editing sites, gathered from recent publications and updated RNA editing databases (RADAR 36 and DARNED 41 ). These RNA editing sites were intersected using bedtools 42 with coding regions of RefSeq transcripts 43 downloaded from UCSC table browser. 44 Nonsynonymous (NS) coding RNA editing were retained for downstream analysis and intersected with the database for nonsynonymous SNVs functional predictions (dbnsfp) 45 which gathers information about NS amino acid (AA) changes, evolutionary conservation (Phylop score) and prediction of variant effect based on different tools (SIFT, polyphen2, LRT, MutationTaster, MutationAssessor, and FATHMM). For each editing site, the number of methods predicting it to be deleterious was registered. Sites predicted to be deleterious based on more methods, were considered as having stronger probability for deleterious effect of the editing (see Supporting Information Table S7 for CDS editing sites mapped to dbnsfp information). Equivalent dbnsfp data regarding simple nucleotide polymorphism (SNPs) and random sites were collected. All coding RNA editing sites were manually intersected with dbsnp. Sites reported to be SNP were excluded, unless the reported source data for the SNP is cdna. Such SNPs can be RNA editing events. 46,47 All NS RNA editing sites reported in this manuscript, were also examined in the exome sequencing project data (NHLBI ESP, 48 which includes variants based on exome sequencing data of 6500 individuals) to verify that they do not represent genomic polymorphism. For editing sites with more than two different reported editing levels in the Illumina s bodymap (e. g. when the editing level is reported for more than one tissue sample), editing level associated with the sample having the maximal coverage (maximal A 1 G reads number) was taken for downstream analysis. We used only high quality reads (edited position having Phred quality score >30) and therefore our calculated editing levels may be somewhat different than the editing levels reported in RADAR database (DB) based on the same raw data. RNA editing sites with their published editing levels were also downloaded directly from RADAR 36 (see Supporting Information Table S5). RADAR considers different 3118 PROTEINS

3 Structural and Functional Preferences of Coding RNA Editing Table I The Number of Changes for Each NS Type Resulting from RNA Editing, A-to-G SNPs or Random A-to-G Replacements in the CDS of the Same Genes Type Editing SNP Random Editing-ratio SNP-ratio Random-ratio a * > W (1) D > G (100) E > G (100) H > R (66) I > M (38) I > V (99) K > E (69) K > R (3) M > V (98) N > D (99) N > S (81) Q > R (1) R > G (1) S > G (1) T >A (5) Y > C (80) Total Dark gray background: editing is significantly enriched (P < 0.01); light gray background: editing is significantly depleted (P < 0.01). a In parentheses, the rankings of the original type of change resulted from original RNA editing comparing to 100 random selection trials of adenines. This is presented as score between 1 and 100.1: most frequent. 100: least frequent. sources for editing sites in addition to the human bodymap. Calculating editing level from Illumina human bodymap Illumina human bodymap data were downloaded from SRA 49 and aligned to the human genome (hg19) using Bowtie 76 (using the following flags: -n 3, -l 20, -k 20, -e 140 -best). All previously identified coding RNA editing sites (as described above) were called from Illumina human bodymap data 40 by counting the number of G found at these sites. A position was considered as edited in the human bodymap data if it has reads showing G whereas A is the corresponding base in the human reference genome (hg19), and the variant mean quality score is higher than 30 (representing sequencing error rate of 0.001). Editing levels in human bodymap were attached to each coding editing site. A site was considered as lowly edited if less than 5% of the reads which cover it show G. A site was considered as highly edited if more than 5% of its covering reads showing G in at least two reads. In addition, we required a minimal coverage of two reads in at least one bodymap sample to consider a site in the analysis. A minimal coverage of at least five reads was requested in order to consider a site as highly edited (see Supporting Information 1, Table S6). For the analysis of correlation between editing levels and thermostabilty predictions or amino acid intra molecular contacts (see below), we used editing levels based on the above criteria and also from RADAR. 36 This allows us to increase the number of editing sites under investigation. For sites with editing levels reported in both sets, the maximal value was taken for the analysis. However, for sites with very small coverage in the human bodymap we used the reported edited level in RADAR and not deduced it from the few bodymap reads. This maximal editing level is reported in the examples of COPA, GIPC1, ZN358, and CCNI. Random A-to-G substitutions control and SNP control Adenosines within CDS mrna of the edited proteins were randomly changed to guanosine. These in silico edited sequences were translated into proteins and their properties were compared to those of the original coding editing sites. A-to-G SNPs in CDS of the same edited proteins were taken from dbsnp and were tested against the original coding editing sites (see Table I for details of SNP and random sites used). Secondary structure prediction and disordered prediction Protein secondary structure prediction was done using profphd 51 and SSpro. 52 Disordered prediction was done using IUpred 53 and disopred. 54 Protein-protein interaction (PPI) information PPI information tables for human were taken from Reactome, 55 IBIS, 56 MINT, 57 and preppi. 58 Structural modeling All edited proteins were aligned to protein sequences of known structures in the PDB (taken from PISCES 59 PROTEINS 3119

4 O. Solomon et al. server) using Blastp 60 with default parameters. Structures of proteins with sequence identity >25% and aligned region >80AA were downloaded from the PDB and used for downstream analysis. In order to carefully locate the editing sites which actually code for amino acids present in the atomic coordinates, we used S2C ( fccc.edu/guoli/s2c/). For FoldX analysis and intramolecular contacts analysis (described below) we included only structures having >50% identity. Structural models for edited proteins presented in the section of novel examples were downloaded from Mod- Base. 61 We used only models which cover the editing sites and have >25% sequence identity with their template in a region longer than 80 AA. These structural models were compared with models in Swiss-model 62 using the structural genomics portal ( Models having the higher sequence identity with their template were eventually taken for further analysis. These models were also visualized using Jmol ( jmol.sourceforge.net/). If a model does not include the edited amino acid or the region is a part of long unstructured fragment, this model was removed. Biological assemblies of the templates were downloaded from PISA. 63 In order to model the edited version of the protein (as can be seen in Figs. 6 and 7) we used the side chain modeling of SCCOMP 64 and SCWRL4. 65 TM-align 66 was used for structural alignment of model to template. Structural derived statistics were mostly based on known structures in PDB. The thermostabilty analysis for individual examples (COPA, GIPC1, ZN358 and CCNI), the side-chain prediction (as in Figs. 6 and 7) and the calculation of the theoretical b-factor (as in the Fig. 5), were done based on the theoretical models. In the Supporting Information files of this manuscript we provide tables with the structural details of CDS editing sites (Supporting Information, Tables S1 S8 with detailed explanation in Sup2). We also constructed a simple website ( which provides a visualization of the edited positions on the best template structures using Jmol. Calculation of intramolecular contacts Intra-molecular contacts for each edited AA covered by a known protein structure were determined by calculation of common contact surface using the Voronoi procedure as described in McConkey et al. 67 Structures having >50% sequence identity with the edited proteins were used for the analysis of correlation between AA intramolecular contacts and the editing level. Anisotropic network model (ANM) analysis Theoretical b-factors were calculated for the structures using the inverse of the hessian matrix of ANM, 68 and were plotted in order to identify edited amino acid with local-minimum of b-factor. These sites are often related to cooperative functional motion of the protein or participate directly in catalytic activity and functional interactions. Profiles of the three low frequency (slow) modes for each protein under investigation were taken from ANM website. For CCNI R75G we cross validated the ANM results with analysis of the residues contact network obtained using both SARIG 69 and Jamming. 70 Thermostability predictions FoldX 71 was used for prediction of thermostabilty changes using structural information (see Supporting Information Tables S1 and S2). Known structures having >50% sequence identity with the edited proteins were included in the analysis of correlation between FoldX predicted DDG and the editing level. In addition, for selected proteins used in the manuscript (see novel examples below), the theoretical models were optimized and mutated using FoldX. Statistics All statistical tests were done using the R statistical programming language. 72 RESULTS Construction of a dataset of human coding RNA editing sites In order to draw common structural and functional patterns related to human coding RNA editing, we first constructed a dataset of coding RNA editing. Editing sites were taken from recently published studies, mainly whole-transcriptome sequencing (WTS or RNA-seq), and from RNA editing databases 36,41 and intersected with coding sequence (CDS) annotation of RefSeq genes. 43 These regions were then mapped to RefSeq and Uniprot proteins. The protein sequences were then aligned against the protein data bank (PDB) 73 and Modbase. 61 These steps resulted with 1066 sites identified in the CDS of genes (well annotated sites according to RefSeq), 708 of them are non-synonymous (NS) (Table I). The numbers of known RNA editing, and in particular of CDS RNA editing in human, were widely increased following recent RNA-seq studies. Until several years ago only dozens of such sites were assumed. Li et al. 5 carefully verified 55 sites in CDS using padlock capturing (considered as the stringiest set class1, in this publication). Peng et al. 3 identified 63 in CDS and Bahn et al. 4 identified 53. More recently, Ramaswami et al. 1 identified 441 editing sites in CDS and in two useful efforts for the editing community, Ramaswami and Li 36 and Kiran et al. 41 gathered all known RNA editing sites to RADAR and DARNED databases (DB) respectively; 2411 sites in 3120 PROTEINS

5 Structural and Functional Preferences of Coding RNA Editing RADAR and 710 sites in DARNED are annotated as coding RNA editing. Xu et al. 74 used 1783 CDS RNA editing sites in their analysis. The overall numbers are heavily dependent on the detection approach and the exact gene annotation used. In the current study we gathered coding RNA editing from several sources (see methods) and used RefSeq 43 annotation for coding regions, which is considered conservative and carefully curated. When other, more speculative, gene annotations are used, the number of coding RNA editing can be significantly higher, as is the case in the RADAR DB (2411 editing sites annotated as CDS RNA editing 36 ). We used the human bodymap data (see methods) and the RADAR database in order to get the editing frequencies. Of the 708 editing sites gathered and detected as NS coding RNA editing, 140 are associated with structural information in the PDB. In 60 of them the coordinates of the amino acid affected by the editing are present in the file (for further details see methods and Supporting Information, Tables S1 S8). We also provide a simple website ( to assist visualization of the modified amino acid mapped on the best template structures using Jmol. Out of the 1066 RNA editing sites located in CDS of RefSeq genes (see Supporting Information 1, Table S8), 321 are found in repeat regions (305 in Alu repeats), whereas out of the 708 RNA editing sites which result in NS changes, 226 are located in repeat regions (217 in Alu repeats). The fraction of editing sites within repeats among CDS RNA editing sites (226/ ) is significantly lower than the fraction among all editing sites (98% out of sites, all sites with known editing level according to RADAR DB. Fisher, P < ). We cannot rule out that part of this trend originates from false-positive editing sites in CDS, as the false identification rate of non Alu editing sites is larger as previously shown. 2 The fraction of coding RNA editing in repeats is higher than the fraction of CDS A-to-G SNPs in the same genes located in repeats (20/2046 A-to-G SNP are found in repeats, only two of them in Alu repeats). When looking only at NS SNP, 14 out of 1391 (0.01) are located in repeats (and two in Alu repeats). RNA editing results in a unique pattern of amino acid changes The above set identified editing sites which overlap with CDS of annotated RefSeq genes and cause NS amino acid changes. We found that there are fewer NS editing sites than expected by chance [empirical P < 0.01 as determined by 100 random trials of A-to-G substitutions as described in methods; Fig. 1(A)]. This is explained by the observation that 35% of the coding RNA editing sites are located in the wobble codon. This ratio is similar to the ratio of A-to-G SNP (33% in the Figure 1 RNA editing results in a unique pattern of amino acid changes. A. Density plot of the NS AA changes resulted from random A-to-G replacements (100 random trials). The number of original NS changes in RefSeq proteins resulted from RNA editing is marked with an arrow. B. Changes by amino acid charge. POS: positive charge, NEG: negative charge, NEU: neutral charge, STOP: codon. ** P < wobble codon) but is significantly larger than random replacement from A-to-G which occur only in 26% in wobble codons (Fisher, P < ). This observation suggests that coding RNA editing is subjected to evolutionary pressure and it may be that the ADAR motif developed in that manner. We next examined the number of NS changes from each type. RNA editing resulted with enriched number of Q-to-R, R-to-G, S-to-G, and stop-to-w substitutions compared to random A-to-G replacements. These substitutions are also enriched compared to A-to-G changes observed in SNP databases within the coding sequences (CDS) of the same genes (Fisher, P < 0.001; Table I). Overall, RNA editing contributes to relatively high numbers of charge changes from neutral-to-positive and positive-to-neutral [Fig. 1(B)] as well as stop-to-neutral. Similar trends for both NS changes and AA charge changes were found for a subset of coding RNA editing sites recorded as frequently edited (edited at high frequency) in human bodymap data (see below). The trend for this subset is less significant, mainly due to the smaller number of sites (Supporting Information 2, Tables S9 S10). Interestingly, most of the stop-totryptophan changes (5/8) were found to be frequent in PROTEINS 3121

6 O. Solomon et al. Figure 2 Comparison between RNA editing sites, genomic polymorphic sites and random A-to-G replacements. A. Density plot for the distribution of conservation score (Phylop score). Editing: editing sites. SNP: A-to-G SNPs in the CDS of the same genes. Random: Random A-to-G replacements in the CDS of the same genes. B. Ratio of changes predicted to have deleterious effect. x-axis: Number of prediction methods agree on the deleterious effect (for details see methods). White: Editing sites. Gray: A-to-G SNPs in the CDS of same genes. Black: random A-to-G replacements in the CDS of the same genes. **P < the human bodymap data (see Supporting Information 1, Table S7). Editing dependent AA changes are depleted in D-to-G and I-to-V (Fisher, P < Table I). Coding RNA editing derived AA changes distribution is significantly different than that cause by A-to-G SNPs or random A-to- G replacements in CDS of the same genes (v 2 P < for both comparisons). These significant differences may be partially explained by the ADAR preference for downstream G and upstream U,, and a depletion of upstream G to the edited A (the ADAR motif ). 1,4,5,8 Indeed, in the most frequent codon changes G is downstream to the edited A. Still, even after taking these preferences into account, and excluding all codons with upstream G or U, or downstream G, we observe that the distribution of NS changes derived from RNA editing is different from that of SNPs or random A-to-G replacements (v 2 P < 0.05 for both comparisons). are not involved in more protein-protein interactions (PPI) than other un-edited proteins (using data from Reactome, IBIS, MINT, and preppi ). Interestingly, we found that coding Alu repeats (exonized Alu) are frequently located in disordered regions of the protein. This observation complements previous analyses that found enrichment of Alu repeats in alternatively spliced (AS) exons 30,77,78 and that tissue specific AS exons are enriched in disordered regions of the protein. 79 It is also in agreement with our recent finding that reduction of ADAR level change the splicing pattern of genes which are enriched with Alu repeats. 29 It is possible that disordered regions better tolerate alterations such as transposition events and editing events. Overall, RNA editing sites are not statistically enriched within particular structural motifs or disordered regions. However, Alu repeats, the preferred substrates for ADAR editing, may play a role in regulation or creation of protein disordered structures, 80 so relation between editing and disorder may still exist. RNA editing sites are not enriched in specific protein 2D structure or disordered regions We next examined if the AA translated from codons modified by RNA editing has any preferences for a specific protein secondary structure or disordered regions. We found that editing sites do not have clear preferences for specific protein 2D elements (using profphd 75 ). Similarly, no preference for editing sites to affect protein segments predicted to be disordered (using IUpred 53 ) was found (Supporting Information Fig. S1). Other prediction methods (Disopred and SSpro 52,54 ) show similar results (data not shown). In addition, the edited proteins Infrequent RNA editing sites tend to have stronger predicted deleterious effect It is of interest to examine the level of evolutionary conservation in genomic sites which undergo RNA editing. We found that RNA editing sites tend in general to be located in positions with lower conservation score than A-to-G SNPs or random A-to-G replacements in CDS of the same genes [Wilcoxon P < for both comparisons; Fig. 2(A)]. Accordingly, coding RNA editing is predicted to have a weaker damaging effect than random A-to-G in CDS of the same genes [P < ; Fig. 2(B)] based on six different methods, which 3122 PROTEINS

7 Structural and Functional Preferences of Coding RNA Editing Figure 3 Lower frequency coding RNA editing sites are located at more conserved positions and cause stronger deleterious effect. A. Density plot for the distribution of Phylop score. Gray vertical fill: editing sites with editing level (G/(A 1 G)) 5% (for details see methods). Horizontal fill: editing sites with editing level <5%. B. Ratio of changes predicted to have deleterious effect. X-axis: number of prediction methods agree on the deleterious effect (for details see methods). Gray bars: editing sites with editing level 5%. White bars: editing sites with editing level <5%. **P < *P < predict deleterious effect of NS variants (see methods). Random A-to-G replacements have higher probability to be damaging than A-to-G SNPs (P < ). The fraction of A-to-G SNPs not predicted to be deleterious in CDS by any program is higher than coding editing sites or random A-to-G sites in CDS of the same genes (no prediction program indicates the SNP as deleterious. Fisher P < 0.02 for both comparisons). Similar trend was seen when we considered only frequent NS RNA editing sites (see below). Random A-to-G replacements tend to be located at more conserved positions than A-to-G SNP or frequent RNA editing sites and are predicted to have stronger deleterious effect (Supporting Information 2, Tables S11 S12). Taken together, these results imply that the functional effect resulted from coding RNA editing is weaker than in silico random mutations, suggesting again that editing coding sites undergo evolutionary selection. We next compared features of sites edited at high frequency (102 coding RNA editing sites in RefSeq genes with more than 5% of the reads in at least one sample of Illumina human bodymap, 40,81 where the editing levels are based on high quality reads, see Methods) and of sites with lower frequency editing level (less than 5%, see methods). Lower frequency editing sites tend to be located at more conserved positions [Wilcoxon P ; Fig. 3(A)]. The distribution of conservation scores for sites with lower frequency editing is not different than that of random A-to-G replacements (Wilcoxon P ). However, the distribution of conservation scores for the sites with high frequency editing is different (Wilcoxon P < ). Lower frequency editing seems to have significantly stronger deleterious effect than high frequency edited sites [P , Fig. 3(B)]. By comparing Blosum62 values of AA changes resulted from lowly edited sites to Blosum62 values of AA changes resulted from highly edited sites, we found that the lowly edited sites tend to have significantly lower value (Wilcoxon P ). In other words, changes in highly edited sites tend to be more conservative. Using the structural data we examined the correlation between the free energy resulted from the amino acid substitution and the editing level. We used FoldX 71 to model the mutated amino acid due to the editing event and to predict the thermostablity changes upon editing. The results suggest that lowly edited sites tend to give rise to lesser stable proteins and have stronger deleterious effect (Fig. 4). In general, the vast majority of the editing events tend to destabilize the proteins but this tendency is more apparent for sites edited in low frequency comparing to sites edited at high frequency. A statistically significant anti-correlation exists between the editing level and the change in free energy (R , P ; FoldX predicted DDG for the edited structures are detailed in Supporting Information 1, Table S1 S2). When we conducted the thermostabilty analysis with a nonredundant set of structures having more than 90% identity to the edited protein and a single structure per editing site, a similar trend was observed (as can be seen in and in Supporting Information Fig. S2). The trend was not significant in this analysis due to the much smaller number of structures (R , P ). PROTEINS 3123

8 O. Solomon et al. Figure 4 FoldX prediction for free energy change resulted from the editing events. Each point represents a structure. The x-axis shows the editing level, while the y-axis shows the free energy change. Related trend was also observed by analysis of the number of contacts created by the amino acids affected by the editing. We found that the number of contacts exist in the unedited version of the protein structures is anti-correlated with the editing level in these positions (R , P < 0.01). The amino acids (translated by unedited version) at lowly edited sites tend to have more intramolecular contacts, while amino acids (translated by the unedited version) at highly edited sites tend to have less intra-molecular contacts. We also examined how often editing sites are located within protein domains, as defined in the interpro database, and found that lowly edited sites are located more often at functional domains than expected (P < 0.05). Examples of frequently edited CDS sites with functional implications The section above suggests that most of the coding edited sites have modest structural effects. Some of them are simply functionally neutral while others are deleterious but affect only a tiny fraction of the transcripts. There are, however, some frequently edited sites predicted to be highly deleterious. In fact, most CDS editing sites found to be conserved in mammals 82 are also predicted to have stronger deleterious effect by various methods (Wilcoxon P < ) than other CDS editing sites. It is likely that this collection of editing sites represents the handful CDS sites with functional implications and the deleterious annotation should be considered more broadly as sites which likely influence function. The high editing frequency in these cases reflects the real importance of the editing event in cellular regulation. To address this question we gathered structural information regarding these cases. We mapped the sequences surrounding only frequent edited sites (editing level 5%) and considered as deleterious by at least two different prediction methods, to available structural data (either from experimental source or from reliable theoretical models). This step resulted in only 16 coding editing sites, which obey all conditions (see Methods). Structural-derived information, which can hint on functionality of particular site in the protein structure context, is the protein dynamics. Important sites, crucial for stability and catalysis are usually located at positions being relatively fixed and rigid, 83 which can be distinguished as minima on dynamic profiles. This tendency can be examined, for example, by looking on X-ray derived temperature factors (b-factors). Using the Anisotropic Network Model (ANM) 68 we calculated the theoretical b-factors for the proteins with significant editing. We found that fourteen of the sixteen sites have relatively low theoretical b-factor comparing to their vicinity ([23, 13] AA). Mobility profiles of the theoretical b-factor for six of these proteins are shown in Figure 5, demonstrating the general tendency that editing sites are located at theoretical b-factor minima. This highlights the importance of these editing sites and their strategic location. It is also known that the most functionally important normal modes are the low frequency modes ("slow modes") which are the most collective and functional relevant. We examined the position of the editing sites along the profiles of the slow modes and got similar trend to that obtained with the theoretical b-factors. The editing sites are located almost exclusively at minima of the slow modes profiles (Supporting Information Fig. S3). We next present examples of frequently edited sites, predicted to bear strong deleterious effect. Some of these changes, predicted to be deleterious at the protein level, might be in fact beneficial at the organism level. So sites classified as deleterious by programs which base their predictions on single, isolated, protein structures, might have a functional role, which is not necessary harmful at the system level. Glutamate receptors (GRIA2/GRIA3 R-to-G, Q-to-R and GRIK2 Q-to-R) We were reassured to find in our data set of functional editing sites the known examples of glutamate receptor 2/3 (GRIA2/GRIA3) R-to-G and Q-to-R sites (GRIA3 R775G chrx: Uniprot id: P GRIA2 Q607R chr4: Uniprot id: P GRIA2 R764G chr4: , hg19.). These sites were shown to have low theoretical b-factor [Fig. 5(A,B)]. Four out of six prediction programs agree on the "deleterious" effect of R764G, Q607R of GRIA2 and Q775R of GRIA3. As was previously shown, the GRIA2 Q607R site is crucial to early postnatal development. 17,18 Similarly, the structural model of the glutamate receptor ionotropic, 3124 PROTEINS

9 Structural and Functional Preferences of Coding RNA Editing Figure 5 Theoretical b-factor profiles of edited proteins. The modified amino acids due to the editing events are marked by arrows. A. GRIA2 editing affects Q607R. B. GRIA3. R775G. C. GRIK2. Q621R D. KCNA1. I400V (in this model I400 correspond to the 75th AA position). E. GIPC1. T62A. F. CCNI. R75G. Note that in all cases the edited residue is located at local minima. kainate 2 (GRIK2; Uniprot id Q13002) reveals lower theoretical b-factor at Q621 [Fig. 5(C)] and "deleterious" effect of the editing, predicted by four out of six prediction programs. GRIK2 is known to be almost completely edited (90% of the receptors) in the gray matter while less edited in the white matter (10% of the receptors are edited). As in GRIA2, editing of GRIK2 at Q621R site determines its permeability to Ca 12 ions. 84,85 conserved from human to drosophila. 39 Five out of six prediction programs agree upon the "deleterious" effect of this editing site. This editing indeed has low theoretical b-factor [Fig. 5(D), seen here as I75V], supporting the importance of the site. KCNA1 I400V allows faster kinetics. 86 Moreover, recent study has shown that editing of I-to-V in related potassium channel of octopus is responsible for adaptation to cold water 87 by changing the channel closing speed. Potassium voltage-gated channel (KCNA1) I400V Another fascinating example is that of the Potassium voltage-gated channel subfamily A member 1 channel (KCNA1 or Kv1.1. Uniprot id: Q09470) I400V (chr12: /hg19), a known editing site found to be Novel examples, in which editing in the coding sequence is likely to have a functional role Zinc finger protein 358 (ZN358) K382R ZN358 (Zinc finger protein 358). Uniprot id: Q96SR6; Reported editing level in RADAR DB %; Edited in PROTEINS 3125

10 O. Solomon et al. Figure 6 Coding changing editing sites at ZN358 K382R and COPA I164V. A. Sequence logo of ZN358 ZINC_FINGER_C2H2_1 motif (Prosite ID: PS00028). K382 is marked with an arrow. B. Structural alignment of ZN358 model (gray) with the template (pink. pdb id: 1mey). Left: pre-edited structure (K382). Right: post-edited (R382). K382 and R382 are colored in green. C. Sequence logo of COPA WD motif (Prosite id: PS00678) the edited I-to-V is marked with an arrow. D. COPA structural model. Dark green: Ile. Light green: Val. thyroid, testes, prostate, lymph nodes, lung, adipose, breast, and colon tissues) is a member of the zinc finger protein family. It is a DNA binding protein which is involved in transcription regulation. It was found to be a pro-apoptotic tumor suppressor and is commonly silenced in cancer. 88 Its editing (chr19: /hg19) results with K382R change at a conserved position within the DNA-binding domain [ZINC_FINGER_C2H2_1. Prosite id: PS00028, Fig. 6(A)]. The structural model shows that the post-edited arginine is more distant from the DNA than the pre-edited lysine [from 3 Åto 7.7 Å. Fig. 6(B)]. This editing site is considered as deleterious by three out of six prediction programs. PDZ domain containing protein (GIPC1) T62A GIPC1 (PDZ domain containing family, member 1). Uniprot id: O Editing level in human body- Map %, edited in prostate tissue) belongs to the GIPC protein family. This protein regulates cell surface receptor expression and trafficking of proteins. 89 Its RNA editing at chr19: /hg19 results with T62A change and shows low theoretical b-factor [Fig. 5(E)]. This editing site is considered as deleterious by two different prediction programs. Interestingly, modeling the edited amino acid using FoldX 71 resulted in slightly more stabilized structure (DDG , from GIPC1 theoretical model) PROTEINS

11 Structural and Functional Preferences of Coding RNA Editing Figure 7 CCNI coding RNA editing. A. Sequence logo of the Cyclin motif (Prosite id: PS00292) the edited R75 is marked with arrow. B. Structural model of CCNI aligned to its template (pdb ID: 1w98). The CCNI model appears in pink while the template CCNE in gray and CDK2 in off-white (pdb ID: 1w98). The edited R (R75 of CCNI) is marked by space fill and colored dark gray. C. Left: CCNI pre-editing model. Dark gray: R75. Green: L138. Blue: K137. Right: CCNI post-edited at R75G and K137R. Dark gray: G75. Green: L138. Blue: R137. Coatomer subunit alpha (COPA) I164V Coatomer subunit alpha (COPA). Uniprot id: P Reported editing level in RADAR DB %; Edited in lung, lymph nodes, prostate and ovary tissues). RNA editing at chr1: /hg19 results with I164V change. COPA is a family member of the non-clathrincoated vesicular coat proteins (COPs), which mediates protein transport from endoplasmic reticulum to the Golgi compartments in eukaryotes cells. 90 Its editing site is located at WD40 repeat-like-containing domain [Prosite id: PS00678, Fig. 6(C,D)]. Despite the conservative amino acid change, it is considered deleterious by four out six prediction programs. Interestingly, we detected this editing site also in zebrafish (Danio rerio) and this position in general is conserved in evolution 82 supporting its apparent importance as human and zebrafish diverged 450 million years ago. 91 This editing site significantly changes its editing level during the zebrafish development (82% of the transcripts are edited at two days comparing to 23% at 6 days of development. Wilcoxon P Supporting Information Fig. S4) based on zebrafish RNA-seq data 92 downloaded from SRA (SRA id: SRP013987). Cyclin-I (CCNI) R75G CCNI (Cyclin-I). Uniprot id: Q Editing level in human bodymap %, edited in lung, lymph nodes, prostate, skeletal muscles, white blood cells, ovary, testes, and thyroid tissues) is a member of the Cyclin protein family. It was shown to be highly expressed in postdifferentiated cells. 93 Its editing at R75G site (chr4: / hg19) is predicted to be deleterious with five out of six prediction methods agree that the change is deleterious. According to the CCNI structural model [Fig. 7(B)] the modified amino acid is completely buried, and therefore a destabilizing thermostability effect is likely. Inspection of the structure reveals that R75G appears at the protein interior at a highly conserved position and, importantly, within the known functional Cyclin motif [Prosite id: PS00292, Fig. 7(A)]. The site has a central position in the protein amino acid contacts network as reflected by high closeness value (upper 4% of all the residues in the PROTEINS 3127

12 O. Solomon et al. structure, using SARIG 69 and Jamming 70 ). Both the theoretical b-factor and the experimental b-factors of the template (pdb id: 1w98) show that this site is less mobile and contribute to the cooperative motion of CCNI [Fig. 5(F)]. A conserved contact of R75 with L138 is abolished upon its replacement with Gly [Fig. 7(C)]. Amino acid coded by a second editing site (K137R, chr4: / hg19, editing level in human bodymap 5 2.8%) is located in spatial proximity to R75G. Both editing sites together predicted by FoldX 71 to destabilize the protein structure (DDG , based on the CCNI theoretical model). This energetic change is lower than the cumulative result of modeling these two sites independently (DDG for K137R, DDG for R75G, both based on CCNI theoretical model), supporting a cooperative function of these two sites. It was shown by high-throughput sequencing and validated using Sanger sequencing that both editing events in CCNI co-occur in the same tissue. 94 Both editing sites were also reported in cdna from the same source (GenBank: CR541783), supporting their probable cooperative function. Interestingly, CCNI is known to have a constitutive mrna expression unlike most other cyclins whose RNA levels fluctuate according to the cell cycle stage. Its expression level was reported to be elevated in post-differentiated cells. 93 Recently it was shown that although CCNI s mrna expression does not change with cell cycle, its protein level does. 95 In this regard, it will be interesting to examine if and how RNA editing of CCNI transcript contributes to its protein expression level along the cell cycle. DISCUSSION In the current study we used recent data on A-to-I RNA editing of coding regions in humans in an attempt to draw common protein structural and functional preferences. We suggest that only few coding editing sites in humans are edited in a significant portion of the transcripts, change amino acid and significantly affect protein functionality. In many cases, RNA editing events at conserved positions have deleterious effects and are selectedagainst to be edited at only a tiny fraction of the transcripts. Under normal conditions the modified translated protein products will then be found in marginal amounts. Other editing sites may have neutral or small effect on the protein function and as such are not subjected to strong negative selection. The editing level at such sites may be significantly higher and these sites tend to be located at less conserved positions. Our results complement a recent study 74 which showed, mostly using evolutionary analyses, that within CDS there is a selection against coding changing editing sites and that NS editing sites occur at lower frequency than synonymous ones within CDS. Our results are, in general, in agreement with this study, but we also show that there is anti-correlation inside the NS editing sites between the deleterious level and the editing level. Most importantly, our study shows this trend based on structural analysis of the modified proteins. CDS editing events represent a small minority of the editing events in the entire transcriptome. Editing, at low level with slight negative functional effect, maybe an inevitable byproduct of necessary editing events elsewhere, or even of editing independent function of the ADAR enzymes. Recently, it was shown that ADAR1 has important editing independent function in regulating mirs expression in melanoma. 96 In addition, some editing sites predicted to be deleterious might have beneficial role at unique conditions of developmental changes. Inferior alleles under a given evolutionary pressure are often kept in the population as they may provide advantage for a different selective pressures (examples are the sickle cell anemia mutation 97 and cystic fibrosis 98 ). Editing level can be changed quickly in response to environmental cues 99 and give rise to "adaptive evolution" and to utilization of the best fitted transcript at each condition or developmental stage. Still, we show in this study that there is a relatively small set of sites, whose editing level is high and nevertheless are predicted to have strong deleterious effect. This raises the question: why are these deleterious events common? While we cannot rule out the possibility of false positive prediction of deleterious sites, we believe it is, by and large, unlikely as we included only sites for which two or more different tools indicating a functional effect. A more likely explanation is that the deleterious effect on the protein level, apparently, contributes to normal regulation on the system level. The known cases of GRIA2/3 R-to-G, Q-to-R and KCNA1 I-to-V 17,18,38,39 were detected by our procedure independently and are predicted to be deleterious although the editing in these genes is in fact vital for normal development. For the cases with high editing level, analysis of protein structural model and protein dynamic, enable us to identify CCNI R75G, ZN358 K382R, GIPC1 T62A, and COPA I164V as new interesting and likely important coding editing sites located in both evolutionary conserved motifs as well as in structurally critical sites. Our study highlights the importance of structural and functional information to supplement the transcriptome data usually applied for RNA editing research. More generally, it illustrates the need for the incorporation of structural information for interpretation and prioritization of specific single nucleotides variants and for directing future experimental studies. ACKNOWLEDGMENTS The authors thank Ariel Azia and Limor Ziv-Strasser for helpful advices, Naa ma Elefant for providing the 3128 PROTEINS

13 Structural and Functional Preferences of Coding RNA Editing script for mapping genomic coordinates into transcripts coordinates, Ami Haviv for providing the script for analysis of the human bodymap data and Jin Billy Li for sharing with us RADAR data pre-publication. The work of O.S. was done in partial fulfillment with the requirements of the Faculty of Life-Sciences, Bar-Ilan University, Israel. REFERENCES 1. Ramaswami G, Zhang R, Piskol R, Keegan LP, Deng P, O Connell MA, Li JB. Identifying RNA editing sites using RNA sequencing data alone. Nat Methods 2013;10: Ramaswami G, Lin W, Piskol R, Tan MH, Davis C, Li JB. Accurate identification of human Alu and non-alu RNA editing sites. Nat Methods 2012;9: Peng Z, Cheng Y, Tan BC, Kang L, Tian Z, Zhu Y, Zhang W, Liang Y, Hu X, Tan X, Guo J, Dong Z, Bao L, Wang J. Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome. Nat Biotechnol 2012;30: Bahn JH, Lee JH, Li G, Greer C, Peng G, Xiao X. Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res 2012;22: Li JB, Levanon EY, Yoon JK, Aach J, Xie B, Leproust E, Zhang K, Gao Y, Church GM. Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 2009;324: Maydanovych O, Beal PA. Breaking the central dogma by RNA editing. Chem Rev 2006;106: Levanon K, Eisenberg E, Rechavi G, Levanon Erez Y. Letter from the editor: adenosine-to-inosine RNA editing in Alu repeats in the human genome. EMBO Rep 2005;6: Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, Olshansky M, Rechavi G, Jantsch MF. Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol 2004;22: Kim DDY, Kim Thomas TY, Walsh T, Kobayashi Y, Matise Tara C, Buyske S, Gabriel A. Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res 2004;14: Athanasiadis A, Rich A, Maas S. Widespread A-to-I RNA editing of Alu-containing mrnas in the human transcriptome. PLoS Biol 2004;2:e Bass BL. RNA editing and hypermutation by adenosine deamination. Trends Biochem Sci 1997;22: Bass BL. RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem 2002;71: Valente L, Nishikura K. ADAR gene family and A-to-I RNA editing: diverse roles in posttranscriptional gene regulation. Prog Nucleic Acid Res Mol Biol 2005;79: Nishikura K. Functions and regulation of RNA editing by ADAR deaminases. Annu Rev Biochem 2009;79: Hartner JC, Walkley CR, Lu J, Orkin SH. ADAR1 is essential for the maintenance of hematopoiesis and suppression of interferon signaling. Nat Immunol 2009;10: Wang Q, Khillan J, Gadue P, Nishikura K. Requirement of the RNA editing deaminase ADAR1 gene for embryonic erythropoiesis. Science 2000;290: Higuchi M, Maas S, Single FN, Hartner J, Rozov A, Burnashev N, Feldmeyer D, Sprengel R, Seeburg PH. Point mutation in an AMPA receptor gene rescues lethality in mice deficient in the RNA-editing enzyme ADAR2. Nature 2000;406: Brusa R, Zimmermann F, Koh DS, Feldmeyer D, Gass P, Seeburg PH, Sprengel R. Early-onset epilepsy and postnatal lethality associated with an editing-deficient GluR-B allele in mice. Science 1995; 270: Maas S, Kawahara Y, Tamburro KM, Nishikura K. A-to-I RNA editing and human disease. RNA Biol 2006;3: Kawahara Y, Ito K, Sun H, Aizawa H, Kanazawa I, Kwak S. Glutamate receptors: RNA editing and death of motor neurons. Nature 2004;427: Rice GI, Kasher PR, Forte GM, Mannion NM, Greenwood SM, Szynkiewicz M, Dickerson JE, Bhaskar SS, Zampini M, Briggs TA, Jenkinson EM, Bacino CA, Battini R, Bertini E, Brogan PA, Brueton LA, Carpanelli M, De Laet C, de Lonlay P, Del Toro M, Desguerre I, Fazzi E, Garcia-Cazorla A, Heiberg A, Kawaguchi M, Kumar R, Lin JP, Lourenco CM, Male AM, Marques W, Jr, Mignot C, Olivieri I, Orcesi S, Prabhakar P, Rasmussen M, Robinson RA, Rozenberg F, Schmidt JL, Steindl K, Tan TY, van der Merwe WG, Vanderver A, Vassallo G, Wakeling EL, Wassmer E, Whittaker E, Livingston JH, Lebon P, Suzuki T, McLaughlin PJ, Keegan LP, O Connell MA, Lovell SC, Crow YJ. Mutations in ADAR1 cause Aicardi-Goutieres syndrome associated with a type I interferon signature. Nat Genet 2012;44: Paz N, Levanon EY, Amariglio N, Heimberger AB, Ram Z, Constantini S, Barbash ZS, Adamsky K, Safran M, Hirschberg A, Krupsky M, Ben-Dov I, Cazacu S, Mikkelsen T, Brodie C, Eisenberg E, Rechavi G. Altered adenosine-to-inosine RNA editing in human cancer. Genome Res 2007;17: Dominissini D, Moshitch-Moshkovitz S, Amariglio N, Rechavi G. Adenosine-to-inosine RNA editing meets cancer. Carcinogenesis 2011;32: Maas S, Patt S, Schrey M, Rich A. Underediting of glutamate receptor GluR-B mrna in malignant gliomas. Proc Natl Acad Sci USA 2001;98: Steinman RA, Yang Q, Gasparetto M, Robinson LJ, Liu X, Lenzner DE, Hou J, Smith C, Wang Q. Deletion of the RNA-editing enzyme ADAR1 causes regression of established chronic myelogenous leukemia in mice. Int J Cancer 2013;132: Chen L, Li Y, Lin CH, Chan TH, Chow RK, Song Y, Liu M, Yuan YF, Fu L, Kong KL, Qi L, Zhang N, Tong AH, Kwong DL, Man K, Lo CM, Lok S, Tenen DG, Guan XY. Recoding RNA editing of AZIN1 predisposes to hepatocellular carcinoma. Nat Med 2013;19: Bazak L, Haviv A, Barak M, Jacob-Hirsch J, Deng P, Zhang R, Isaacs FJ, Rechavi G, Li JB, Eisenberg E, Levanon EY. A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes. Genome Res 2013;24: Alon S, Mor E, Vigneault F, Church GM, Locatelli F, Galeano F, Gallo A, Shomron N, Eisenberg E. Systematic identification of edited micrornas in the human brain. Genome Res 2012;22: Solomon O, Oren S, Safran M, Deshet-Unger N, Akiva P, Jacob- Hirsch J, Cesarkas K, Kabesa R, Amariglio N, Unger R, Rechavi G, Eyal E. Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR). RNA 2013;19: Lev-Maor G, Ram O, Kim E, Sela N, Goren A, Levanon Erez Y, Ast G. Intronic Alus influence alternative splicing. PLoS Genet 2008;4: e Lev-Maor G, Sorek R, Levanon Erez Y, Paz N, Eisenberg E, Ast G. RNA-editing-mediated exon evolution. Genome Biol 2007;8:R Beghini A, Ripamonti CB, Peterlongo P, Roversi G, Cairoli R, Morra E, Larizza L. RNA hyperediting and alternative splicing of hematopoietic cell phosphatase (PTPN6) gene in acute myeloid leukemia. Hum Mol Genet 2000;9: Rueter SM, Dawson TR, Emeson RB. Regulation of alternative splicing by RNA editing. Nature 1999;399: Scadden ADJ. The RISC subunit Tudor-SN binds to hyper-edited double-stranded RNA and promotes its cleavage. Nat Struct Mol Biol 2005;12: PROTEINS 3129

Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR)

Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR) Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR) O. Solomon, S. Oren, M. Safran, N. Deshet-Unger, P. Akiva, J. Jacob-Hirsch, K. Cesarkas, R. Kabesa, N. Amariglio, R.

More information

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA www.impactjournals.com/oncotarget/ Oncotarget, Supplementary Materials 2016 Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) DNA Supplementary Materials

More information

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene

More information

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies

More information

Original Article Correlation of expression of ADAR1 in oral squamous cell carcinoma with clinicopathologic parameters

Original Article Correlation of expression of ADAR1 in oral squamous cell carcinoma with clinicopathologic parameters Int J Clin Exp Pathol 2016;9(3):3448-3453 www.ijcep.com /ISSN:1936-2625/IJCEP0021751 Original Article Correlation of expression of ADAR1 in oral squamous cell carcinoma with clinicopathologic parameters

More information

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,

More information

MODULE 3: TRANSCRIPTION PART II

MODULE 3: TRANSCRIPTION PART II MODULE 3: TRANSCRIPTION PART II Lesson Plan: Title S. CATHERINE SILVER KEY, CHIYEDZA SMALL Transcription Part II: What happens to the initial (premrna) transcript made by RNA pol II? Objectives Explain

More information

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq) RNA sequencing (RNA-seq) Module Outline MO 13-Mar-2017 RNA sequencing: Introduction 1 WE 15-Mar-2017 RNA sequencing: Introduction 2 MO 20-Mar-2017 Paper: PMID 25954002: Human genomics. The human transcriptome

More information

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion,

More information

To test the possible source of the HBV infection outside the study family, we searched the Genbank

To test the possible source of the HBV infection outside the study family, we searched the Genbank Supplementary Discussion The source of hepatitis B virus infection To test the possible source of the HBV infection outside the study family, we searched the Genbank and HBV Database (http://hbvdb.ibcp.fr),

More information

CHR POS REF OBS ALLELE BUILD CLINICAL_SIGNIFICANCE

CHR POS REF OBS ALLELE BUILD CLINICAL_SIGNIFICANCE CHR POS REF OBS ALLELE BUILD CLINICAL_SIGNIFICANCE is_clinical dbsnp MITO GENE chr1 13273 G C heterozygous - - -. - DDX11L1 chr1 949654 A G Homozygous 52 - - rs8997 - ISG15 chr1 1021346 A G heterozygous

More information

For all of the following, you will have to use this website to determine the answers:

For all of the following, you will have to use this website to determine the answers: For all of the following, you will have to use this website to determine the answers: http://blast.ncbi.nlm.nih.gov/blast.cgi We are going to be using the programs under this heading: Answer the following

More information

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies Dr. Maricel G. Kann Assistant Professor Dept of Biological Sciences UMBC 2 The term protein domain

More information

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6 Alternative splicing Biosciences 741: Genomics Fall, 2013 Week 6 Function(s) of RNA splicing Splicing of introns must be completed before nuclear RNAs can be exported to the cytoplasm. This led to early

More information

MicroRNAs, RNA Modifications, RNA Editing. Bora E. Baysal MD, PhD Oncology for Scientists Lecture Tue, Oct 17, 2017, 3:30 PM - 5:00 PM

MicroRNAs, RNA Modifications, RNA Editing. Bora E. Baysal MD, PhD Oncology for Scientists Lecture Tue, Oct 17, 2017, 3:30 PM - 5:00 PM MicroRNAs, RNA Modifications, RNA Editing Bora E. Baysal MD, PhD Oncology for Scientists Lecture Tue, Oct 17, 2017, 3:30 PM - 5:00 PM Expanding world of RNAs mrna, messenger RNA (~20,000) trna, transfer

More information

SpliceDB: database of canonical and non-canonical mammalian splice sites

SpliceDB: database of canonical and non-canonical mammalian splice sites 2001 Oxford University Press Nucleic Acids Research, 2001, Vol. 29, No. 1 255 259 SpliceDB: database of canonical and non-canonical mammalian splice sites M.Burset,I.A.Seledtsov 1 and V. V. Solovyev* The

More information

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by

More information

Reporting TP53 gene analysis results in CLL

Reporting TP53 gene analysis results in CLL Reporting TP53 gene analysis results in CLL Mutations in TP53 - From discovery to clinical practice in CLL Discovery Validation Clinical practice Variant diversity *Leroy at al, Cancer Research Review

More information

6/12/2018. Disclosures. Clinical Genomics The CLIA Lab Perspective. Outline. COH HopeSeq Heme Panels

6/12/2018. Disclosures. Clinical Genomics The CLIA Lab Perspective. Outline. COH HopeSeq Heme Panels Clinical Genomics The CLIA Lab Perspective Disclosures Raju K. Pillai, M.D. Hematopathologist / Molecular Pathologist Director, Pathology Bioinformatics City of Hope National Medical Center, Duarte, CA

More information

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, Supplementary Information Supplementary Figures Supplementary Figure 1. a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation, gene ID and specifities are provided. Those highlighted

More information

RNA-seq Introduction

RNA-seq Introduction RNA-seq Introduction DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different functional RNAs Which RNAs (and sometimes then translated

More information

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological

More information

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed.

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed. Supplementary Note The potential association and implications of HBV integration at known and putative cancer genes of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed. Human telomerase

More information

Introduction. Introduction

Introduction. Introduction Introduction We are leveraging genome sequencing data from The Cancer Genome Atlas (TCGA) to more accurately define mutated and stable genes and dysregulated metabolic pathways in solid tumors. These efforts

More information

Using SuSPect to Predict the Phenotypic Effects of Missense Variants. Chris Yates UCL Cancer Institute

Using SuSPect to Predict the Phenotypic Effects of Missense Variants. Chris Yates UCL Cancer Institute Using SuSPect to Predict the Phenotypic Effects of Missense Variants Chris Yates UCL Cancer Institute c.yates@ucl.ac.uk Outline SAVs and Disease Development of SuSPect Features included Feature selection

More information

Workshop on Analysis and prediction of contacts in proteins

Workshop on Analysis and prediction of contacts in proteins Workshop on Analysis and prediction of contacts in proteins 1.09.09 Eran Eyal 1, Vladimir Potapov 2, Ronen Levy 3, Vladimir Sobolev 3 and Marvin Edelman 3 1 Sheba Medical Center, Ramat Gan, Israel; Departments

More information

Studio delle modificazioni post-trascrizionali mediante tecnologia RNA-seq

Studio delle modificazioni post-trascrizionali mediante tecnologia RNA-seq Studio delle modificazioni post-trascrizionali mediante tecnologia RNA-seq Ernesto Picardi University of Bari IBIOM-CNR ernesto.picardi@uniba.it www.uniba.it www.ibiom.cnr.it 2001 Publication of the human

More information

THE UMD TP53 MUTATION DATABASE UPDATES AND BENEFITS. Pr. Thierry Soussi

THE UMD TP53 MUTATION DATABASE UPDATES AND BENEFITS. Pr. Thierry Soussi THE UMD TP53 MUTATION DATABASE UPDATES AND BENEFITS Pr. Thierry Soussi thierry.soussi@ki.se thierry.soussi@upmc.fr TP53: 33 YEARS AND COUNTING STRUCTURE FUNCTION RELATIONSHIP OF WILD AND MUTANT TP53 1984

More information

Nature Biotechnology: doi: /nbt.1904

Nature Biotechnology: doi: /nbt.1904 Supplementary Information Comparison between assembly-based SV calls and array CGH results Genome-wide array assessment of copy number changes, such as array comparative genomic hybridization (acgh), is

More information

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Accessing and Using ENCODE Data Dr. Peggy J. Farnham 1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000

More information

Supplementary Figure 1 IL-27 IL

Supplementary Figure 1 IL-27 IL Tim-3 Supplementary Figure 1 Tc0 49.5 0.6 Tc1 63.5 0.84 Un 49.8 0.16 35.5 0.16 10 4 61.2 5.53 10 3 64.5 5.66 10 2 10 1 10 0 31 2.22 10 0 10 1 10 2 10 3 10 4 IL-10 28.2 1.69 IL-27 Supplementary Figure 1.

More information

PALB2 c g>c is. VARIANT OF UNCERTAIN SIGNIFICANCE (VUS) CGI s summary of the available evidence is in Appendices A-C.

PALB2 c g>c is. VARIANT OF UNCERTAIN SIGNIFICANCE (VUS) CGI s summary of the available evidence is in Appendices A-C. Consultation sponsor (may not be the patient): First LastName [Patient identity withheld] Date received by CGI: 2 Sept 2017 Variant Fact Checker Report ID: 0000001.5 Date Variant Fact Checker issued: 12

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

RECAP (1)! In eukaryotes, large primary transcripts are processed to smaller, mature mrnas.! What was first evidence for this precursorproduct

RECAP (1)! In eukaryotes, large primary transcripts are processed to smaller, mature mrnas.! What was first evidence for this precursorproduct RECAP (1) In eukaryotes, large primary transcripts are processed to smaller, mature mrnas. What was first evidence for this precursorproduct relationship? DNA Observation: Nuclear RNA pool consists of

More information

DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data

DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data Ting Gong, Joseph D. Szustakowski April 30, 2018 1 Introduction Heterogeneous tissues are frequently

More information

Supplemental Information For: The genetics of splicing in neuroblastoma

Supplemental Information For: The genetics of splicing in neuroblastoma Supplemental Information For: The genetics of splicing in neuroblastoma Justin Chen, Christopher S. Hackett, Shile Zhang, Young K. Song, Robert J.A. Bell, Annette M. Molinaro, David A. Quigley, Allan Balmain,

More information

Chapter 11 Gene Expression

Chapter 11 Gene Expression Chapter 11 Gene Expression 11-1 Control of Gene Expression Gene Expression- the activation of a gene to form a protein -a gene is on or expressed when it is transcribed. -cells do not always need to produce

More information

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009 Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009 1 Abstract A stretch of chimpanzee DNA was annotated using tools including BLAST, BLAT, and Genscan. Analysis of Genscan predicted genes revealed

More information

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. Supplementary Figure 1 Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. (a) Pearson correlation heatmap among open chromatin profiles of different

More information

High AU content: a signature of upregulated mirna in cardiac diseases

High AU content: a signature of upregulated mirna in cardiac diseases https://helda.helsinki.fi High AU content: a signature of upregulated mirna in cardiac diseases Gupta, Richa 2010-09-20 Gupta, R, Soni, N, Patnaik, P, Sood, I, Singh, R, Rawal, K & Rani, V 2010, ' High

More information

Nature Getetics: doi: /ng.3471

Nature Getetics: doi: /ng.3471 Supplementary Figure 1 Summary of exome sequencing data. ( a ) Exome tumor normal sample sizes for bladder cancer (BLCA), breast cancer (BRCA), carcinoid (CARC), chronic lymphocytic leukemia (CLLX), colorectal

More information

Variant Annotation and Functional Prediction

Variant Annotation and Functional Prediction Variant Annotation and Functional Prediction Copyrighted 2018 Isabelle Schrauwen and Suzanne M. Leal This exercise touches on several functionalities of the program ANNOVAR to annotate and interpret candidate

More information

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer Genetic alterations of histone lysine methyltransferases and their significance in breast cancer Supplementary Materials and Methods Phylogenetic tree of the HMT superfamily The phylogeny outlined in the

More information

EXPression ANalyzer and DisplayER

EXPression ANalyzer and DisplayER EXPression ANalyzer and DisplayER Tom Hait Aviv Steiner Igor Ulitsky Chaim Linhart Amos Tanay Seagull Shavit Rani Elkon Adi Maron-Katz Dorit Sagir Eyal David Roded Sharan Israel Steinfeld Yossi Shiloh

More information

Bioinformation Volume 5

Bioinformation Volume 5 Do N-glycoproteins have preference for specific sequons? R Shyama Prasad Rao 1, 2, *, Bernd Wollenweber 1 1 Aarhus University, Department of Genetics and Biotechnology, Forsøgsvej 1, Slagelse 4200, Denmark;

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality. Supplementary Figure 1 Assessment of sample purity and quality. (a) Hematoxylin and eosin staining of formaldehyde-fixed, paraffin-embedded sections from a human testis biopsy collected concurrently with

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

The Emergence of Alternative 39 and 59 Splice Site Exons from Constitutive Exons

The Emergence of Alternative 39 and 59 Splice Site Exons from Constitutive Exons The Emergence of Alternative 39 and 59 Splice Site Exons from Constitutive Exons Eli Koren, Galit Lev-Maor, Gil Ast * Department of Human Molecular Genetics, Sackler Faculty of Medicine, Tel Aviv University,

More information

Mechanisms of alternative splicing regulation

Mechanisms of alternative splicing regulation Mechanisms of alternative splicing regulation The number of mechanisms that are known to be involved in splicing regulation approximates the number of splicing decisions that have been analyzed in detail.

More information

Ch. 18 Regulation of Gene Expression

Ch. 18 Regulation of Gene Expression Ch. 18 Regulation of Gene Expression 1 Human genome has around 23,688 genes (Scientific American 2/2006) Essential Questions: How is transcription regulated? How are genes expressed? 2 Bacteria regulate

More information

SUPPLEMENTARY FIGURE LEGENDS

SUPPLEMENTARY FIGURE LEGENDS SUPPLEMENTARY FIGURE LEGENDS Supplementary Figure 1 Negative correlation between mir-375 and its predicted target genes, as demonstrated by gene set enrichment analysis (GSEA). 1 The correlation between

More information

LESSON 3.2 WORKBOOK. How do normal cells become cancer cells? Workbook Lesson 3.2

LESSON 3.2 WORKBOOK. How do normal cells become cancer cells? Workbook Lesson 3.2 For a complete list of defined terms, see the Glossary. Transformation the process by which a cell acquires characteristics of a tumor cell. LESSON 3.2 WORKBOOK How do normal cells become cancer cells?

More information

Finding subtle mutations with the Shannon human mrna splicing pipeline

Finding subtle mutations with the Shannon human mrna splicing pipeline Finding subtle mutations with the Shannon human mrna splicing pipeline Presentation at the CLC bio Medical Genomics Workshop American Society of Human Genetics Annual Meeting November 9, 2012 Peter K Rogan

More information

Deploying the full transcriptome using RNA sequencing. Jo Vandesompele, CSO and co-founder The Non-Coding Genome May 12, 2016, Leuven

Deploying the full transcriptome using RNA sequencing. Jo Vandesompele, CSO and co-founder The Non-Coding Genome May 12, 2016, Leuven Deploying the full transcriptome using RNA sequencing Jo Vandesompele, CSO and co-founder The Non-Coding Genome May 12, 2016, Leuven Roadmap Biogazelle the power of RNA reasons to study non-coding RNA

More information

Identifying Mutations Responsible for Rare Disorders Using New Technologies

Identifying Mutations Responsible for Rare Disorders Using New Technologies Identifying Mutations Responsible for Rare Disorders Using New Technologies Jacek Majewski, Department of Human Genetics, McGill University, Montreal, QC Canada Mendelian Diseases Clear mode of inheritance

More information

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis Jian Xu, Ph.D. Children s Research Institute, UTSW Introduction Outline Overview of genomic and next-gen sequencing technologies

More information

Biochemistry of Carcinogenesis. Lecture # 35 Alexander N. Koval

Biochemistry of Carcinogenesis. Lecture # 35 Alexander N. Koval Biochemistry of Carcinogenesis Lecture # 35 Alexander N. Koval What is Cancer? The term "cancer" refers to a group of diseases in which cells grow and spread unrestrained throughout the body. It is difficult

More information

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first intron IGLL5 mutation depicting biallelic mutations. Red arrows highlight the presence of out of phase

More information

Molecular Biology (BIOL 4320) Exam #2 May 3, 2004

Molecular Biology (BIOL 4320) Exam #2 May 3, 2004 Molecular Biology (BIOL 4320) Exam #2 May 3, 2004 Name SS# This exam is worth a total of 100 points. The number of points each question is worth is shown in parentheses after the question number. Good

More information

micrornas (mirna) and Biomarkers

micrornas (mirna) and Biomarkers micrornas (mirna) and Biomarkers Small RNAs Make Big Splash mirnas & Genome Function Biomarkers in Cancer Future Prospects Javed Khan M.D. National Cancer Institute EORTC-NCI-ASCO November 2007 The Human

More information

Assignment 5: Integrative epigenomics analysis

Assignment 5: Integrative epigenomics analysis Assignment 5: Integrative epigenomics analysis Due date: Friday, 2/24 10am. Note: no late assignments will be accepted. Introduction CpG islands (CGIs) are important regulatory regions in the genome. What

More information

REGULATED SPLICING AND THE UNSOLVED MYSTERY OF SPLICEOSOME MUTATIONS IN CANCER

REGULATED SPLICING AND THE UNSOLVED MYSTERY OF SPLICEOSOME MUTATIONS IN CANCER REGULATED SPLICING AND THE UNSOLVED MYSTERY OF SPLICEOSOME MUTATIONS IN CANCER RNA Splicing Lecture 3, Biological Regulatory Mechanisms, H. Madhani Dept. of Biochemistry and Biophysics MAJOR MESSAGES Splice

More information

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA LEO TUNKLE *

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA   LEO TUNKLE * CERNA SEARCH METHOD IDENTIFIED A MET-ACTIVATED SUBGROUP AMONG EGFR DNA AMPLIFIED LUNG ADENOCARCINOMA PATIENTS HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA Email:

More information

Table S1: Kinetic parameters of drug and substrate binding to wild type and HIV-1 protease variants. Data adapted from Ref. 6 in main text.

Table S1: Kinetic parameters of drug and substrate binding to wild type and HIV-1 protease variants. Data adapted from Ref. 6 in main text. Dynamical Network of HIV-1 Protease Mutants Reveals the Mechanism of Drug Resistance and Unhindered Activity Rajeswari Appadurai and Sanjib Senapati* BJM School of Biosciences and Department of Biotechnology,

More information

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies 2017 Contents Datasets... 2 Protein-protein interaction dataset... 2 Set of known PPIs... 3 Domain-domain interactions...

More information

A Quick-Start Guide for rseqdiff

A Quick-Start Guide for rseqdiff A Quick-Start Guide for rseqdiff Yang Shi (email: shyboy@umich.edu) and Hui Jiang (email: jianghui@umich.edu) 09/05/2013 Introduction rseqdiff is an R package that can detect differential gene and isoform

More information

Long non-coding RNAs

Long non-coding RNAs Long non-coding RNAs Dominic Rose Bioinformatics Group, University of Freiburg Bled, Feb. 2011 Outline De novo prediction of long non-coding RNAs (lncrnas) Genome-wide RNA gene-finding Intrinsic properties

More information

Module 3: Pathway and Drug Development

Module 3: Pathway and Drug Development Module 3: Pathway and Drug Development Table of Contents 1.1 Getting Started... 6 1.2 Identifying a Dasatinib sensitive cancer signature... 7 1.2.1 Identifying and validating a Dasatinib Signature... 7

More information

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients

AD (Leave blank) TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients AD (Leave blank) Award Number: W81XWH-12-1-0444 TITLE: Genomic Characterization of Brain Metastasis in Non-Small Cell Lung Cancer Patients PRINCIPAL INVESTIGATOR: Mark A. Watson, MD PhD CONTRACTING ORGANIZATION:

More information

Supplementary Tables. Supplementary Figures

Supplementary Tables. Supplementary Figures Supplementary Files for Zehir, Benayed et al. Mutational Landscape of Metastatic Cancer Revealed from Prospective Clinical Sequencing of 10,000 Patients Supplementary Tables Supplementary Table 1: Sample

More information

Peptide hydrolysis uncatalyzed half-life = ~450 years HIV protease-catalyzed half-life = ~3 seconds

Peptide hydrolysis uncatalyzed half-life = ~450 years HIV protease-catalyzed half-life = ~3 seconds Uncatalyzed half-life Peptide hydrolysis uncatalyzed half-life = ~450 years IV protease-catalyzed half-life = ~3 seconds Life Sciences 1a Lecture Slides Set 9 Fall 2006-2007 Prof. David R. Liu In the absence

More information

Supplementary Figure 1

Supplementary Figure 1 Count Count Supplementary Figure 1 Coverage per amplicon for error-corrected sequencing experiments. Errorcorrected consensus sequence (ECCS) coverage was calculated for each of the 568 amplicons in the

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

RNA Processing in Eukaryotes *

RNA Processing in Eukaryotes * OpenStax-CNX module: m44532 1 RNA Processing in Eukaryotes * OpenStax This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 By the end of this section, you

More information

mirna Dr. S Hosseini-Asl

mirna Dr. S Hosseini-Asl mirna Dr. S Hosseini-Asl 1 2 MicroRNAs (mirnas) are small noncoding RNAs which enhance the cleavage or translational repression of specific mrna with recognition site(s) in the 3 - untranslated region

More information

High Throughput Sequence (HTS) data analysis. Lei Zhou

High Throughput Sequence (HTS) data analysis. Lei Zhou High Throughput Sequence (HTS) data analysis Lei Zhou (leizhou@ufl.edu) High Throughput Sequence (HTS) data analysis 1. Representation of HTS data. 2. Visualization of HTS data. 3. Discovering genomic

More information

WHOLE EXOME SEQUENCING PIPELINE EVALUATION AND MUTATION DETECTION IN ESOPHAGEAL CANCER PATIENTS

WHOLE EXOME SEQUENCING PIPELINE EVALUATION AND MUTATION DETECTION IN ESOPHAGEAL CANCER PATIENTS WHOLE EXOME SEQUENCING PIPELINE EVALUATION AND MUTATION DETECTION IN ESOPHAGEAL CANCER PATIENTS SUMMARY Tran Thi Bich Ngoc 1 ; Ho Viet Hoanh 2 ; Vu Phuong Nhung 1 ; Nguyen Hai Ha 1 Nguyen Van Ba 2 ; Nguyen

More information

The Basics: A general review of molecular biology:

The Basics: A general review of molecular biology: The Basics: A general review of molecular biology: DNA Transcription RNA Translation Proteins DNA (deoxy-ribonucleic acid) is the genetic material It is an informational super polymer -think of it as the

More information

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b) KRas;At KRas;At KRas;At KRas;At a b Supplementary Figure 1. Gene set enrichment analyses. (a) GO gene sets (MSigDB v3. c5) enriched in KRas;Atg5 fl/+ as compared to KRas;Atg5 fl/fl tumors using gene set

More information

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc. Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Topics Overview of Data Processing Pipeline Overview of Data Files 2 DNA Nano-Ball (DNB) Read Structure Genome : acgtacatgcattcacacatgcttagctatctctcgccag

More information

SUPPLEMENTARY INFORMATION. Intron retention is a widespread mechanism of tumor suppressor inactivation.

SUPPLEMENTARY INFORMATION. Intron retention is a widespread mechanism of tumor suppressor inactivation. SUPPLEMENTARY INFORMATION Intron retention is a widespread mechanism of tumor suppressor inactivation. Hyunchul Jung 1,2,3, Donghoon Lee 1,4, Jongkeun Lee 1,5, Donghyun Park 2,6, Yeon Jeong Kim 2,6, Woong-Yang

More information

Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins.

Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins. Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins. The RNA transcribed from a complex transcription unit

More information

Introduction to Genetics

Introduction to Genetics Introduction to Genetics Table of contents Chromosome DNA Protein synthesis Mutation Genetic disorder Relationship between genes and cancer Genetic testing Technical concern 2 All living organisms consist

More information

Histones modifications and variants

Histones modifications and variants Histones modifications and variants Dr. Institute of Molecular Biology, Johannes Gutenberg University, Mainz www.imb.de Lecture Objectives 1. Chromatin structure and function Chromatin and cell state Nucleosome

More information

The process of RNA editing is a widespread phenomenon in

The process of RNA editing is a widespread phenomenon in Underediting of glutamate receptor GluR-B mrna in malignant gliomas Stefan Maas*, Stephan Patt, Michael Schrey, and Alexander Rich* *Department of Biology, Massachusetts Institute of Technology, 77 Massachusetts

More information

MutationTaster & RegulationSpotter

MutationTaster & RegulationSpotter MutationTaster & RegulationSpotter Pathogenicity Prediction of Sequence Variants: Past, Present and Future Dr. rer. nat. Jana Marie Schwarz Klinik für Pädiatrie m. S. Neurologie Exzellenzcluster NeuroCure

More information

Multifactorial Interplay Controls the Splicing Profile of Alu-Derived Exons

Multifactorial Interplay Controls the Splicing Profile of Alu-Derived Exons MOLECULAR AND CELLULAR BIOLOGY, May 2008, p. 3513 3525 Vol. 28, No. 10 0270-7306/08/$08.00 0 doi:10.1128/mcb.02279-07 Copyright 2008, American Society for Microbiology. All Rights Reserved. Multifactorial

More information

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Not IN Our Genes - A Different Kind of Inheritance! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Epigenetics in Mainstream Media Epigenetics *Current definition:

More information

Deciphering the Role of micrornas in BRD4-NUT Fusion Gene Induced NUT Midline Carcinoma

Deciphering the Role of micrornas in BRD4-NUT Fusion Gene Induced NUT Midline Carcinoma www.bioinformation.net Volume 13(6) Hypothesis Deciphering the Role of micrornas in BRD4-NUT Fusion Gene Induced NUT Midline Carcinoma Ekta Pathak 1, Bhavya 1, Divya Mishra 1, Neelam Atri 1, 2, Rajeev

More information

Supplementary Figure 1. Estimation of tumour content

Supplementary Figure 1. Estimation of tumour content Supplementary Figure 1. Estimation of tumour content a, Approach used to estimate the tumour content in S13T1/T2, S6T1/T2, S3T1/T2 and S12T1/T2. Tissue and tumour areas were evaluated by two independent

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Rates of different mutation types in CRC.

Nature Genetics: doi: /ng Supplementary Figure 1. Rates of different mutation types in CRC. Supplementary Figure 1 Rates of different mutation types in CRC. (a) Stratification by mutation type indicates that C>T mutations occur at a significantly greater rate than other types. (b) As for the

More information

Cross species analysis of genomics data. Computational Prediction of mirnas and their targets

Cross species analysis of genomics data. Computational Prediction of mirnas and their targets 02-716 Cross species analysis of genomics data Computational Prediction of mirnas and their targets Outline Introduction Brief history mirna Biogenesis Why Computational Methods? Computational Methods

More information

Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W

Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W 5.1.2007 Overview High-quality finished sequence is much more useful for research once it is annotated. Annotation is a fundamental

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION doi:10.1038/nature12864 Supplementary Table 1 1 2 3 4 5 6 7 Peak Gene code Screen Function or Read analysis AMP reads camp annotation reads minor Tb927.2.1810 AMP ISWI Confirmed

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature14008 Supplementary Figure 1. Sequence alignment of A/little yellow-shouldered bat/guatemala/060/2010 (H17N10) polymerase with that of human strain A/Victoria/3/75(H3N2). The secondary

More information

Regulation of Gene Expression in Eukaryotes

Regulation of Gene Expression in Eukaryotes Ch. 19 Regulation of Gene Expression in Eukaryotes BIOL 222 Differential Gene Expression in Eukaryotes Signal Cells in a multicellular eukaryotic organism genetically identical differential gene expression

More information

Circular RNAs (circrnas) act a stable mirna sponges

Circular RNAs (circrnas) act a stable mirna sponges Circular RNAs (circrnas) act a stable mirna sponges cernas compete for mirnas Ancestal mrna (+3 UTR) Pseudogene RNA (+3 UTR homolgy region) The model holds true for all RNAs that share a mirna binding

More information

TITLE: The Role Of Alternative Splicing In Breast Cancer Progression

TITLE: The Role Of Alternative Splicing In Breast Cancer Progression AD Award Number: W81XWH-06-1-0598 TITLE: The Role Of Alternative Splicing In Breast Cancer Progression PRINCIPAL INVESTIGATOR: Klemens J. Hertel, Ph.D. CONTRACTING ORGANIZATION: University of California,

More information

MIR retrotransposon sequences provide insulators to the human genome

MIR retrotransposon sequences provide insulators to the human genome Supplementary Information: MIR retrotransposon sequences provide insulators to the human genome Jianrong Wang, Cristina Vicente-García, Davide Seruggia, Eduardo Moltó, Ana Fernandez- Miñán, Ana Neto, Elbert

More information