Using expression profiling data to identify human microrna targets

Similar documents
Bayesian Learning of MicroRNA Targets from Sequence and Expression Data

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

High AU content: a signature of upregulated mirna in cardiac diseases

Prediction of micrornas and their targets

Cross species analysis of genomics data. Computational Prediction of mirnas and their targets

Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities

Profiles of gene expression & diagnosis/prognosis of cancer. MCs in Advanced Genetics Ainoa Planas Riverola

SUPPLEMENTARY FIGURE LEGENDS

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Circular RNAs (circrnas) act a stable mirna sponges

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

Bi 8 Lecture 17. interference. Ellen Rothenberg 1 March 2016

mirna Dr. S Hosseini-Asl

Package TargetScoreData

micrornas (mirna) and Biomarkers

he micrornas of Caenorhabditis elegans (Lim et al. Genes & Development 2003)

Analysis of paired mirna-mrna microarray expression data using a stepwise multiple linear regression model

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

MicroRNA expression profiling and functional analysis in prostate cancer. Marco Folini s.c. Ricerca Traslazionale DOSL

MicroRNA and Male Infertility: A Potential for Diagnosis

Computational Analysis of mirna and Target mrna Interactions: Combined Effects of The Quantity and Quality of Their Binding Sites *

CRS4 Seminar series. Inferring the functional role of micrornas from gene expression data CRS4. Biomedicine. Bioinformatics. Paolo Uva July 11, 2012

Research Article Base Composition Characteristics of Mammalian mirnas

Analysis of small RNAs from Drosophila Schneider cells using the Small RNA assay on the Agilent 2100 bioanalyzer. Application Note

Soft Agar Assay. For each cell pool, 100,000 cells were resuspended in 0.35% (w/v)

On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles

Systematic exploration of autonomous modules in noisy microrna-target networks for testing the generality of the cerna hypothesis

Inferring condition-specific mirna activity from matched mirna and mrna expression data

Impact of mirna Sequence on mirna Expression and Correlation between mirna Expression and Cell Cycle Regulation in Breast Cancer Cells

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

MicroRNAs, RNA Modifications, RNA Editing. Bora E. Baysal MD, PhD Oncology for Scientists Lecture Tue, Oct 17, 2017, 3:30 PM - 5:00 PM

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Small RNAs and how to analyze them using sequencing

Supporting Information

Chapter 2. Investigation into mir-346 Regulation of the nachr α5 Subunit

IPA Advanced Training Course

Supplementary Figure 1

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

microrna Presented for: Presented by: Date:

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Paternal exposure and effects on microrna and mrna expression in developing embryo. Department of Chemical and Radiation Nur Duale

SUPPLEMENTARY INFORMATION. Supp. Fig. 1. Autoimmunity. Tolerance APC APC. T cell. T cell. doi: /nature06253 ICOS ICOS TCR CD28 TCR CD28

Supplementary Figure 1

Concordant Regulation of Translation and mrna Abundance for Hundreds of Targets of a Human microrna

Micro RNA Research. Ken Kosik. Harriman Professor, Department of Molecular, Cellular & Developmental Biology and Biomolecular Sciences & Engr.

Nature Structural & Molecular Biology: doi: /nsmb.2419

CONTRACTING ORGANIZATION: Baylor College of Medicine Houston, TX 77030

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

RNA- seq Introduc1on. Promises and pi7alls

Functions of microrna in response to cocaine stimulation

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

RNA-seq Introduction

Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression.

Title: Human breast cancer associated fibroblasts exhibit subtype specific gene expression profiles

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

microrna PCR System (Exiqon), following the manufacturer s instructions. In brief, 10ng of

Nature Genetics: doi: /ng.3731

NIH Public Access Author Manuscript Nat Struct Mol Biol. Author manuscript; available in PMC 2012 April 01.

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Genomic analysis of essentiality within protein networks

Deciphering the Role of micrornas in BRD4-NUT Fusion Gene Induced NUT Midline Carcinoma

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Molecular BioSystems PAPER. Gene module based regulator inference identifying mir-139 as a tumor suppressor in colorectal cancer.

Gene-microRNA network module analysis for ovarian cancer

Expanded View Figures

V16: involvement of micrornas in GRNs

MicroRNA-mediated incoherent feedforward motifs are robust

mirna-guided regulation at the molecular level

Cancer and Gene Alterations - 1

Figure S1. Analysis of endo-sirna targets in different microarray datasets. The

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

Downregulation of serum mir-17 and mir-106b levels in gastric cancer and benign gastric diseases

Mature microrna identification via the use of a Naive Bayes classifier

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

A novel and universal method for microrna RT-qPCR data normalization

Cellecta Overview. Started Operations in 2007 Headquarters: Mountain View, CA

SSM signature genes are highly expressed in residual scar tissues after preoperative radiotherapy of rectal cancer.

PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, MD

Post-transcriptional regulation of an intronic microrna

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed.

microrna-200b and microrna-200c promote colorectal cancer cell proliferation via

SC-L-H shared(37) Specific (1)

Marta Puerto Plasencia. microrna sponges

RNA interference induced hepatotoxicity results from loss of the first synthesized isoform of microrna-122 in mice

NOVEL FUNCTION OF mirnas IN REGULATING GENE EXPRESSION. Ana M. Martinez

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION

Simple, rapid, and reliable RNA sequencing

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009

SUPPLEMENTARY INFORMATION

MicroRNAs: novel regulators in skin research

omiras: MicroRNA regulation of gene expression

Abstract. Patricia G. Melloy*

AbSeq on the BD Rhapsody system: Exploration of single-cell gene regulation by simultaneous digital mrna and protein quantification

Transcription:

Using expression profiling data to identify human microrna targets Jim C Huang 1,7, Tomas Babak 2,7, Timothy W Corson 2,3,6, Gordon Chua 4, Sofia Khan 3, Brenda L Gallie 2,3, Timothy R Hughes 2,4, Benjamin J Blencowe 2,4, Brendan J Frey 1,4,5 & Quaid D Morris 2,4,5 We demonstrate that paired expression profiles of micrornas (mirnas) and mrnas can be used to identify functional mirnatarget relationships with high precision. We used a Bayesian data analysis algorithm, GenMiR++, to identify a network of 1,597 high-confidence target predictions for 14 human mirnas, which was supported by RNA expression data across 88 tissues and cell types, sequence complementarity and comparative genomics data. We experimentally verified our predictions by investigating the result of downregulation in retinoblastoma using quantitative reverse transcriptase (RT)-PCR and microarray profiling: some of our verified targets include and. Compared to sequence-based predictions, our highscoring GenMiR++ predictions had much more consistent Gene Ontology annotations and were more accurate predictors of which mrna levels respond to changes in levels. mirnas are short endogenous noncoding transcripts with a widespread role in post-transcriptional regulation in higher eukaryotes. mirnas downregulate the expression of their target mrnas by complementary sequence binding that either causes the inhibition of translational initiation or leads to mrna degradation, presumably after translocation to P-bodies 1.ItisthoughtthatmiRNAs make an important contribution to the regulation of gene expression and that their misregulation is implicated in cancer onset and progression. Of the hundreds of known human mirnas, however, only a handful have been experimentally linked to specific functions. Accurately determining the full repertoire of mirna targets is now a major bottleneck in mirna functional characterization. Complete specification of the distinguishing targeting features remains elusive, although measures of hybridization affinity, phylogenetic conservation of target sites and target-site accessibility have proved to be informative 2 6. One of the most accurate predictors of targeting is the presence of a conserved 6 8-bp seed region of exact Watson-Crick complementary between the 3 untranslated region (UTR) of the mrna and the 5 end of the mirna, typically between nucleotides 2 7 (ref. 4). However, as many as 4% of seed region matches that are conserved between human and chicken are false positives 4. Target predictions based on less conservation have higher sensitivity but even higher false positive rates, necessitating laboratory verification to warrant functional claims about individual mirna-mrna interactions. Precise identification of mirna targets is critical to advancing our understanding of human diseases arising from mirna misregulation, and accurate identification of physiologically active mirna targets is now a considerable impediment to the functional characterization of individual mirnas. The regulatory role of mirnas makes them strong candidate oncogenes and tumor suppressors: many examples have been found of mirna misregulation in tumors 7, and mirna expression levels are often more predictive of cancer subtypes than corresponding mrna levels 8. Example of both oncogenic mirnas 9 and those that suppress tumors 1,11 have been reported. Simultaneous profiling of mirna and mrna expression may be a timely strategy to achieve the required precision in the identification of functional mirna targets. Many mirnas cause degradation of their targets 12 and a large number of mrnas are regulated in this way 13,14. Expression profiling of mirnas and the mrnas that they target for degradation should reveal an inverse relationship between the expression profile of the mirna and that of its targets. This inverse expression pattern has been observed in distributions of expression profiles of predicted targets of tissuespecific mirnas and more widely expressed mirnas 13 16. To our knowledge, there has been no practical demonstration that paired mirna and mrna expression profiles can be used to support mirna target prediction. This demonstration is needed because inverse expression signals previously detected when examining the average behavior of all of the predicted targets of a mirna may be masked by individual differences in transcriptional and post-transcriptional regulation. We applied a highly sensitive data analysis procedure that accounts for individual differences in regulation by scoring candidate mirna targets using paired mirna-mrna expression profiling 1 Department of Electrical and Computer Engineering, University of Toronto, 1 King s College Road, Toronto, Ontario M5S 3G4, Canada. 2 Department of Molecular and Medical Genetics, University of Toronto, 1 King s College Rd., Toronto, Ontario M5S 1A8, Canada. 3 Division of Applied Molecular Oncology, Ontario Cancer Institute/ Princess Margaret Hospital, University Health Network, Toronto, Ontario M5G 2M9, Canada. 4 Banting and Best Department of Medical Research, University of Toronto, 16 College Street, Toronto, Ontario M5G 1L6, Canada. 5 Department of Computer Science, University of Toronto, 1 King s College Road, Toronto, Ontario M5S 3G4, Canada. 6 Present address: Department of Molecular, Cellular and Developmental Biology, Yale University, P.O. Box 2813, New Haven, Connecticut 652, USA. 7 These authors contributed equally to this work. Correspondence should be addressed to Q.D.M. (quaid.morris@utoronto.ca) or B.J.F. (frey@psi.toronto.eu). RECEIVED 12 OCTOBER; ACCEPTED 23 OCTOBER; PUBLISHED ONLINE 18 NOVEMBER 27; DOI:138/NMETH113 NATURE METHODS VOL.4 NO.12 DECEMBER 27 145

data. Our procedure, called GenMiR++ (Generative model for mirna regulation), achieves its high sensitivity by considering all other predicted mirna regulators of an mrna and balancing different sources of uncertainty when calculating its score 15 (Supplementary Figs. 1 2 online). Through the use of our procedure, we demonstrate that the mirna degradation signal can be used to vet individual mirna sequence based target predictions; by doing so, we established the GenMiR++ network of 1,597 high-confidence targets for 14 human mirnas using a Bayesian inference algorithm. We demonstrate the biological validity of this network by focusing on, which we identified as a potential tumorsuppressor in retinoblastoma. RESULTS Scoring degradation signatures using GenMiR++ We compiled mirna and mrna expression data (Supplementary Methods online) for 151 human mirnas and 16,63 mrnas across a mixture of 88 normal and cancerous tissue samples common to both datasets 8,17. We obtained target predictions for 114 of the profiled mirnas, covering a total of 89 unique profiled mrnas, from the TargetScanS track of the University of California Santa Cruz (UCSC) genome browser. In total, we associated both mirna and mrna expression profiles with 6,387 of the predicted target pairs. We scored each mirna-mrna pair for the presence of an inverse expression pattern using the GenMiR++ procedure, which evaluates the degree to which the expression of the mirna explains the pattern of downregulation of the mrna expression levels under the approximation that the mrna transcription rate in a given sample is constant across all mrnas. The scores that GenMiR++ assigns to each pair depend not only on the expression profiles of the predicted mirna regulator and the mrna target but also on the profiles of all other predicted mirnas for the same target (Supplementary Methods). GenMiR++ also balances different sources of uncertainty before scoring potential relationships. In general, mirna-mrna target pairs are penalized if both the mirna and mrna were highly expressed in the same tissue and are rewarded if the mirna was highly expressed in tissues that the mrna had low expression in, especially if few of its other predicted mirna regulators were highly expressed therein. Gene Ontology enrichment of GenMiR++-predicted target sets We reasoned that if GenMiR++ had successfully identified the functional mirna targets among the sequence-based predictions, then the set of high-scoring targeted mrnas for each mirna should have more consistent Gene Ontology annotations than random subsets of the sequence-based predictions. To test this, we downloaded Gene Ontology Biological Process (GO-BP) annotations from the Gene Ontology Annotation Database 18. For each mirna, we compare the Gene Ontology categories enrichments of two subsets of its TargetScanS-predicted targets: those with Gen- MiR++ scores in the top 5 th percentile and a random subset of equal size. We scored the Gene Ontology enrichment within the target sets using Fisher s exact test (Supplementary Methods) for all GO-BP categories associated with at least five Ensembl proteins. As a control to ensure that the observed enrichment was not due to coexpression of the GenMiR++ targets or to indirect regulation of the candidate targets by the mirna, we also computed enrichment scores from sets of mrnas which were coexpressed with the GenMiR++ targets but were not targeted by any mirnas. We found significantly greater consistency in Gene Ontology annotation among the set of GenMiR++ targets than in the random subsets (Supplementary Fig. 3a online; P o 1 1,Wilcoxon- Mann-Whitney test) as well as significantly less consistency in the Gene Ontology annotation among the coexpressed sets (Supplementary Fig. 3b; P o 1 1, Wilcoxon-Mann-Whitney test). A list of significant mirna Gene Ontology associations is available in Supplementary Fig. 3c and Supplementary Table 1 online. The high-confidence GenMiR++ mirna target network Having established that GenMiR++ scoring enriches for more consistent GO-BP annotation, we threshholded the GenMiR++ score to select a high-confidence set of predicted functional targets from the sequence-based predictions. We set our threshold so that our high-confidence set covered 25% (1,597 of 6,387) of the sequence-based predictions, and we estimate based on random relabelings of the mrnas that our set has a false positive rate of 3.5% (Supplementary Methods and Supplementary Table 2 online). The mirna-mrna network whose edges are predicted interactions from our high-confidence set covers 14 mirnas and 316 unique mrnas (Supplementary Fig. 4 online). Validating GenMiR++-predicted targets To experimentally validate the predictive accuracy of our method, we used our high-confidence GenMiR++-predicted targets for let- 7b (Fig. 1) to predict the outcome of misregulation in retinoblastoma. Notably, no neural tissue was represented in the expression data used to build our network, so this test evaluates how well our predictions generalize to other unrepresented tissues and thus reveals the value of GenMiR++ and our resource for broader scientific exploration. We used microarrays to profile three retinoblastoma samples and one retina sample from a healthy individual (healthy retina). Of 245 interrogated mirnas, 25 were AKAP6 INPP5A DPF2 SYT11 MLLT1 RPS6KA3 TAF5 CBL DUSP9 OSMR NDST2 RGS16 IL13 ERCC6 HDHD1A RORC ZCCHC11 OLR1 PRDM2 HAS2 DOCK3 WNT1 RNF5 SMARCC1 SEMA3F FASLG NOVA1 Figure 1 The GenMiR++ network of targets. Nodes representing mrnas are green circles. Messenger RNAs connected to by blue edges are predicted by GenMiR++ to be regulated solely by mirnas from the let-7 mirna family, of which PRDM2 is predicted to be regulated solely by. Yellow lines connect to all other predicted mrna targets, thus indicating that the mrna target is predicted to also be regulated by other mirnas from one or more additional families. 146 VOL.4 NO.12 DECEMBER 27 NATURE METHODS

a mir-328 mir-299-5p mir-324-3p mir-197 mir-188 mir-494 mir-222 mir-342 mir-32 mir-17 mir-124a mir-13 mir-21 let-7c mir-182 mir-99b mir-125b mir-191 mir-181a mir-423 mir-17-5p mir-16a mir-373* Healthy RB1451 RB2386 RB43 1 >2 Scanner counts b c Relative quantity (versus HR12) Relative quantity of 16 12 8 4 1, 1 1 1.1.1 RB369 RB381 RB43 RB1451 Figure 2 Reduced levels of parallel increased levels of GenMiR++ targets in retinoblastoma. (a) Expression profiles of 25 (of 245) mirnas in healthy retina and retinoblastoma samples. Let-7b is downregulated B5-fold. (b) Quantitative real-time reverse transcriptase PCR assays reveal general downregulation of across retinoblastoma (RB) samples compared to healthy retina (HR). Relative quantities are calibrated to HR12. Error bars represent s.d. of the relative fold expression level from triplicate samples. (c) Quantitative reverse transcriptase PCR validation with three GenMiR++-predicted targets (,, FASLG) and three TargetScanS-predicted targets not confirmed by GenMiR++ (ACTR2, COL1A2, KPNA4). The negative control (ZYX) is not predicted to be a target by either method. Shown are the fold-enrichment of expression of the seven transcripts in 15 RNA samples, 1 from retinoblastoma (left; labeled 369 2386) and five retina samples from healthy individuals (right; labeled HR12 HR99). For 6 of the transcripts, expression in HR12 was used as a baseline to calculate fold enrichment. For FALSG, expression in HR125 was used as its baseline. Error bars represent the s.d. of the expression level over 4 technical replicates. Absence of error bars is an indication that the transcript did not amplify in the sample. RB1466 RB2238 RB224 RB2244 Sample RB2252 RB2386 HR12 HR13 HR125 FASLG ACTR2 COL1A2 KPNA4 ZYX HR137 HR99 detected above background (that is, above 99% of negative control probes consisting of 1, random probe sequences per array). Of these, the most apparent differential expression was evident for, which was on average B5-fold lower in abundance in retinoblastoma versus healthy retina (Fig. 2a). This downregulation is widespread across retinoblastoma; 9/1 of these samples from different individuals had lower levels than the average expression in five retina samples from healthy individuals (Fig. 2b). We then used quantitative real-time PCR to measure the abundance of five predicted targets (Fig. 2c) in our highconfidence GenMiR++ network. Two of these predicted targets, RORC and IL13, did not amplify in more than 2 out of the 15 samples, and we excluded them from subsequent analysis. As negative controls, we also assayed three TargetScanS-predicted targets with GenMiR++ scores in the bottom 5 th percentile. We found that two of the GenMiR++-predicted targets ( and ) are significantly upregulated (P o.1) in at least eight of the ten retinoblastoma samples compared to the healthy retina (sample 12; HR12) baseline. Another GenMiR++-predicted target, FASLG, was significantly upregulated (P o.1) in four of the five retinoblastoma samples in which it amplified. Thus, when a positive target amplified in retinoblastoma, its expression was significantly higher than the retinal baseline at least 8% of the time, whereas only once were any of the three negative controls upregulated in any of the retinoblastoma samples. downregulates GenMiR++-predicted targets To ensure that the observed upregulation of the GenMiR++predicted targets of was not caused by other changes in retinoblastoma, we used microarrays to profile the transcriptomic response to exogenous. We transfected a synthetic RNA duplex of the mature sequence into WERI-Rb1 cells and compared their expression profile to cells transfected in parallel with a scrambled duplex. In total, 9,659 genes were expressed above the median expression intensity in the control retinoblastoma cells, and of these, we focused on the most downregulated 5% (483 targets). To ensure that this screen was enriched for targets of, we confirmed that the reverse complement of the seed region was overrepresented in the 3 UTRs of the downregulated genes. We found sequence-based predictions supported by a high GenMiR++ score had much higher precision than those that were not, with only a moderate loss of sensitivity. Twelve of the 34 WERI-Rb1 cell expressed TargetScanS-predicted targets are included in the high-confidence GenMiR++ network; of these, 5 (42%) were scored as downregulated in the assay (, HDHD1A, RNF5, SEMA3F, SMARCC1) whereas only 2 of the 22 NATURE METHODS VOL.4 NO.12 DECEMBER 27 147

ARTICLES a b Relative quantity of mrna c Relative quantity of mirna WERI-Rb1 expressed 9,659 Genes downregulated 483 1 1. 1 1 5 1.67.51.93.92 Negative control 9,149 2 12 2 7 476 5 15 1. 1. Mock Candidate targets 61 GenMiR++-predicted targets 27 1. 1. (9%) other TargetScanS predictions (CDC34, EIF4G2) scored as such. This represents a significant enrichment for confirmed targets among the GenMiR++-predicted targets (Fig. 3a; P ¼.38, Fisher s exact test) with only a 28.5% loss in sensitivity (2/7) compared to all of the TargetScanS-predicted targets. The precision of TargetScanS is lower in our assay, 21% (7/34), than in previous reports, suggesting that we may have missed some targets. Indeed, using quantitative real-time PCR, we identified downregulation and reconfirmed downregulation in the transfected WERI-Rb1 cells (Fig. 3b,c). However, based on our results, we would expect 71% (5/7) of the false negatives in our microarray assay to also be among the GenMiR++-predicted set. So, for example, if there were seven false negatives in our assay, bringing the actual precision of TargetScanS closer to previous reports at 41% (14/34), then the expected precision of the GenMiR++-predicted set would double to 83% (1/12). DISCUSSION We demonstrated that mirna-mrnas paired expression profiles can be used on a large-scale by GenMiR++ to improve the accuracy of sequence-based mirna-target predictions. Our target predictions have more consistent Gene Ontology annotations than random subsets of target predictions based on sequence alone and contain targeting relationships that expand our understanding of the roles of mirnas in disease. Furthermore, by providing a surrogate for direct experimental validation, GenMiR++ can be used to help identify 3 UTR sequence features that are responsible for target degradation because cis elements predictive of a high GenMiR++ score should also be predictive of target degradation. Our demonstration that targets, and that is overexpressed in depleted retinoblastoma, suggests another mechanism for mediated tumor suppression in humans. In breast cancer, overexpression 19,2 is sufficient to overcome the G1-S checkpoint resulting from DNA damage 21, possibly through triggering the activation of the cyclin-dependent kinase Cdk2 required for progression from G1 to S phase. Here we report that is overexpressed in retinoblastoma, suggesting 2.5 1.5 16 12 8 4 Negative control Mock Figure 3 Increasing levels of reveals enrichment of GenMiR++ targets among downregulated genes. (a) Venn diagram showing overlap of WERI-Rb1 expressed genes, downregulated genes, GenMiR++ target predictions and sequence-based predictions. (b) TaqMan real-time RT-PCR results of and in WERI-Rb1 (left) and HeLa cells (right), normalized against 5S expression and calibrated to endogenous levels. Both genes were significantly downregulated in both WERI-Rb1 and HeLa cells compared to transfection of the Ambion negative control (P o.1). (c) Relative levels (Taqman real-time RT-PCR) after transfection in WERI-Rb1 cells (left) and HeLa cells (right), normalized against 5S expression and calibrated to endogenous (mock) levels. levels were similarly analyzed, as a negative control. WERI-Rb1 and levels are B9- and 16-fold lower, respectively, compared to healthy fetal retina. Error bars, s.d. of the expression level over four technical replicates. a similar mechanism may be at work therein. Notably, is sufficient for oncogenic transformation of mouse fibroblasts lacking a functional Rb1 gene or in cooperation with active Ras. Although it is impossible for us to rule out, it is extremely unlikely that the inverse expression patterns we observed between mirnas and their GenMiR++-predicted mrna targets are due to indirect regulation. The Gene Ontology enrichments for Gen- MiR++-predicted target sets are considerably larger than those of transcripts with similar expression patterns; if the mirna regulation were indirect, these should be approximately equal. We do not, however, rule out the possibility that there are additional redundant indirect pathways also contributing to their degradation. At present our GenMiR++ scores only predict which mirna targets are regulated by transcript degradation; we have not attempted to predict targets regulated by translational repression. In an effort to avoid ruling out the latter type of interaction, we only considered candidate targets with exact seed region matches that generally lead to transcriptional degradation 13,14 (Supplementary Fig. 5 online). However, extending our approach to address translation repression would be straightforward if protein expression data were available. As with transcriptional degradation, if mirna regulates target expression by translational repression, then the mirna expression profiles should explain any observed downregulation of protein abundances compared to what would be expected based on the mrna expression profiles. The current version of GenMiR++ assumes a constant transcription input for each mrna; although this assumption is an oversimplification, it is surprisingly effective, suggesting that modeling transcriptional rate may not be necessary for mirna target prediction. Nonetheless, we could extend the basic GenMiR++ model to incorporate available data or computational predictions on the transcriptional rate for specific mrnas. Extensions to the GenMiR++ model that incorporate protein abundance and transcription rate data are described in Supplementary Methods. Methods such as GenMiR++ that account for additive regulation by mirnas should consider all other predicted mirnas when scoring mirna-mrna pairs. However, all such methods are sensitive to the number of false positives in the initial set of sequencebased predictions; higher numbers of false positives provide more opportunities for one of the false positives to show a spurious inverse expression relationship. Indeed, GenMiR++ was not as successful when applied to the union of five datasets of sequence-based targets that contained a larger proportion of false positives per mrna (data not shown). It may be possible to make GenMiR++ more robust to very noisy sets of candidate targets by using sequence features like 148 VOL.4 NO.12 DECEMBER 27 NATURE METHODS

accessibility 5,6 to assign a prior probability to each of the candidate targets and incorporating that prior into the GenMiR++ model, so that, for example, a less accessible target site would require more support in the expression data to achieve a high score. Based on our functional analyses and biological testing, we believe that the GenMiR++ network is the most accurate mirna target interaction network currently available, and the methods underlying its production will become widely implicated and further developed. Our results highlight the potential for Gen- MiR++ as a principled algorithmic framework in which functional regulatory networks of mirnas and functional target mrnas can be discovered by combining mirna and mrna sequence and expression data; and through these discoveries the pivotal role of mirnas in cancer onset and progression can be better understood. METHODS Collection and mapping of the sequence-based target predictions. We downloaded TargetScanS 4 predictions from build 17 of the human genome on UCSC genome browser 22 for the 114 mirnas represented in our expression compendium. We then mapped the European Molecular Biology labels of the profiled mrna transcripts in our compendium to the Human Gene Nomenclature Committee labels of the predicted targets. Through this mapping, we unambiguously associated 89 mrnas with an expression profile and a set of predicted mirna regulators. Though GenMiR++ is designed to work with any sequence-based predictions, we chose the TargetScanS over a number of similar methods for two reasons: a comparison of protein, mirna and mrna expression data in mouse showed that the interactions predicted by TargetScanS are more likely to lead to mrna degradation rather than translational repression (Supplementary Fig. 5). Also, the TargetScanS target sets showed the greatest Gene Ontology enrichment compared with four other sequence-based methods (Supplementary Fig. 6 online). The GenMiR++ model. GenMiR++ 15 uses mirna and mrna expression profiles from the same sets of tissues and cell types to identify candidate mirna-mrna target pairs that are best supported by the expression data. For each mrna, its candidate mirna regulators were scored according to how much the mirna expression profile contributed to explaining downregulation of the mrna expression, given all other mirna candidate regulators. GenMiR++ calculates the scores by attempting to reproduce an mrna s profile by a weighted combination of the genome-wide average normalized expression profile and the negatively weighted profiles of a subset of the mirna regulators. mirnas that often appear in subsets whose reproductions are good fits to the mrna profile are assigned the highest scores. Additional methods. Descriptions of the collection and renormalization of the preexisting microarray data and new microarray data from WERI and HeLa cells; the tests for Gene Ontology enrichment; the design of the mirna expression profiling array; the extraction and preparation of RNA for quantification; the transfection; the quantitative real-time RT-PCR; the calculation of the false detection rate of GenMiR++ scores; the source and handling of the human tissues; the GenMiR++ model and its learning algorithm; and extensions to the GenMiR++ model to incorporate protein abundance and transcriptional rate data are available in Supplementary Methods. Studies with human tissues were approved by the Research Ethics Boards of the University Health Network (Toronto) and the Hospital for Sick Children (Toronto). The complete GenMiR++ mirna-mrna interaction network is available at http://www.psi.toronto.edu/genmir/. Accession codes. Gene Expression Omnibus (GEO): GSE772 (mirna data) and GSE7185 (mrna data). Note: Supplementary information is available on the Nature Methods website. ACKNOWLEDGMENTS J.C.H. and T.B. were supported by Natural Science and Engineering Research Council postgraduate scholarships. T.W.C. was supported by a Canada Graduate Scholarship from the Canadian Institutes for Health Research (CIHR). This study was supported by a Natural Sciences and Engineering Research Council operating grant and Canadian Foundation for Innovation and Ontario Research Fund infrastructure grants to Q.D.M.; a CIHR grant to B.J.F. and T.R.H; an Ontario Genomics Institute and Genome Canada grant to B.J.F. and B.J.B.; a CIHR and National Cancer Institute of Canada grant to B.J.B.; and a US National Institutes of Health grant to B.L.G. B.J.F. is a Fellow of the Canadian Institute for Advanced Research. Published online at http://www.nature.com/naturemethods/ Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions 1. Engels, B.M. & Hutvagner, G. Principles and effects of microrna-mediated post-transcriptional gene regulation. Oncogene 25, 6163 6169 (26). 2. Krek, A. et al. Combinatorial microrna target predictions. Nat. Genet. 37, 495 5 (25). 3. Enright, A.J. et al. MicroRNA targets in Drosophila. Genome Biol. 5, R1 (23). 4. Lewis, B.P., Burge, C.B. & Bartel, D.P. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microrna targets. Cell 12, 15 2 (25). 5. Long, D. et al. Potent effect of target structure on microrna function. Nat. Struct. Mol. Biol. 14, 287 294 (27). 6. Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microrna target recognition. Nat. Genet. 39, 1278 1284 (27). 7. Wu, W., Sun, M., Zou, G.M. & Chen, J. MicroRNA and cancer: Current status and prospective. Int. J. Cancer 12, 953 96 (26). 8. Lu, J. et al. MicroRNA expression profiles classify human cancers. Nature 435, 834 838 (25). 9. He, L. et al. A microrna polycistron as a potential human oncogene. Nature 435, 828 833 (25). 1. Johnson, S.M. et al. RAS is regulated by the let-7 microrna family. Cell 12, 635 647 (25). 11. Hammond, S.M. MicroRNAs as oncogenes. Curr. Opin. Genet. Dev. 16, 4 9 (26). 12. Bagga, S. et al. Regulation by let-7 and lin-4 mirnas results in target mrna degradation. Cell 122, 553 563 (25). 13. Lim, L.P. et al. Microarray analysis shows that some micrornas downregulate large numbers of target mrnas. Nature 433, 769 773 (25). 14. Farh, K.K. et al. The widespread impact of mammalian MicroRNAs on mrna repression and evolution. Science 31, 1817 1821 (25). 15. Huang, J.C., Morris, Q.D. & Frey, B.J. Bayesian inference of MicroRNA targets from sequence and expression data. J. Comput. Biol. 14, 55 563 (27). 16. Stark, A., Brennecke, J., Bushati, N., Russell, R.B. & Cohen, S.M. Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3 UTR evolution. Cell 123, 1133 1146 (25). 17. Ramaswamy, S. et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98, 15149 15154 (21). 18. Camon, E. et al. The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 32, D262 D266 (24). 19. Cangi, M.G. et al. Role of the Cdc25A phosphatase in human breast cancer. J. Clin. Invest. 16, 753 761 (2). 2. Wu, W., Fan, Y.H., Kemp, B.L., Walsh, G. & Mao, L. Overexpression of cdc25a and cdc25b is frequent in primary non-small cell lung cancer but is not associated with overexpression of c-myc. Cancer Res. 58, 482 485 (1998). 21. Mailand, N. et al. Rapid destruction of human Cdc25A in response to DNA damage. Science 288, 1425 1429 (2). 22. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996 16 (22). NATURE METHODS VOL.4 NO.12 DECEMBER 27 149