Coding probability Ratio of sequences with ORF <= Size Size. Expression Levels by Class

Similar documents
Transcriptome sequencing reveals altered long intergenic non-coding RNAs in lung cancer

Supplementary Figures

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Ancient convergent losses of the Paraoxonase 1 gene could render marine mammals susceptible to organophosphate pesticides

Lung Met 1 Lung Met 2 Lung Met Lung Met H3K4me1. Lung Met H3K27ac Primary H3K4me1

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

SUPPLEMENTARY FIGURES

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Long non-coding RNAs

(a) Significant biological processes (upper panel) and disease biomarkers (lower panel)

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb.2419

WDR62 is associated with the spindle pole and mutated in human microcephaly

Supplemental Information For: The genetics of splicing in neuroblastoma

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

Nature Immunology: doi: /ni Supplementary Figure 1. Transcriptional program of the TE and MP CD8 + T cell subsets.

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

SUPPLEMENTAL INFORMATION

Supplementary Figure 1. MLN8237 treatments show an unusual camel-back response pattern.

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

a) List of KMTs targeted in the shrna screen. The official symbol, KMT designation,

SUPPLEMENTARY INFORMATION

a) bear b) wolf c) parrot d) peacock a) panda b) giraffe c) elephant d) shark a) hippo b) rhino c) zebra d) tortoise

Supplemental Figure 1. Small RNA size distribution from different soybean tissues.

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Supplementary Figure 1: High-throughput profiling of survival after exposure to - radiation. (a) Cells were plated in at least 7 wells in a 384-well

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Nature Getetics: doi: /ng.3471

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

15. Supplementary Figure 9. Predicted gene module expression changes at 24hpi during HIV

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

SUPPLEMENTARY INFORMATION

Nature Immunology: doi: /ni Supplementary Figure 1. RNA-Seq analysis of CD8 + TILs and N-TILs.

Breeding scheme, transgenes, histological analysis and site distribution of SB-mutagenized osteosarcoma.

Supplemental Information. NRF2 Is a Major Target of ARF. in p53-independent Tumor Suppression

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

Supplementary figure legends

Supplementary Figure 1. IDH1 and IDH2 mutation site sequences on WHO grade III

MIR retrotransposon sequences provide insulators to the human genome

Nature Immunology: doi: /ni Supplementary Figure 1. Characteristics of SEs in T reg and T conv cells.

Pan-cancer screen for mutations in non-coding elements with conservation and cancer specificity reveals correlations with expression and survival

Guangdong Medical University, Zhanjiang, China; 5 Guangxi Medical University, Nanning, China; 6 Department of Pathology, University of Michigan

Circular RNAs (circrnas) act a stable mirna sponges

Supplementary Figure 1 ITGB1 and ITGA11 increase with evidence for heterodimers following HSC activation. (a) Time course of rat HSC activation

m 6 A mrna methylation regulates AKT activity to promote the proliferation and tumorigenicity of endometrial cancer

SUPPLEMENTARY INFORMATION

cis-regulatory enrichment analysis in human, mouse and fly

Supplementary Figure 1. Prevalence of U539C and G540A nucleotide and E172K amino acid substitutions among H9N2 viruses. Full-length H9N2 NS

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

SUPPLEMENTARY APPENDIX

Supplementary Figure 1

Hands-On Ten The BRCA1 Gene and Protein

Inhibition of fatty acid oxidation as a therapy for MYC-overexpressing triplenegative

Unexpected Inheritance: Multiple Integrations of Ancient Bornavirus and Ebolavirus/Marburgvirus Sequences in Vertebrate Genomes

microrna-200b and microrna-200c promote colorectal cancer cell proliferation via

Supplementary Information

SUPPLEMENTARY INFORMATION

TITLE: - Whole Genome Sequencing of High-Risk Families to Identify New Mutational Mechanisms of Breast Cancer Predisposition

Nature Structural and Molecular Biology: doi: /nsmb Supplementary Figure 1

Supplementary Figures

Supplemental File. TRAF6 is an amplified oncogene bridging the Ras and nuclear factor-κb cascade in human lung cancer

BWA alignment to reference transcriptome and genome. Convert transcriptome mappings back to genome space

for the article titled An atlas of human long non- coding RNAs with accurate 5 ends TABLE OF CONTENTS

Supplemental Figure 1. Genes showing ectopic H3K9 dimethylation in this study are DNA hypermethylated in Lister et al. study.

Supplementary Figure 1: Tissue of Origin analysis on 152 cell lines. (a) Heatmap representation of the 30 Tissue scores for the 152 cell lines.

Supplementary Figure 1. Metabolic landscape of cancer discovery pipeline. RNAseq raw counts data of cancer and healthy tissue samples were downloaded

Transient β-hairpin Formation in α-synuclein Monomer Revealed by Coarse-grained Molecular Dynamics Simulation

Nature Medicine: doi: /nm.3967

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Supplementary Materials for

Human Genome Complexity, Viruses & Genetic Variability

Supplementary Figure 1 IMQ-Induced Mouse Model of Psoriasis. IMQ cream was

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

Data mining with Ensembl Biomart. Stéphanie Le Gras

Figure 1. Possible role of oncogene activation, receptor, G-protein mutation, or tumor

Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first

Nature Genetics: doi: /ng.3731

Large conserved domains of low DNA methylation maintained by Dnmt3a

Supplementary Figures

Supplementary information

Supplemental Figure legends

Supplementary Figure 1. AdipoR1 silencing and overexpression controls. (a) Representative blots (upper and lower panels) showing the AdipoR1 protein

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

SUPPLEMENTARY FIGURES AND TABLES

TEB. Id4 p63 DAPI Merge. Id4 CK8 DAPI Merge

Supplementary Figure 1. SC35M polymerase activity in the presence of Bat or SC35M NP encoded from the phw2000 rescue plasmid.

Supplementary Figure 1.TRIM33 binds β-catenin in the nucleus. a & b, Co-IP of endogenous TRIM33 with β-catenin in HT-29 cells (a) and HEK 293T cells

Supplemental Information. Genomic Characterization of Murine. Monocytes Reveals C/EBPb Transcription. Factor Dependence of Ly6C Cells

Nature Genetics: doi: /ng Supplementary Figure 1

Meaning-based guidance of attention in scenes as revealed by meaning maps

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed.

Patnaik SK, et al. MicroRNAs to accurately histotype NSCLC biopsies

Nature Genetics: doi: /ng Supplementary Figure 1. Immunofluorescence (IF) confirms absence of H3K9me in met-2 set-25 worms.

Dynamic reorganization of the AC16 cardiomyocyte transcriptome in response to TNFα signaling revealed by integrated genomic analyses. Luo et al.

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

SUPPLEMENTARY INFORMATION

Transcription:

Supplementary Figure S Characterization of unannotated transcripts. (A) CPAT coding probability scores and (B) cummulative distribution of ORF length are plotted for each category of protein-coding genes, known RNA genes, known, and novel. (C) Expression level distribution of novel lncrna genes, known RNA genes, recently discovered, and protein-coding genes. (D) Repetitive content analysis of. Stacked bars represent the percentage of nucleotides covered by various transposable element families for known and novel. A B.. Coding probability.7.. Ratio of sequences with ORF <= Size.7.. Novel Known Known RNA genes.. Protein coding genes Protein coding Known genes RNA genes Known Novel 4 6 8 Size C Expression Levels by Class Known_RNA LncRNA Novel Protein D Quantity of Transcripts Transcript coverage Family LINE/L SINE/Alu SINE/MIR LTR/ER LTR/ERV LTR/ERVL DNA/hA rlie Others Expression (FPKM) + Known Novel

Supplementary Figure S Differentially expressed in lung cancer. (A) Schematic showing the filtering steps used in our lncrna differential expression pipeline. Heatmaps showing the differentially expressed in the (B) Seo, (C), and (D) cohorts. Although only the paired tumors and normal tissues were used in the differential expression analysis, the expression of the unpaired tumors is also shown for the TCGA cohorts. A B All RefSeq noncoding RNAs Merge ncrnas from GENCODE, Ensembl, UCSC, Human Body Map, and novel Normal (n=7) Tumor (n=7) Remove single-exon transcripts Merge and remove transcripts < nt Remove transcripts overlapping RefSeq protein-coding gene or Ensembl pseudogene FPKM matrix from Cufflinks Read count matrix from BedTools Filter out lowly expressed transcripts (at least 7% of samples with FPKM <. or count < ) 3 Merge overlapping transcripts, keeping transcript with largest average FPKM Differential Expression Analysis using edger C D Normal (n=) Tumor (n=) Unpaired Tumor (n=43) Normal (n=34) Tumor (n=34) Unpaired Tumor (n=63) Samples Samples Samples

Supplementary Figure S3 Subtype-specific. Heatmap of differentially expressed between TCGA lung adenocarcinoma () and lung squamous cell carcinoma () tumors. (n=97) (n=96) Samples

Supplementary Figure S4 Lung cancer cell line validation. qpcr validation of six LCALs across a panel of lung cancer cell lines (n = 8) relative to a control cell line, BEAS-B, and normalized to the housekeeping gene RPL3. All error bars are mean +/- standard error across n = 3 biological replicates. LCAL LCAL Log fold change (LCAL/RPL3) BEASB A49 HOP6 H HOP9 H46 Log fold change (LCAL/RPL3) BEASB A49 HOP6 H HOP9 H46 Log fold change (LCAL7/RPL3) LCAL7 BEAS A49 HOP6 H HOP9 H46 Log fold change (LCAL8/RPL3) LCAL8 BEAS A49 HOP6 H HOP9 H46 Log fold change (LCAL8/RPL3) LCAL8 BEAS A49 HOP6 H HOP9 H46 Log fold change (LCAL8/RPL3) LCAL8 BEASB A49 HOP6 H HOP9 H46

Supplementary Figure S LCAL expression in lung cancer. Coverage maps showing the average expression levels of tumor and normal samples across all three lung cancer cohorts for (A) LCAL8 (FENDRR) (B) LCAL8 (ESCCAL-; known as CASC9 in RefSeq), and (C) LCAL8 (CCAT). Annotated RefSeq (dark blue), UCSC (light blue), and full-length transcripts as determined by and 3 RACE in cell line (black) are shown below each plot. qpcr validation in an independent cohort of human adenocarcinoma and matched controls and squamous cell carcinoma and matched controls are shown for (D) LCAL8 (E) LCAL8, and (F) LCAL8. Insert tables distinguish high and low expression of LCALs using the cutoff value denoted by the dotted line. Because LCAL8 and LCAL8 are differentially expressed in the cohort only, the insert tables were calculated separately for the two subtypes. These results further demonstrate that LCAL8 and LCAL8 are broadly over-expressed in lung squamous cell carcinomas but not adenocarcinomas. A Average Read Depth N T N T Seo N Seo T LCAL8 kb D Relative Expression (LCAL8/RPL3 4. 4 3. 3... + - Normal Tumor 4 p<. B Average Read Depth FENDRR 6 4 LCAL8 FENDRR LCAL8 kb E Relative Expression (LCAL8/RPL3 4 AD Normal AD Tumor SQ Normal SQ Tumor + - Normal Tumor 7 + - Normal Tumor p<. C Average Read Depth 4 LCAL8 kb F Relative Expression (LCAL8/RPL3 8 6 4 8 6 4 AD Normal AD Tumor SQ Normal SQ Tumor + - Normal Tumor + - Normal Tumor 7 p=.46 CCAT AD Normal AD Tumor SQ Normal SQ Tumor

Supplementary Figure S6 Association between LCAL expression and mutation status. Expression levels of (A) LCAL, (B) LCAL8, (C) LCAL4, (D) LCAL38, (E) LCAL74, and (F) LCAL84, measured by log FPKM, for wild type (black) and mutant (colored) samples. Data points are ordered by expression levels and symbols designate cohort (squares for, circles for ). Thick colored lines represent the median expression level for each group. P-values for each mutational association are also reported (*: FDR <., **: FDR <.). A KEAP: P =.7 NFEL: P <.** KEAP: P =.7 B P <.** P =. LCAL Expression NFEL KEAP NFEL & KEAP LCAL8 Expression TP3 C 3 3 D 3 3 LCAL4 Expression 8 6 4 TP3 LCAL38 Expression 8 6 4 TP3 P <.** P =.7 P <.** P =.4* E LCAL74 Expression 3 3 HGF F LCAL84 Expression 3 3 CDKNA P <.** P =. P =.43 P <.** 3 3 3 3

Supplementary Figure S7 Conservation of LCAL. The UCSC schematic shows a lack of Pfam domains or conserved RNA secondary structures predicted by EvoFold. ENCODE data shows DNaseI Hypersensitivity and transcription factor binding in the promoter of LCAL. Evolutionary conservation, using PhyloP, does not show any strong basepair conservation within LCAL. Multiz alignments across vertebrates reveals LCAL sequence similarity restricted within the majority of primates. Scale chr6: MYC FOS STAT3 NR3C RELA KAP STAT3 MYC FOS CTCF KAP POLRA SETDB MYC EP3 USF CEBPB FOS STAT3 JUND MAFF 4.88 _ kb hg9 8,, 8,, 8,, RefSeq Genes LCAL Pfam Domains in UCSC Genes EvoFold Predictions of RNA Secondary Structure Digital DNaseI Hypersensitivity Clusters in cell types from ENCODE 6 3 8 mm mmmm mmm e Transcription Factor ChIP-seq (6 factors) from ENCODE with Factorbook Motifs G h mmm mm mmmm n vertebrates Basewise Conservation by PhyloP h m u mm H H KAHK mmmm mmmmm K K Primates Vert. Cons - -4. _ Chimp Gorilla Orangutan Gibbon Rhesus Crab-eating_macaque Baboon Green_monkey Marmoset Squirrel_monkey Bushbaby Chinese_tree_shrew Squirrel Lesser_Egyptian_jerboa Prairie_vole Chinese_hamster Golden_hamster Mouse Rat Naked_mole-rat Guinea_pig Chinchilla Brush-tailed_rat Rabbit Pika Pig Alpaca Bactrian_camel Dolphin Killer_whale Tibetan_antelope Cow Sheep Domestic_goat Horse White_rhinoceros Cat Dog Ferret_ Panda Pacific_walrus Weddell_seal Black_flying-fox Megabat David s_myotis_(bat) Microbat Big_brown_bat Hedgehog Shrew Star-nosed_mole Elephant Cape_elephant_shrew Manatee Cape_golden_mole Tenrec Aardvark Armadillo Opossum Tasmanian_devil Wallaby Platypus Saker_falcon Peregrine_falcon Collared_flycatcher White-throated_sparrow Medium_ground_finch Zebra_finch Tibetan_ground_jay Budgerigar Parrot Scarlet_macaw Rock_pigeon Mallard_duck Chicken Turkey American_alligator Green_seaturtle Painted_turtle Chinese_softshell_turtle Spiny_softshell_turtle Lizard X_tropicalis Coelacanth Tetraodon Fugu Yellowbelly_pufferfish Nile_tilapia Princess_of_Burundi Burton s_mouthbreeder Zebra_mbuna Pundamilia_nyererei Medaka Southern_platyfish Stickleback Atlantic_cod Zebrafish Mexican_tetra_(cavefish) Spotted_gar Lamprey Multiz Alignments of Vertebrates

Supplementary Figure S8 Nuclear Localization of LCAL. Nuclear and cytosolic fractionation of lysates indicates high expression of LCAL in the nucleus in cells. GAPDH and MT-RNR were used as positive control for cytosolic gene expression and U6 was used as a positive control for nuclear gene expression. qpcr results are relative to total RNA and normalized to the housekeeping gene RPL3. All error bars are mean +/- standard error across three biological replicates in two independent experiments. Relative Expression (gene/rpl3) 4% % % 8% 6% 4% % % cytoplasmic nuclear GAPDH U6 MT-RNR LCAL

Supplementary Figure S9 Lung cancer cell line validation. qpcr validation of LCAL across a panel of squamous carcinoma cell lines (n=) relative to a control cell line, BEAS-B, and normalized to the housekeeping gene RPL3. Beas H73 Calu- SK-MES- SW9 HCC9 Log Fold Change (LCAL/RPL3)

Supplementary Figure S LCAL expression affects cellular proliferation. After 7h transfection, cells were seeded in a 96-well plate at 3, cells/well. At indicated days Alamar Blue reduction was measured by fluorescence. Fluorescence was normalized to mean scrambled control. All error bars are mean +/- standard error across n=4 biological replicates in two independent experiments. * P <., ** P <., P <. by a two-tailed Student s t-test. Percent Normalized Alamar Blue (fluorescence) 8 6 4 control LCAL sirna LCAL sirna ** Relative Expression (LCAL/RPL3)..8.6.4. Day Day 4 Day 6 control LCAL sirna LCAL sirna HCC9 control LCAL sirna LCAL sirna. Percent Normalized Alamar Blue (fluorescence) 8 6 4 * ** ** * Relative Expression (LCAL/RPL3)..8.6.4. Day Day 4 Day 6. control LCAL sirna LCAL sirna