Supplemental Data. Genome-wide Association of Copy-Number Variation. Reveals an Association between Short Stature

Similar documents
The American Journal of Human Genetics 89, , December 9,

ARIC Manuscript Proposal #2426. PC Reviewed: 9/9/14 Status: A Priority: 2 SC Reviewed: Status: Priority:

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Nature Genetics: doi: /ng Supplementary Figure 1

Supplementary Appendix

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer

Numerous hypothesis tests were performed in this study. To reduce the false positive due to

the standard deviation (SD) is a measure of how much dispersion exists from the mean SD = square root (variance)

Understandable Statistics

November 9, Johns Hopkins School of Medicine, Baltimore, MD,

ORIGINAL INVESTIGATION. C-Reactive Protein Concentration and Incident Hypertension in Young Adults

Global variation in copy number in the human genome

bivariate analysis: The statistical analysis of the relationship between two variables.

Choosing a Significance Test. Student Resource Sheet

APPENDIX D REFERENCE AND PREDICTIVE VALUES FOR PEAK EXPIRATORY FLOW RATE (PEFR)

Supplementary Information. Supplementary Figures

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Complex Trait Genetics in Animal Models. Will Valdar Oxford University

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Supplementary Figures

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

Genetics of coronary artery calcification among African Americans, a meta-analysis

Method Comparison Report Semi-Annual 1/5/2018

Meaning-based guidance of attention in scenes as revealed by meaning maps

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

Quick start guide for using subscale reports produced by Integrity

Supplementary Materials for

Numerous hypothesis tests were performed in this study. To reduce the false positive due to

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder

Children, Toronto, Ontario, Canada. Department of Laboratory Medicine and Pathobiology Hospital for Sick Children, Toronto, Ontario, Canada, M5G 1X8

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Transcript of Dr. Mark Perlin's talk on "Preserving DNA Information" delivered on

Simple Linear Regression

Atherosclerotic Disease Risk Score

Nature Genetics: doi: /ng Supplementary Figure 1

Unit 1 Exploring and Understanding Data

LTA Analysis of HapMap Genotype Data

Supplementary Online Content

indicated in shaded lowercase letters (hg19, Chr2: 217,955, ,957,266).

9/18/2017 DISCLOSURES. Consultant: RubiconMD. Research: Amgen, NHLBI OUTLINE OBJECTIVES. Review current CV risk assessment tools.

MSI positive MSI negative

Nature Medicine: doi: /nm.3967

doi: /

CHAPTER 3 Describing Relationships

Nature Biotechnology: doi: /nbt.1904

How to interpret scientific & statistical graphs

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF BIOCHEMISTRY AND MOLECULAR BIOLOGY

BST227: Introduction to Statistical Genetics

APPENDIX AVAILABLE ON THE HEI WEB SITE

Heritability and genetic correlations explained by common SNPs for MetS traits. Shashaank Vattikuti, Juen Guo and Carson Chow LBM/NIDDK

Supplementary materials for: Executive control processes underlying multi- item working memory

Supporting Information

SALSA MLPA probemix P372-B1 Microdeletion Syndromes 6 Lot B1-1016, B

Supplemental Information For: The genetics of splicing in neuroblastoma

Introduction to the Genetics of Complex Disease

PRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data.

MRC-Holland MLPA. Description version 29; 31 July 2015

Chapter 3 CORRELATION AND REGRESSION

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Business Statistics Probability

Genomics 101 (2013) Contents lists available at SciVerse ScienceDirect. Genomics. journal homepage:

CHILD HEALTH AND DEVELOPMENT STUDY

STATISTICS & PROBABILITY

Supplementary Materials

Most severely affected will be the probe for exon 15. Please keep an eye on the D-fragments (especially the 96 nt fragment).

Types of Statistics. Censored data. Files for today (June 27) Lecture and Homework INTRODUCTION TO BIOSTATISTICS. Today s Outline

Meta-Analysis of Genome-Wide Association Studies in African Americans Provides Insights into the Genetic Architecture of Type 2 Diabetes

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Supplementary Figure 1

MRC-Holland MLPA. Description version 14; 28 September 2016

AP Statistics. Semester One Review Part 1 Chapters 1-5

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Supplementary Figures

Jackson Heart Study Manuscript Proposal Form

Dementia. Inhibition of vascular calcification

Willcocks et al.,

MRC-Holland MLPA. Description version 30; 06 June 2017

Application Note 201

Mendel: Understanding Inheritance. 7 th Grade Science Unit 4 NCFE Review

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI /

Chromatin marks identify critical cell-types for fine-mapping complex trait variants

MRC-Holland MLPA. Description version 08; 30 March 2015

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

ANOVA in SPSS (Practical)

Early Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.

Figure S1. Alignment of predicted amino acid sequences of KIR3DH alleles identified in 8

Structural Equation Modeling of Health Literacy and Medication Adherence by Older Asthmatics

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5)

Society for Behavioral Medicine 33 rd Annual Meeting New Orleans, LA

The normal curve and standardisation. Percentiles, z-scores

p.r623c p.p976l p.d2847fs p.t2671 p.d2847fs p.r2922w p.r2370h p.c1201y p.a868v p.s952* RING_C BP PHD Cbp HAT_KAT11

HYPERTENSION: UPDATE 2018

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Lecture 20. Disease Genetics

Discontinuous Traits. Chapter 22. Quantitative Traits. Types of Quantitative Traits. Few, distinct phenotypes. Also called discrete characters

Unit 7 Comparisons and Relationships

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Transcription:

The American Journal of Human Genetics, Volume 89 Supplemental Data Genome-wide Association of Copy-Number Variation Reveals an Association between Short Stature and the Presence of Low-Frequency Genomic Deletions Andrew Dauber, Yongguo Yu, Michael C. Turchin, Charleston W. Chiang, Yan A. Meng, Ellen W. Demerath, Sanjay R. Patel, Stephen S. Rich, Jerome I. Rotter, Pamela J. Schreiner, James G. Wilson, Yiping Shen, Bai-Lin Wu, Joel N. Hirschhorn

Figure S1. Histogram of length of deletion CNVs in the clinical cohort Figure S2. Histogram of length of duplication CNVs in the clinical cohort Figure S3. Scatter Plot of Total CNV Burden Versus Height in the Clinical Cohort Figure S4. Scatter Plot of Total Lower Frequency (<5%) Deletion Burden Versus Height in the Clinical Cohort Figure S5. 11q11 candidate region CNV association data Figure S6. 14q11.2 candidate region CNV association data Figure S7. 17q21.31 candidate region CNV association data Figure S8. MLPA validation of candidate regions Table S1. Custom designed MLPA probe sets for candidate regions Table S2. Subjects with deletion or duplication syndromes associated with short stature Table S3. Genic CNV burden association analysis in short cases versus controls Table S4. Quantitative Trait CNV Association Analysis in Clinical Cohort Table S5. Quantitative Trait CNV Association Analysis in Clinical Cohort Divided By Frequency and Type of CNV Table S6. Height distribution in 3 candidate regions divided by copy number status Supplemental Acknowledgments

Deletion Length Number of Deletions 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 Size (kb) Figure S1. Histogram of length of deletion CNVs in the clinical cohort 7 8 9 1

Duplication Length 6 Number of Duplications 5 4 3 2 1 1 2 3 4 5 Size (kb) 6 7 8 9 1 Figure S2. Histogram of length of duplication CNVs in the clinical cohort

35 3 Total CNV Burden (kb) 25 2 15 1 5-1 -8-6 -4-2 2 4 6 Height Z-score Figure S3. Scatter Plot of Total CNV Burden Versus Height in the Clinical Cohort The top 12 outliers for CNV burden were removed to allow for adequate graphical representation of the majority of the cohort. These 12 outliers all had height Z-scores < and 3 had Z-scores <-2. Their total CNV burden ranged from 31-151 Mb.

25 Low Frequency Deletion Burden (Kb) 2 15 1 5-1 -8-6 -4-2 2 4 6 Height Z-score Figure S4. Scatter Plot of Total Lower Frequency (<5%) Deletion Burden Versus Height in the Clinical Cohort The top 9 outliers for low frequency deletion burden were removed to allow for adequate graphical representation of the majority of the cohort. These 9 outliers all had height Z-scores < and 4 had Z-scores <-2. Their total low frequency deletion burden ranged from 2-149 Mb.

Chromosome 11q11 Candidate Region 14 OR4C15 OR4C16 OR4C11 OR4P4 OR4S2 OR4C6 6 12 4 1 2 #Individuals 8 6 4-2 Negative Log P-value 2-4 55 557 558 551 5511 5512 5513 5515 5516 5517 5518 5519 552 Duplication_#CNV Deletion_#CNV Duplication Deletion -6 Figure S5. 11q11 candidate region CNV association data Bars represent number of individuals with the CNV in our cohort (out of 4411 individuals). Triangles and squares represent the negative log p-value for the regional height association for duplications and deletions respectively. Positive values indicate that the CNV is height increasing and negative values indicated that the CNV is height decreasing. Gene names for genes overlapping region are placed above their approximate positions.

Chromosome 14q11.2 Candidate Region 16 OR11H2 OR4Q3 OR4M1 OR4N2 OR4K2 R4K5 OR4K1 6 14 5 12 4 #Individuals 1 8 6 4 2 1921 1924 1925 1927 1928 1929 193 1931 Figure S6. 14q11.2 candidate region CNV association data See Figure S5 for legend. Duplication_#CNV Deletion_#CNV Duplication Deletion 1932 1933 1934 1935 1936 1937 1938 1939 194 1941 1942 1943 1944 1946 1947 1948 1949 3 2 1-1 -2-3 Negative Log P-value

7 KIAA1267 Chromosome 17q21.31 Candidate Region LOC644246 ARL17B LRRC37A NSFP1 6 6 5 4 5 3 #Individuals 4 3 2 1 Negative Log P-value 2-1 1-2 4156 4157 4158 4159 4161 4162 4164 4165 4167 4168 4171 4178 Duplication_#CNV Deletion_#CNV Duplication Deletion -3 Figure S7. 17q21.31 candidate region CNV association data See Figure S5 for legend.

Figure S8. MLPA validation of candidate regions Representative MLPA results were shown for three loci (from top to bottom: 11q11 duplication, 14q11.2 deletion and 17q21.31 deletion). Images on the left side are peak profiles (red traces were peak profiles from normal control samples and blue traces were peak profiles of test samples) and images on the right are ratio profiles (red dots above the upper bar indicate duplication: case/control ratio>1.25, red dots below the lower bar indicate deletion: case/control ratio<.75)

Table S1. Custom designed MLPA probe sets Loci Probe coordinates (hg18) Probe sequences GC/Tm amplicom size 14q11 chr14:19463852-19463925 Chr14Ex16L,GGGTTCCCTAAGGGTTGGATGCAAGTTAGTCACAGGACAAGGAG 5/76.78 118 GAAAGTATCAGAGG Chr14Ex16R,TACACTTTCCAGGAATGCAGAGTAACAGAGTCCTCCATCTAGATT 46.67/75.7 GGATCTTGCTGGCAC chr14:1946544-1946513 Chr14Ex17L,GGGTTCCCTAAGGGTTGGAtaaaaaactaccgtCAGCAGCAAGCCATC 47.62/76.49 13 ACACATTGGAAATGG Chr14Ex17R,CATGTGTAGGACCACGCTGAAGCAGGTATAgaaaagtcggtggaTCTA 49.25/77.63 GATTGGATCTTGCTGGCAC chr14:1946893-19468989 Chr14Ex19L,GGGTTCCCTAAGGGTTGGAtctggacccgtgatggCTCACTACATCAGG 58.46/78.18 134 ACAGGACCTGAGGACC Chr14Ex19R,TGAGAGGCTTCCTTCCATACGGCATAGTGTcattctctggttttcgTCTA 47.83/77.26 GATTGGATCTTGCTGGCAC 11q11 chr11:5578474-5578558 Chr11Ex4L,GGGTTCCCTAAGGGTTGGACTATACATGATCCCTGTTGGAGCTTTC 47.46/75.88 124 ATCTTTTCCTTGG Chr11Ex4R,GAAACATGCAAAACCAAAGCTTTGTAACTGAGTTTGTCCTCCTCTAG 43.8/74.87 ATTGGATCTTGCTGGCAC chr11:5519115-5519184 Chr11Ex9L,GGGTTCCCTAAGGGTTGGACTGGATGAAATGGGAGGCTTTGGCTG 51.92/76.57 112 GGAAATA Chr11Ex9R,ACTGCAATGCTGAGAACATCATGTATTTCCCAAAAGGTCTAGATTG 43.33/74.33 GATCTTGCTGGCAC chr11:55344586-55344651 Chr11Ex13L,GGGTTCCCTAAGGGTTGGATCACCGAGATACTGGACACCAAAGTC 5/75.78 18 TTCTCTT Chr11Ex13R,ACTGAGCCTGTTACTTTCATGGAGTTTGTCACATCTAGATTGGATC 44.64/74.28 TTGCTGGCAC 17q21.31 chr17:41629594-41629659 Chr17Ex4L,GGGTTCCCTAAGGGTTGGAtcatccggtgaagagattGATGGAGGAAAA 5/78.26 144 AACAGTCCCTCTTCCTCAGAC Chr17Ex4R,AGTGACACCTCAAACTGCTACAGTCACCTTTGCgagccacctgacagtgtg 51.35/79.2 TCTAGATTGGATCTTGCTGGCAC chr17:41839514-41839593 Chr17Ex12L,GGGTTCCCTAAGGGTTGGAtcggcgttGGATTTCCAGTCTGGCCAGT GAGTATCTGACTTTGTTTTC 5.75/78.24 138 Chr17Ex12R,TTTTAACCTGCTAAGTGGCATTCGGGAAACTTCCAGAGAGttccgga atctagattggatcttgctggcac 46.48/76.91 (Bold capital letters are universal primer sequences. Small letters are stuffer sequences, capital letters are target specific sequences)

Table S2. Subjects with deletion or duplication syndromes associated with short stature Name of Syndrome # of Subjects 22q11 deletion 11 1q44 deletion 1 1p36 deletion 4 16p13 deletion 9 Smith-Magenis 4 Potocki-Lupski 1 Mowat-Wilson 3 Wolf-Hirschhorn 2 Williams 3 SHOX/Xq deletion 6

Table S3. Genic CNV Burden Association Analysis in 415 Short Cases versus 38 Controls All Frequencies Common (>5%) Lower Frequency (<5%) Rare (<1%) P- value Case Control P- value Case Control P- value Case Control Case Control Deletions and Duplications Total Number of CNV 3178 28785 875 7959 581 5352 364 3357 Number of CNV per individual 7.7 7.6.31 2.1 2.1.43 1.4 1.4.53.88.88.49 Total CNV burden per individual (kb) 39 2259.1 137 137.51 221 22.1 195 175.12 Average CNV size per individual (kb) 39 298.2 6 59.12 114 15.5 124 18.2 Deletions Only Total Number of CNV 1674 15237 47 391 329 2867 199 1712 Number of CNV per individual 4. 4..43 1. 1..83.79.75.22.48.45.23 Total CNV burden per individual (kb) 1781 1121.2 92 95.77 142 137.32 136 125.19 Average CNV size per individual (kb) 521 264.2 6 6.52 97 93.27 14 98.23 Duplications Only Total Number of CNV 154 13548 413 3636 38 2911 231 2145 Number of CNV per individual 3.6 3.6.3 1..96.2.74.77.61.56.56.51 Total CNV burden per individual (kb) 1398 1226.8 85 83.27 181 166.16 183 162.13 Average CNV size per individual (kb) 354 324.15 58 56.13 123 111.7 133 115.4 P- value P-values showing significant associations are in bold.

Table S4. Quantitative Trait CNV Association Analysis in Clinical Cohort All Subjects N=4411 STD ERR P BETA Subjects with height Z< N=2334 STD ERR P BETA Subjects with height Z> N=267 STD ERR P BETA Global CNV Burden Number of CNV per individual.22.11.45 -.3.21.88.37.3.22 Total CNV burden per individual -.27.11.13 -.4.22.7.17.29.54 Average CNV size per individual -.45.11 3.1E-5 -.42.22.55 -.5.28.87 Genic CNV Burden Number of CNV per individual.6.11.55 -.16.21.45.41.3.17 Total CNV burden per individual -.45.11 3.4E-5 -.53.22.16.8.28.77 Average CNV size per individual -.54.11 5.E-7 -.52.21.17 -.1.28.73 All CNV measures were inverse normalized due to the non-normal distribution in the clinical cohort. Linear regression was then performed looking at each measure of inverse normalized CNV burden as a function of height Z-score. The beta value indicates the magnitude of increase in Z-score of CNV measure for every 1 standard deviation increase in stature. The analyses of subjects with height Z< or Z> indicate that the linear regression was repeated just including subjects with height Z-scores less than or greater than. These analyses demonstrate that the associations between increased burden of CNV and larger average size of CNV with decreased height are driven by subjects with height Z-scores less than. These associations are not present in subjects with height Z-scores greater than. This is consistent with the findings in the case control association analyses in which these associations were only present for short cases compared to controls but not tall cases compared to controls.

Table S5. Quantitative Trait CNV Association Analysis in Clinical Cohort Divided By Frequency and Type of CNV CNV CATEGORY Global Deletions Global Duplications Genic Deletions Genic Duplications ALL FREQUENCIES STD ERR P BETA COMMON (>5%) STD ERR P BETA LOWER FREQUENCY (< 5%) STD ERR P BETA RARE (< 1%) STD ERR ANALYSIS BETA P Number of CNV per individual.26.11.14.41.1.1 -.9.1.39 -.11.9.25 Total CNV burden per individual -.11.11.32.4.11.1 -.38.1.2 -.34.1.3 Average CNV size per individual -.22.11.38.27.11.13 -.37.1.3 -.34.1.3 Number of CNV per individual.1.11.92 -.3.1.75.5.1.61.2.9.82 Total CNV burden per individual -.17.11.11 -.6.11.58 -.14.1.19 -.13.1.18 Average CNV size per individual -.21.11.47 -.6.11.57 -.17.1.1 -.14.1.14 Number of CNV per individual.2.11.58.34.1.8.1.1.94 -.4.9.68 Total CNV burden per individual -.19.11.83.4.1.1 -.1.1.31 -.1.9.29 Average CNV size per individual -.29.11.7.28.1.7 -.9.9.34 -.9.9.32 Number of CNV per individual -.12.11.25.4.1.7 -.4.1.97 -.2.9.84 Total CNV burden per individual -.3.11.5.1.1.31 -.14.1.15 -.15.9.12 Average CNV size per individual -.28.11.9.9.1.38 -.14.1.18 -.14.9.13

P-values less than.1 are in bold (Bonferroni corrected threshold for this table). All CNV measures were inverse normalized due to the nonnormal distribution in the clinical cohort. The beta value indicates the magnitude of increase in Z-score of CNV measure for every 1 standard deviation increase in stature based on the linear regression.

Table S6. Height distribution in 3 candidate regions divided by copy number status Deletions Duplications Copy Number 2 11q11 #CNV 1125 999 2279 Mean Height Z-score -.15 -.388 -.15 Median Height Z-score.27 -.353 -.83 Variance 1.974 2.19 1.861 Interquartile Range -.97 to.84-1.3 to.54 -.95 to.75 14q11.2 #CNV 152 392 259 Mean Height Z-score -.58 -.334 -.251 Median Height Z-score.37 -.335 -.18 Variance 1.878 1.969 1.95 Interquartile Range -.83 to.86-1.29 to.71-1.1 to.67 17q21.31 #CNV 69 288 356 Mean Height Z-score.31 -.314 -.221 Median Height Z-score.55 -.323 -.124 Variance 1.824 1.654 1.969 Interquartile Range -.77 to.93-1.17 to.55-1.8 to.72

Funding Acknowledgments for the CARe Cohorts Atherosclerotic Risk in Communities (ARIC): The Atherosclerosis Risk in Communities Study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute contracts N1-HC-5515, N1- HC-5516, N1-HC-5518, N1-HC-5519, N1-HC-552, N1-HC-5521, N1-HC-5522, R1HL87641, R1HL59367, R1HL86694 and RC2 HL12419; National Human Genome Research Institute contract U1HG442; and National Institutes of Health contract HHSN2682625226C. The authors thank the staff and participants of the ARIC study for their important contributions. Infrastructure was partly supported by Grant Number UL1RR255, a component of the National Institutes of Health and NIH Roadmap for Medical Research. Coronary Artery Risk in Young Adults (CARDIA): University of Alabama at Birmingham, Coordinating Center, N1-HC-9595; University of Alabama at Birmingham, Field Center, N1-HC-4847; University of Minnesota, Field Center and Diet Reading Center (Year 2 Exam), N1-HC-4848; Northwestern University, Field Center, N1-HC-4849; Kaiser Foundation Research Institute, N1-HC-485; University of California, Irvine, Echocardiography Reading Center (Year 5 & 1), N1-HC-45134; Harbor-UCLA Research Education Institute, Computed Tomography Reading Center (Year 15 Exam), N1-HC-5187; Wake Forest University (Year 2 Exam), N1-HC-4525; New England Medical Center (Year 2 Exam), N1-HC-4524 from the National Heart, Lung and Blood Institute. Cleveland Family Study (CFS): Case Western Reserve University (RO1 HL4638-1-16). Jackson Heart Study (JHS): Jackson State University (N1-HC-9517), University of Mississippi (N1-HC- 95171), Tougaloo College (N1-HC-95172).

Multi-Ethnic Study of Atherosclerosis (MESA): MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support is provided by grants and contracts N1 HC-95159, N1-HC-9516, N1-HC-95161, N1-HC-95162, N1-HC-95163, N1-HC-95164, N1-HC-95165, N1-HC-95166, N1-HC-95167, N1-HC-95168, N1-HC- 95169 and RR-24156.