DISSERTATION. Adam Michael Suhy. Graduate Program in Integrated Biomedical Science Program. The Ohio State University. Dissertation Committee:

Similar documents
BIOMEDICAL SCIENCES GRADUATE PROGRAM SPRING 2015

UNIT 6 GENETICS 12/30/16

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 6 Patterns of Inheritance

REPRODUCTION AND GENETICS

Genetics. The study of heredity. Father of Genetics: Gregor Mendel (mid 1800 s) Developed set of laws that explain how heredity works

Genes and Inheritance (11-12)

I. Classical Genetics. 1. What makes these parakeets so varied in color?

Mendelian Genetics. 7.3 Gene Linkage and Mapping Genes can be mapped to specific locations on chromosomes.

Mendel and Heredity. Chapter 12

The passing of traits from parents to offspring. The scientific study of the inheritance

Genetics: CH9 Patterns of Inheritance

Introduction to Genetics

CHAPTER 4 RESULTS. showed that all three replicates had similar growth trends (Figure 4.1) (p<0.05; p=0.0000)

Unit 7 Section 2 and 3

8.1 Genes Are Particulate and Are Inherited According to Mendel s Laws 8.2 Alleles and Genes Interact to Produce Phenotypes 8.3 Genes Are Carried on

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Biology 12. Mendelian Genetics

Gregor Mendel. What is Genetics? the study of heredity

Mendel and Heredity. Chapter 12

11-1: Introduction to Genetics

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

Genetic Variation Junior Science

Biology. Slide 1 of 31. End Show. Copyright Pearson Prentice Hall

Gregor Mendel Father of Genetics

Introduction to Genetics

Mendelian Genetics. Gregor Mendel. Father of modern genetics

Genetics and Heredity Notes

Regulation of Gene Expression in Eukaryotes

Prentice Hall. Biology: Concepts and Connections, 6th Edition (Campbell, et al) High School

The Biology and Genetics of Cells and Organisms The Biology of Cancer

Mendelian Genetics and Beyond Chapter 4 Study Prompts

Ch 8 Practice Questions

Name Class Date. KEY CONCEPT The chromosomes on which genes are located can affect the expression of traits.

Genetics & Heredity 11/16/2017

Genetics and heredity. For a long time, general ideas of inheritance were known + =

B-4.7 Summarize the chromosome theory of inheritance and relate that theory to Gregor Mendel s principles of genetics

1) DNA unzips - hydrogen bonds between base pairs are broken by special enzymes.

Biology 2C03: Genetics What is a Gene?

Eukaryotic Gene Regulation

Semester 2- Unit 2: Inheritance

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

UNIT III (Notes) : Genetics : Mendelian. (MHR Biology p ) Traits are distinguishing characteristics that make a unique individual.

3. What law of heredity explains that traits, like texture and color, are inherited independently of each other?

Mendelian Genetics. Biology 3201 Unit 3

Human Genetics (Learning Objectives)

Biology. Slide 1 of 31. End Show. Copyright Pearson Prentice Hall

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Psych 3102 Lecture 3. Mendelian Genetics

SEX. Genetic Variation: The genetic substrate for natural selection. Sex: Sources of Genotypic Variation. Genetic Variation

UNIT IV. Chapter 14 The Human Genome

Writing the Rules of Heredity. 23. Genetics I

Summary and concluding remarks

Patterns of Inheritance

For a long time, people have observed that offspring look like their parents.

HEK293FT cells were transiently transfected with reporters, N3-ICD construct and

Extra Review Practice Biology Test Genetics

Chapter 11. Introduction to Genetics

Table 1 Functional polymorphisms identified by XGEN group, Center for Pharmacogenomics in OSU College of Medicine.

Mendelian Genetics. KEY CONCEPT Mendel s research showed that traits are inherited as discrete units.

Genetics. *** Reading Packet

Meiotic Mistakes and Abnormalities Learning Outcomes

5.5 Genes and patterns of inheritance

Name Period. Keystone Vocabulary: genetics fertilization trait hybrid gene allele Principle of dominance segregation gamete probability

Mendelian Genetics. You are who you are due to the interaction of HEREDITY and ENVIRONMENT. ENVIRONMENT: all outside forces that act on an organism.

Patterns of Heredity Genetics

Chapter 11 introduction to genetics 11.1 The work of Gregor mendel

Genome - Wide Linkage Mapping

Chapter 12 Multiple Choice

GENETICS - CLUTCH CH.2 MENDEL'S LAWS OF INHERITANCE.

Pedigree Construction Notes

Soft Agar Assay. For each cell pool, 100,000 cells were resuspended in 0.35% (w/v)

2015 AP Biology Unit #4 Test Cell Communication, Cancer, Heredity and The Cell Cycle Week of 30 November

MicroRNA sponges: competitive inhibitors of small RNAs in mammalian cells

GENETICS PREDICTING HEREDITY

Mendelian Genetics Chapter 11

You are who you are because of a combination of HEREDITY and ENVIRONMENT. ENVIRONMENT: all outside forces that act on an organism.

Section 11 1 The Work of Gregor Mendel (pages )

Chapter VIII: Dr. Sameh Sarray Hlaoui

EOG Practice:,Evolution & Genetics [126663]

Computational Systems Biology: Biology X

Class XII Chapter 5 Principles of Inheritance and Variation Biology

A gene is a sequence of DNA that resides at a particular site on a chromosome the locus (plural loci). Genetic linkage of genes on a single

Biology. Chapter 13. Observing Patterns in Inherited Traits. Concepts and Applications 9e Starr Evers Starr. Cengage Learning 2015

BIOMEDICAL SCIENCES GRADUATE PROGRAM SUMMER 2014

Introduction to Genetics and Heredity

MENDELIAN GENETICS. Law of Dominance: Law of Segregation: GAMETE FORMATION Parents and Possible Gametes: Gregory Mendel:

Genes and Inheritance

Notes: Mendelian Genetics

Genetics 1. Down s syndrome is caused by an extra copy of cshromosome no 21. What percentage of

Patterns of Inheritance

Objectives. ! Describe the contributions of Gregor Mendel to the science of genetics. ! Explain the Law of Segregation.

SUPPLEMENTARY INFORMATION

Chapter 17 Genetics Crosses:

RNA Processing in Eukaryotes *

12 MENDEL, GENES, AND INHERITANCE

Genetics & The Work of Mendel. AP Biology

9/25/ Some traits are controlled by a single gene. Selective Breeding: Observing Heredity

Section 8.1 Studying inheritance

Ambient temperature regulated flowering time

Gallery Walk. Fundamentals of Genetics

Transcription:

Regulation of Cholesteryl Ester Transfer Protein and Expression of Transporters in the Blood Brain Barrier DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Adam Michael Suhy Graduate Program in Integrated Biomedical Science Program The Ohio State University 2015 Dissertation Committee: Wolfgang Sadee, Advisor Amanda Toland Joseph Kitzmiller Kalpana Ghoshal

Copyright by Adam Michael Suhy 2015

Abstract Coronary artery disease (CAD) accounts for more deaths in America than any other disease, and places a considerable economic burden on the healthcare system. Statins have successfully reduced the risk of cardiac mortality; however, a residual risk of approximately 20-26% has been observed. Reduced activity of cholesteryl ester transfer protein has been shown to increase the risk of atherosclerotic death in males on statins, supporting a possible genetic basis for a portion of the observed residual risk. SNPs in CETP have been associated with a non-active splice form, increased HDL cholesterol, and allelic expression imbalance in CETP. The goal of the first portion of this study is to identify the functional SNPs responsible for the differential regulation of CETP splicing and expression through the use of molecular genetics. I use mini-genes and a qrt-pcr based assay to examine the effect of 2 SNPs on alternative splicing. I also investigate the interactions between SNPs residing in an upstream haplotype block with transcription factor binding sites. I demonstrate the effect of 3 candidate SNPs on expression of a luciferase reporter. Additionally, I interrogate RNA-sequencing data to uncover expression and alternative splicing of ABC and SLC transporter proteins in the blood brain barrier using computational tools. The effect of each candidate SNP (rs5883, and rs9930761) on splicing was apparent in the in vitro system that I assayed. The minor allele of rs5883 significantly ii

increased the amount of alternatively spliced mrna in HepG2 cells by 1.25 fold (pvalue=0.0001), while rs9930761 had no effect. Thus, demonstrating the measureable effect of only rs5883 in regulating the amount of alternative splicing in liver. In the upstream haplotype block, I found that 3 SNPs interact with putative transcription factor binding sites for factors that are highly expressed in liver. rs17231506 had no effect in HepG2 cells, however, rs247616 caused a significant decrease in luciferase activity, and rs173539 resulted in a significant increase in luciferase activity. Due to the high linkage disequilibrium between these two SNPs, and the association of the minor allele of the SNPs in this haplotype block with decreased allelic expression imbalance, it is apparent that the effect of rs247616 predominates over that of rs173539. I conclude that the increase in alternative splicing, and thus decrease in CETP activity, is accounted for by the activity of rs5883. rs5883 should be included in future clinical association studies to verify its effect on clinical outcomes. Additionally, rs247616 as a marker is sufficient to assess the effect on CETP expression due to the upstream haplotype block. Knowledge of a patient s rs5883 and rs247616 genotype in combination with previously established CETP SNPs will provide an improved prediction of their CETP activity and statin response. In addition to my work on CETP regulation, I also studied the expression of ATPbinding cassette (ABC) and solute carrier (SLC) transporters in the blood brain barrier (BBB). The BBB is the name given to the virtually impenetrable nature of the blood vessels in the brain and central nervous system (CNS) preventing the passive diffusion of most solutes. This barrier, in combination with specific transporters expressed in the iii

endothelial walls of the vessel, allows for precise control of nutrient and waste influx and efflux. Several diseases, such as Parkinson s and Alzheimer diseases, associate with genetic variants in certain transporters, indicating a need to understand what transporters are present in the BBB, and thus what solutes or drugs are transported in or out of the BBB. I used RNA-sequencing data to screen the expression and splicing of all transporters in the ABC and SLC families. Using computational tools, such as LifeScope alignment software and Cufflinks to align and assemble transcripts, I compared the expression of transporters in whole cerebral cortex tissue samples and cerebral cortex samples enriched for brain microvessel endothelial cells (BMEC) from the same individual. I identified nearly 160 transporter genes and pseudogenes that are at least 1.25-fold enriched in the BMEC enriched samples. Sixty-three of these were more than 2-fold enriched, indicating a likely role in the BBB. Many were previously implicated in BBB function, others have known functions in the brain, and some do not have any previous evidence of brain or CNS function. Additionally, I analyzed splice junctions to determine what splice forms are enriched in BMECs. I ve shown that RNA-seq can be a powerful tool for screening tissues, such as these, for transporters that have not been shown previously to express in the BBB. Knowledge of BBB transporters can lead to a better understanding of some neurological disorders and improved drug therapies. iv

Dedication This work is dedicated to my family. To my mother and father for always asking when I m going to graduate so that I push myself to do it, and their support all along the way. I am especially grateful for the support of Angela, who has always been there for me over the course of my time at OSU. Without you, I would not be typing this today. Thank you. v

Acknowledgments I would like to acknowledge the support of my entire lab, and particularly the camaraderie of Beth and Jon. Thank you also to Wolfgang for steering me back on track numerous times, and to the rest of my committee for their support along my way. vi

Vita 2003 Plymouth-Whitemarsh High School 2007 B.S. Chemistry, Carnegie Mellon University 2007 B.S. Biological Sciences, Carnegie Mellon University 2010-present Graduate Research Assistant, Department of Pharmacology, The Ohio State University Publications Suhy, A., Hartmann, K., Newman, L., Papp, A., Toneff, T., Hook, V., and Sadee, W. (2014). Genetic variants affecting alternative splicing of human cholesteryl ester transfer protein. Biochem. Biophys. Res. Commun. 443, 1270 1274. doi:10.1016/j.bbrc.2013.12.127 PMID: 24393849 Kitzmiller, J.P., Binkley, P.F., Pandey, S.R., Suhy, A.M., Baldassarre, D., and Hartmann, K. (2013). Statin pharmacogenomics: pursuing biomarkers for predicting clinical outcomes. Discov. Med. 16, 45 51. PMID: 23911231 Smith, R.M., Webb, A., Papp, A.C., Newman, L.C., Handelman, S.K., Suhy, A., Mascarenhas, R., Oberdick, J., and Sadee, W. (2013). Whole transcriptome RNA- Seq allelic expression in human brain. BMC Genomics 14, 571. PMID: 23968248 vii

In review: Suhy, A., Hartmann, K., Papp, A. C., Wang, D., Sadee, W. Regulation of CETP expression by upstream polymorphisms: Reduced expression associated with rs247616. Gerber, M., Hampel, H., Zhou, X.P., Deveci, M., Catalyurek, U., Schulz, N. P., Suhy, A., Balmain, A., de la Chapelle, A., Toland, A. E. Allele-specific imbalance mapping at human orthologs of mouse susceptibility to colon cancer (Scc) loci. Fields of Study Major Field: Integrated Biomedical Science Program viii

Table of Contents Abstract... ii Dedication... v Acknowledgments... vi Vita... vii Table of Contents... ix List of Tables... xi List of Figures... xii Chapter 1: Introduction... 1 Genetics and Heritability... 1 Genetic Regulation and detection... 8 Pharmacogenomics... 10 Cholesteryl Ester Transfer Protein... 11 Bioinformatics and the Blood Brain Barrier... 14 Chapter 2: Alternative Splicing of CETP... 16 2.1 Introduction... 16 ix

2.2 Material and Methods... 19 2.4 Discussion... 35 Chapter 3: Regulation of CETP Expression by Upstream Polymorphisms... 38 3.1 Introduction... 38 3.2 Material and Methods... 41 3.3 Results... 45 3.4 Discussion... 64 Chapter 4: Transporter Proteins in the Blood Brain Barrier... 66 4.1 Introduction... 66 4.2 Material and Methods... 68 4.3 Results... 70 4.4 Discussion... 81 Chapter 5: Conclusion... 83 References... 89 Appendix A: Abbreviations... 98 x

List of Tables Table 1 Primers used in reactions... 19 Table 2 Liver and plasma sample genotypes and mrna splicing percentages... 34 Table 3 Primers used in PCR and infusion cloning reactions... 44 Table 4 Linkage structure of rs247616... 46 Table 5 Linkage structure of rs708272... 47 Table 6 Sequenced SNP genotype counts in AEI+ vs AEI- samples... 48 Table 7 SNPs in high Linkage Disequilibrium with rs247616... 57 Table 8 Transporters enriched in BMEC vs whole tissue... 73 Table 9 - Genes with exon junctions counted more frequently in BMEC... 78 Table 10 Genes with exon junctions counted less frequently in BMEC... 79 xi

List of Figures Figure 1 An illustration of chromatin structure... 4 Figure 2 Illustration of Reverse Cholesterol Transport... 11 Figure 3 Illustration of the qrt-pcr splicing assay... 22 Figure 4 Standard curve for splicing assay... 23 Figure 5 Linear effect of lower transfection concentrations... 27 Figure 6 Time course of mini-gene expression in HEK293 cells... 29 Figure 7 CETP mrna splicing in cell lines.... 31 Figure 8 Western Blots with a CETP antibody in liver protein extract and plasma.... 33 Figure 9 rs72786786 association with AEI... 49 Figure 10 - rs72786786 association with AEI with homozygous genotypes separated... 50 Figure 11 rs1800775 association with AEI... 51 Figure 12 CETP expression in GTEx samples.... 53 Figure 13 - CETP expression reported from biogps... 54 Figure 14 mrna expression of CETP in tissues... 55 Figure 15 Expression of top 5 transcription factors... 60 Figure 16 The effect of expression of the luciferase reporter gene... 63 Figure 17 Plot of RNA-seq vs qrt-pcr... 71 xii

Chapter 1: Introduction Genetics and Heritability As early as 11,500 BCE humans have been domesticating plants for use in agriculture (Hillman et al., 2001). Animals, particularly goats, have been domesticated since 8000 BCE (Zeder and Hesse, 2000). To selectively breed plants or animals, one must understand that traits are passed from one generation to the next, requiring at least a rudimentary understanding of heredity. The knowledge of selecting desirable traits in plants and animals spread throughout the world, leading to more productive food sources allowing for human populations to grow. However, it was not until Gregor Mendel, an Augustinian friar, that the field of genetics really began. Mendel s work with pea plants provided an understanding of how those traits that had been selected for by farmers for centuries were passed from generation to generation. Mendelian inheritance is the phrase now used to describe inheritance of traits that follow the laws described by Mendel (Mendel, 1866); specifically, these are the laws of segregation, independent assortment, and dominance. The law of segregation states that the two alleles of a gene are segregated from one another in the gamete of each parent then passed to offspring, where one allele from the mother and one from the father are reunited when the zygote is formed. Mendel also observed that each pair of alleles segregated independently from other alleles during 1

gametogenesis, which he defined as the law of independent assortment. Organisms with two identical alleles are said to be homozygous for that allele; if they have 2 different alleles, they are called heterozygous. The law of dominance states that traits requiring only one allele to be present are shown to be due to alleles with effects that are dominant over other alleles, which are known as recessive. Thus an organism that is heterozygous for an allele will show the dominant phenotype or trait. If the organism is homozygous for the recessive allele, they will express the recessive phenotype. In the time of Mendel, it was unclear what material was transferred from parent to offspring conferring heredity. DNA, while known to exist since the late 19 th -century (Dahm, 2008; Miescher, 1871), was not thought to be the carrier of genetic information until 1943 (Avery et al., 1944) and then confirmed in 1952 (Hershey and Chase, 1952). Astbury provided evidence that DNA had a regular structure in 1937 (Astbury and Bell, 1938), and then Watson and Crick published on the double-helix structure of DNA in 1953 (Watson and Crick, 1953). Soon after the structure of DNA was confirmed, an understanding of the molecule s role began to crystalize. The central dogma of molecular biology was proposed by Francis Crick in 1956 and later published in 1970 (Crick, 1970). Crick proposed that DNA can be used as a template to create RNA (now called transcription), and RNA can be used as a template to make protein (now called translation). He also postulated that in some situations, RNA can be used to make more RNA or DNA, as well. 2

At this point, it was clear that DNA was the heritable genetic material passed from parent to offspring. By fully understanding the structure of the molecule, and how the basepairs (bp) interacted, more advanced topics in genetics could be explored. Today, we know that only some human traits follow Mendel s laws. A trait or phenotype that is under the control of a single gene locus is known as a Mendelian trait, such as a mutation in the CFTR gene that results in the disease Cystic Fibrosis. Non- Mendelian traits are those that do not necessarily follow all of Mendel s 3 laws. For example, many phenotypes are considered to be polygenic, resulting from differences in multiple genes. Many complex diseases that are studied today, such as coronary artery disease (CAD) and many neurological diseases, are the result of the effects of many genes. Further, some alleles are considered to have other patterns of inheritance such as incomplete dominance or codominance, where a combination of the two different alleles are detectable. Simple examples include red and white flowers producing pink flowers, or black and white chickens producing speckled chickens. The variation in genes between individuals is due to many possible factors. Single nucleotide polymorphisms (SNPs), and insertions and deletions (indels) can change the function and regulation of genes in a relatively direct fashion by changing amino acids that are incorporated into proteins, or preventing the binding of transcription factors, enhancers, or repressors. Changes to other proteins, such as histone modifications, can alter the way certain genes are expressed, as well. 3

Histones are protein complexes whose function is to organize DNA. In our cells nuclei DNA is structured as chromatin. Chromatin is an organized structure of DNA, RNA, and histone proteins, which is further condensed into chromosomes during cell replication. An illustration of the organization of DNA is shown in Figure 1, below. Figure 1 An illustration of chromatin structure showing the levels of organization (A) and a crystal structure of DNA wrapped around a histone protein complex (B). (Felsenfeld and Groudine, 2003) 4

Histones are subject to modification primarily by methylation and acetylation. These modifications cause the histone proteins to tighten or loosen the DNA wrapped around them. The role of histones in gene regulation can be gleaned by inspection of the second row of Figure 1, labeled Beads on a string. It can be seen that some DNA is in contact with the histone proteins, while other DNA is loose between the beads. While loosened, DNA is more available to transcription machinery than while tightly wrapped around histones. This regulation does not depend on the sequence of the DNA wrapped around the histone, and is thus called epigenetic, or above genetics. A project called The Encyclopedia of DNA elements (ENCODE), by the National Human Genome Research Institute (NHGRI), has defined regions of the genome that interact with modified histone proteins using ChIP-seq analysis (Kellis et al., 2014). These regions are potentially affected by histone modifications, and are likely to be areas containing regulatory elements. In addition to non-mendelian traits that do not abide by the law of dominance, the law of independent segregation has some exceptions. Rather than completely independently assorting during gametogenesis, alleles at multiple loci may be passed on together more often than by pure chance. When this happens, alleles are said to be in linkage disequilibrium (LD) with one another. The more frequently alleles are observed together in a population, the stronger the LD. Alleles that tend to be passed on together and have high LD with one another are members of a haplotype. Haplotypes are regions 5

of DNA that contain alleles that tend to be passed to offspring together, and act as a unit to affect phenotypic change in some cases. Beyond heredity, it is clear that a person s environment plays a large role in the traits that they display. For example, lung cancer has a strong environmental component (smoking) associated with the disease. However, there are people who smoke but do not get lung cancer. A portion of this effect can be explained through genetics. By studying identical twins, siblings, and immediate family members who share all or much of their DNA, it can be determined what portion of a phenotype is dictated by genetics, or its heritability, and how much is due to environmental influences. Physicians have known for many years that many diseases are heritable, and thus take a family history to estimate the risk of disease for patients and their relatives. More recently studies have been conducted to estimate heritability. For example, in 2005, a study of nearly 900 siblings calculated that the heritability of CAD is approximately 50% (Fischer et al., 2005), meaning that about half of the variation in CAD phenotypes across the population is due to genetics. However, this does not necessarily mean that the genes and genetic variations that are responsible are understood. When considering all of the genetic factors that impact CAD disease risk that are currently known, only about 10-12% of the heritability is currently explained (Deloukas et al., 2013; Tada et al., 2014). The remaining 40% is what is known as missing heritability. Based on the sibling studies, it is apparent that there are additional genetic components that contribute to CAD disease risk that remain undiscovered. By 6

uncovering more of these genetic risk factors, we can better detect, diagnose and treat disease. 7

Genetic Regulation and detection There are many mechanisms of gene regulation that can occur at any stage between DNA and protein, and even after proteins are made. For many years, the most studied driver of genetic variations were non-synonymous or nonsense mutations in DNA that caused a change in the structure of messenger RNA (mrna) and protein. The resultant proteins would incorporate inappropriate amino acids, or be terminated prematurely, creating deleterious effects. In addition to changes in the coding sequence, the expression of a gene can be affected by other heritable variations. SNPs that reside in enhancer or repressor regions of a gene are capable of altering the binding of transcription factors, resulting in differential gene expression. After mrna is transcribed from DNA it undergoes a process of modifications, such as 5 capping, addition of the 3 poly-a tail, and splicing to join exons and remove introns. Alternative splicing, such as exclusion of exons, or retained introns in the final mrna, is a source of genetic diversity in humans, allowing the production of additional proteins from a single gene. However, it is also a source of potential dysfunction. SNPs in regions required for controlled splicing (such as splicing branch points, exonic splicing enhancer (ESE), or exonic splicing silencer (ESS) sites) may result in more or less copies of a specific splice form, resulting in more or less functional protein. Together, all of these factors can have severe impacts clinically and most are implicated in complex diseases. In order to detect some of the effects of gene regulation, our lab employs a technique to measure allelic expression imbalance (AEI). AEI is the 8

differential expression of mrna of one allele versus the other measured at an exonic marker SNP (Johnson et al., 2008). The benefit of measuring AEI compared to expression by quantitative real-time polymerase chain reaction (qrt-pcr) or a similar technique, is that any variation between the alleles is due to cis-acting variants in or around the gene only. Any variation in trans-acting factors, such as transcription factor expression, acts upon both alleles and only the relative difference between the alleles is considered. We then genotype or sequence the gene region to find SNPs that are associated with the imbalance. With a SNP associated with AEI, we can explore the molecular genetics with mini-genes or reporter assays to determine functionality. A SNP known to be functional can be tested clinically for association with a disease phenotype or drug interaction. In this way, we can identify SNPs that are potentially clinically relevant with relatively few samples. With access to clinical datasets, we can test functional SNPs for associations with disease and/or drug status to confirm the SNPs clinical effect. 9

Pharmacogenomics Pharmacogenomics is the study of the interplay between pharmacology and genomics. More specifically, it is the study of how genetic variation can cause differences in drug response with the goal of identifying genetic biomarkers that are predictive of drug response. Drug metabolizing enzymes are commonly implicated in varied drug responses; reduced or increased activity of a drug metabolizing enzyme, such as the Cytochrome P450 (CYP) family of enzymes, may cause slower or faster removal of a drug, resulting in a potential over- or under-dosing of the drug. There exist other examples where the mechanism for the differential drug response is less clear. For example, patients who have had myocardial infarction (MI) or have an unfavorable lipid profile are typically prescribed statins. Statins are effective in reducing low-density lipoprotein (LDL) cholesterol levels. However, there is a residual risk of 20-26% for patients with CAD taking statins (Campbell et al., 2007; Mora et al., 2012). These patients have a reduction in LDL cholesterol, but still progress to have cardiac events in the future. Some of the observed residual risk is likely to be explained by genetics. 10

Cholesteryl Ester Transfer Protein Cholesteryl ester transfer protein (CETP) is involved in reverse cholesterol transport (RCT). RCT is the process by which cholesterol in the periphery is collected and returned to the liver for metabolism and excretion in the bile (Figure 2). Apolipoprotein A-I and A-II are produced in the liver and intestines, and secreted into circulation as a complex called high density lipoprotein (HDL). HDL particles collect cholesterol from foam cells (macrophages adhered to arterial walls) via ABCA1 and ABCG1 transporters. The cholesterol is esterified and made more hydrophobic by LCAT, another component of the HDL particle complex, to allow for denser packing of the particle. Mature HDL particles interact with a hepatocyte receptor, SR-B1, initiating uptake into the liver. Figure 2 Illustration of Reverse Cholesterol Transport (Heinecke, 2012) 11

Before HDL particles are taken up by the liver via SR-B1, CETP exchanges some of the cholesteryl esters (CE) for triglycerides (TG) from low-density lipoprotein (LDL) particles. This process acts to lower HDL cholesterol and increase LDL cholesterol, increasing the risk of CAD based on the lipid profile. Due to this effect, CETP has been a popular target for pharmacological inhibition, but no inhibitors have successfully improved patient outcomes (Barter and Caulfield, 2007; Schwartz et al., 2012). Variants in CETP that reduce the function of the protein have been shown in some cases to increase the risk of CAD (Zhong et al., 1996), while other variants that do the same are linked to lower risk and longevity (Barzilai et al., 2003; Freeman et al., 1994; Kuivenhoven, 1998; Willer et al., 2008). Our lab has previously identified variants that associate with an alternative spliceoform, as well as variants upstream of CETP that are members of a large haplotype block, and are associated with allelic expression imbalance (AEI) indicating a reduction in CETP expression and increased HDL cholesterol levels (Papp et al., 2012). Genetic variation in CETP may also explain some of the residual risk observed in patients treated with statins(kitzmiller et al., 2013). There is evidence that reduced activity of Cholesteryl Ester Transfer Protein (CETP) in males taking statins is associated with higher risk of atherosclerotic death (Regieli et al., 2008), indicative of a gene-drug interaction, and suggesting that treating carriers of CETP reducing variants may be less beneficial or possibly detrimental(kitzmiller et al., 2013). A specific understanding of the functional genetics responsible for the regulation of CETP activity is essential for 12

selection of clinical biomarkers that will allow for risk prediction and appropriate treatment. The regulation of CETP splicing and transcriptional expression will be the focus of the chapters 2 and 3. 13

Bioinformatics and the Blood Brain Barrier Bioinformatics is a field that applies the techniques of computer science, mathematics, and statistics to the study of biological data. Through the use of algorithms, scripts and pipelines, large amounts of data can be processed and interpreted by a few people with access to a computer server. Bioinformatics has been the bottleneck in the recent explosion of data being produced by GWAS, and sequencing studies. Data is being produced faster than it can be analyzed and interpreted by the lab. Tools such as Bowtie, TopHat and Cufflinks can effectively detect differential gene and transcript expression from RNA-sequencing (RNA-seq) output files (Trapnell et al., 2012). The ability to navigate the systems, process and interpret data is an increasingly sought after skill in research. My interest in computers and data has drawn me toward learning additional techniques beyond the basics used to date by our lab. To learn these skills, I applied them to RNA-seq data to study the expression and splicing of transporters in the blood brain barrier (BBB). The BBB is the name given to the relatively impermeable vasculature of the brain. Compared to blood vessels in other tissues, blood vessels in the brain are virtually impenetrable to substances other than water, gases and some hydrophobic molecules (Obermeier et al., 2013). All other nutrients and wastes that must pass through the blood vessel wall must undergo primary or secondary active transport. The transporter proteins, primarily members of the ATP-binding cassette (ABC) and solute carrier (SLC) families, are required to transport nearly all necessary nutrients from the blood to the brain tissue, 14

and waste products from the brain tissue into the blood. By improving our understanding of what transporters are involved in this process, and if they are modified for their role in the BBB, we can better understand how to potentially treat disorders due to dysfunction of the BBB. The fourth chapter of this dissertation will detail the work that I did to analyze the transporters of the BBB using novel bioinformatics methods. 15

Chapter 2: Alternative Splicing of CETP 2.1 Introduction Cholesteryl Ester Transfer Protein (CETP) is a key protein in reverse cholesterol transport, a process that moves cholesterol from the periphery to the liver for excretion. Expressed most highly in the liver and released into the circulation (Su et al., 2004), CETP mediates the exchange of cholesteryl esters from high-density lipoproteins (HDL) with triglycerides to low-density lipoproteins (LDL). Increased CETP activity reduces the HDL/total cholesterol ratio, associated with increased risk for coronary artery disease (CAD) (Gordon et al., 1989; Murray et al., 2006; Roger et al., 2012). Accordingly, CETP inhibitors are currently in clinical trials to determine their ability to increase HDL levels and reduce the risk of CAD. However, initial results have been disappointing, in some cases even demonstrating enhanced CAD risk (Nagano et al., 2004; Regieli et al., 2008). While off-target effects of CETP inhibitors are suspected to play a role, it is also possible that genetic CETP variants affect disease risk and treatment outcomes. Subjects lacking functional CETP expression display multiple cardiovascular abnormalities (Nagano et al., 2004), showing that the reverse cholesterol pathway serves important physiological functions. Therefore, optimal CETP activity should balance negative and positive downstream events. A detailed understanding of the genetic architecture of the CETP locus is critical in guiding therapeutic intervention. 16

Numerous genetic studies of CETP have focused on frequent non-synonymous SNPs including I405V (rs5882) and promoter region SNPs within ~1kb of the transcription start site (Corbex et al., 2000; Frisdal et al., 2005); however, the mechanism underlying any effect on CETP activity remained uncertain. GWAS studies have also implicated SNPs in the promoter enhancer region as being associated with circulating CETP and HDL levels, existing in high LD with each other on a long haplotype. The most significant SNPs were found to reside in regions 5-10 kb upstream of CETP (Marmot et al., 1991; Papp et al., 2012), with Taq1B in intron 1 serving as a marker SNP. Despite highly significant association with HDL, any effect of CETP variants on CAD risk remained weak at best (Boekholdt et al., 2005). Carriers of the minor Taq1B allele, associated with reduced CETP activity, may benefit less from statin therapy, suggesting a possible gene - drug interaction (Kuivenhoven, 1998; Regieli et al., 2008). We have studied the molecular genetics of CETP, showing that a frequent SNP 6.2kb upstream (rs247616) is the most likely variant responsible for reduced CETP mrna expression associated with the long upstream haplotype (Papp et al., 2012). This same SNP also has shown a strong association with HDL levels (Barber et al., 2010; Papp et al., 2012), but additional regulatory mechanisms are likely to be operative. Another enhancer SNP in high LD with rs247616 has also shown an association of CAD outcome with statin therapy (Leusink et al., 2013), supporting the notion that CETP activity has clinical relevance. Alternative splicing of CETP mrna has been shown to result in a protein isoform lacking exon 9 ( 9-CETP), which appears to be sequestered in the ER and may 17

act in a dominant-negative fashion by binding to full length CETP preventing its secretion (Inazu et al., 1992). We have identified two SNPs of intermediate minor allele frequency (MAF 4-8%) to be associated with increased formation of the 9-CETP mrna isoform in livers, one in intron 8 (rs9930761) interrupting a putative splicing branch point, and the other in exon 9 (rs5883) creating a putative exonic splicing enhancer (ESE) sequence (Papp et al., 2012). In high LD with each other, these two SNPs reside on opposite alleles to the upstream promoter/enhancer alleles and were found to be associated with increased HDL levels, with an effect size similar to that of the upstream enhancer SNPs (Papp et al., 2012). This strong effect had previously remained hidden because the splicing SNPs reside on opposite haplotypes as the enhancer SNPs, resulting in underestimation of the splicing effect on expressed CETP activity unless the enhancer SNP effect is accounted for (Papp et al., 2012). Moreover, rs5883, has been associated with increased risk of CAD in hypertensive patients, a finding that still requires replication (Papp et al., 2012). The goal of the present study was to test further the influence of rs9930761 and rs5883 on CETP exon 9 splicing. The former has slightly higher allele frequency (MAF ~6%) than the latter (~5%), but associations with mrna expression favor rs5883 (Papp et al., 2012). Therefore, rs9930761 could either have a relatively small effect, it could contribute to or be necessary for the rs5883 effects, or rs5883 alone could be the main variant affecting splicing. Our experiments with mini-gene constructs favor this third hypothesis. 18

2.2 Material and Methods Mini-genes: A genomic DNA region was amplified with PCR using Advantage HD (Clonetech, USA) according to manufacturer's protocol using primers Exon-8F infusion and Exon-10R infusion (Table 1). This region extending from exon 8 to just downstream of exon 10 was inserted by In-Fusion Dry-Down PCR cloning kit (Clonetech, USA) into a pcmv-tag2b expression vector in frame. The procedure was completed by transforming into Stellar competent E. coli (Clonetech, USA). Table 1 Primers used in reactions: PCR, site directed mutation procedures, and splicing assay Primer Sequence (5'-3') Exon8-F infusion CTGCAGGAATTCGATATCGCCAGCATCCTTTCAGATGG Exon10-R infusion ATCGATAAGCTTGATATCAGGGGGCAGTTACCTCTTGGAA rs9930761-sdm C-T CTGAAGCTGGACCTGAGCCCAGTAGGG rs5883-sdm C-T TGGTTCTCTGAGCGAGTCTTTCACTCGCTGGC β-actin-r GCCGATCCACACGGAGTACT Exon10-R AAGATTTCCTGGTTGGTGTTGA Exon8_10-F GGAGTCCCATCACATGGCAG Exon9_10-F GGGAGACGAGTTCATGGCAG Site-directed mutagenesis was carried out with the Quikchange lightning II (Agilent, USA) system. Each SNP (rs5883 and rs9930761) was mutated using primers rs9930761-sdm C-T and rs5883-sdm C-T (sequences shown in Table 1), transformed, 19

and isolated sequentially to create all four haplotype combinations. All constructs were sequenced to confirm proper insertion, showing that the insert sequence was otherwise that of the wild-type CETP. Multiple plasmid constructs were generated, yielding similar results in test transfection experiments. The haplotype plasmids were transformed into XL-10 gold competent cells (Agilent, USA), and three clones of each haplotype were collected and again sequenced. Each of the identical three clones was combined for the transfection assays. Cell Lines: Human Embryonic Kidney (HEK 293) and HepG2 were grown to 70-80% confluence in low glucose DMEM + 10% FBS and 1% Penicillin/Streptomycin. For passaging, HEK293 and HepG2 cells were treated with 0.05% and 0.25% Trypsin in EDTA, respectively. Transfection: Cells were grown to 70-80% confluence in T75 flasks, trypsinized and plated at ~2.5x10 5 cells on 6-well plates, and allowed to grow overnight in 2mL of growth medium. Cells were transfected using Lipofectamine 2000 (Invitrogen, USA). Transfection was optimized, and 7.2µL of Lipofectamine 2000 reagent was brought to 150µL/well total volume in Optimem (Gibco, USA). 16ng of haplotype construct DNA, 1.6µg of empty vector, and 200ng pcdna vector expressing EmGFP (Life Technologies, USA) were mixed in a final volume of 150µL/well with Optimem. This was added to the 20

Lipofectamine dilution and incubated for 5 minutes at room temperature. 275µL was added to each well, followed by incubation at 37 o C cells for 7-24 hours. RNA isolation and cdna synthesis: Media were aspirated from cells and cells lysed and stored in 500µL Trizol (Invitrogen, USA). RNA was isolated and washed using chloroform and isopropyl alcohol. 50µL nuclease free water was added to dissolve pellet, RNA integrity assayed with a Bioanalyzer 2100 (Agilent, USA), and concentration measured with Qubit (Invitrogen, USA) spectroscopy. 1µg RNA was treated with Amplification grade DNase I (Invitrogen, USA). cdna was generated with poly-dt and gene-specific primers (β-actin-r, Ex10-R Table 1) using SuperScript Reverse Transcriptase III (Invitrogen, USA). CETP and 9-CETP mrna assay: The splicing assay relies on measuring the relative expression of the 9-CETP and full-length mrna from the same cdna sample. The measurements were made using qrt-pcr (Life technologies 7500) with SYBR Green. Primer Exon8_10F, which is specific for the Exon 8 to Exon 10 junction of the short form, and Exon9_10-F specific to the exon 9 to exon 10 junction of the long form were used with the Exon10-R (Table 1). PCR cycles included an initial incubation at 95 o C for 20 seconds, followed by a maximum of 40 cycles of 95 o C for 3 seconds and 60 o C for 30 seconds. Cycle thresholds (Ct) were compared between the two reactions and used to determine the relative 21

quantities of the short and long form of the mini-gene. An illustration of the qrt-pcr splicing assay is shown in Figure 3. Genomic/Plasmid DNA: Exon 8 Exon 9 Exon 10 Normally Spliced cdna: Exon 8 Exon 9 Exon 10 Alternatively Spliced cdna: Exon 8_10 Exon 9_10 Exon 8 Exon 10 Exon 10R Figure 3 Illustration of the qrt-pcr splicing assay Splice form specific primers are used to isolate the amplification signals. The assay was validated using known ratios of full length and 9-CETP cdna to form a standard curve (Figure 4). 22

Figure 4 Standard curve for splicing assay. A known ratio of 9-CETP to full length cdna was measured using the splicing assay. CETP Western Blots: Liver samples were prepared as homogenates in buffer with a cocktail of protease inhibitors to prevent protein degradation (Mende-Mueller et al., 2001). Liver and plasma samples were subjected to denaturing gel electrophoresis (4-12% NuPage Bis-Tris gels, Novex) and electrophoretically transferred to nitrocellulose membranes for Western blots. Western blots of human liver samples and plasma were conducted with rabbit anti- CETP (Sigma-Aldrich, USA) detected with goat anti-rabbit gamma-globulin-horseradish 23

peroxidase (ARGG-HRP) (Sigma-Aldrich, USA) using the electrochemiluminescence (ECL) and detection reagent (Amersham, USA) (Toneff et al., 2013). 24

2.3 Results qrt-pcr analysis of full-length CETP mrna and 9-CETP mrna: CETP mrna levels were quantitated by qrt-pcr. Accuracy of the assay was tested using a standard curve with known ratios of cdna fragments spanning exons 8-10 and mini-gene fragment with exons 8 and 10 ( 9-CETP). The standard curve showed a slope equal to 1 with r 2 = 0.99. Native CETP mrna Expression and Splicing in HEK293 and HepG2 cells: Expression of CETP mrna in human embryonic kidney (HEK293) and hepatocellular carcinoma (HepG2) cells was measured using qrt-pcr. The long and short forms of CETP were amplified to determine the extent of splicing in nontransfected cells. CETP mrna was present at very low levels in HepG2 and HEK293, 3.6x10-4 and 3.8x10-4 times that of β-actin mrna used for normalization, respectively. Alternative splicing in HEK293 and HepG2 occurred at similar levels, with the full length mrna accounting for 53% + 1% and 54% + 4% of the total, respectively, indicating that these cells are competent in alternative splicing of CETP. Mini-gene constructs expressing the genomic region of CETP surrounding exons 8-10: To study the effect of each SNP on CETP mrna splicing, mini-gene constructs were developed with all four combinations of the 2 SNPs (rs9930761 T/C alleles, and 25

rs5883 C/T alleles). Mini-gene constructs spanning exon 8 through intron 10 of CETP were created using the pcmv-tag2b expression vector. Initial results revealed a high level of the 9-CETP splice isoform of the minigene in HEK293 cells (83% after 24-hour transfection for the wild-type construct) when cells were transfected with 1.6-µg cdna vector. To determine whether the splicing machinery was affected by the high expression level, 100 fold less (0.016µg) plasmid DNA was transfected resulting in a proportional decrease of expressed mrna, indicating that mrna expression was not saturated (Figure 5). In addition, the level of splicing remained similar after the 100-fold dilution of plasmid DNA used for transfection (82% 9-CETP after a 24-hour transfection for the wild-type construct). Expression of the CETP mini-gene constructs (1.6 and 0.016µg) yielded mrna levels 0.1 to 50 times that of β-actin in HepG2 and HEK293, respectively. This represents a 300 to 1x10 5 higher expression compared to native CETP mrna levels in HepG2 and HEK293. Therefore, native expression of CETP mrna was minimal compared to the mini-gene mrna levels and was ignored in the analysis of mini-gene mrna splicing. An amount of 0.016µg of plasmid DNA was used in each subsequent transfection. 26

27 Figure 5 Linear effect of lower transfection concentrations. Reduction of expression is roughly linear with the reduction of DNA used during transfection. 27

HEK293 cells transfected with 0.016µg of plasmid DNA were harvested at different time points (2, 4, 7, and 24 hours). Vector expressed mrna increased between 4 and 7 hours and peaked at 24 hours (wild-type construct was normalized to β-actin expression 2hr = 0.2%, 4hr = 0.6%, 7hr = 2%, 24hr = 23%) (Figure 6). The level of splicing was also relatively high in HepG2 cells, remaining similar between 7 and 24 hours (7hr = 79% 9-CETP, 24hr = 84% 9-CETP). Therefore a 24-hour incubation period was selected for subsequent experiments. 28

Figure 6 Time course of mini-gene expression in HEK293 cells. 2, 4, 7, and 24 hour time points are used. Expression increases slightly after 7 hours, and no apparent expression is lost at 24 hours. 24 hour time points were used to generate data with the assay. Exon9 splicing of CETP mini-gene constructs in HEK293 and HepG2 cells: All four possible combinations generated by the rs5883 and rs9930761 alleles were transfected into HepG2 and HEK293 cells, and relative expression of the mrnas containing exons 8, 9, and 10 (representing wild-type CETP) and exon 8 and 10 (representing 9-CETP) was measured. Shown in Figure 7, constructs that contain the 29

minor allele of rs5883 (T) express significantly more 9-CETP mrna than those with the major C allele in both HEK293 (1.08 fold, p-value=0.0001) and HepG2 cells (1.25 fold, p-value=0.0001). In contrast, splicing efficiency was indistinguishable between the minor C allele and the major T allele of rs9930761. Moreover, the combination of the minor alleles of rs5883 and rs9930761 did not differ from the splicing levels observed with the minor T allele of rs5883 alone in HepG2 cells, indicating that rs9930761 was not required. In HEK293 cells, the combined effect of the two minor alleles was even reduced compared to splicing levels observed with the minor T allele of rs5883 alone. Experiments were repeated with different plasmid preparations, yielding the same results. These results indicate that rs5883 alone results in enhanced splicing to yield the 9- CETP isoform of the mini-gene in vitro. 30

Figure 7 CETP mrna splicing in cell lines. Mini-gene constructs were transfected into (A) HEK293 and (B) HepG2 cells. qrt-pcr was used to determine amounts of full length and 9-CETP mrna present in each sample. The percentage of 9-CETP was calculated by dividing the expression of 9-CETP by the sum of the expression of 9- CETP and full length. 31

CETP protein in liver and plasma: CETP protein from human liver tissue homogenates was assessed by Western blot using anti-cetp antibody by our collaborators. Bands corresponding to the full length CETP protein of 55-kDa were observed in liver and plasma. Liver samples known to contain the 9-CETP mrna splice isoform of CETP contained a 47-kDa band (Figure 8A, lanes 1 and 2) corresponding in molecular weight (MW) to the 9-CETP splice variant. The minor 100-kDa band may represent a dimer of CETP protein. Lower MW bands may represent truncated or degraded forms of CETP. It is noted that plasma contains only the 55-kDa CETP protein and not the 47-kDa 9-CETP splice variant (Figure 8A lanes 3 and 4), indicating that the 9-CETP protein is not secreted efficiently. Control experiments without the CETP antibody were devoid of these bands (Figure 8B). We also made an attempt to quantitate the relative amounts of CETP and 9-CETP isoform in livers with different CETP genotypes. However, the intensity of the staining of full length CETP and its 9-CETP isoform was highly sensitive to experimental conditions, impeding attempts to attain reproducible quantitation. In multiple experiments, the 9-CETP isoform tended to be robustly expressed in liver tissues, often yielding more intense bands on the gel than the full length CETP protein. This result is consistent with the notion that CETP is excreted from the liver while the 9-CETP isoform is not. An example of Western blots of two liver tissues heterozygous for rs5883 and rs9930761 is provided in Figure 8A. In this case, the intensities of the 55kDa and 47kDa bands vary corresponding to the measured splicing levels of CETP mrna in the same livers. Yet, multiple experiments showed that Western blots could not provide 32

reliable quantitation of CETP protein isoforms. Genotypes of liver and plasma samples, as well as 9-CETPmRNA percentage for livers used in Western blots is provided in 33

Table 2. Figure 8 Western Blots with a CETP antibody in liver protein extract and plasma. Anti-CETP antibody and secondary antibody (ARGG-HRP) were used in A, and secondary antibody only in B. Liver tissues (heterozygous for rs5883) are denoted as Li051 and Li049. Plasma samples are denoted as C-2117 and C-2124. Lanes in panel A and B representing liver samples were loaded with 5µg protein, and plasma with 2.5µg protein based on protein measurements of each sample. 34

Table 2 Liver and plasma sample genotypes and mrna splicing percentages in the liver Sample rs9330761 rs5883 Percent 9-CETP mrna Li049 CT CT 32 Li051 TT CC 17 C-2117 TT CC ND C-377 CC TT ND ND not determined 35

2.4 Discussion We report the results of mini-gene experiments testing the ability of rs5883 and rs9930761 to affect splicing of CETP to its 9-CETP isoform. The results indicate that an effect on splicing is largely attributable to rs5883 rather than rs9930761. The latter also does not appear to be necessary for the effect of rs5883, and in HEK293 cells, even diminishes the effect of rs5883. These results also demonstrate that the effect of rs5883 on alternative splicing is not unique to liver cells, where CETP is primarily expressed, but also occurs in kidney cell lines, suggesting the regulation of splicing is not dependent on liver-specific factors. Determined with Western blots, the presence of both full length CETP and 9-CETP protein in the liver supports earlier findings that the 9-CETP protein is sufficiently stable but is not secreted from the liver as it is absent in the circulation (Inazu et al., 1992). The relative amounts of CETP and 9-CETP protein in the liver are subject to multiple factors, including the rate of transfer of full-length CETP into the circulations, expected to be slowed by higher levels of 9-CETP via formation of a heterodimer (Inazu et al., 1992). In addition, CETP and 9-CETP could be subject to differential degradation rates, confounding the interpretation of relative amounts of CETP versus 9- CETP. Whether 9-CETP has independent intracellular physiological functions remains to be determined. Further evaluation of the mechanism by which rs5883 influences alternative splicing may help to inform our understanding of CETP regulation and its potential role in CAD risk and treatment response. 36

CETP variants rs5883 and rs9930761 are in high linkage disequilibrium (LD) with each other (D =1.0) while rs9930761 is slightly more abundant (MAF 6% versus 5%) (Papp et al., 2012). Therefore, the haplotype allele carrying only the minor C allele of rs9930761 is present at a MAF of ~1%, whereas the haplotype carrying only the minor T allele of rs5883 was not detected in vivo (but was tested here in vitro) and the haplotype with both minor alleles occurs at ~5%. Therefore, in vivo associations observed with rs5883 are not independent of any effects of rs9930761. We cannot rule out such combined effect, but the results presented here indicate that rs5883 has the main effect and support the use of only rs5883 in clinical association studies. These in vitro results are consistent with previous association studies, showing that rs5883 is more strongly associated with HDL levels and risk of CAD in hypertensive patients than rs9930761. Despite a slightly lower MAF, rs5883 had shown greater significance in association and effect size with HDL-C levels and adverse CAD outcomes than rs9930761 [9], supporting the conclusion that rs5883 is the main active variant (Marmot et al., 1991; Papp et al., 2012; Pepine et al., 2003). As CETP levels may be optimized for maintaining sufficient HDL levels while also enabling reverse cholesterol transport, robust genetic effects on CETP activity are critical for assessing its clinical relevance, in view of ongoing developments of CETP inhibitors. Also, an interaction of CETP splicing with statin therapy requires further study. The clinical relevance of alterative splicing to 9-CETP mrna on CETP activity remains to be studied further. If the alternatively spliced isoform truly acts in a dominant-negative manner, as suggested previously (Inazu et al., 1992), CETP activity 37

could decrease considerably with increased rates of alterative splicing in the presence of rs5883, in particular in homozygous carriers of the T allele. As clinical trials of CETP inhibitors progress, it may be important to analyze results with this genetic variant in mind as the more than 10% of the population heterozygous for rs5883, and ~0.25% homozygous, already show reduced CETP function. In some ethnic groups, including those of African descent, rs5883 MAF is higher (7-11%), enhancing its potential clinical significance. In addition, the interaction between rs5883 and CETP enhancer variants needs to be considered. 38

Chapter 3: Regulation of CETP Expression by Upstream Polymorphisms 3.1 Introduction Cholesteryl Ester Transfer Protein (CETP) is involved in reverse cholesterol transport from the periphery to the liver where it is converted to bile acids and excreted. Expressed highly in spleen and liver (Drayna et al., 1987; Lonsdale et al., 2013; Su et al., 2004), CETP is secreted into the circulation (Hesler et al., 1987). There, CETP facilitates the transfer of cholesteryl esters from high-density lipoprotein (HDL) particles to lowdensity lipoprotein (LDL) particles, in exchange for triglycerides (Hesler et al., 1987). Increased CETP activity reduces the HDL/total cholesterol ratio, which is associated with an increased risk for coronary artery disease (CAD) (Gordon et al., 1989; Murray et al., 2006; Roger et al., 2012). Genetic CETP deficiency resulting in increased HDL cholesterol and decreased LDL cholesterol in Japanese populations, causes various cardiovascular abnormalities, and was reported to be anti-atherogenic (Inazu et al., 1990) or atherogenic (Nagano et al., 2004). Targeting CETP for inhibition has been a goal of pharmaceuticals; however, initial clinical results have shown no benefit or enhanced risk (Barter and Caulfield, 2007; Schwartz et al., 2012). Off-target effects of CETP inhibitors are suspected to play a role, but genetic CETP variants may also affect disease risk and treatment outcomes. CETP variants have been associated with HDL levels; however, observed effects on 39

CAD risk are weak (Boekholdt et al., 2005). The minor allele of the marker Taq1B (Taq1BB) has been associated with reduced CETP activity and reduced benefit from statin therapy, suggesting a possible gene-drug interaction (Kuivenhoven, 1998; Regieli et al., 2008). Regulation of CETP expression has since been shown to be influenced significantly by SNPs 5-10kb upstream of the gene through GWAS analysis and molecular genetics studies (Marmot et al., 1991; Papp et al., 2012). Multiple SNPs in this region are in high LD with each other on a long haplotype block stretching across the upstream region into the 5' portion of CETP and including Taq1B (Papp et al., 2012). The upstream enhancer SNPs (rs247616, rs173539, and rs3764261), in partial LD with Taq1B, were shown to be highly associated with CETP mrna expression in the liver, and with reduced CETP activity and increased HDL cholesterol levels (Barber et al., 2010; Papp et al., 2012; Suhy et al., 2014). Subsequently, Leusink et al. found that CETP enhancer variant rs3764261, in partial LD with Taq1B (R 2 =0.442, D =0.886), and in high LD with rs247616 (R 2 =1, D =1), also associates with lower efficacy of statins in preventing CAD (Leusink et al., 2013). Previous studies had focused on non-synonymous single nucleotide polymorphisms (SNPs) in the coding region (such as I405V [rs5882]) and on variants within the proximal promoter region up to 1kb from the transcription start site (Corbex et al., 2000; Frisdal et al., 2005), which also reside on the large 5 haplotype block with upstream enhancer variants. In addition, a splicing SNP (rs5883), generating an mrna lacking exon 9 and a protein devoid of cholesteryl ester transfer activity, had been 40

associated with reduced CETP activity and increased HDL cholesterol levels (Barber et al., 2010; Papp et al., 2012; Suhy et al., 2014). Genetic studies have also been carried out in transgenic mice using human or simian CETP (Escolà-Gil et al., 2001; Foger et al., 1999; Gautier et al., 2013; Harder et al., 2007; Honzumi et al., 2010; MacLean and Vadlamudi, 2000; Marotti and Castle, 1992). However, the inserted CETP regions did not include the enhancer regions located upstream of CETP, and mice do not express CETP. Therefore, further study is required to understand the complex regulation of CETP in humans. By measuring allelic expression imbalance (AEI) in human livers and associating it with genotype, we have previously identified rs247616 as the most likely variant responsible for reduced CETP mrna expression (Johnson et al., 2008; Papp et al., 2012). However, other promoter/enhancer variants could also have been causing or contributing to CETP transcription, and CETP is likely to harbor multiple regulatory variants. The current study aimed to determine the role of rs247616 in regulation of CETP mrna expression, and distinguish whether it is solely responsible for the observed AEI, is in strong linkage disequilibrium (LD) with the functional SNP, or acts in concert with additional functional variants. We use transcription factor binding site prediction tools to inform our search, and a reporter gene assay to assess the potential function of the SNP. Given the pervasive effect of CETP on HDL and LDL levels, a better understanding of the molecular mechanisms underlying genetic effects is critical to resolving the relationship between CETP variants, lipid levels, and cardiovascular disease. 41

3.2 Material and Methods Identifying rs247616 linkage structure: Genotypes from the 1000 Genomes project (1000genomes.org) were analyzed using PLINK (Harvard). Variants up to 30 megabases upstream and downstream of rs247616 were assessed for linkage disequilibrium to rs247616 in Caucasians. Sequencing: Eight liver DNA samples were selected for sequencing to identify additional candidate SNPs that may associate with AEI in CETP. Selected samples were heterozygous for rs247616, 5 samples showed AEI and 3 samples showed no AEI, based on a cutoff of a 1.3:1 ratio. The promoter/enhancer region was cloned by PCR and prepared for sequencing using the Ion Torrent Personal Genome Machine (LifeTechnologies). DNA from each of the 8 samples were tagged with a unique barcode for differentiation during analysis. I used CLC bio software (CLC bio, USA) to run SNP calling on the output sequence. Association with Expression in tissues: We measured the association between each SNP and CETP expression in liver, spleen and across 53 tissue types using genotypes and RNA-seq expression data from the GTEx database. Genotype data was accessed through dbgap project 5358 under IRB 42

protocol 2013H0096. Significance of association was determined using the student s t- test. Genotyping: SNPs in partial LD with rs247616 that had not been tested previously were assessed for association with AEI. rs72786786 and rs1800775 were genotyped in liver tissues previously assayed for AEI by GC-clamp (Sheffield et al., 1989). Association with AEI was then determined using a t-test to compare the means of heterozygous and homozygous samples. Transcription factor binding site prediction: MatInspector (Genomatix, Germany) was used to predict transcription factor binding sites lost or gained by the minor allele of SNPs found by sequencing. Fifty basepair regions centered on each SNP were submitted with the minor and major alleles, and differences were recorded. The difference between the matrix similarity (a weighted score of the match of the inputted sequence to the defined matrix) and the optimized matrix similarity (a threshold defined for each matrix) was used to determine the quality of the match. Positive/negative scores indicate that the allele has a matrix similarity greater/lesser than the optimized matrix similarity, suggesting gain/loss of a potential new binding site for the transcription factor. Sites without differing scores between the major and minor allele, or sites with scores below the threshold for both alleles (suggesting poor 43

binding for both) were discarded leaving only sites with allelic differences in binding (Cartharius et al., 2005; Quandt et al., 1995). Coexpression analysis: Using data from The Genotype-Tissue Expression (GTEx) project (www.gtexportal.org, Broad Institute) (Lonsdale et al., 2013), we correlated the expression of CETP in liver and spleen samples with the expression of transcription factors found to be influenced by SNPs in LD with our lead candidate SNP, rs247616 (n=19). The statistical package, R (r-project.org), was used to correlate the expression data using the Kendall method (Kumari et al., 2012), and Benjamini-Hochberg (Benjamini and Hochberg, 1995) correction was used to adjust the p-values to account for multiple comparisons. Luciferase reporter assay: To assess the effect of the minor allele of a SNP on expression, we inserted the candidate enhancer region, containing either the major or minor allele, into the multiple cloning site of a pgl-4.23 minimal promoter luciferase expression vector (Promega, USA) using the infusion system (Clonetech, USA) and primers shown in Table 3. The region around the candidate SNP of approximately 400bps was amplified by PCR. The vector was transformed into XL-10 gold E. coli. Colonies were isolated and validated by genotyping for the SNP. The sequence of the remaining insert was confirmed by Sanger 44

sequencing. Three validated colonies for each insert were selected and combined for the luciferase expression assay. Table 3 Primers used in PCR and infusion cloning reactions Primer rs247616-f infusion rs247616-r infusion rs173539-f infusion rs173539-r infusion rs17231506-f infusion rs17231506-r infusion Sequence (5'-3') GCTAGCCTCGAGGATATCGACTCAACAACAGGGCCACA AGGCCAGATCTTGATATCGACAACAGAGGGACACTCTCTCTAATAAT GCTAGCCTCGAGGATATCCCTGTGGTCCCAGTTACTTAGGA AGGCCAGATCTTGATATCGCCATTTCCACTATACGGATCC GCTAGCCTCGAGGATATCCCATTATCCCCACCCTTGG AGGCCAGATCTTGATATCGGCTGGAGGAACTTCATTCATTA Each expression vector was co-transfected into HEK-293 and HepG2 cells with a renilla luciferase control vector using Lipofectamine 2000 (Life Technologies, USA) and the manufacturer s procedure. Cells were plated into 12 well plates and incubated for 24 hours. The cells were then gently removed with trypsin, and counted and diluted approximately 1 to 10. Cells were placed in wells of a white, opaque 96-well plate. The Dual-Glo Luciferase Assay System (Promega, USA) was used to measure the firefly and renilla luciferase signals. Chemiluminescence measurements were taken with a Fusion plate reader (Perkin Elmer, USA). Transfections were carried out in triplicate or greater. Each graph shows the mean of 3 transfections (n=6 for rs247616), and error bars represent the standard error of the mean. 45

3.3 Results Identifying rs247616 linkage structure: Previous work had identified rs247616, a SNP 6.2kb upstream of CETP, as the SNP most highly associated with Allelic Expression Imbalance (AEI); the minor allele confers a reduction in expression (Papp et al., 2012). To search for additional candidate regulatory variants in the rs247616 haplotype block, we first identified additional SNPs in high LD with rs247616. Using CEU (Caucasian) population data from the 1000genomes project (1000genomes.org) and PLINK (Harvard) we identified all SNPs approximately 30Mb up and downstream of rs247616, and then calculated their LD to rs247616. Thirteen SNPs have an R 2 greater than 0.77 (Table 4). An additional 3 SNPs have an R 2 less than 0.77 and greater than 0.50, of which all are more highly linked with rs708272 (Table 5), which had been shown to not be significantly associated with AEI in CETP (Papp et al., 2012). This indicates these three SNPs are unlikely to be regulatory variants affecting CETP mrna expression. To assess their biological functions, we compared previously published associations for each of the 13 SNPs with HDL levels (Table 4) (Papp et al., 2012). All tested were significantly associated with HDL, and owing to their high LD, could not be differentiated. 46

Table 4 Linkage structure of rs247616. Shaded rows indicate SNPs more strongly linked to rs708272 SNP Position MAF R 2 D' Association to HDLc 1 rs12446515 56987015 0.33 1.00 1.00 NT rs173539 56988044 0.33 1.00 1.00 4.65E-29 rs247616 56989590 0.33 1.00 1.00 7.18E-29 rs247617 56990716 0.33 1.00 1.00 1.52E-27 rs183130 56991363 0.33 1.00 1.00 4.80E-27 rs3764261 56993324 0.33 1.00 1.00 7.21E-29 rs821840 56993886 0.33 1.00 1.00 NT rs36229491 56994244 0.33 1.00 1.00 NT rs17231506 56994528 0.33 1.00 1.00 4.11E-29 rs56156922 56987369 0.33 0.98 1.00 NT rs56228609 56987765 0.31 0.93 1.00 4.23E-24 rs12149545 56993161 0.31 0.93 1.00 3.19E-28 rs200751500 57001274 0.31 0.87 0.97 NT rs72786786 56985514 0.33 0.78 0.88 NT rs7205804 57004889 0.45 0.51 0.93 5.16E-25 rs1532625 57005301 0.45 0.51 0.93 4.83E-25 rs1532624 57005479 0.45 0.51 0.93 6.35E-25 1 (Papp et al., 2012) NT Not Tested 47

Table 5 Linkage structure of rs708272. Shaded rows indicate SNPs more strongly linked to rs247616 SNP Position MAF R2 D' Association to HDLc 1 rs711752 56996211 0.45 1.00 1.00 2.31E-26 rs708272 56996288 0.45 1.00 1.00 8.77E-27 rs34620476 56996649 0.45 1.00 1.00 9.89E-24 rs34145065 56996645 0.44 0.98 1.00 NT rs12720926 56998918 0.44 0.96 1.00 5.57E-19 rs11508026 56999328 0.44 0.96 1.00 3.99E-24 rs4784741 57001216 0.44 0.96 1.00 2.39E-23 rs12444012 57001438 0.44 0.96 1.00 1.76E-23 rs200373219 57001581 0.44 0.96 1.00 NT rs7205804 57004889 0.45 0.90 0.96 5.16E-25 rs1532625 57005301 0.45 0.90 0.96 4.83E-25 rs1532624 57005479 0.45 0.90 0.96 6.35E-25 rs1800775 56995236 0.49 0.78 1.00 9.83E-25 rs3816117 56996158 0.49 0.78 1.00 2.53E-24 rs56228609 56987765 0.31 0.56 1.00 4.23E-24 rs12149545 56993161 0.31 0.56 1.00 3.19E-28 rs200751500 57001274 0.31 0.55 1.00 NT 1 (Papp et al., 2012) NT Not Tested 48

Sequencing: Ion Torrent sequencing was analyzed using CLC bio analysis software. The software was used to identify SNPs and call their genotypes for each sample. Table 6 shows each SNP that was observed in the region in the samples. The first number of the score is the sum of samples that are heterozygous in AEI positive or homozygous in AEI negative samples. The second number is the inverse. Those that do not add up to 8 did not have sufficient coverage to determine the genotype of the SNP in a given sample. As shown, a number of SNPs align as well, or better, in these samples with AEI than rs247616, providing evidence of additional candidate SNPs. Table 6 Sequenced SNP genotype counts in AEI+ vs AEI- samples SNP Genotype count rs17231506 6-1 rs247616 6-2 rs173539 6-2 rs72786786 5-2 rs12720918 3-0 rs12920974 3-0 rs34760410 3-0 rs36229787 3-0 rs3764261 3-0 Association of SNPs in partial linkage with rs247616: SNPs, rs1800775 and rs72786786, were genotyped in DNA isolated from liver tissue by GC-clamp. rs72786786 lies upstream of rs247616, in a DNase hypersensitivity cluster identified by ENCODE. This, and its LD with rs247616 (R 2 =0.78, D =0.88) 49

made it an intriguing SNP to possibly explain the variability in the AEI association with rs247616. However, genotyping by GC-clamp and a t-test revealed no significant association with AEI (p-value = 0.24) (Figure 9). Further statistical testing revealed that using a recessive model (AA vs AG + GG) the association with AEI was significant (pvalue = 3.2x10-5 ) (Figure 10). Samples with the genotype AA have the highest levels of AEI (1.8, n=3) compared to AG (1.3, n=15) and GG (1.2, n=7), indicating that the functional variant is typically homozygous in samples with the rs72786786 genotypes AG or GG. 1.8 rs72786786 association with AEI in CETP 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 GG/AA GA Figure 9 rs72786786 association with AEI. No association with AEI in CETP is observed for rs72786786 in liver tissue. Error bars indicate standard deviation. 50

rs72786786 association with AEI in CETP 2 1.5 1 0.5 0 AA GG GA Figure 10 - rs72786786 association with AEI with homozygous genotypes separated. Significant association with AEI in CETP is observed for rs72786786 homozygous minor genotype in liver tissue. Error bars indicate standard deviation. The SNP, rs1800775, has appeared as a GWAS hit, as highly associated with CAD and HDL cholesterol (Lu et al., 2013). Similarly, it showed no significant association with AEI (p-value=0.65) in the liver samples we tested Figure 11. With no significant association with either of these SNPs, our focus remains on rs247616 and SNPs in high LD with this SNP. 51

1.8 1.6 rs1800775 association with AEI in CETP 1.4 1.2 1 0.8 0.6 0.4 0.2 0 AA/CC AC Figure 11 rs1800775 association with AEI. No association with AEI in CETP is observed for rs1800775 in liver tissue. Error bars indicate standard deviation. Association of CETP variants with CETP mrna expression in human tissues: CETP is broadly expressed in tissues such as adipose, liver, breast and thyroid, with the highest expression in spleen suggesting that spleen may be a significant source of CETP, contradicting previous findings (Van Eck et al., 2007) and biogps (Figure 12, Figure 13) (Su et al., 2004; Wu et al., 2009). We used CETP expression in all sequenced liver and spleen tissues and genotyping data for rs247616, rs173539, and rs17231506 from the GTEx database. We compared the number of minor alleles of each SNP to the relative expression of CETP in liver and spleen (Figure 14). Using the student s t-test, 52

we found significantly lower expression of CETP associated with the minor allele of each SNP in spleen (p=0.008), which displayed the highest CETP mrna expression, indicating that this haplotype block is important for regulation of CETP expression. We observed slightly lower average expression in liver (520 counts versus 396 counts for 0 versus 2 alleles), but this did not reach significance (p=0.63). When analyzed together, all tissues sequenced by GTEx showed a significant association (p=0.024). rs247616 was not associated with CETP expression in any other single tissue, thus the effect observed in all tissues is attributable to the significant association in spleen. In view of our targeted analyses on allele-selective CETP mrna expression (Papp et al., 2012), it is apparent that rs247616 is associated with hepatic expression, but at a level undetectable in GTEX expression data as an expression quantitative trait locus (eqtl). These results suggest that overall, the minor allele reduces expression in most or all tissues; however, the magnitude of the effect may differ between tissues. Because of the high expression and significant eqtl values in the spleen, we subsequently focused on differences in transcription factor expression in both liver and spleen. 53

Figure 12 CETP expression in GTEx samples. Count of RNA-seq reads for CETP in tissues. (www.gtexportal.org/home/gene/cetp) 54

54 Figure 13 - CETP expression reported from biogps (http://biogps.org/#goto=genereport&id=1071) 55

55 Figure 14 mrna expression of CETP in tissues. CETP expression in Liver (left) and spleen (right), as a function of rs247616 genotype, obtained from GTEx (www.gtexportal.org). Each bar represents the expression of CETP for 0, 1, or 2 copies of the minor allele of rs247616. CETP levels in spleen are significantly lower in carriers of the minor alleles of rs247616, with similar data obtained with rs175539, and rs17231506, because of high LD between them. No significant effect is observed in other tissues analyzed separately, such as liver, showing lower CETP expression and less genotype influence. Nevertheless, rs247616 is significant across all tissues in GTEx combined (p-value=0.024; data not shown). 56

Transcription Factor Binding Site prediction: To determine whether these SNPs have a functional role, we assessed potential interactions with transcription factors. Sequence surrounding each SNP in LD with rs247616 and an R 2 > 0.77 was submitted in pairs with the major and minor allele to MatInspector (Genomatix, Germany) to analyze lost or gained transcription factor binding sites. Using GTEx expression data, we identified transcription factors that are expressed in liver. Our findings reveal that many of the SNPs analyzed lie within a putative transcription factor binding site for transcription factors that are expressed in the liver, and modify the predicted ability of the transcription factor to bind (Table 7). Ten SNPs in high LD with rs247616 produce changes in the putative transcription factor binding sites in which the SNPs reside. These SNPs are found between 10.3kb upstream and 5.4kb downstream of the CETP transcriptional start site. Three SNPs cause a large increase (rs17231506, rs173539) or decrease (rs247616) in the matrix similarity score of transcription factors that are highly expressed in liver with reads per kilobase per million reads sequenced (RPKM) values greater than 25. 57

57 Table 7 SNPs in high Linkage Disequilibrium with rs247616 that alter a putative transcription factor binding site for a factor that is have high RPKM values in liver. SNP LD with rs247616 (R2) Distance from CETP start rs72786786 0.78-10321 rs12446515 1.00-8820 Matrix Similarity Score (Major) Matrix Similarity Score (minor) Median Liver RPKM Median Spleen RPKM Transcription Gain or Loss Factor 0.055 0.099 gain SRF 7.6 20.4 ND 0.029 gain LMO2 1.4 18.6-0.104 0.017 gain RREB1 6.6 9.1-0.134 0.027 gain ZFX 2.4 4.3 rs56228609 0.93-8070 -0.018 0.053 gain NR1D1 9.5 8.5 rs173539 1.00-7791 ND 1 0.06 gain RARA/RXRA 10.2/25.7 22.7/16.4 rs247616 1.00-6245 rs247617 1.00-5119 rs183130 1.00-4472 rs36229491 1.00-1591 0.014-0.075 loss YBX1 75.3 168 0.016-0.019 loss CEBPA 54.4 9.9-0.128 0.044 gain LIN54 1.2 1.6 0.04 0.076 gain NR1I2/RXRA 21.4/25.7 0.18/16.4-0.027 0.028 gain TP53 5 17-0.021 0.037 gain HMGA1 5.5 46.2 0.067-0.097 loss NFKB1 4.9 13.5 0.026 0.001 loss DEAF1 2.9 7.6 0.012 0.006 loss HNF1A 4.5 0 ND 0.015 gain ZFHX3 1 1.2-0.036 0.059 gain HNF4A 41.6 0 rs17231506 1.00-1307 0.047-0.067 loss DEAF1 2.9 7.6 rs200751500 0.87 5439 0.013 0.03 gain KLF4 2.1 26.4 1- ND-Not detected 58

rs247616 strongly alters the putative binding sites of both Y-box binding protein 1 (YBX1) and CCAAT/enhancer binding protein alpha (CEBPA). Both transcription factors are highly expressed in the GTEx sequenced livers with median RPKM values of 75 and 54, respectively. The matrix similarity scores decrease from a favorable binding prediction of 0.014 and 0.016 with the major allele, to unfavorable scores of -0.075 and -.019, respectively, with the minor allele, indicating a change from potentially favorable binding to strongly unfavorable binding. YBX1 is highly expressed in many tissues, including liver and spleen, as shown in Figure 15. CEBPA expression in liver is highly variable (SD=56 RPKM), but expression is generally higher than in spleen. The minor allele of rs17231506 increases the matrix similarity score for Hepatocyte nuclear factor 4-alpha (HNF4A) from -0.036 to 0.059, indicating a change from an unfavorable binding site to a favorable one. HNF4A is also highly expressed in livers sequenced by the GTEx consortium, and has a median RPKM value of 42 indicating relatively high expression in liver. HNF4A expression was undetectable in spleen (Figure 15). While HepG2 is a hepatic cell line, and does express HNF4A, it is possible that it is not as well expressed as in hepatic tissues, and could explain a lack of effect from rs17231506. The minor allele of rs173539 further causes an increase in the predicted matrix similarity score for the retinoid X receptor-alpha/retinoic acid receptor-alpha (RXRA/RARA) heterodimer. RARA and RXRA are well expressed in the liver and spleen, with median RPKM values of 10 and 26, respectively, in liver, and 23 and 16, respectively, in spleen (Figure 15). The matrix similarity is below the reported threshold 59

when tested with the major allele, but provides a positive, favorable score of 0.06 for the minor allele. GTEx RNA-seq shows that multiple isoforms exist for RXRA; the expression of the first exon of the only protein coding isoform, which is also the only exon exclusive to the transcript (ensemble.org; ENST00000481739), is expressed 2.4 times more frequently in liver than spleen, indicating that the expression difference of RXRA between the tissues is greater than the difference in total expression would imply. 60

60 Figure 15 Expression of top 5 transcription factors. Expression is plotted on a log10 axis, expression in liver is shaded in gray. HNF4A shows no expression in spleen. 61

Coexpression analysis between CETP and transcription factors: We compared the expression levels of 19 transcription factors found to potentially interact with our candidate SNPs (Table 7) with CETP in livers (n=35), and spleens (n=35) sequenced by the GTEx Consortium. Using the R statistical package, we correlated the expression levels of transcription factors by the Kendall tau rank correlation coefficient method to identify concordant and discordant pairs. After adjusting the p-values with the Benjamini-Hochberg correction, no transcription factors showed statistically significant correlation with CETP expression. This result indicates that no single transcription factor alone had sufficient influence to alter hepatic or splenic CETP expression in this group with high significance. A larger set of samples would be required to explore transcription factor interactions. Luciferase reporter assay: The three SNPs listed above show the strongest indication of playing a functional role in CETP regulation in the liver by interacting with well-expressed transcription factors. We used a reporter assay to examine the effects of rs247616, rs173539, and rs17231506 on expression of a luciferase reporter gene. The results show that rs247616 and rs173539 significantly altered the luciferase expression in both HEK-293 and HepG2. The minor allele of rs247616 produced a 1.7-fold decrease in expression of the reporter compared to the major allele in both cell lines (p<0.001) (Figure 16A). The minor allele of rs173539 induced an over twofold increase in expression over the major 62

allele, again in both cell lines (p=0.003-0.005) (Figure 16B). No significant difference in expression was observed between the major and minor alleles of rs17231506 in HepG2 cells (Figure 16C), with a small reduction observed in HEK-293 cells (p=0.05). 63

63 Figure 16 The effect of expression of the luciferase reporter gene on HEK-293 and HepG2 cells. Signals are normalized to the wildtype allele and represent the ratio of the experimental firefly luciferase signal to the renilla luciferase control. A) Significant decrease in expression due to the minor allele of rs247616 in HEK-293 and HepG2 cells, B) significant increase in expression due to the minor allele of rs173539 in HEK-293 and HepG2 cells, C) small but significant decrease in expression due to the minor allele of rs17231506 in HEK-293 cells but no significant effect in HepG2 cells. 64