Statistical Genetics

Similar documents
The laws of Heredity. Allele: is the copy (or a version) of the gene that control the same characteristics.

Ch 8 Practice Questions

Labrador Coat Color Similar to coat color in mice: Black lab is BxEx Yellow lab is xxee Chocolate lab is bbex Probable pathway:

Genetics and Heredity Notes

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder

Meiotic Mistakes and Abnormalities Learning Outcomes

UNIT III (Notes) : Genetics : Mendelian. (MHR Biology p ) Traits are distinguishing characteristics that make a unique individual.

Mendelian Genetics. Biology 3201 Unit 3

MENDELIAN GENETICS. Punnet Squares and Pea Plants

Mendel s Law of Heredity. Page 254

Mendel and Heredity. Chapter 12

Biology. Chapter 13. Observing Patterns in Inherited Traits. Concepts and Applications 9e Starr Evers Starr. Cengage Learning 2015

Extra Review Practice Biology Test Genetics

Chapter 02 Mendelian Inheritance

GENETIC LINKAGE ANALYSIS

Model of an F 1 and F 2 generation

Mendel and Heredity. Chapter 12

I. Classical Genetics. 1. What makes these parakeets so varied in color?

3. What law of heredity explains that traits, like texture and color, are inherited independently of each other?

Genetics All somatic cells contain 23 pairs of chromosomes 22 pairs of autosomes 1 pair of sex chromosomes Genes contained in each pair of chromosomes

Unit 7 Section 2 and 3

What we mean more precisely is that this gene controls the difference in seed form between the round and wrinkled strains that Mendel worked with

Name Class Date. KEY CONCEPT The chromosomes on which genes are located can affect the expression of traits.

Genetics. *** Reading Packet

Lab 5: Testing Hypotheses about Patterns of Inheritance

Chapter 12 Multiple Choice

Dan Koller, Ph.D. Medical and Molecular Genetics

Pre-AP Biology Unit 7 Genetics Review Outline

Genetics: field of biology that studies heredity, or the passing of traits from parents to offspring Trait: an inherited characteristic, such as eye

Genetic Variation Junior Science

Chapter 10 Notes Patterns of Inheritance, Part 1

Genetics. Genetics. True or False. Genetics Vocabulary. Chapter 5. Objectives. Heredity

Genes and Inheritance (11-12)

Genetics. by their offspring. The study of the inheritance of traits is called.

The Biology and Genetics of Cells and Organisms The Biology of Cancer

Genetics Practice Test

GENETICS NOTES. Chapters 12, 13, 14, 15 16

The Law of Segregation Introduction Today, we know that many of people's characteristics, from hair color to height to risk of diabetes, are

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 6 Patterns of Inheritance

UNIT 1-History of life on earth! Big picture biodiversity-major lineages, Prokaryotes, Eukaryotes-Evolution of Meiosis

MENDELIAN GENETICS. MENDEL RULE AND LAWS Please read and make sure you understand the following instructions and knowledge before you go on.

8.1 Genes Are Particulate and Are Inherited According to Mendel s Laws 8.2 Alleles and Genes Interact to Produce Phenotypes 8.3 Genes Are Carried on

Genetics. the of an organism. The traits of that organism can then be passed on to, on

Chapter 11. Introduction to Genetics

Beyond Mendel s Laws. Incomplete Dominance Co-dominance and Multiple Alleles

Genetics & Heredity 11/16/2017

Family Trees for all grades. Learning Objectives. Materials, Resources, and Preparation

Mendelian Genetics. KEY CONCEPT Mendel s research showed that traits are inherited as discrete units.

Mendelian Inheritance. Jurg Ott Columbia and Rockefeller Universities New York

MULTIFACTORIAL DISEASES. MG L-10 July 7 th 2014

Patterns of Heredity Genetics

Gregor Mendel. What is Genetics? the study of heredity

Heredity and Meiosis AP BIOLOGY. Heredity. Slide 1 / 142 Slide 2 / 142. Slide 4 / 142. Slide 3 / 142. Slide 6 / 142. Slide 5 / 142.

Diploma in Equine Science

For more information about how to cite these materials visit

Mendelian Genetics. 7.3 Gene Linkage and Mapping Genes can be mapped to specific locations on chromosomes.

Unit 6.2: Mendelian Inheritance

Guided Reading and Study. Definition a. The scientific study of heredity. b. Physical characteristics

Unit 5: Genetics Notes

What creates variation in the offspring of sexually reproducing organisms?

HEREDITY = The passing of traits from parents to offspring. Transmitted by means of information stored in molecules of DNA.

HEREDITY. Heredity is the transmission of particular characteristics from parent to offspring.

UNIT 6 GENETICS 12/30/16

Genetics. The study of heredity. Father of Genetics: Gregor Mendel (mid 1800 s) Developed set of laws that explain how heredity works

2 Traits and Inheritance

Genetics, Analysis & Principles/5e ANSWERS TO PROBLEM SETS CHAPTER 1

MENDELIAN GENETICS. Law of Dominance: Law of Segregation: GAMETE FORMATION Parents and Possible Gametes: Gregory Mendel:

Chapter 11 introduction to genetics 11.1 The work of Gregor mendel

Laws of Inheritance. Bởi: OpenStaxCollege

Introduction to Genetics and Heredity

Mendel: Understanding Inheritance. 7 th Grade Science Unit 4 NCFE Review

Name Period. Keystone Vocabulary: genetics fertilization trait hybrid gene allele Principle of dominance segregation gamete probability

Genes and Inheritance

Unit 5: Genetics Guided Notes

AS90163 Biology Describe the transfer of genetic information Part 1 - DNA structure & Cell division

Ch 10 Genetics Mendelian and Post-Medelian Teacher Version.notebook. October 20, * Trait- a character/gene. self-pollination or crosspollination

Laboratory. Mendelian Genetics

Genetics PPT Part 1 Biology-Mrs. Flannery

Example: Colour in snapdragons

Introduction to Genetics

Ch 4: Mendel and Modern evolutionary theory

Mendelian Genetics. Activity. Part I: Introduction. Instructions

CHAPTER- 05 PRINCIPLES OF INHERITANCE AND VARIATION

Genetics Test- Mendel, Probablility and Heredity

Mendelism: the Basic principles of Inheritance

Lecture 13: May 24, 2004

Gregor Mendel and Genetics Worksheets

Two copies of each autosomal gene affect phenotype.

Patterns of Heredity - Genetics - Sections: 10.2, 11.1, 11.2, & 11.3

For a long time, people have observed that offspring look like their parents.

GENETICS - CLUTCH CH.2 MENDEL'S LAWS OF INHERITANCE.

Fundamentals of Genetics

Science Olympiad Heredity

Mendelian Genetics and Beyond Chapter 4 Study Prompts

Chapter 17 Genetics Crosses:

MENDELIAN GENETIC CH Review Activity

Name Hour. Section 11-1 The Work of Gregor Mendel (pages )

Section 11 1 The Work of Gregor Mendel (pages )

Mendel rigorously followed various traits in the pea plants he bred. He analyzed

Introduction to Genetics

Transcription:

Institute of Mathematics Ecole polytechnique fédérale de Lausanne Switzerland Spring Seminar of the 3e cycle romand Diablerets, March 2007

Mendel s Experiments What is Genetics? Statistical Models G. Mendel is famous for being the first who studied the transmission of characteristics in a systematic manner.

Mendel s Experiments What is Genetics? Statistical Models G. Mendel is famous for being the first who studied the transmission of characteristics in a systematic manner. Crossing of pea plants, smooth vs wrinkled and yellow vs green

Mendel s Experiments What is Genetics? Statistical Models G. Mendel is famous for being the first who studied the transmission of characteristics in a systematic manner. Crossing of pea plants, smooth vs wrinkled and yellow vs green Crosses of pure smooth with pure wrinkled produced F 1 hybrids that were all smooth

Mendel s Experiments What is Genetics? Statistical Models G. Mendel is famous for being the first who studied the transmission of characteristics in a systematic manner. Crossing of pea plants, smooth vs wrinkled and yellow vs green Crosses of pure smooth with pure wrinkled produced F 1 hybrids that were all smooth Crossing plants from F 1 produced both smooth and wrinkled in proportions 75% to 25%.

Mendel s Experiments What is Genetics? Statistical Models G. Mendel is famous for being the first who studied the transmission of characteristics in a systematic manner. Crossing of pea plants, smooth vs wrinkled and yellow vs green Crosses of pure smooth with pure wrinkled produced F 1 hybrids that were all smooth Crossing plants from F 1 produced both smooth and wrinkled in proportions 75% to 25%. Explanation: stipulate the existence of factors (A for smooth and a for wrinkled ). Do self-crosses to determine in what proportion A and a are present in any given plant.

Mendel s Experiments What is Genetics? Statistical Models G. Mendel is famous for being the first who studied the transmission of characteristics in a systematic manner. Crossing of pea plants, smooth vs wrinkled and yellow vs green Crosses of pure smooth with pure wrinkled produced F 1 hybrids that were all smooth Crossing plants from F 1 produced both smooth and wrinkled in proportions 75% to 25%. Explanation: stipulate the existence of factors (A for smooth and a for wrinkled ). Do self-crosses to determine in what proportion A and a are present in any given plant. Result: three different types of plants in F 2.

Mendel s Experiments What is Genetics? Statistical Models

Heredity Genetics What is Genetics? Statistical Models 1 Genetics is the study of heredity, the transmission of characteristics from parents to offspring. It is concerned with the biological materials transmitted and the modes of inheritance.

Heredity Genetics What is Genetics? Statistical Models 1 Genetics is the study of heredity, the transmission of characteristics from parents to offspring. It is concerned with the biological materials transmitted and the modes of inheritance. 2 Some of the areas of study are: What is genetic material? How is it formed, transmitted, and changed? How is it organized and how does it function? What happens to it among groups of organisms as time passes?

DNA Chromosomes, genes, double helix What is Genetics? Statistical Models 1 DNA is the genetic material. It is stored in units called chromosomes, which in turn are subdivided into genes. In its natural form, DNA occurs in a double-stranded helical polymere of bases.

DNA Chromosomes, genes, double helix What is Genetics? Statistical Models 1 DNA is the genetic material. It is stored in units called chromosomes, which in turn are subdivided into genes. In its natural form, DNA occurs in a double-stranded helical polymere of bases. 2 There are four bases (A, T, G, and C). The double helix structure is held together by a chemical bond between complementary bases. Because of this one says that DNA consists of base pairs. The complementary pairings are A-T and G-C.

DNA Chromosomes, genes, double helix What is Genetics? Statistical Models 1 DNA is the genetic material. It is stored in units called chromosomes, which in turn are subdivided into genes. In its natural form, DNA occurs in a double-stranded helical polymere of bases. 2 There are four bases (A, T, G, and C). The double helix structure is held together by a chemical bond between complementary bases. Because of this one says that DNA consists of base pairs. The complementary pairings are A-T and G-C. 3 In higher organisms, each descendent inherents two copies of each gene, one from each parent. We humans receive 23 chromosomes from each of our parents.

Research Milestones What is Genetics? Statistical Models 1 Ch. Darwin (1859): (On the origin of species by means of natural selection

Research Milestones What is Genetics? Statistical Models 1 Ch. Darwin (1859): (On the origin of species by means of natural selection 2 G. Mendel (1865): Versuche über Pflanzen-Hybriden

Research Milestones What is Genetics? Statistical Models 1 Ch. Darwin (1859): (On the origin of species by means of natural selection 2 G. Mendel (1865): Versuche über Pflanzen-Hybriden 3 J. F. Miescher (1869): extraction of DNA from white blood cells

Research Milestones What is Genetics? Statistical Models 1 Ch. Darwin (1859): (On the origin of species by means of natural selection 2 G. Mendel (1865): Versuche über Pflanzen-Hybriden 3 J. F. Miescher (1869): extraction of DNA from white blood cells 4 Th. Boveri (1888): observation of the behavior of chromosomes during cell division

Research Milestones What is Genetics? Statistical Models 1 Ch. Darwin (1859): (On the origin of species by means of natural selection 2 G. Mendel (1865): Versuche über Pflanzen-Hybriden 3 J. F. Miescher (1869): extraction of DNA from white blood cells 4 Th. Boveri (1888): observation of the behavior of chromosomes during cell division 5 Th. Boveri & W. Sutton (1902): proof that genes reside on chromosomes

Research Milestones What is Genetics? Statistical Models 1 Ch. Darwin (1859): (On the origin of species by means of natural selection 2 G. Mendel (1865): Versuche über Pflanzen-Hybriden 3 J. F. Miescher (1869): extraction of DNA from white blood cells 4 Th. Boveri (1888): observation of the behavior of chromosomes during cell division 5 Th. Boveri & W. Sutton (1902): proof that genes reside on chromosomes 6 F.H.C. Crick & J.D. Watson (1953): discovery of the double-helical structure of DNA

DNA Alleles, genotype, phenotype Genetics What is Genetics? Statistical Models 1 A single copy of a gene is called an allele. The genotype of an individual is formed by his or her two alleles.

DNA Alleles, genotype, phenotype Genetics What is Genetics? Statistical Models 1 A single copy of a gene is called an allele. The genotype of an individual is formed by his or her two alleles. 2 The common blood groups for example are based on the genotypes of a single gene with three different possible alleles: A, B, and O.

DNA Alleles, genotype, phenotype Genetics What is Genetics? Statistical Models 1 A single copy of a gene is called an allele. The genotype of an individual is formed by his or her two alleles. 2 The common blood groups for example are based on the genotypes of a single gene with three different possible alleles: A, B, and O. 3 The genotype determines a range of characteristics, such as eye and hair color, but also vulnerability to disease. Such observable features are called phenotypes. The blood groups A, B, AB, and O are an example of a phenotype. The underlying genotypes are AA and AO for A, BB and BO for B, AB for AB and OO for O.

Genome Genetic variation, mutations Genetics What is Genetics? Statistical Models 1 The genome of humans consists of about 3 10 9 base pairs, of which 99.9% are exactly the same for all of us. About 1.5 10 6 of the base pairs show variation between individuals (polymorphic, polymorphisms). This is enough to cause a sizable genetic variation in the human population.

Genome Genetic variation, mutations Genetics What is Genetics? Statistical Models 1 The genome of humans consists of about 3 10 9 base pairs, of which 99.9% are exactly the same for all of us. About 1.5 10 6 of the base pairs show variation between individuals (polymorphic, polymorphisms). This is enough to cause a sizable genetic variation in the human population. 2 Stochastic processes are an integral part of genetics. In any biological population, the genetic composition of the descendents is a result of random events such as mating, recombination, mutations, selection, and so on.

What is Genetics? Statistical Models Population Genetics, Genetic Studies 1 Population genetics is the study of the preservation and the generation of genetic variability in populations. Its main tools are clever experimentation and observation, stochastic models and statistics.

What is Genetics? Statistical Models Population Genetics, Genetic Studies 1 Population genetics is the study of the preservation and the generation of genetic variability in populations. Its main tools are clever experimentation and observation, stochastic models and statistics. 2 It is widely believed that a lot of the phenotypic variation observed in any population has genetic causes. Genetic studies are designed to identify the gene or the group of genes causing some phenotype.

A Cellular Disease Multiple hits model Genetics Simple Models of Carcinogenesis Example More Sophisticated Models 1 is a disease of cells, in which a normal stem cell is transformed into a cancerous cell that exhibits growth similar to an embryonic cell.

A Cellular Disease Multiple hits model Genetics Simple Models of Carcinogenesis Example More Sophisticated Models 1 is a disease of cells, in which a normal stem cell is transformed into a cancerous cell that exhibits growth similar to an embryonic cell. 2 Such a transformation is only possible if the genetic material is altered. This has been demonstrated in the sense that gamma radiation or the contact with certain chemical substances can cause cancer.

A Cellular Disease Multiple hits model Genetics Simple Models of Carcinogenesis Example More Sophisticated Models 1 is a disease of cells, in which a normal stem cell is transformed into a cancerous cell that exhibits growth similar to an embryonic cell. 2 Such a transformation is only possible if the genetic material is altered. This has been demonstrated in the sense that gamma radiation or the contact with certain chemical substances can cause cancer. 3 A simple model can be based on the idea of a continuous stream of injuries to a cell, some of which may cause harm to the genetic material (a hit).

The One-Hit Model Genetics Simple Models of Carcinogenesis Example More Sophisticated Models If S(t) = P(no hit up to age t) denotes the survival function, the number of cells under attack is N, and the intensity of hits per unit time is λ, we have The solution is S(t + dt) = S(t)(1 Nλ dt + o(dt)). S(t) = exp( Nλt). Biologically reasonable values for the parameters are N = 10 9 /256 and λ = 10 6 /2 per gene and per year.

Multiple (m) Hits Genetics Simple Models of Carcinogenesis Example More Sophisticated Models 1 If m hits are necessary to transform a cell into the cancerous state, the following expression is obtained S(t) = exp ( λ m Nt m /m!).

Multiple (m) Hits Genetics Simple Models of Carcinogenesis Example More Sophisticated Models 1 If m hits are necessary to transform a cell into the cancerous state, the following expression is obtained S(t) = exp ( λ m Nt m /m!). 2 The age-dependent incidence curve for this model is λ(t) = λ m Nt m 1 /(m 1)!, which on a log-log scale is linear with slope m 1.

Knock-out of a Gatekeeper Gene Two hits model Simple Models of Carcinogenesis Example More Sophisticated Models 1 Suppose there is a gatekeeper gene with alleles + (signifying a functioning gene) and (signifying a knocked out version of the gene, multi-allelic).

Knock-out of a Gatekeeper Gene Two hits model Simple Models of Carcinogenesis Example More Sophisticated Models 1 Suppose there is a gatekeeper gene with alleles + (signifying a functioning gene) and (signifying a knocked out version of the gene, multi-allelic). 2 If you are born +/+ and subject to a stream of knock-out hits at rate λ, one can show that the number of intermediate +/ stem cells at age t is equal to I(t) = λnt.

Knock-out of a Gatekeeper Gene Two hits model Simple Models of Carcinogenesis Example More Sophisticated Models 1 Suppose there is a gatekeeper gene with alleles + (signifying a functioning gene) and (signifying a knocked out version of the gene, multi-allelic). 2 If you are born +/+ and subject to a stream of knock-out hits at rate λ, one can show that the number of intermediate +/ stem cells at age t is equal to I(t) = λnt. 3 The survival function is of the Weibull form ( ) S(t) = exp λ 2 Nt 2 /2!.

Knock-out of a Gatekeeper Gene Two hits model Simple Models of Carcinogenesis Example More Sophisticated Models 1 Suppose there is a gatekeeper gene with alleles + (signifying a functioning gene) and (signifying a knocked out version of the gene, multi-allelic). 2 If you are born +/+ and subject to a stream of knock-out hits at rate λ, one can show that the number of intermediate +/ stem cells at age t is equal to I(t) = λnt. 3 The survival function is of the Weibull form ( ) S(t) = exp λ 2 Nt 2 /2!. 4 The age-dependent incidence curve for this model is λ(t) = λ 2 Nt.

Mortality in the U.S. Simple Models of Carcinogenesis Example More Sophisticated Models

Multiple-Hits Models Why these models are too simplistic Simple Models of Carcinogenesis Example More Sophisticated Models 1 For many cancers in human populations, the log-log plot suggest values around m = 4. However, for realistic mutation rates the term λ m would lead to very small incidence rates.

Multiple-Hits Models Why these models are too simplistic Simple Models of Carcinogenesis Example More Sophisticated Models 1 For many cancers in human populations, the log-log plot suggest values around m = 4. However, for realistic mutation rates the term λ m would lead to very small incidence rates. 2 Multi-hit models model the incidences up to age 50, but for higher ages they fail.

Multiple-Hits Models Why these models are too simplistic Simple Models of Carcinogenesis Example More Sophisticated Models 1 For many cancers in human populations, the log-log plot suggest values around m = 4. However, for realistic mutation rates the term λ m would lead to very small incidence rates. 2 Multi-hit models model the incidences up to age 50, but for higher ages they fail. 3 Proto-oncogenes (lead to cancer when mutated) and tumor suppressor-genes (control cell growth) seem to confirm the multi-hit model. But cancers in humans do not appear to follow the suggested pathway.

Multiple-Hits Models Why these models are too simplistic Simple Models of Carcinogenesis Example More Sophisticated Models 1 Pre-cancerous cells, distinguishable from normal cells, exist in many cancers (polyps in colon cancer are an example). These suggest that partially hit cells may already exhibit abnormal characteristics.

Carcinogenesis Disease development in stages Genetics Simple Models of Carcinogenesis Example More Sophisticated Models 1 The simplest possible extension to multiple-hits models assumes the existence of two stages. Once a cell reaches the first stage (initiation), it has a phenotype of abnormal growth, but is not yet cancerous.

Carcinogenesis Disease development in stages Genetics Simple Models of Carcinogenesis Example More Sophisticated Models 1 The simplest possible extension to multiple-hits models assumes the existence of two stages. Once a cell reaches the first stage (initiation), it has a phenotype of abnormal growth, but is not yet cancerous. 2 Initiated cells form little colonies in which a cell can reach the second stage (promotion). Such a promoted cell can then start the tumor.

Carcinogenesis Disease development in stages Genetics Simple Models of Carcinogenesis Example More Sophisticated Models 1 The simplest possible extension to multiple-hits models assumes the existence of two stages. Once a cell reaches the first stage (initiation), it has a phenotype of abnormal growth, but is not yet cancerous. 2 Initiated cells form little colonies in which a cell can reach the second stage (promotion). Such a promoted cell can then start the tumor. 3 Initiation is usually modeled by multiple-hits, for example the knock-out of a gatekeeper gene. Promotion happens within a clonal expansion started by an initiated cell. Clonal expansions can be modeled by a birth-and-death process.

Simple Models of Carcinogenesis Example More Sophisticated Models Is a Disease of Stem Cells?

Two-Stage Carcinogenesis Simple Models of Carcinogenesis Example More Sophisticated Models The cells undergo a Poisson initiation with rate λ init (t) = c init t m 1, where c init depends on the number of cells N and mutation rates.

Two-Stage Carcinogenesis Simple Models of Carcinogenesis Example More Sophisticated Models The cells undergo a Poisson initiation with rate λ init (t) = c init t m 1, where c init depends on the number of cells N and mutation rates. Once an initated cell is created, it starts a clonal expansion with birth rate β and death rate δ < β. The size of a clone at age x + dt, conditional on C(x) satisfies C(x + dt) = C(x) +1, with probability C(x)β + o(dt) C(x) 1, with probability C(x)δ + o(dt) C(x), sinon.

Two-Stage Carcinogenesis Simple Models of Carcinogenesis Example More Sophisticated Models From the above it follows that a clone grows exponentially E(C(x)) = exp((β δ)x), but because it grows from a single cell it may die out in the early stages. The chance of survival is equal to 1 δ/β.

Two-Stage Carcinogenesis Simple Models of Carcinogenesis Example More Sophisticated Models From the above it follows that a clone grows exponentially E(C(x)) = exp((β δ)x), but because it grows from a single cell it may die out in the early stages. The chance of survival is equal to 1 δ/β. If we assume that at the end of every division of initiated cells, one tumor cell is created with probability r, one can show (see following slides) that the survival function within a clone is S clone (x) = (Cρ 1/(ρ 2 + Cρ 1 ) ( 1 e x) + e x (C/(C + 1) (1 e x ) + e x, where ρ 1 = ( β δ + )/2, ρ 2 = ( β δ )/2 and 0 < = (β δ) 2 + 4rβδ.

Simple Models of Carcinogenesis Example More Sophisticated Models Occurrance of a Neoplastic Cell in a Clonal Expansion List of possible events During a time interval the following can happen one neoplastic cell and one initiated cell are created with probability r βdx + o(dx);

Simple Models of Carcinogenesis Example More Sophisticated Models Occurrance of a Neoplastic Cell in a Clonal Expansion List of possible events During a time interval the following can happen one neoplastic cell and one initiated cell are created with probability r βdx + o(dx); two initated cells are created with probability (1 r)βdx + o(dx);

Simple Models of Carcinogenesis Example More Sophisticated Models Occurrance of a Neoplastic Cell in a Clonal Expansion List of possible events During a time interval the following can happen one neoplastic cell and one initiated cell are created with probability r βdx + o(dx); two initated cells are created with probability (1 r)βdx + o(dx); the initiated cell dies with probability δdx + o(dx);

Simple Models of Carcinogenesis Example More Sophisticated Models Occurrance of a Neoplastic Cell in a Clonal Expansion List of possible events During a time interval the following can happen one neoplastic cell and one initiated cell are created with probability r βdx + o(dx); two initated cells are created with probability (1 r)βdx + o(dx); the initiated cell dies with probability δdx + o(dx); the initated cell continues to live and nothing happens with probability (1 (β + δ)dx) + o(dx).

Clonal Survival Function Simple Models of Carcinogenesis Example More Sophisticated Models S clone(x) = (rβ dx + o(dx)) {z } a neoplastic cell appears 0 {z} probability that no tumor cell exists at clonal age x + ((1 r) β dx + o(dx)) {z } the initiated cell divides (S clone(x dx)) 2 {z } probability that if at age dx two initated cells exists neither will produce a tumor cell between dx and x

Clonal Survival Function Simple Models of Carcinogenesis Example More Sophisticated Models + (δ dx + o(dx)) {z } the intiated cell dies + (1 (β + δ)dx + o(dx)) {z } the intitated cell survives 1 {z} if the cell dies, the clonal expansion stops and will never lead to a tumor S clone(x dx) {z } probability that an initiated cell does not create a tumor between dx et x

Clonal Survival Function Simple Models of Carcinogenesis Example More Sophisticated Models ( S clone S clone (x dx) = (1 r) β + o(dx) ) Sclone 2 dx dx (x dxr)+ ( δ + o(dx) ) ( (β + δ) + o(dx) ) S clone (x dx). dx dx

Survival Within a Clone Simple Models of Carcinogenesis Example More Sophisticated Models

Two-Stage Carcinogenesis Simple Models of Carcinogenesis Example More Sophisticated Models For small values of r, one has approximately S clone (x) rβδ/(β δ)2 ( 1 e (β δ)x) + e (β δ)x rβ 2 /(β δ) 2 ( 1 e (β δ)x) + e (β δ)x. Initiation and survival in clones are combined to determine the cancer survival function ( t ) S cancer (t) = exp λ init (u) (1 S clone (t u)) du, with corresponding risk equal to λ cancer (t) = 0 t 0 λ init (u)f clone (t u)du.

Incidence Genetics Simple Models of Carcinogenesis Example More Sophisticated Models

Biomarkers Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 A biomarker is a variable useful to detect

Biomarkers Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 A biomarker is a variable useful to detect Exposure: DNA adduct, protein adduct;

Biomarkers Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 A biomarker is a variable useful to detect Exposure: DNA adduct, protein adduct; Effect: level of protein, RNA or metabolite;

Biomarkers Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 A biomarker is a variable useful to detect Exposure: DNA adduct, protein adduct; Effect: level of protein, RNA or metabolite; Susceptibility: genotype, mutation.

Biomarkers Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 A biomarker is a variable useful to detect Exposure: DNA adduct, protein adduct; Effect: level of protein, RNA or metabolite; Susceptibility: genotype, mutation. 2 Biomarkers are used, for example, as indicators for disease; drug effect; environmental damage.

Genetic Studies Genetic markers: clues to genetic susceptibility Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Many common diseases have a familial component. This fact is established through epidemiological studies that show that if someone has a first degree relative (parent or descentent) with the disease, the chance that he or she develops the disease as well is increased.

Genetic Studies Genetic markers: clues to genetic susceptibility Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Many common diseases have a familial component. This fact is established through epidemiological studies that show that if someone has a first degree relative (parent or descentent) with the disease, the chance that he or she develops the disease as well is increased. 2 These diseases include: cardiovascular diseases; many forms of cancer (colon, testicular, melanoma, etc.).

Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies Genetic Studies Familial Risk in Sweden (Colon Cases per 10 5 persons at risk)

Genetic Risk Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies The previous slide contains evidence for genetic interference in at least two different ways:

Genetic Risk Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies The previous slide contains evidence for genetic interference in at least two different ways: 1 The curve for individuals with a sick first degree relative are above the curve for the general population.

Genetic Risk Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies The previous slide contains evidence for genetic interference in at least two different ways: 1 The curve for individuals with a sick first degree relative are above the curve for the general population. 2 In all instances, the curve at high ages turns downwards, which seems to indicate that the elderly have a lower risk due to selection.

Genetic Risk Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies The previous slide contains evidence for genetic interference in at least two different ways: 1 The curve for individuals with a sick first degree relative are above the curve for the general population. 2 In all instances, the curve at high ages turns downwards, which seems to indicate that the elderly have a lower risk due to selection. 3 Genetic effects are most naturally incorporated into models by stratifying the population according to genotypes. (Frailty effects!)

Carcinogenesis A simple frailty model Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies To incorporate a frailty effect we could postulate a risk factor which, if present, would make an individual eligible to get cancer, whereas its absence would provide protection from the disease.

Carcinogenesis A simple frailty model Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies To incorporate a frailty effect we could postulate a risk factor which, if present, would make an individual eligible to get cancer, whereas its absence would provide protection from the disease. Let F > 0 be the fraction of the newborns at risk and let λ all (t) be the mortality due to all other causes confounded. Among those at risk, the mortality is equal to For those not at risk, we have λ at risk (t) = λ cancer (t) + λ all (t). λ protected (t) = λ all (t).

Carcinogenesis A simple frailty model Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies The cancer mortality observed in the general population is equal to the mortality among those at risk multiplied by the (age-dependent) fraction at risk. At birth, this is equal to F, but then changes. λ obs (t) = survivors among at risk survivors in the population λ cancer(t).

Carcinogenesis A simple frailty model Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies From the previous slide we find ( F exp ) t 0 λ at risk(u) du ( F exp ) ( t 0 λ at riks(u) du + (1 F) exp ). t 0 λ protected(u) du Finally one has λ obs (t) = λ cancer F exp ( F exp ) t 0 λ cancer(u) du ). + (1 F) ( t 0 λ cancer(u) du

Incidence Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies

Genetic Studies Genetic Causes Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Familiality is an indicator for genetic cause, that is, the existence of risk-carrying alleles.

Genetic Studies Genetic Causes Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Familiality is an indicator for genetic cause, that is, the existence of risk-carrying alleles. 2 If an individual is not a carrier, he or she may be protected from the disease, whereas being a carrier may lead to the disease (penetrance = chance to develop it).

Genetic Studies Genetic Causes Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Familiality is an indicator for genetic cause, that is, the existence of risk-carrying alleles. 2 If an individual is not a carrier, he or she may be protected from the disease, whereas being a carrier may lead to the disease (penetrance = chance to develop it). 3 Genetic studies are designed to uncover genetic causality and to identify the genes involved.

Genetic Studies Summary of different aspects to look for Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Focus of Genetic Probing:

Genetic Studies Summary of different aspects to look for Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Focus of Genetic Probing: on candidate genes

Genetic Studies Summary of different aspects to look for Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Focus of Genetic Probing: on candidate genes on large portions of the whole genome

Genetic Studies Summary of different aspects to look for Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Focus of Genetic Probing: on candidate genes on large portions of the whole genome 2 Choice of Genetic Markers:

Genetic Studies Summary of different aspects to look for Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Focus of Genetic Probing: on candidate genes on large portions of the whole genome 2 Choice of Genetic Markers: selected mutations (SNPs)

Genetic Studies Summary of different aspects to look for Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Focus of Genetic Probing: on candidate genes on large portions of the whole genome 2 Choice of Genetic Markers: selected mutations (SNPs) most mutations

Genetic Studies Summary of different aspects to look for Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Focus of Genetic Probing: on candidate genes on large portions of the whole genome 2 Choice of Genetic Markers: selected mutations (SNPs) most mutations sequence

Genetic Studies Summary of different aspects to look for Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Focus of Genetic Probing: on candidate genes on large portions of the whole genome 2 Choice of Genetic Markers: selected mutations (SNPs) most mutations sequence 3 Data Aggregation:

Genetic Studies Summary of different aspects to look for Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Focus of Genetic Probing: on candidate genes on large portions of the whole genome 2 Choice of Genetic Markers: selected mutations (SNPs) most mutations sequence 3 Data Aggregation: typing individuals

Genetic Studies Summary of different aspects to look for Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 Focus of Genetic Probing: on candidate genes on large portions of the whole genome 2 Choice of Genetic Markers: selected mutations (SNPs) most mutations sequence 3 Data Aggregation: typing individuals typing groups or pools

Genetic Studies Study Designs Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies There are various designs for genetic studies. The choice of study depends on the prior knowledge one has about the disease. The major classes are:

Genetic Studies Study Designs Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies There are various designs for genetic studies. The choice of study depends on the prior knowledge one has about the disease. The major classes are: linkage studies (using extended families);

Genetic Studies Study Designs Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies There are various designs for genetic studies. The choice of study depends on the prior knowledge one has about the disease. The major classes are: linkage studies (using extended families); linkage disequilibrium studies or gene/disease association studies (using case/control designs).

Genetic Studies Linkage Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies

Genetic Studies Successful Studies Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 The chance for success of a genetic study depends strongly on the nature and strength of the genetic involvement.

Genetic Studies Successful Studies Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 The chance for success of a genetic study depends strongly on the nature and strength of the genetic involvement. Easy: recessive diseases caused by a single gene and a high penetrance.

Genetic Studies Successful Studies Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 The chance for success of a genetic study depends strongly on the nature and strength of the genetic involvement. Easy: recessive diseases caused by a single gene and a high penetrance. Difficult: polygenic diseases with a strong environmental influence.

Genetic Studies Successful Studies Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 The chance for success of a genetic study depends strongly on the nature and strength of the genetic involvement. Easy: recessive diseases caused by a single gene and a high penetrance. Difficult: polygenic diseases with a strong environmental influence. 2 Success is difficult to come by: reproducible gene-disease associations are few and far between! (http://www.the-scientist.com/2004/12/20/20/1/)

Genetic Studies Which Biomarkers? Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 In any genetic study one must identify variables related to the genotype of the individuals. These can be based on sequencing of candidate genes;

Genetic Studies Which Biomarkers? Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 In any genetic study one must identify variables related to the genotype of the individuals. These can be based on sequencing of candidate genes; determining the presence of genetic markers (microsatellites, SNPs, etc.) throughout the whole genome;

Genetic Studies Which Biomarkers? Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 In any genetic study one must identify variables related to the genotype of the individuals. These can be based on sequencing of candidate genes; determining the presence of genetic markers (microsatellites, SNPs, etc.) throughout the whole genome; determining the presence of mutations either in candidate genes or throughout the genome.

Genetic Studies Which Biomarkers? Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies 1 In any genetic study one must identify variables related to the genotype of the individuals. These can be based on sequencing of candidate genes; determining the presence of genetic markers (microsatellites, SNPs, etc.) throughout the whole genome; determining the presence of mutations either in candidate genes or throughout the genome. 2 The principles that can be used to achieve this are based on physical properties of biomolecules.

Genetic Studies Statistical Analysis Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies Based on such data, one tries to correlate the presence of the disease with the genetic variables. Linkage: by studying families in which the disease segregates, one can find markers that go together with the disease rather than segregating randomly (Mendel);

Genetic Studies Statistical Analysis Genetics Biomarkers Genetic Susceptibility Back to Carcinogenesis Genetic Studies Based on such data, one tries to correlate the presence of the disease with the genetic variables. Linkage: by studying families in which the disease segregates, one can find markers that go together with the disease rather than segregating randomly (Mendel); Association: by comparing individuals having the disease (the cases) with individuals free of the disease (the controls) one may find significant differences with regard to the genetic variables, either because the risk-conferring alleles are enriched or depleted among the cases.

Experimental Strategy Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? 1 Direct scanning of individual genomes at today s costs limits the size of the study too severely.

Experimental Strategy Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? 1 Direct scanning of individual genomes at today s costs limits the size of the study too severely. 2 Pooling P 100 individuals and analysing the resulting genetic material using DCE (denaturing capillary electrophoresis) is one of several new methods that are on track to become feasible and sufficiently accurate in the near future. Potential alternatives: sequencing by synthesis, mismatch repair detection and others.

Experimental Strategy Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? 1 Direct scanning of individual genomes at today s costs limits the size of the study too severely. 2 Pooling P 100 individuals and analysing the resulting genetic material using DCE (denaturing capillary electrophoresis) is one of several new methods that are on track to become feasible and sufficiently accurate in the near future. Potential alternatives: sequencing by synthesis, mismatch repair detection and others. 3 DCE can be used to determine the number of mutant alleles in the exons and splice sites of all protein coding genes.

Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? Experimental Strategy One hundred common diseases, 10,000 patients per disease 1 Suppose we could analyse blood samples from 10,000 individuals suffering from a disease for a selection of 100 common diseases.

Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? Experimental Strategy One hundred common diseases, 10,000 patients per disease 1 Suppose we could analyse blood samples from 10,000 individuals suffering from a disease for a selection of 100 common diseases. 2 We can then summarize the DCE measurements in the form of a table M(g, d) counting the number of mutant alleles for any gene g and disease d.

Experimental Strategy The Statistical Problem Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? 1 Let S cases (gene) and S controls (gene) be the total sum over all mutant alleles in the case and the control cohorts.

Experimental Strategy The Statistical Problem Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? 1 Let S cases (gene) and S controls (gene) be the total sum over all mutant alleles in the case and the control cohorts. 2 Under any scenario of genetic risk, the cases group would either have a slight enrichment (genetic condition conferring risk) or a slight rarification (protective genetic condition) of the mutant alleles.

Experimental Strategy The Statistical Problem Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? 1 Let S cases (gene) and S controls (gene) be the total sum over all mutant alleles in the case and the control cohorts. 2 Under any scenario of genetic risk, the cases group would either have a slight enrichment (genetic condition conferring risk) or a slight rarification (protective genetic condition) of the mutant alleles. 3 Identifying the genes that show a significant difference S cases (gene) S controls (gene) is the aim of the study.

Confounding Variables Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? We are looking for are unusual differences S cases (gene) S controls (gene). The factors that make this difficult include:

Confounding Variables Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? We are looking for are unusual differences S cases (gene) S controls (gene). The factors that make this difficult include: The large costs!

Confounding Variables Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? We are looking for are unusual differences S cases (gene) S controls (gene). The factors that make this difficult include: The large costs! The large number of genes involved (around 25,000) and the need to account for random noise.

Confounding Variables Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? We are looking for are unusual differences S cases (gene) S controls (gene). The factors that make this difficult include: The large costs! The large number of genes involved (around 25,000) and the need to account for random noise. Population heterogeneity (ethnical differences) could dilute an otherwise sizable effect. Many neutral mutant alleles may be present in equal proportions in both cohorts.

Confounding Variables Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? We are looking for are unusual differences S cases (gene) S controls (gene). The factors that make this difficult include: The large costs! The large number of genes involved (around 25,000) and the need to account for random noise. Population heterogeneity (ethnical differences) could dilute an otherwise sizable effect. Many neutral mutant alleles may be present in equal proportions in both cohorts. Misdiagnosis will in general dilute the effect.

Confounding Variables Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? We are looking for are unusual differences S cases (gene) S controls (gene). The factors that make this difficult include: The large costs! The large number of genes involved (around 25,000) and the need to account for random noise. Population heterogeneity (ethnical differences) could dilute an otherwise sizable effect. Many neutral mutant alleles may be present in equal proportions in both cohorts. Misdiagnosis will in general dilute the effect. Measurement errors in determining S makes detection more difficult.

Confounding Variables Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? Genicity is important in determining the size of the effect (does one have to be homozyote for the mutant allele?, heterozygote?)

Confounding Variables Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? Genicity is important in determining the size of the effect (does one have to be homozyote for the mutant allele?, heterozygote?) Interactions between genes (epistasis) or between genes and environment, if present, could greatly dilute the effect.

Confounding Variables Summed Mutant Alleles for Cases vs Controls What Are The Major Difficulties? Genicity is important in determining the size of the effect (does one have to be homozyote for the mutant allele?, heterozygote?) Interactions between genes (epistasis) or between genes and environment, if present, could greatly dilute the effect. If the risk is distributed over many genes, each one gene has a diminished effect.

Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! Statistical Question One hundred common diseases, 10,000 patients per disease 1 Assuming that the disease d enriches or rarefies specific mutant alleles we are led to test the null hypothesis H 0 : π(g, d) = π(g, c), where d refers to the disease group (cases) and c to the control group, for which we may take all the other diseases.

Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! Statistical Question One hundred common diseases, 10,000 patients per disease 1 Assuming that the disease d enriches or rarefies specific mutant alleles we are led to test the null hypothesis H 0 : π(g, d) = π(g, c), where d refers to the disease group (cases) and c to the control group, for which we may take all the other diseases. 2 A possible test statistics is M(g, d)/20, 000 s d M(g, s)/1, 980, 000.

Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! Statistical Question One hundred common diseases, 10,000 patients per disease 1 Assuming that the disease d enriches or rarefies specific mutant alleles we are led to test the null hypothesis H 0 : π(g, d) = π(g, c), where d refers to the disease group (cases) and c to the control group, for which we may take all the other diseases. 2 A possible test statistics is M(g, d)/20, 000 s d M(g, s)/1, 980, 000. 3 Or simply ( the difference in the summed mutant alleles: T = M(g, d) ) s d M(g, s)/ 99.

Statistical Properties Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! 1 The test, which rejects the null hypothesis when either T /SD(T ) > quantile or T /SD(T ) < quantile has a chance of making a true discovery equal to power(δ) = 1 Φ(quantile δ) + Φ(quantile + δ).

Statistical Properties Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! 1 The test, which rejects the null hypothesis when either T /SD(T ) > quantile or T /SD(T ) < quantile has a chance of making a true discovery equal to power(δ) = 1 Φ(quantile δ) + Φ(quantile + δ). 2 The standardized effect is equal to δ = (2n) π(g, d), where n is the size of the cases cohort, which was n = 10, 000 in the concrete proposal. (This assumes that all sources of variation other than sampling the study participants can Stephanbe Morgenthaler neglected.)

Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! Multiplicity One hundred common diseases and 10,000 patients per disease 1 The number of null hypothesis to be tested is 25,000 genes 100 diseases = N = 2,500,000.

Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! Multiplicity One hundred common diseases and 10,000 patients per disease 1 The number of null hypothesis to be tested is 25,000 genes 100 diseases = N = 2,500,000. 2 Testing with the usual quantile of 1.96 would identify an absurd number of about 125,000 false positive (gene, disease) associations. [in general: N α]

Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! Multiplicity One hundred common diseases and 10,000 patients per disease 1 The number of null hypothesis to be tested is 25,000 genes 100 diseases = N = 2,500,000. 2 Testing with the usual quantile of 1.96 would identify an absurd number of about 125,000 false positive (gene, disease) associations. [in general: N α] 3 If we increase the quantile to 4.26 the number of expected false positive couples decreases to a more reasonable 50 (half a false discovery for each of the 100 diseases) [p-value of 2 10 5 ]. The Bonferroni method requires the quantile 5.61. [p-value of 2 10 8 ].

Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! Multiplicity One hundred common diseases and 10,000 patients per disease 1 The number of null hypothesis to be tested is 25,000 genes 100 diseases = N = 2,500,000. 2 Testing with the usual quantile of 1.96 would identify an absurd number of about 125,000 false positive (gene, disease) associations. [in general: N α] 3 If we increase the quantile to 4.26 the number of expected false positive couples decreases to a more reasonable 50 (half a false discovery for each of the 100 diseases) [p-value of 2 10 5 ]. The Bonferroni method requires the quantile 5.61. [p-value of 2 10 8 ]. 4 A correction of this type is essential!

Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! Bonferroni and False Discovery Rate In the FDR procedure, the p-values are ordered from smallest to largest. The corresponding k null hypotheses are rejected, if k is the largest integer such that p (k) k FDR/N. [The last time that one beats the FDR bound!]

p-values vs Normal Deviates Using p-values is not optimal Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! 1 p-values are unusual objects. The comparison distribution is the uniform, which holds under H 0. When the alternative is true they have a complex law.

p-values vs Normal Deviates Using p-values is not optimal Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! 1 p-values are unusual objects. The comparison distribution is the uniform, which holds under H 0. When the alternative is true they have a complex law. 2 One sometimes can convert p-values to normal deviates (5% 1.64, 2.5% 1.96) such that only a shift in the mean takes place when the alternative holds. This is a much better scale to judge things on. Why?

p-values vs Normal Deviates Using p-values is not optimal Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! 1 p-values are unusual objects. The comparison distribution is the uniform, which holds under H 0. When the alternative is true they have a complex law. 2 One sometimes can convert p-values to normal deviates (5% 1.64, 2.5% 1.96) such that only a shift in the mean takes place when the alternative holds. This is a much better scale to judge things on. Why? 3 Because, if a study using n = 200 has p-value of 5.6%, then doubling the sample size will on average change it to:

p-values vs Normal Deviates Using p-values is not optimal Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! 1 p-values are unusual objects. The comparison distribution is the uniform, which holds under H 0. When the alternative is true they have a complex law. 2 One sometimes can convert p-values to normal deviates (5% 1.64, 2.5% 1.96) such that only a shift in the mean takes place when the alternative holds. This is a much better scale to judge things on. Why? 3 Because, if a study using n = 200 has p-value of 5.6%, then doubling the sample size will on average change it to: Φ 1 (94.4%) = 1.59 2 1.59 = 2.25 1 Φ(2.25) = 1.2%.

Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! 50% Chance of Making a True Discovery (δ=quantile) The plot shows how the standardized effect δ behaves as a function of the enrichment for various π controls (g) and study sizes (n cases and m controls).

Sparsity Testing N Hypothesis Genetics Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! 1 Only a few genes are expected to be implicated in any one disease. The signal hidden inside the genetic noise will thus be very sparse. Maybe as few as 20 in N = 2,500,000.

Sparsity Testing N Hypothesis Genetics Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! 1 Only a few genes are expected to be implicated in any one disease. The signal hidden inside the genetic noise will thus be very sparse. Maybe as few as 20 in N = 2,500,000. 2 In such a case, the effect size the Bonferroni bound is too large and one can somewhat lower it. Procedures for detecting sparse signals look at the whole collection of estimated effects (or p-values) and test whether their distribution deviates significantly from a null sample (which is uniform for p-value case).

Sparsity Testing N Hypothesis Genetics Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! 1 Only a few genes are expected to be implicated in any one disease. The signal hidden inside the genetic noise will thus be very sparse. Maybe as few as 20 in N = 2,500,000. 2 In such a case, the effect size the Bonferroni bound is too large and one can somewhat lower it. Procedures for detecting sparse signals look at the whole collection of estimated effects (or p-values) and test whether their distribution deviates significantly from a null sample (which is uniform for p-value case). 3 In a normal model in which one mixes a fraction of 1 N β null effects with a fraction of N β effects of size µ > 0 the following happens.

Sparsity N=2,500,000 independent tests Genetics Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! For values of β close to one (20 out of N corresponds to β 0.8), the effect size has to be larger than µ > 2 0.30 log(n) = 2.97. The four plots correspond to 0, 4 0.30 log(n), 2 0.30 log(n), 1 0.30 log(n).

If a Binomial Model Were to Hold Null Hypothesis and Test Statistic Corrections for Multiplicity Necessary Study Size Finding the Rare Gene! This is the same plot as before, but instead of the Bonferroni bound, the detection bound for sparse effects under normal mixtures is shown.

Hardy-Weinberg equilibrium Maintaining Genetic Diversity Linkage Types of Finite Populations 1 For any gene with two alleles, the frequencies of genotypes determine the frequency of alleles: p A = P AA + 0.5P Aa

Hardy-Weinberg equilibrium Maintaining Genetic Diversity Linkage Types of Finite Populations 1 For any gene with two alleles, the frequencies of genotypes determine the frequency of alleles: p A = P AA + 0.5P Aa 2 If all is well, then one has inversely the Hardy-Weinberg law: P AA = p 2 A, P aa = p 2 a, and P Aa = 2p A p a, that is, the alleles a descendant receives can be thought of as being randomly and independently drawn from the alleles available at the parent s generation.

Hardy-Weinberg equilibrium Maintaining Genetic Diversity Linkage Types of Finite Populations 1 For any gene with two alleles, the frequencies of genotypes determine the frequency of alleles: p A = P AA + 0.5P Aa 2 If all is well, then one has inversely the Hardy-Weinberg law: P AA = p 2 A, P aa = p 2 a, and P Aa = 2p A p a, that is, the alleles a descendant receives can be thought of as being randomly and independently drawn from the alleles available at the parent s generation. 3 This has given rise to the Wright-Fisher model of heredity and shows that on average the genetic variation is exactly preserved.

The Wright-Fisher Model Maintaining Genetic Diversity Linkage Types of Finite Populations

Hardy-Weinberg equilibrium Violations Maintaining Genetic Diversity Linkage Types of Finite Populations 1 The equilibrium will be violated when the mating in the population is not free, but restricted by geography, social conventions, and so on [inbreeding].

Hardy-Weinberg equilibrium Violations Maintaining Genetic Diversity Linkage Types of Finite Populations 1 The equilibrium will be violated when the mating in the population is not free, but restricted by geography, social conventions, and so on [inbreeding]. 2 Inbreeding causes an excess of homozygote individuals and thus reduces the genetic diversity. This can render recessive genetic diseases much more common than they would be in a less inbred environment.

Hardy-Weinberg equilibrium Violations Maintaining Genetic Diversity Linkage Types of Finite Populations 1 The equilibrium will be violated when the mating in the population is not free, but restricted by geography, social conventions, and so on [inbreeding]. 2 Inbreeding causes an excess of homozygote individuals and thus reduces the genetic diversity. This can render recessive genetic diseases much more common than they would be in a less inbred environment. 3 A kind of statistical inbreeding coefficient can be defined as F = 2p Ap a P Aa 2p A p a.

Equilibrium Linkage Genetics Maintaining Genetic Diversity Linkage Types of Finite Populations 1 If one considers two genes, both with two alleles, the possible genotypes are AABB, AABb, AAbb, AaBB, etc. One would hope that P AABB = pa 2p2 B, that is, the alleles are drawn randomly.

Equilibrium Linkage Genetics Maintaining Genetic Diversity Linkage Types of Finite Populations 1 If one considers two genes, both with two alleles, the possible genotypes are AABB, AABb, AAbb, AaBB, etc. One would hope that P AABB = pa 2p2 B, that is, the alleles are drawn randomly. 2 This is true for genes that are located on different chromosomes, but wrong when they are on the same chromosome.

Equilibrium Linkage Genetics Maintaining Genetic Diversity Linkage Types of Finite Populations 1 If one considers two genes, both with two alleles, the possible genotypes are AABB, AABb, AAbb, AaBB, etc. One would hope that P AABB = pa 2p2 B, that is, the alleles are drawn randomly. 2 This is true for genes that are located on different chromosomes, but wrong when they are on the same chromosome. 3 Indeed, it all depends on the association of the alleles on the chromosomes. An individual with genotype AaBb can have arrangement AB on one chromosome and ab on the other, or it can be Ab, ab. These arrangements are called haplotypes.

Linkage Recombination fraction Genetics Maintaining Genetic Diversity Linkage Types of Finite Populations 1 Looking at it this way, one would now think that a descendent inherits from the parent one of the haplotypes.

Linkage Recombination fraction Genetics Maintaining Genetic Diversity Linkage Types of Finite Populations 1 Looking at it this way, one would now think that a descendent inherits from the parent one of the haplotypes. 2 This is again wrong. In fact an individual which is AB and ab will create germlines AB or ab with probability (1 r)/2 and germlines Ab or ab with probability r/2. The parameter r lies between 0 and 1/2 and is called the recombination fraction.

Linkage Recombination fraction Genetics Maintaining Genetic Diversity Linkage Types of Finite Populations 1 Looking at it this way, one would now think that a descendent inherits from the parent one of the haplotypes. 2 This is again wrong. In fact an individual which is AB and ab will create germlines AB or ab with probability (1 r)/2 and germlines Ab or ab with probability r/2. The parameter r lies between 0 and 1/2 and is called the recombination fraction. 3 The smaller r, the lesser the physical distance of the genes on the chromosome.

Linkage Recombination fraction Genetics Maintaining Genetic Diversity Linkage Types of Finite Populations 1 Looking at it this way, one would now think that a descendent inherits from the parent one of the haplotypes. 2 This is again wrong. In fact an individual which is AB and ab will create germlines AB or ab with probability (1 r)/2 and germlines Ab or ab with probability r/2. The parameter r lies between 0 and 1/2 and is called the recombination fraction. 3 The smaller r, the lesser the physical distance of the genes on the chromosome. So, r offers a genetic opening for creating a physical map of the genes on a chromosome. (r = 1% one centimorgan).

Recombination Cross-overs between homologous chromosomes Maintaining Genetic Diversity Linkage Types of Finite Populations During Meiosis (the fabrication of germ cells), two cell divisons occur. As part of this process, inter-chromosomal cross-overs takes place.