Case Study. Malaria and the human genome STUDENT S GUIDE. Steve Cross, Bronwyn Terrill and colleagues. Version 1.1

Similar documents
LAB#23: Biochemical Evidence of Evolution Name: Period Date :

1. Describe the relationship of dietary protein and the health of major body systems.

Amino Acids. Amino Acids. Fundamentals. While their name implies that amino acids are compounds that contain an NH. 3 and CO NH 3

Properties of amino acids in proteins

Objective: You will be able to explain how the subcomponents of

Cells N5 Homework book

Biological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A

Methionine (Met or M)

CS612 - Algorithms in Bioinformatics

Protein Investigator. Protein Investigator - 3

Copyright 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings

Chapter 4: Information and Knowledge in the Protein Insulin

AP Bio. Protiens Chapter 5 1

Page 8/6: The cell. Where to start: Proteins (control a cell) (start/end products)

Amino acids-incorporated nanoflowers with an

Proteins are sometimes only produced in one cell type or cell compartment (brain has 15,000 expressed proteins, gut has 2,000).

Introduction to Protein Structure Collection

Introduction to proteins and protein structure

Towards a New Paradigm in Scientific Notation Patterns of Periodicity among Proteinogenic Amino Acids [Abridged Version]

Biomolecules: amino acids

Molecular Biology. general transfer: occurs normally in cells. special transfer: occurs only in the laboratory in specific conditions.

The Structure and Function of Large Biological Molecules Part 4: Proteins Chapter 5

Chemistry 121 Winter 17

1-To know what is protein 2-To identify Types of protein 3- To Know amino acids 4- To be differentiate between essential and nonessential amino acids

Student Exploration: Microevolution

Lipids: diverse group of hydrophobic molecules

Four Classes of Biological Macromolecules. Biological Macromolecules. Lipids

Short polymer. Dehydration removes a water molecule, forming a new bond. Longer polymer (a) Dehydration reaction in the synthesis of a polymer

PROTEINS. Amino acids are the building blocks of proteins. Acid L-form * * Lecture 6 Macromolecules #2 O = N -C -C-O.

Biology. Lectures winter term st year of Pharmacy study

If you like us, please share us on social media. The latest UCD Hyperlibrary newsletter is now complete, check it out.

9/6/2011. Amino Acids. C α. Nonpolar, aliphatic R groups

Amino Acids. Review I: Protein Structure. Amino Acids: Structures. Amino Acids (contd.) Rajan Munshi

Green Segment Contents

Malaria. Population at Risk. Infectious Disease epidemiology BMTRY 713 (Lecture 23) Epidemiology of Malaria. April 6, Selassie AW (DPHS) 1

Chemical Nature of the Amino Acids. Table of a-amino Acids Found in Proteins

The Meaning of Genetic Variation

Mutations and Disease Mutations in the Myosin Gene

Lezione 10. Sommario. Bioinformatica. Lezione 10: Sintesi proteica Synthesis of proteins Central dogma: DNA makes RNA makes proteins Genetic code

The Structure and Function of Macromolecules

Aipotu II: Biochemistry

Reactions and amino acids structure & properties

Bio Factsheet. Proteins and Proteomics. Number 340

Protein Folding LARP

Invest in the future, defeat malaria

Population Genetics Simulation Lab

Protein and Amino Acid Analysis. Chemistry M3LC

CHM333 LECTURE 6: 1/25/12 SPRING 2012 Professor Christine Hrycyna AMINO ACIDS II: CLASSIFICATION AND CHEMICAL CHARACTERISTICS OF EACH AMINO ACID:

Biomolecules Amino Acids & Protein Chemistry

(30 pts.) 16. (24 pts.) 17. (20 pts.) 18. (16 pts.) 19. (5 pts.) 20. (5 pts.) TOTAL (100 points)

Proteins. RWF Chemistry H2A

Practice Problems 3. a. What is the name of the bond formed between two amino acids? Are these bonds free to rotate?

Macromolecules of Life -3 Amino Acids & Proteins

READ THIS FIRST. Your Name

Lecture 4. Grouping Amino Acid 7/1/10. Proteins. Amino Acids. Where Are Proteins Located. Nonpolar Amino Acids

Below are the sections of the DNA sequences of a normal hemoglobin gene and the mutated gene that causes sickle cell disease.

TEACHER S GUIDE. Case Study. Lactose tolerance. Steve Cross, Bronwyn Terrill and colleagues. Wellcome Trust Sanger Institute Hinxton. Version 1.

For questions 1-4, match the carbohydrate with its size/functional group name:

Introduction to Peptide Sequencing

Protein Synthesis and Mutation Review

So where were we? But what does the order mean? OK, so what's a protein? 4/1/11

Cells. Variation and Function of Cells

2. Ionization Sources 3. Mass Analyzers 4. Tandem Mass Spectrometry

The Basics: A general review of molecular biology:

Head. Tail. Carboxyl group. group. group. air water. Hydrocarbon chain. lecture 5-sa Seth Copen Goldstein 2.

1. (38 pts.) 2. (25 pts.) 3. (15 pts.) 4. (12 pts.) 5. (10 pts.) Bonus (12 pts.) TOTAL (100 points)

Chapter 3: Amino Acids and Peptides

Introduction. Basic Structural Principles PDB

Proteins are a major component of dissolved organic nitrogen (DON) leached from terrestrially aged Eucalyptus camaldulensis leaves

GL Science Inertsearch for LC Inertsil Applications - Acids. Data No. Column Data Title Solutes Eluent Detection Data No.

Moorpark College Chemistry 11 Fall Instructor: Professor Gopal. Examination # 5: Section Five May 7, Name: (print)

Journal of Cell Science Supplementary information. Arl8b +/- Arl8b -/- Inset B. electron density. genotype

BIOLOGY 621 Identification of the Snorks

Section 1 Proteins and Proteomics

The Distribution of Human Differences. If all this genetic variation is so recent and continuous, why do we think of it in categorical terms?

A Chemical Look at Proteins: Workhorses of the Cell

LC-MS Analysis of Amino Acids on a Novel Mixed-Mode HPLC Column

Cube Critters Teacher s Guide

Midterm 1 Last, First

9/16/15. Properties of Water. Benefits of Water. More properties of water

Sickle Cell Anemia. Sickle cell anemia is an inherited disorder of the blood which occurs when just one base pair substitution


Running head: HEREDITY AND MALARIA 1

Biology 2C03: Genetics What is a Gene?

Review II: The Molecules of Life

The Making of the Fittest: Natural Selection in Humans

Biochemistry - I. Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture 1 Amino Acids I

Lesson Overview. Human Genetic Disorders. Lesson Overview Human Genetic Disorders

The Making of the Fittest: Natural Selection in Humans

The Distribution of Human Differences. If all this genetic variation is so recent and continuous, why do we think of it in categorical terms?

Classroom Tested Lesson Video Description Secrets of the Sequence, Show 108, Episode 2

Natural Selection In Humans (Sickle Cell Anemia)

number Done by Corrected by Doctor Dr.Diala

Human Genetic Disorders. Lesson Overview. Lesson Overview Human Genetic Disorders

Proximate composition, amino acid and fatty acid composition of fish maws. Department of Biology, Lingnan Normal University, Zhanjiang, , China

4 Fahed Al Karmi Sufian Alhafez Dr nayef karadsheh

Classification of amino acids: -

Transcription:

STUDENT S GUIDE Case Study Malaria and the human genome Version 1.1 Steve Cross, Bronwyn Terrill and colleagues Wellcome Trust Sanger Institute Hinxton

Malaria and the human genome Each year, the malaria parasite Plasmodium falciparum kills over a million African children and causes debilitating illness in over half a billion people worldwide. Malaria is the strongest known selective force in the recent history of the human genome. Many types of genetic variation have evolved in humans due to selection by the malarial parasite, causing variation in red blood cell regulation, structure and antigen expression. In this activity, you will investigate the origin and action of mutations that are thought to have arisen in human populations in response to selection pressure from malaria. IMAGE FROM: Wellcome Images. Activity overview Malaria is a debilitating illness that affects more than 40% of the world s population caused by parasites of the genus Plasmodium. This disease is thought to be the strongest selective force on our species in recent history. Researchers believe that this is responsible for the diverse range of genetic adaptations that protect against malaria in different populations genomes. In this activity, you will use a common statistical test (chi-squared) to work out whether a genetic mutation is associated with incidence of the disease, or whether the two events are independent. What is malaria? Every year, malaria causes hundreds of millions of people to be ill and kills between one and three million, most of them children in sub-saharan Africa. It is a disease caused by a protozoan parasite that is spread by mosquitoes and which multiplies inside human blood cells. It causes fevers, chills and shortness of breath (the symptoms are like severe flu) and, in extreme cases, coma and death. Malaria is a disease that is older than humans, and there are malarial parasites that infect birds, lizards and primates other than humans. The human form of malaria seems to have existed for at least 100,000 years, although it probably only became such a vicious killer about 10,000 years ago. Malaria is currently found in a band that extends across the tropics, throughout Africa, southern Asia, Central America and the north of South America. At one time, it also extended north into Europe and North America. 2

Malaria can be treated with a wide variety of drugs, from the threehundred-year-old quinine to recently-developed treatments. The choice of drug depends on the type of malaria infection and the development of resistance to drugs by some strains in some areas. There is currently no vaccine for malaria, but there is a huge international effort to develop one. Malaria is the focus of a large amount of medical research, and the genome of the most common forms of the disease was sequenced in 2002 by staff at the Wellcome Trust Sanger Institute. How malaria is caused Malaria is caused by a group of protozoan parasites of the genus Plasmodium. There are four types of Plasmodium. P. falciparum is the most common form on the African continent and causes the most severe malaria of any type. P. vivax is the form of the infection most often seen in Asia. The other two species, P. ovale and P. malariae, are less common. Malaria life cycle The malaria parasites are spread by female Anopheles mosquitoes, being injected into the human bloodstream when the mosquito sucks blood (1). Once in the blood vessels (2), the parasites (known as sporozoites) migrate to the liver (3), where they can multiply, safe from attack by the immune system. The next stage in the parasite s life cycle is to invade red blood cells (4). In the red blood cells, the malarial parasites of this stage (merozoites) can multiply to huge numbers before bursting the cells and re-entering the bloodstream (5a). Alternatively, they can invade red blood cells to turn into sex cells known as gametocytes (5b). Gametocytes can be taken up by mosquitoes through the same biting process and blood meal as before. In the mosquito s intestine, the sex cells mate; their offspring can then migrate to the mosquito salivary glands ready to be passed into a new human host. 3

malaria and the human genome Malaria as a selective pressure on humans Researchers believe that there is evidence in the human genome that malaria has been the single greatest selective pressure on human beings in recent genetic history. Because it is so widespread, and so deadly, malaria has killed huge numbers of humans, and our genomes bear scars of a long arms race with this disease. There are numerous specific genetic variations which are found most often in areas where malaria is common, some of which have been shown to offer protection against infection. One of these genetic variants might explain why P. vivax is so rare in Africa compared to P. falciparum. In western and central Africa most people have genetic variants that mean that they do not produce a specific protein, called the Duffy protein, on the surface of their red blood cells. This protein is used by P. vivax as a way to enter red blood cells, and thus these people are immune to P. vivax. This suggests that there has been very high exposure to P. vivax in these regions in the past few thousand years, but that human DNA changes have meant that this particular parasite is much rarer now. Sickle cell trait and sickle cell anaemia Sickle cell anaemia is a genetic condition that is found mainly in areas where malaria in endemic. The mutation that causes sickle cell anaemia is commonly said to be recessive, in that only people with two copies of the disorder form of the gene are affected by the full disease. In fact, even carriers with one copy of the gene, who are said to have sickle cell trait, will have a small percentage of sickled cells in their blood. At a molecular level, they also have a significant change in half of their haemoglobin molecules. The molecular structure of haemoglobin. The two beta subunits are in blue. 4

A change of a single nucleotide in the HBB gene, β-globin (on chromosome 11), causes sickle cell trait, and having this change on both copies of chromosome 11 causes sickle cell anaemia. β-globin is one of the proteins that makes up haemoglobin a complex found in red blood cells that binds oxygen for transport around the body. The allele with this change is often called Hbs for short, with the non-sickle cell version being known as Hb a. The unaffected genotype is therefore written Hb a Hb a, the sickle cell trait genotype is Hb a Hb s and the sickle cell anaemia genotype is Hb s Hb s. The symptoms of the disorder are visible at a microscopic level. The red blood cells of a homozygote for the sickle cell allele tend to adopt a rigid sickle shape. This means that they cannot move as freely through small blood vessels, and therefore cannot transfer oxygen as effectively to some organs. Symptoms include organ pain and fever, and often occur in bouts rather than being continuous. People with sickle cell anaemia usually have a shortened life span. PHOTO BY: E.M. Unit, Royal Free Hospital School of Medicine/Wellcome Images. Sickle and normal red blood cells. People with one copy of the sickle cell allele and one standard allele only tend to feel sickle cell symptoms under extreme conditions of oxygen deprivation, such as climbing a mountain, or if seriously dehydrated. Because the sickle-cell mutation only alters a single base in the DNA sequence, it is known as a single-nucleotide polymorphism, or SNP (pronounced snip ). Internationally, researchers focus on the role that SNPs play in human disease. SNPs have been found that offer some protection against obesity, heart disease and diabetes. In malarial regions of Africa, about 1 in 10 people carries at least one sickle cell allele. 5

Is sickle cell trait or anaemia associated with malaria? Sickle cell anaemia is, at first glance, a paradox. If possessing two copies of an allele causes such serious disorder, and historically would have caused death before reproductive age, how can that allele be so common? A clue to the reasons can be found by looking at the distribution of the sickle cell allele on a world map, and comparing it to the distribution of P. falciparum; Both are found in the same regions. This would suggest that maybe the sickle cell allele is protecting some people against malaria, even as it has adverse effects on others. Number of cases of malaria 1 million or more 1 million 500, 000 500, 000 100, 000 100, 000 10, 000 10, 000 1, 000 Fewer than 1, 000 A map of the distribution of malaria in populations of Africa and South Asia. The darker the green, the greater the incidence of the disease. Data source: World Health Organisation. % of population with Hb s 14 + 12 14 10 12 8 10 6 8 4 6 A map of the distribution of the sickle cell allele in populations of Africa and South Asia. 6

The key to this protection is that carriers of only one copy of the sickle cell allele, who also have one healthy allele, do not suffer from the anaemia, but are protected against malaria by their sickle cell trait. This is often quoted as an example of a phenomenon known as heterozygote advantage. In this instance, because the heterozygote is fitter (in an evolutionary sense) than either of the homozygotes in a specific set of circumstances, there will always be a balance of genes in the population, rather than one form being fixed by selective pressure. Where there is little malaria, the superior fitness of the non-sickle form drives this variant to fixation. Exercise 1 Exploring the effects of sickled cells To understand the effect of the SNP that causes sickle cell trait and sickle cell anaemia, you will first need to characterise it. You will be using JavaScript DNA Translator 1.1, a simple tool for analysing DNA sequences. The first 60 nucleotides of the normal form of the gene look like this: ATG GTG CAT CTG ACT CCT GAG GAG AAG TCT GCC GTT ACT GCC CTG TGG GGC AAG GTG AAC The mutated (sickle cell) form looks like this: ATG GTG CAT CTG ACT CCT GTG GAG AAG TCT GCC GTT ACT GCC CTG TGG GGC AAG GTG AAC Open the file JVTtranslator.shtml with a web browser (it looks best in Mozilla Firefox). You will be presented with this screen: Give your sequence a name here. Paste the DNA sequence here. Ensure that the 'Reading frame' is set to '1' so that the software starts at the beginning of the sequence. Click 'Translate'. 7

Type a name for your sequence into the top box ( standard or sickle for example). Cut and paste one of the DNA sequences from the text file called DNA_sequences.txt into the second box. You ll need to tell the software that you re only interested in one protein sequence from this DNA, by changing Reading Frame to 1. Now click the Translate button. The screen that is generated will give you your results. The easiest version of the protein sequence to read is in the yellow box. Here is a list of what the letter codes in the yellow box represent in terms of the amino acids in the predicted protein. Asp D Aspartic acid Ala A Alanine Glu E Glutamic acid Gly G Glycine Arg R Arginine Val V Valine Lys K Lysine Leu L Leucine His H Histidine Ile I Isoleucine Asn N Asparagine Pro P Proline Gln Q Glutamine Phe F Phenylalanine Ser S Serine Met M Methionine Thr T Threonine Trp W Tryptophan Tyr Y Tyrosine Cys C Cysteine Amino acid codes The three-letter and single letter codes for the 20 amino acids that are found in proteins. Most computer software uses the single-letter codes to show the different amino acids. An example of output from the JaveScript Translator. The amino acid sequence is in the yellow box. Questions a. What has changed in the sickle cell gene? b. What has changed in the sickle cell protein? 8

Exercise 2 Does sickle cell protect against malaria? In this exercise you will test the hypothesis that the sickle cell allele is associated with protection against malaria. Rather than look at the geographical distribution of the parasite and the gene, you will be looking directly at the life histories of people with and without the sickle cell SNP, and comparing their chances of infection. When trying to demonstrate a relationship or association between a genetic change and a disease, it is important to gather data in different ways and from different sources, to prevent yourself from coming to inaccurate conclusions about a gene s importance. There are many studies that claim to have found a correlation between a specific genetic change and a disease or susceptibility, without any idea of how this might happen. Only by repeating studies and looking for mechanisms by which the disease is caused can scientists be sure that associations are real rather than just statistical anomalies. The data You are going to analyse real data from a 2008 study to evaluate the evidence that the sickle cell allele is associated with malarial incidence. The hypothesis is that people with sickle cell trait will be less likely to become hospitalised with malaria. Every year millions of children worldwide are hospitalised with severe malaria. For this study, DNA was taken from 500 children attending a single hospital. The DNA was examined, and the sickle cell SNP status was determined for each child. As the control sample, DNA was also taken from 500 people in the population near the hospital. These were people who had not been admitted to hospital with severe malaria as children. The sickle cell SNP status was also determined for each person in this population. You will need to perform a statistical test to find out whether the prevalence of the sickle cell allele is significantly greater in the general population than in children being admitted to hospital with severe malaria. This test evaluates whether the variation in numbers is a result of chance or a real effect. You will be looking at individual chromosomes rather than people. This is because the paired chromosomes of humans would make the test much more complicated than it would be if you were to use individual chromosomes. 9

The Chi-Squared test The Chi-Squared test can be used here to test the following null hypothesis. Null hypothesis: There is no difference in the prevalence of the sickle cell allele in the chromosomes of people with and without malaria. Alternative hypothesis: There is a significant difference in the prevalence of the sickle cell allele in these populations. Open the Excel spreadsheet called malaria_data_student. There are two worksheets: Cases and Controls. (Your teacher may also give you another worksheet called Student Table to use.) On the Cases and Controls worksheets, there are 500 people s genotypes at specific locations in the genome. Remember the Cases were from people in hospital with malaria; Controls were from people in the community. The column you re interested in is Column E: HbS. Sort the data and count the rows to work out how many As and Ts have been observed in the different populations. Fill in the table below using the chromosome data you have been given: Category Has malaria Has Hb s (T) Has malaria No Hb s (A) No malaria Has Hbs (T) No malaria No Hb s (A) Observed (O) Expected* (E) (see below) O E (O E) 2 (O E)2 E TOTALS * To calculate the Expected values, you will need to count the number of Hb s alleles found across the entire population under test (1,000 people and therefore 2,000 chromosomes). This figure, divided by the total chromosome population size (2,000) will give you the prevalence of the Hb s allele. Multiplying the prevalence of each form of the allele by the number of people s chromosomes in each test group, will give you the Expected values. 10

Category Size of population (ignoring alleles) Prevalence of allele Expected Has Hb s No Hb s (O E) 2 The total of E across all four groups is the chi-squared value (χ 2 ). There is only one degree of freedom in this experiment. Using this information, calculate the probability of the null hypothesis being true using this lookup table, where p is the probability. p 0.25 0.2 0.15 0.1 0.05 0.025 0.01 0.005 0.001 0.0005 χ 2 1.32 1.64 2.07 2.71 3.84 5.02 6.63 7.88 10.83 12.12 Questions c. What is the probability that the null hypothesis is true? d. Does this mean that the result is statistically significant? e. Does this information reinforce or contradict the data based on mapping the prevalence of malaria and the sickle cell allele published above? 11