For all of the following, you will have to use this website to determine the answers: http://blast.ncbi.nlm.nih.gov/blast.cgi We are going to be using the programs under this heading: Answer the following questions. You might want to use the Help function the Blast page. 1. You have protein sequence and you wish to know what other proteins look like it. Which of the five Basic Blast programs should you use? 2. You a DNA sequence and you wish to search for other DNA sequences to find one that encodes the same or similar protein. Which of the five Basic Blast programs should you use? 3. You have DNA and you wish to find other DNA sequences that look like it. Which of the five Basic Blast programs should you use? 4. You have protein sequence and you wish to search DNA databases to find genes that encode a similar protein. Which of the five Basic Blast programs should you use? ------------------------------------------------------------------------------------------------------------------------ Now we begin an example in which you are going to use blastp. ------------------------------------------------------------------------------------------------------------------------ You have identified a gene that is important for transcription regulation of a collection of genes. You have just obtained some amino acid sequence. You are going to use BLAST to find out the likely identity of this protein. The protein sequence that you have is called the query. The query is: HGTSSGPTVTIVQIPNGNTVQVHGVLQGGQPSVLQSPQVQTVQLSVLGESEDSQESVD Click the link that says "protein blast". You will see a box like the one below. Bring this page with you to the exam. You will turn it in with the answers filled out. 1
The accession number and gi numbers(see above) mean that if the sequence is contained in the Blast database then one could type in the ID number of the sequence. Your sequence is not this database so we can't use these ID Your query (above)is not exactly in fasta format but it is close enough. The program will accept it. Copy and past the query into the the box labelled "Enter Query Sequence". Next you click the Blast button and wait a bit. After a while you get a graphic summary. The graphic summary is convenient and like most conveniences somewhat useless. Use it only if it shows conserved domains. Take a look at them and vanish the Graphic Summary. Click the triangle and it will vanish. The descriptions are more useful. Look at them and then click the triangle to vanish them Bring this page with you to the exam. You will turn it in with the answers filled out. 2
Ah, the alignments. This is the part that I like. What is the top hit? The first one is the most statistically improbable to be a random hit. This means that it is likely to be one that we want to look at. You should a line that says something like this: Identities = 45/58 (77%), Positives = 51/58 (87%), Gaps = 0/58 (0%) 5. What do you think the difference between Identities and Positives is? Do you understand how to interpret the alignment? The top sequence is your query. The bottom sequence is the hit that was found in the database. The middle line shows which sequence are the same. 6. What do the pluses mean? Now give your protein a name. Don't make one up. 7. What does Blast tell you it's name should be? Bring this page with you to the exam. You will turn it in with the answers filled out. 3
8. In blastp page what is the default scoring matrix? Look under Algorithm parameters. Tell me something about this matrix (look in lecture notes.) ************************************************************ 9. In blastp page what is the default word size? Look under Algorithm parameters. What does word size mean? ************************************************************ 10. In the blastp page what are the default Gap insertion and extension penalties? They are different than the ones used in class. Look under Algorithm parameters. ----------------------------------------------------------------- Bring this page with you to the exam. You will turn it in with the answers filled out. 4
--------------------------------------------------------------- NEW EXAMPLE. --------------------------------------------------------------- This DNA is from a real cdna. There are no sequencing errors in it. gctggtccagaaggctaaactggccgagcagtcagaacgttacgatgatatggcccaggccatgaagtc cgtcacagagactggcgttgagctctcaaatgaggaaagaaatctgctctccgttgcctacaaaaatgt ggtcggtgcccgcaggtcatcgtggcgtgtcatctcctccattgagcagaaaaccgaagcatccgctag aaaacagcagctcgcccgtgagtacagagagc You are to search the DNA databases of blast looking for other DNA molecules that encode similar or identical proteins. 11. Which basic blast program should you use? It is a different blast than in the last question. Paste the sequence into the query window. Double check that the correct database is selected. Yu should be using this one: Now do the alignment click the BLAST button. In your alignment window you will see multiple possible alignments to the same chunk of DNA. Actually, if you are on the right track, all of these should be aligned amino acid sequeences. All of the most favorable alignments are listed under a link that has a format like this: >gb BBXXXXX.X This link identifies a file that contains the sequence from the gene that matched your query. Lower down the list you will find other genes that match too. Click the link. 12. What is the name of this gene? (hint scroll down and look under FEATURES for the term called "gene". Next to it will an entry that says /gene = "Name of this darned gene". Name of gene: Bring this page with you to the exam. You will turn it in with the answers filled out. 5
13. Some of the alignments have asterisks in them. These are the product of STOP CODONS!!! Why are there stop codons? 14. Knowing this, find the first one that does not have a stop codon,what is the expect value for this one? The expect value is Bring this page with you to the exam. You will turn it in with the answers filled out. 6
Essay question #1 setup The mutant form of the gene has single nucleotide change right here. The clos gene 5' UTR 3' UTR transcription mrna from the clos gene translation Wild type makes a transcription factor called CLOS. The mutant makes a CLOS protein that has its is missing its activation domain but htat still has its DNA binding domain. Essay question #2 setup I want you to describe 6 possible new HIV drugs (make them up). These drugs should NOT belong to the four categories of drugs that are currently in clinical use. Tell me how each drug works. Tell me how and why they interfere with the viral "life cycle". Don't skip anything. Use as much space as needed. You are going to need to use information from all of HIV lectures to do a good job on this one. Essay question #3 setup Could you make a PAM250 matirx? Don't bring this page with you to the exam <---- YOU WILL NOT TURN THIS ONE IN. 7
Other RNA Processing Events 1. How does RNA editing work? What changes does it cause? Is it random or directed? 2. What the heck does ADAR stand for? What does it do? 3. With respect to RNA editing what are guide RNAs? 4. Could you give me a list of all possible ways to generate an RNAi response? 5. Could you describe all of the enzymes involved and draw the entire RNAi pathway? 6. Could you describe how replication of RNAi triggers can occur? 7. What did Dr. Atkinson when he spoke of exon spreading in the context of RNAi? Translation 1. What is a Shine-Dalgarno sequence? 2. What are Kozak's Rules? You might refer to this as Kozac's sequence. 3. How do eukaryotic ribosomes recognize the translation start site? 4. eif2, eif2b, eif3, eif4, eif5, eif6 What do they do? 5. What other proteins does the eif4 complex touch? 6. How do some viruses interact with eif4 in order to take over cellular translation? 7. Why is the regulation of translation initiation more important in a eukaryotic cell than in a prokaryotic cell? 8. Riboswitch 9. Heme regulation of globin synthesis 10. mtor phosphorylates EiF4E and does what to translation? Bioinformatics 1. What is the conceptual basis of the PAM250 matrix? 2. You should expect to have to perform dynamic programming. 3. I can ask you anything about how the Ficketts program works. 4. I might ask you anything about Blast. 5. You wish to say that two aligned proteins have 50% of the same amino acids at the same position. Amongst the remaining amino acids, 25% of these are conservative differences. We should say that the proteins are 75%. A) homologous B) identical C) similar D) homolical 6. The Fickett s testcode algorithm Which is the BEST answer? a. can identify exons in genomic DNA b. can identify exons transcribed by RNA polymerase II in forward reading frames but not if the sequence is presented in the backward reading frame c. can identify exons transcribed by RNA polymerase II regardless of their reading frame d. can identify transcribed regions of genomic DNA 7. The Fickett s testcode algorthim works because (choose the best answer) a. coding regions do not contain stop codons. Don't bring this page with you to the exam <---- YOU WILL NOT TURN THIS ONE IN. 8
b. each species has a specific codon bias (dialect of codons preferentially used to encode proteins). c. protein encoding exons have a different bias of nucleotides at position 1, 2, & 3 than does DNA that does not encode a protein. d. protein encoding exons have a bias of nucleotides at position 1, 2, & 3 while DNA that does not encode a protein has a random distribution of nucleotides at positions 1, 2 and 3. e. A and D 8. Blast (choose the best answer) a. at its heart is a global alignment program b. at its heart a local alignment program. c. part of the reason that it is so fast is because it does not use dynamic programming. d. part of the reason that it is so fast is because it first compares words instead of amino acid residues. e. B and D 9. In Fickett's Testcode the equation for calculating the A position parameter is A) max (#Apos1, #Apos2, #Apos3) B) min (#Apos1, #Apos2, #Apos3) min (#Apos1, #Apos2, #Apos3)+1 max (#Apos1, #Apos2, #Apos3)+1 C) % of A's in the sequence D) % of A's in position 3 E) Log10(.incidence/.freq) * 10 The plots below represent 5 different dot plots. X axis X axis A B C X axis D X axis E X axis Y axis Y axis Y axis Y axis Y axis The outcome of four different alignments are shown below. Please match up alignment with the correct dot plot. 10. This cartoon represents the alignment of two proteins:. Which dot plot would it it generate? A, B, C, D or E. 11. This cartoon represents the alignment of two proteins:. Which dot plot would it it generate? A, B, C, D or E. 12. DNA fragment with an inversion Which dot plot would it it generate? A, B, C, D or E. 13. One protein has 3 copies of a domain, the other copy has only 1 copy of this domain. Which dot plot would it it generate? A, B, C, D or E. Don't bring this page with you to the exam <---- YOU WILL NOT TURN THIS ONE IN. 9
--------------------------------------------------------------- BLASTP COMPARISON This is the result of a blastp search of the genome. GENE ID: 80333 KCNIP4 Kv channel interacting protein 4 [Homo sapiens] (Over 10 PubMed links) Score = 83.2 bits (204), Expect = 6e-15, Method: Composition-based stats. Identities = 41/50 (82%), Positives = 42/50 (84%), Gaps = 7/50 (14%) Query 1 ATVRHRPEALSLLEAQSKRATFCGTFTKKELQILYRGFRNECPSGVVNEE 50 ATVRHRPEAL LLEAQSK FTKKELQILYRGF+NECPSGVVNEE Sbjct 43 ATVRHRPEALELLEAQSK-------FTKKELQILYRGFKNECPSGVVNEE 85 15. How many gap insertion penalties occurred? How many gap extension penalties occurred? How many substitutions are there? What are the PAM250 values for these substitutions (use the PAM table from the lecture). Don't bring this page with you to the exam <---- YOU WILL NOT TURN THIS ONE IN. 10
HIV 1. 1981, 1983 with regard to HIV why are these years relevant? 2. In 2002 how many people are estimated to have HIV in the USA (round to nearest 100,000). 3. In 2003 how many people are estimated to have HIV in the world (round to the nearest million). 4. In Dr. Atkinson's opinion, what is the most convincing evidence that HIV causes AIDS? a. Drugs designed to specifically interfere with the viral life cycle suppress AIDS symptoms and extend human life. b. Recipients of contaminated blood subsequently develop AIDS. c. The fact that Kary Mullis says so ;) 5. Ploidy of HIV 6. Retrovirus means what? 7. Reverse transcription - how does it occur? 8. Four purposes of understanding the viral load (also useful for stimulating evening converstaion), can you tell me how many virions are produced per day during the so-called latency period? 9. How many virions produced over the 10 year (approximate) latency period? 10. Can you label everything here. Even things that I talked about but that did not appear in the original figure? 11. What does GP120 do? What does GP41 do? 12. How does HIV enter the cell? 13. How type of cell does HIV use to enter the body after the transfer of bodily fluids? 14. CCR5, CXCR4, RANTES, why are these important? What do they do? 15. Aids resistance genes. What are they? Why are the understanding of these genes so very important? How could they contribute to the the production of a HIV therapy? 16. How does HIV kill cells? 17. What is the relationship between reverse transcription and HIV recombination? 18. HIV protease inhibitors block the production of capsid proteins, gp120 and gp41, HIV reverse transcriptase, RNase, protease and integrase. How can this be? Why does this happen? 19. TAR is an activator. What does it bind? How and when does it activate HIV transcription? 20. What does TAT do? Is it protein, RNA, DNA, animal, plant or mineral. 21. REV regulates splicing of HIV transcripts. You should be able to describe what would happen its activity were completely blocked. 22. *****Tough one****** You should think about it as soon as you understand Rev and Nef.****** The last one was easy. But what are the consequences of shortening the Rev minus period (making Rev work much, much faster than normal could do this). Hint - the body would probably clear the virus but why? 23. Know everything about Nef. Don't bring this page with you to the exam <---- YOU WILL NOT TURN THIS ONE IN. 11