Bioinformatics Laboratory Exercise

Size: px
Start display at page:

Download "Bioinformatics Laboratory Exercise"

Transcription

1 Bioinformatics Laboratory Exercise Biology is in the midst of the genomics revolution, the application of robotic technology to generate huge amounts of molecular biology data. Genomics has led to an explosion in biological data. For example the amount of DNA sequence data have grown exponentially over the last several decades. At its current rates the amount of DNA sequences known will double every 9 to 12 months. In spring 2005 there were 39 billion base pairs of sequence known. This semester Spring 2005, there are 100 billion base pairs of sequence data known. Similar levels of data growth have occurred for protein sequences, gene expression data and protein interactions. A new sub discipline in biology, bioinformatics, has developed to deal with this vast amount of data. Bioinformatics involves the careful storage, organization and indexing of sequence information, and the development of computer software to analyze the data. The major depository of sequence data is a database called National Center of Biotechnology Information (NCIB) maintained by the National Institute of Health. NCIB maintains a web site that includes a number of different interfaces for searching and analyzing the databases. The site can be accessed at the URL We will use this data base to explore the types of mutations that accumulate in genes as species diverge. As we discussed in class, base substitutions within an open reading frame can be categorized based on how they affect sequence of the encoded protein. For example mis-sense mutations are based substitutions that result in a change of one amino acid in the polypeptide chain; silent mutations are based substitutions that do not affect the polypeptide chain; and non-sense mutations are base substitutions that generate a premature stop codon shortening the polypeptide chain. In this exercise, you will identify genes that code ribosomal proteins in humans. You will use a program at NCBI to identify the codons within the gene and the amino acids that they encode. You will then identify the equivalent gene (an orthologue) in a closely related vertebrate species. From an alignment of these two genes you will identify base substitution mutations that have accumulated since the divergence of these species. You will then analyze whether these mutations are mis-sense, silent or non-sense mutations. During the course of this exercise you will use the two major tools to identify genes in the NCIB databases. The first method involves searching for key words in the data base entries. In addition to the DNA sequence, every entry in the DNA databases includes other information about the gene. This additional information may include the name of the species, information about the protein encoded by the gene, the names of the investigators or literature citations for the data. This information can be used to identify genes using an interface called entrez. Entrez is a search engine similar to those that are used for literature searches (e.g. pubmed). It searches the database entries for words in the entry. For example, we will search the database using Homo sapiens ribosomal protein to identify genes encoding the human ribosomal proteins. This search term will identify these human genes. However it will also identify numerous other entries that are

2 not ribosomal proteins but have these four words somewhere in their database entry. Therefore once the entrez search has identified possible entries, they will have to be carefully examined to determine which ones correspond with human ribosomal proteins. A second way to search the DNA databases is by sequence comparison. NCBI includes a search program called Blast that will compare any DNA sequence to all 39 billion bp of sequence in the data base, identify similar sequences, rank the sequences in order of similarity and provide sequence alignments for similar regions. We will use blast to identify vertebrate genes similar to the human ribosomal proteins. The third piece of bioinformatics software that we will use will be a program called ORF finder. This program will identify the open reading frame in a fragment of DNA and identify which amino acid each codon encodes. Procedure I. Identifying genes for human ribosomal proteins. 1. Using Internet Explorer go to the NCBI website < 2. On the NCBI home page click on the All Databases link on the blue bar. 3. Select the Nucleotide database 4. In the search window type homo sapiens ribosomal protein 60S and click GO

3 5. This will retrieve all the nucleotide database entries that include the words. Note that almost forty thousand database entries are retrieved. Most of these entries are not human genes and many may be partial sequences. To limit the analyses to genes that are better understood select the RefSeq tab by clicking on the tab.. 6. RefSeq are the database entries that have been most carefully reviewed by NCBI. There are still more than 100 entries. Some of these will be genes for ribosomal proteins. Other may be human pseudogene or genes encoding factors that interact with the ribosome. 7. To identify the genes for human ribosomal proteins only choose entries that begin Homo sapiens. Avoid mitochondrial proteins, pseudogenes or whole chromosomes. On this example the third entry is a ribosomal protein. As an example click on the blue accession number to obtain the database entry. (For the actual exercise, everyone in the class will be assigned an separate ribosomal protein to analyze.)

4 8. Scholl down the entry to observe the type of information in this entry. Notice at the bottom of the entry is the DNA sequence. This sequence may include promoter elements, exons, introns etc. To obtain just the open reading frame (start codon to stop codon without any introns) click on the link CDS about halfway down the entry. 9. If you scroll down to the bottom of this screen you will see a sequence of DNA that corresponds to the open reading frame. It starts with a DNA version of a start codon and ends with one of the three stop codons.

5 10. Unfortunately, the computer programs cannot read this sequence because of the numbers. Therefore before you run the other programs you need to convert this to another format called fasta. To convert it to another format click the drop down button next to GenBank Full and select FASTA. Then click the Display button and new screen will come up. 11. Below you will see an example of a Fasta format. Copy the FASTA entry and paste it into a word document. (Be sure to select the pasted info and convert it to Courier Font 8pt) See example at the end of this lab handout.

6 II. ORF Finder 1. Return to the home page by clicking the NCBI symbol in the upper left hand side of the window. Scroll down under HotSpots to find the ORF Finder Link. Click this link. 2. Scroll down the ORF Finder page and you will find a dialogue box. Paste the FASTA format of your gene in this box and click OrfFind button. 3. The ORF Finder program will generate a series of green bars. Click on the top (and longest bar) to obtain the annotated ORF.

7 3. Scroll to the bottom of this page to find an annotated open reading frame. Copy the open reading frame and paste it into the word document. (Be sure to select the pasted info and convert it to Courier Font 8pt. Also convert the entire font to Black.) See end of lab handout for examples. III. Blast Search 1. Return to the home page by clicking the NCBI symbol in the upper left hand side of the window. Click on the BLAST link on the blue bar. 2. Under the nucleotide column find the Nucleotide-nucleotide Blast and click.

8 3. In the new screen paste the fasta format for the human gene in the search dialogue box. Next to Choose a database click the drop down box and select refseq_rna. Click BLAST! to launch the search. 3. A blast search response comes up. Click on format! to see the results of the blast search. (Note it may require a few minutes to return the results.) 4. Scroll down the results until you see a list of similar genes. The best matches are at the top of this list. You will use the first non-human gene on this list. In the case of this example it is a dog. To see an alignment of the two sequences click on the Score corresponding to the best match

9 5. This will bring you to a Blast alignment of the two sequences. Copy and paste this alignment in your word document. (Be sure to select the pasted info and convert it to Courier Font 8pt.) 6. The top sequence is the human gene; the second sequence is the dog gene. Matches between the two sequences are indicated by a between the two sequences. A mis-match between the two sequences suggests a mutation has accumulated in one of these genes since the divergence of humans and dogs. IV. Mutation Analysis Starting at the 5 of the gene identify 10 single base substitutions. Ignore any double base substitutions. Note the location and substitution for these mutations on the annotated Open Reading Frame. Using the genetic code determine if these mutations are silent, mis-sense or non-sense mutations. Report your analysis in a data collection sheet. See example on the last page. V. Short Lab Report 1. Submit the Fasta format for your human ribosomal protein open reading frame. 2. Submit the annotated open reading frame generated by ORF Finder 3. Submit the Blast alignment of the human gene to the most similar non-human gene. 4. Submit a data collection sheet formatted as in the example. 5. What percentage of the mutations were silent, mis-sense or non-sense mutations. 6. It is a basic tenet of evolutionary biology that mutations are random. If this is true, we would predict that mis-sense mutations would be more common than silent mutations. (Changes in either of the first two nucleotides of a codon generally result in a mis-sense mutation. Only mutations in the third position result in a silent mutation.) Explain why in this analysis, silent mutations are more common than missense mutations.

10 Example Analysis NM_ Human ribosomal protein RPL34 >gi : Homo sapiens ribosomal protein L34 (RPL34), transcript variant 2, mrna ATGGTCCAGCGTTTGACATACCGACGTAGGCTTTCCTACAATACAGCCTCTAACAAAACTAGGCTGTCCC GAACCCCTGGTAATAGAATTGTTTACCTTTATACCAAGAAGGTTGGGAAAGCACCAAAATCTGCATGTGG TGTGTGCCCAGGCAGACTTCGAGGGGTTCGTGCTGTAAGACCTAAAGTTCTTATGAGATTGTCCAAAACA AAGAAACATGTCAGCAGGGCCTATGGTGGTTCCATGTGTGCTAAATGTGTTCGTGACAGGATCAAGCGTG CTTTCCTTATCGAGGAGCAGAAAATCGTTGTGAAAGTGTTGAAGGCACAAGCACAGAGTCAGAAAGCTAA ATAA Annotated Open Reading Frame 1 atggtccagcgtttgacataccgacgtaggctttcctacaataca M V Q R L T Y R R R L S Y N T 46 gcctctaacaaaactaggctgtcccgaacccctggtaatagaatt A S N K T R L S R T P G N R I 91 gtttacctttataccaagaaggttgggaaagcaccaaaatctgca V Y L Y T K K V G K A P K S A 136 tgtggtgtgtgcccaggcagacttcgaggggttcgtgctgtaaga C G V C P G R L R G V R A V R 181 cctaaagttcttatgagattgtccaaaacaaagaaacatgtcagc P K V L M R L S K T K K H V S 226 agggcctatggtggttccatgtgtgctaaatgtgttcgtgacagg R A Y G G S M C A K C V R D R 271 atcaagcgtgctttccttatcgaggagcagaaaatcgttgtgaaa I K R A F L I E E Q K I V V K 316 gtgttgaaggcacaagcacagagtcagaaagctaaataa 354 V L K A Q A Q S Q K A K * gi ref XM_ L34 (LOC478509), mrna Length = 577 PREDICTED: Canis familiaris similar to ribosomal protein Score = 543 bits (274), Expect = e-153 Identities = 334/354 (94%) Strand = Plus / Plus Query: 1 atggtccagcgtttgacataccgacgtaggctttcctacaatacagcctctaacaaaact 60 Sbjct: 172 atggttcagcgtttgacataccgtcgtaggctgtcctacaatacagcctctaacaaaact 231 Query: 61 aggctgtcccgaacccctggtaatagaattgtttacctttataccaagaaggttgggaaa 120 Sbjct: 232 aggctgtcccgaactcctggcaatagaatcgtttacctttataccaagaaggttgggaaa 291 Query: 121 gcaccaaaatctgcatgtggtgtgtgcccaggcagacttcgaggggttcgtgctgtaaga 180 Sbjct: 292 gcgccaaagtctgcatgtggcgtgtgtcctggccgacttcgaggtgttcgtgcggtgaga 351

11 Query: 181 cctaaagttcttatgagattgtccaaaacaaagaaacatgtcagcagggcctatggtggt 240 Sbjct: 352 cctaaagtccttatgagattgtctaaaacgaaaaaacatgtcagcagggcctatggtggt 411 Query: 241 tccatgtgtgctaaatgtgttcgtgacaggatcaagcgtgctttccttatcgaggagcag 300 Sbjct: 412 tccatgtgtgctaaatgtgttcgtgacaggatcaagcgtgctttccttattgaggagcag 471 Query: 301 aaaatcgttgtgaaagtgttgaaggcacaagcacagagtcagaaagctaaataa 354 Sbjct: 472 aaaatcgttgtgaaagtgttgaaggcacaagcacagagtcagaaagctaaataa 525 Data Collection Sheet Mutation Human Dog Type of mutation 1 GTC GTG Silent Val Val Etc. 2 CGA CGT Silent Arg Arg 3 CTT CTG Silent Leu Leu 4 ACC ACT Silent Thr Thr

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

For all of the following, you will have to use this website to determine the answers:

For all of the following, you will have to use this website to determine the answers: For all of the following, you will have to use this website to determine the answers: http://blast.ncbi.nlm.nih.gov/blast.cgi We are going to be using the programs under this heading: Answer the following

More information

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009 Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009 1 Abstract A stretch of chimpanzee DNA was annotated using tools including BLAST, BLAT, and Genscan. Analysis of Genscan predicted genes revealed

More information

Student Handout Bioinformatics

Student Handout Bioinformatics Student Handout Bioinformatics Introduction HIV-1 mutates very rapidly. Because of its high mutation rate, the virus will continue to change (evolve) after a person is infected. Thus, within an infected

More information

Data mining with Ensembl Biomart. Stéphanie Le Gras

Data mining with Ensembl Biomart. Stéphanie Le Gras Data mining with Ensembl Biomart Stéphanie Le Gras (slegras@igbmc.fr) Guidelines Genome data Genome browsers Getting access to genomic data: Ensembl/BioMart 2 Genome Sequencing Example: Human genome 2000:

More information

Bio 111 Study Guide Chapter 17 From Gene to Protein

Bio 111 Study Guide Chapter 17 From Gene to Protein Bio 111 Study Guide Chapter 17 From Gene to Protein BEFORE CLASS: Reading: Read the introduction on p. 333, skip the beginning of Concept 17.1 from p. 334 to the bottom of the first column on p. 336, and

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

SMPD 287 Spring 2015 Bioinformatics in Medical Product Development. Final Examination

SMPD 287 Spring 2015 Bioinformatics in Medical Product Development. Final Examination Final Examination You have a choice between A, B, or C. Please email your solutions, as a pdf attachment, by May 13, 2015. In the subject of the email, please use the following format: firstname_lastname_x

More information

MODULE 3: TRANSCRIPTION PART II

MODULE 3: TRANSCRIPTION PART II MODULE 3: TRANSCRIPTION PART II Lesson Plan: Title S. CATHERINE SILVER KEY, CHIYEDZA SMALL Transcription Part II: What happens to the initial (premrna) transcript made by RNA pol II? Objectives Explain

More information

Number of Differences from Species 1

Number of Differences from Species 1 Molecular Evidence for Evolution Name: Pre Lab Activity: Genes code for amino acids, amino acids code for proteins and proteins build body structures. Therefore, one way to observe the relatedness of species

More information

Chapter 12-4 DNA Mutations Notes

Chapter 12-4 DNA Mutations Notes Chapter 12-4 DNA Mutations Notes I. Mutations Introduction A. Definition: Changes in the DNA sequence that affect genetic information B. Mutagen= physical or chemical agent that interacts with DNA to cause

More information

RNA and Protein Synthesis Guided Notes

RNA and Protein Synthesis Guided Notes RNA and Protein Synthesis Guided Notes is responsible for controlling the production of in the cell, which is essential to life! o DNARNAProteins contain several thousand, each with directions to make

More information

a. From the grey navigation bar, mouse over Analyze & Visualize and click Annotate Nucleotide Sequences.

a. From the grey navigation bar, mouse over Analyze & Visualize and click Annotate Nucleotide Sequences. Section D. Custom sequence annotation After this exercise you should be able to use the annotation pipelines provided by the Influenza Research Database (IRD) and Virus Pathogen Resource (ViPR) to annotate

More information

Insulin mrna to Protein Kit

Insulin mrna to Protein Kit Insulin mrna to Protein Kit A 3DMD Paper BioInformatics and Mini-Toober Folding Activity Student Handout www.3dmoleculardesigns.com Insulin mrna to Protein Kit Contents Becoming Familiar with the Data...

More information

TRANSLATION: 3 Stages to translation, can you guess what they are?

TRANSLATION: 3 Stages to translation, can you guess what they are? TRANSLATION: Translation: is the process by which a ribosome interprets a genetic message on mrna to place amino acids in a specific sequence in order to synthesize polypeptide. 3 Stages to translation,

More information

Project Manual Bio3055. Cholesterol Homeostasis: HMG-CoA Reductase

Project Manual Bio3055. Cholesterol Homeostasis: HMG-CoA Reductase Project Manual Bio3055 Cholesterol Homeostasis: HMG-CoA Reductase Bednarski 2003 Funded by HHMI Cholesterol Homeostasis: HMG-CoA Reductase Introduction: HMG-CoA Reductase is an enzyme in the cholesterol

More information

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by

More information

The Meaning of Genetic Variation

The Meaning of Genetic Variation Activity 2 The Meaning of Genetic Variation Focus: Students investigate variation in the beta globin gene by identifying base changes that do and do not alter function, and by using several CD-ROM-based

More information

Exploring HIV Evolution: An Opportunity for Research Sam Donovan and Anton E. Weisstein

Exploring HIV Evolution: An Opportunity for Research Sam Donovan and Anton E. Weisstein Microbes Count! 137 Video IV: Reading the Code of Life Human Immunodeficiency Virus (HIV), like other retroviruses, has a much higher mutation rate than is typically found in organisms that do not go through

More information

6.3 DNA Mutations. SBI4U Ms. Ho-Lau

6.3 DNA Mutations. SBI4U Ms. Ho-Lau 6.3 DNA Mutations SBI4U Ms. Ho-Lau DNA Mutations Gene expression can be affected by errors that occur during DNA replication. Some errors are repaired, but others can become mutations (changes in the nucleotide

More information

Molecular Database Generation for Type 2 Diabetes using Computational Science-Bioinformatics Tools

Molecular Database Generation for Type 2 Diabetes using Computational Science-Bioinformatics Tools Molecular Database Generation for Type Diabetes using Computational Science-Bioinformatics Tools Gagandeep Kaur Grewal Dept of Computer Engineering, UCOE, Punjabi University,Patiala Punjab,India gdeepgrewal@gmail.com

More information

High-throughput transcriptome sequencing

High-throughput transcriptome sequencing High-throughput transcriptome sequencing Erik Kristiansson (erik.kristiansson@zool.gu.se) Department of Zoology Department of Neuroscience and Physiology University of Gothenburg, Sweden Outline Genome

More information

PROTEIN SYNTHESIS. It is known today that GENES direct the production of the proteins that determine the phonotypical characteristics of organisms.

PROTEIN SYNTHESIS. It is known today that GENES direct the production of the proteins that determine the phonotypical characteristics of organisms. PROTEIN SYNTHESIS It is known today that GENES direct the production of the proteins that determine the phonotypical characteristics of organisms.» GENES = a sequence of nucleotides in DNA that performs

More information

SpliceDB: database of canonical and non-canonical mammalian splice sites

SpliceDB: database of canonical and non-canonical mammalian splice sites 2001 Oxford University Press Nucleic Acids Research, 2001, Vol. 29, No. 1 255 259 SpliceDB: database of canonical and non-canonical mammalian splice sites M.Burset,I.A.Seledtsov 1 and V. V. Solovyev* The

More information

Point total. Page # Exam Total (out of 90) The number next to each intermediate represents the total # of C-C and C-H bonds in that molecule.

Point total. Page # Exam Total (out of 90) The number next to each intermediate represents the total # of C-C and C-H bonds in that molecule. This exam is worth 90 points. Pages 2- have questions. Page 1 is for your reference only. Honor Code Agreement - Signature: Date: (You agree to not accept or provide assistance to anyone else during this

More information

Integration Solutions

Integration Solutions Integration Solutions (1) a) With no active glycosyltransferase of either type, an ii individual would not be able to add any sugars to the O form of the lipopolysaccharide. Thus, the only lipopolysaccharide

More information

Reporting TP53 gene analysis results in CLL

Reporting TP53 gene analysis results in CLL Reporting TP53 gene analysis results in CLL Mutations in TP53 - From discovery to clinical practice in CLL Discovery Validation Clinical practice Variant diversity *Leroy at al, Cancer Research Review

More information

Sections 12.3, 13.1, 13.2

Sections 12.3, 13.1, 13.2 Sections 12.3, 13.1, 13.2 Now that the DNA has been copied, it needs to send its genetic message to the ribosomes so proteins can be made Transcription: synthesis (making of) an RNA molecule from a DNA

More information

Post-Lab Activity STUDENT MANUAL POST-LAB ACTIVITY. Analysis and Interpretation of Results

Post-Lab Activity STUDENT MANUAL POST-LAB ACTIVITY. Analysis and Interpretation of Results STUDENT MANUAL POST-LAB ACTIVITY Post-Lab Activity Analysis and Interpretation of Results Detailed Gel Analysis Does molecular evidence support or refute the theory of evolution? Does your molecular evidence

More information

Project Manual Bio3055. Apoptosis: Superoxide Dismutase I

Project Manual Bio3055. Apoptosis: Superoxide Dismutase I Project Manual Bio3055 Apoptosis: Superoxide Dismutase I Bednarski 2003 Funded by HHMI Apoptosis: Superoxide Dismutase I Introduction: Apoptosis is another name for programmed cell death. It is a series

More information

Section D. Identification of serotype-specific amino acid positions in DENV NS1. Objective

Section D. Identification of serotype-specific amino acid positions in DENV NS1. Objective Section D. Identification of serotype-specific amino acid positions in DENV NS1 Objective Upon completion of this exercise, you will be able to use the Virus Pathogen Resource (ViPR; http://www.viprbrc.org/)

More information

FINAL ANNOTATION REPORT: Drosophila virilis Fosmid 11 (48P14) Robert Carrasquillo Bio 4342

FINAL ANNOTATION REPORT: Drosophila virilis Fosmid 11 (48P14) Robert Carrasquillo Bio 4342 FINAL ANNOTATION REPORT: Drosophila virilis Fosmid 11 (48P14) Robert Carrasquillo Bio 4342 2006 TABLE OF CONTENTS I. Overview... 3 II. Genes... 4 III. Clustal Analysis... 15 IV. Repeat Analysis... 17 V.

More information

Central Dogma. Central Dogma. Translation (mrna -> protein)

Central Dogma. Central Dogma. Translation (mrna -> protein) Central Dogma Central Dogma Translation (mrna -> protein) mrna code for amino acids 1. Codons as Triplet code 2. Redundancy 3. Open reading frames 4. Start and stop codons 5. Mistakes in translation 6.

More information

Sebastian Jaenicke. trnascan-se. Improved detection of trna genes in genomic sequences

Sebastian Jaenicke. trnascan-se. Improved detection of trna genes in genomic sequences Sebastian Jaenicke trnascan-se Improved detection of trna genes in genomic sequences trnascan-se Improved detection of trna genes in genomic sequences 1/15 Overview 1. trnas 2. Existing approaches 3. trnascan-se

More information

Objective: You will be able to explain how the subcomponents of

Objective: You will be able to explain how the subcomponents of Objective: You will be able to explain how the subcomponents of nucleic acids determine the properties of that polymer. Do Now: Read the first two paragraphs from enduring understanding 4.A Essential knowledge:

More information

DNA codes for RNA, which guides protein synthesis.

DNA codes for RNA, which guides protein synthesis. Section 3: DNA codes for RNA, which guides protein synthesis. K What I Know W What I Want to Find Out L What I Learned Vocabulary Review synthesis New RNA messenger RNA ribosomal RNA transfer RNA transcription

More information

Pre-mRNA has introns The splicing complex recognizes semiconserved sequences

Pre-mRNA has introns The splicing complex recognizes semiconserved sequences Adding a 5 cap Lecture 4 mrna splicing and protein synthesis Another day in the life of a gene. Pre-mRNA has introns The splicing complex recognizes semiconserved sequences Introns are removed by a process

More information

Biological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A

Biological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A Biological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A Homework Watch the Bozeman video called, Biological Molecules Objective:

More information

Gene finding. kuobin/

Gene finding.  kuobin/ Gene finding KUO-BIN LI, PH.D. http://www.bii.a-star.edu.sg/ kuobin/ Bioinformatics Institute 30 Medical Drive, Level 1, IMCB Building Singapore 117609 Republic of Singapore Gene finding (LSM5191) p.1

More information

ITS accuracy at GenBank. Conrad Schoch Barbara Robbertse

ITS accuracy at GenBank. Conrad Schoch Barbara Robbertse ITS accuracy at GenBank Conrad Schoch Barbara Robbertse Improving accuracy Barcode tag in GenBank Barcode submission tool Standards RefSeq Targeted Loci Well validated sequences already in GenBank Bacteria

More information

TITLE: The Role Of Alternative Splicing In Breast Cancer Progression

TITLE: The Role Of Alternative Splicing In Breast Cancer Progression AD Award Number: W81XWH-06-1-0598 TITLE: The Role Of Alternative Splicing In Breast Cancer Progression PRINCIPAL INVESTIGATOR: Klemens J. Hertel, Ph.D. CONTRACTING ORGANIZATION: University of California,

More information

Bioinformatic analyses: methodology for allergen similarity search. Zoltán Divéki, Ana Gomes EFSA GMO Unit

Bioinformatic analyses: methodology for allergen similarity search. Zoltán Divéki, Ana Gomes EFSA GMO Unit Bioinformatic analyses: methodology for allergen similarity search Zoltán Divéki, Ana Gomes EFSA GMO Unit EFSA info session on applications - GMO Parma, Italy 28 October 2014 BIOINFORMATIC ANALYSES Analysis

More information

RESEARCH PROJECT. Comparison of searching tools and outcomes of different providers of the Medline database (OVID and PubMed).

RESEARCH PROJECT. Comparison of searching tools and outcomes of different providers of the Medline database (OVID and PubMed). RESEARCH PROJECT Comparison of searching tools and outcomes of different providers of the Medline database (OVID and PubMed). Karolina Kucerova Charles University in Prague Evidence-Based Medicine Course

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

R2 Training Courses. Release The R2 support team

R2 Training Courses. Release The R2 support team R2 Training Courses Release 2.0.2 The R2 support team Nov 08, 2018 Students Course 1 Student Course: Investigating Intra-tumor Heterogeneity 3 1.1 Introduction.............................................

More information

Protein Synthesis and Mutation Review

Protein Synthesis and Mutation Review Protein Synthesis and Mutation Review 1. Using the diagram of RNA below, identify at least three things different from a DNA molecule. Additionally, circle a nucleotide. 1) RNA is single stranded; DNA

More information

Mutations. Any change in DNA sequence is called a mutation.

Mutations. Any change in DNA sequence is called a mutation. Mutations Mutations Any change in DNA sequence is called a mutation. Mutations can be caused by errors in replication, transcription, cell division, or by external agents. Mutations Mutations can be harmful.

More information

Beta Thalassemia Case Study Introduction to Bioinformatics

Beta Thalassemia Case Study Introduction to Bioinformatics Beta Thalassemia Case Study Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu www.cs.sjsu.edu/faculty/khuri Outline v Hemoglobin v Alpha

More information

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Bas. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 18 th 2016 Protein alignments We have seen how to create a pairwise alignment of two sequences

More information

SFARI Gene 2.0 User Guide

SFARI Gene 2.0 User Guide 1 SFARI Gene 2.0 User Guide This document is designed to acquaint the new user with SFARI Gene release 4.0, an integrated resource for autism research, and to provide enough information to allow the user

More information

Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W

Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W Annotation of Drosophila mojavensis fosmid 8 Priya Srikanth Bio 434W 5.1.2007 Overview High-quality finished sequence is much more useful for research once it is annotated. Annotation is a fundamental

More information

Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins.

Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins. Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins. The RNA transcribed from a complex transcription unit

More information

Phenylketonuria (PKU) Structure of Phenylalanine Hydroxylase. Biol 405 Molecular Medicine

Phenylketonuria (PKU) Structure of Phenylalanine Hydroxylase. Biol 405 Molecular Medicine Phenylketonuria (PKU) Structure of Phenylalanine Hydroxylase Biol 405 Molecular Medicine 1998 Crystal structure of phenylalanine hydroxylase solved. The polypeptide consists of three regions: Regulatory

More information

DNA is the genetic material that provides instructions for what our bodies look like and how they function. DNA is packaged into structures called

DNA is the genetic material that provides instructions for what our bodies look like and how they function. DNA is packaged into structures called DNA is the genetic material that provides instructions for what our bodies look like and how they function. DNA is packaged into structures called chromosomes. We have 23 pairs of chromosomes (for a total

More information

Section Chapter 14. Go to Section:

Section Chapter 14. Go to Section: Section 12-3 Chapter 14 Go to Section: Content Objectives Write these Down! I will be able to identify: The origin of genetic differences among organisms. The possible kinds of different mutations. The

More information

Computational Biology I LSM5191

Computational Biology I LSM5191 Computational Biology I LSM5191 Aylwin Ng, D.Phil Lecture Notes: Transcriptome: Molecular Biology of Gene Expression II TRANSLATION RIBOSOMES: protein synthesizing machines Translation takes place on defined

More information

Cours Bioinformatique : TP2

Cours Bioinformatique : TP2 Cours Bioinformatique 2008-2009: TP2 Researchers have identified a gene that is involved in breast cancer. Your task in this exercise is to use bioinformatics tools and databases to find useful information

More information

Part III: Basic Immunology

Part III: Basic Immunology Part III: Basic Immunology Introduction: This is an introductory unit on immunology. Important topics addressed include bacterial pathogens, vaccines, antibiotics, and cells of the immune system. After

More information

Finding subtle mutations with the Shannon human mrna splicing pipeline

Finding subtle mutations with the Shannon human mrna splicing pipeline Finding subtle mutations with the Shannon human mrna splicing pipeline Presentation at the CLC bio Medical Genomics Workshop American Society of Human Genetics Annual Meeting November 9, 2012 Peter K Rogan

More information

Supplementary Figure 1. CFTR protein structure and domain architecture.

Supplementary Figure 1. CFTR protein structure and domain architecture. A Plasma Membrane NH ₂ COOH Supplementary Figure. CFT protein structure and domain architecture. (A) Open state CFT homology model, ribbon representation from Serohijos et al. 8 PNAS 5:356. CFT domains

More information

Analysis with SureCall 2.1

Analysis with SureCall 2.1 Analysis with SureCall 2.1 Danielle Fletcher Field Application Scientist July 2014 1 Stages of NGS Analysis Primary analysis, base calling Control Software FASTQ file reads + quality 2 Stages of NGS Analysis

More information

MUTATIONS, MUTAGENESIS, AND CARCINOGENESIS. (Start your clickers)

MUTATIONS, MUTAGENESIS, AND CARCINOGENESIS. (Start your clickers) MUTATIONS, MUTAGENESIS, AND CARCINOGENESIS (Start your clickers) How do mutations arise? And how do they affect a cell and its organism? Mutations: heritable changes in genes Mutations occur in DNA But

More information

Biochemistry 2000 Sample Question Transcription, Translation and Lipids. (1) Give brief definitions or unique descriptions of the following terms:

Biochemistry 2000 Sample Question Transcription, Translation and Lipids. (1) Give brief definitions or unique descriptions of the following terms: (1) Give brief definitions or unique descriptions of the following terms: (a) exon (b) holoenzyme (c) anticodon (d) trans fatty acid (e) poly A tail (f) open complex (g) Fluid Mosaic Model (h) embedded

More information

Proteins. Length of protein varies from thousands of amino acids to only a few insulin only 51 amino acids

Proteins. Length of protein varies from thousands of amino acids to only a few insulin only 51 amino acids Proteins Protein carbon, hydrogen, oxygen, nitrogen and often sulphur Length of protein varies from thousands of amino acids to only a few insulin only 51 amino acids During protein synthesis, amino acids

More information

Chapter 4: Information and Knowledge in the Protein Insulin

Chapter 4: Information and Knowledge in the Protein Insulin Chapter 4: Information and Knowledge in the Protein Insulin This chapter will calculate the information and molecular knowledge in a real protein. The techniques discussed in this chapter to calculate

More information

Integrated Analysis of Copy Number and Gene Expression

Integrated Analysis of Copy Number and Gene Expression Integrated Analysis of Copy Number and Gene Expression Nexus Copy Number provides user-friendly interface and functionalities to integrate copy number analysis with gene expression results for the purpose

More information

OMIM The Online Mendelian Inheritance in Man Knowledgebase: A Wardrobe Full of Genes. Ada Hamosh, MD, MPH

OMIM The Online Mendelian Inheritance in Man Knowledgebase: A Wardrobe Full of Genes. Ada Hamosh, MD, MPH OMIM The Online Mendelian Inheritance in Man Knowledgebase: A Wardrobe Full of Genes Ada Hamosh, MD, MPH OMIM THE ONLINE MENDELIAN INHERITANCE IN MAN KNOWLEDGEBASE: A WARDROBE FULL OF GENES The OMIM knowledgebase

More information

Add_A_Class_with_Class_Number_Revised Thursday, March 18, 2010

Add_A_Class_with_Class_Number_Revised Thursday, March 18, 2010 Slide 1 Text Captions: PAWS Tutorial "Add a Class using Class Number" Created for: Version 9.0 Date: March, 2010 Slide 2 Text Captions: Objective In this tutorial you will learn how to add a class to your

More information

You may use your notes to answer the following questions:

You may use your notes to answer the following questions: Build-A-Cell Name: Group members: Date: Instructions: Please use the the Lego blocks responsibly and not a device to pinch other students. Answer the pre-lab questions before you start, follow all directions,

More information

Supplementary Document

Supplementary Document Supplementary Document 1. Supplementary Table legends 2. Supplementary Figure legends 3. Supplementary Tables 4. Supplementary Figures 5. Supplementary References 1. Supplementary Table legends Suppl.

More information

Genetic information flows from mrna to protein through the process of translation

Genetic information flows from mrna to protein through the process of translation Genetic information flows from mrn to protein through the process of translation TYPES OF RN (RIBONUCLEIC CID) RN s job - protein synthesis (assembly of amino acids into proteins) Three main types: 1.

More information

Term Definition Example Amino Acids

Term Definition Example Amino Acids Name 1. What are some of the functions that proteins have in a living organism. 2. Define the following and list two amino acids that fit each description. Term Definition Example Amino Acids Hydrophobic

More information

Protein Synthesis

Protein Synthesis Protein Synthesis 10.6-10.16 Objectives - To explain the central dogma - To understand the steps of transcription and translation in order to explain how our genes create proteins necessary for survival.

More information

Mouse Clec9a ORF sequence

Mouse Clec9a ORF sequence Mouse Clec9a gene LOCUS NC_72 13843 bp DNA linear CON 1-JUL-27 DEFINITION Mus musculus chromosome 6, reference assembly (C57BL/6J). ACCESSION NC_72 REGION: 129358881-129372723 Mouse Clec9a ORF sequence

More information

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies

OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies OncoPPi Portal A Cancer Protein Interaction Network to Inform Therapeutic Strategies 2017 Contents Datasets... 2 Protein-protein interaction dataset... 2 Set of known PPIs... 3 Domain-domain interactions...

More information

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion,

More information

COSMIC - Catalogue of Somatic Mutations in Cancer

COSMIC - Catalogue of Somatic Mutations in Cancer COSMIC - Catalogue of Somatic Mutations in Cancer http://cancer.sanger.ac.uk/cosmic https://academic.oup.com/nar/articl e-lookup/doi/10.1093/nar/gkw1121 Data In Large-scale systematic screens Detailed

More information

Introduction to genetic variation. He Zhang Bioinformatics Core Facility 6/22/2016

Introduction to genetic variation. He Zhang Bioinformatics Core Facility 6/22/2016 Introduction to genetic variation He Zhang Bioinformatics Core Facility 6/22/2016 Outline Basic concepts of genetic variation Genetic variation in human populations Variation and genetic disorders Databases

More information

Care Pathways User Guide

Care Pathways User Guide Care Pathways User Guide For questions about McKesson Clinical Tools, email us at msh.providers@mckesson.com. Care Pathways User Guide Table of Contents Introduction to Care Pathways... 3 Launching Care

More information

Structural Variation and Medical Genomics

Structural Variation and Medical Genomics Structural Variation and Medical Genomics Andrew King Department of Biomedical Informatics July 8, 2014 You already know about small scale genetic mutations Single nucleotide polymorphism (SNPs) Deletions,

More information

Sequence Analysis of Human Immunodeficiency Virus Type 1

Sequence Analysis of Human Immunodeficiency Virus Type 1 Sequence Analysis of Human Immunodeficiency Virus Type 1 Stephanie Lucas 1,2 Mentor: Panayiotis V. Benos 1,3 With help from: David L. Corcoran 4 1 Bioengineering and Bioinformatics Summer Institute, Department

More information

Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses. Objective

Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses. Objective Section B. Comparative Genomics Analysis of Influenza H5N2 Viruses Objective Upon completion of this exercise, you will be able to use the Influenza Research Database (IRD; http://www.fludb.org/) to: Search

More information

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer (2017). Mitelman F, Johansson B and Mertens F (Eds.), http://cgap.nci.nih.gov/chromosomes/mitel

More information

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and Worldwide.

Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and Worldwide. Page 1 of 32 Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and Worldwide. When and Where---Wednesdays 1-2pm Room 438 Library Admin Building Beginning September

More information

MicroRNA in Cancer Karen Dybkær 2013

MicroRNA in Cancer Karen Dybkær 2013 MicroRNA in Cancer Karen Dybkær RNA Ribonucleic acid Types -Coding: messenger RNA (mrna) coding for proteins -Non-coding regulating protein formation Ribosomal RNA (rrna) Transfer RNA (trna) Small nuclear

More information

SEQUENCE FEATURE VARIANT TYPES

SEQUENCE FEATURE VARIANT TYPES SEQUENCE FEATURE VARIANT TYPES DEFINITION OF SFVT: The Sequence Feature Variant Type (SFVT) component in IRD (http://www.fludb.org) is a relatively novel approach that delineates specific regions, called

More information

Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition

Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition Pre-mRNA Secondary Structure Prediction Aids Splice Site Recognition Donald J. Patterson, Ken Yasuhara, Walter L. Ruzzo January 3-7, 2002 Pacific Symposium on Biocomputing University of Washington Computational

More information

Bioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Bioinformatics. Sequence Analysis: Part III. Pattern Searching and Gene Finding. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Sequence Analysis: Part III. Pattern Searching and Gene Finding Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18

More information

Guide to Use of SimulConsult s Phenome Software

Guide to Use of SimulConsult s Phenome Software Guide to Use of SimulConsult s Phenome Software Page 1 of 52 Table of contents Welcome!... 4 Introduction to a few SimulConsult conventions... 5 Colors and their meaning... 5 Contextual links... 5 Contextual

More information

Completing the CIBMTR Confirmation of HLA Typing Form (Form 2005)

Completing the CIBMTR Confirmation of HLA Typing Form (Form 2005) Completing the CIBMTR Confirmation of HLA Typing Form (Form 2005) Stephen Spellman Research Manager NMDP Scientific Services Maria Brown Scientific Services Specialist Data Management Conference 2007 1

More information

A Brief Summary of Important Online Bioinformatics Databases and Genomic Application Algorithms

A Brief Summary of Important Online Bioinformatics Databases and Genomic Application Algorithms A Brief Summary of Important Online Bioinformatics Databases and Genomic Application Algorithms Compiled 2005, James F. Lynn Abstract: This is, by no means, a complete compendium of all the databases and

More information

Agile Product Lifecycle Management for Process

Agile Product Lifecycle Management for Process Nutrition Surveillance Management User Guide Release 5.2.1 Part No. E13901-01 September 2008 Copyrights and Trademarks Copyright 1995, 2008, Oracle Corporation and/or its affiliates. All rights reserved.

More information

Rotavirus Genotyping and Enhanced Annotation in the Virus Pathogen Resource (ViPR) Yun Zhang J. Craig Venter Institute ASV 2016 June 19, 2016

Rotavirus Genotyping and Enhanced Annotation in the Virus Pathogen Resource (ViPR) Yun Zhang J. Craig Venter Institute ASV 2016 June 19, 2016 Rotavirus Genotyping and Enhanced Annotation in the Virus Pathogen Resource (ViPR) Yun Zhang J. Craig Venter Institute ASV 2016 June 19, 2016 Loading Virus Pathogen Database and Analysis About Resource

More information

Fully Automated IFA Processor LIS User Manual

Fully Automated IFA Processor LIS User Manual Fully Automated IFA Processor LIS User Manual Unless expressly authorized, forwarding and duplication of this document is not permitted. All rights reserved. TABLE OF CONTENTS 1 OVERVIEW... 4 2 LIS SCREEN...

More information

Molecular Evolution and the Neutral Theory

Molecular Evolution and the Neutral Theory Molecular Evolution and the Neutral Theory 1. Observation: DNA and amino-acid sequences evolve at roughly constant rates. 2. Model: The neutral theory explains why this might be expected. 3. Application:

More information

Circuit Pilates Classes Pilates Online - Login

Circuit Pilates Classes Pilates Online - Login Circuit Pilates Classes Pilates Online - Login Mindbody Online is a program that Bellbird Sports and Spinal uses to co-ordinate all group Pilates classes In addition to these handouts you can find videos

More information

Abstract. Patricia G. Melloy*

Abstract. Patricia G. Melloy* Laboratory Exercise Using an International p53 Mutation Database as a Foundation for an Online Laboratory in an Upper Level Undergraduate Biology Class ws Patricia G. Melloy* From the Department of Biological

More information

The Molecular Evolution of Gene Birth and Death. Author: Ann Brokaw AP Biology Teacher Rocky River High School Rocky River, Ohio

The Molecular Evolution of Gene Birth and Death. Author: Ann Brokaw AP Biology Teacher Rocky River High School Rocky River, Ohio The Molecular Evolution of Gene Birth and Death Author: Ann Brokaw AP Biology Teacher Rocky River High School Rocky River, Ohio The Birth and Death of Genes To the student: The following slides provide

More information

MRC-Holland MLPA. Description version 12; 13 January 2017

MRC-Holland MLPA. Description version 12; 13 January 2017 SALSA MLPA probemix P219-B3 PAX6 Lot B3-0915: Compared to version B2 (lot B2-1111) two reference probes have been replaced and one additional reference probe has been added. In addition, one flanking probe

More information

Chemistry 107 Exam 4 Study Guide

Chemistry 107 Exam 4 Study Guide Chemistry 107 Exam 4 Study Guide Chapter 10 10.1 Recognize that enzyme catalyze reactions by lowering activation energies. Know the definition of a catalyst. Differentiate between absolute, relative and

More information