CS 6824: Tissue-Based Map of the Human Proteome T. M. Murali November 17, 2016
Human Protein Atlas Measure protein and gene expression using tissue microarrays and deep sequencing, respectively. Alternative to mass spectrometry. Two parts: Quantitatively measure mrna transcripts in tissue homogenates, involving mixture of cell types. Precisely localise corresponding proteins in single cells, using immunohistochemistry.
How many tissues did they analyze? 32 or 44?
How many tissues did they analyze? 32 or 44? 44 tissues analyzed using RNA seq. 32 tissues analyzed using tissue microarrays.
How many tissues did they analyze? 32 or 44? 44 tissues analyzed using RNA seq. 32 tissues analyzed using tissue microarrays. Rest of the paper is slicing and dicing the data in different ways.
Types of Protein-Coding Genes Table 1. Classification of all human protein-coding genes based on transcript expression levels in 32 tissues. Category Description No. of genes Fraction of genes (% ) Tissue enriched mrna levels in a particular tissue at least five times those in all other tissues 2,355 12 Group enriched mrna levels at least five times those in a group of 2 7 tissues 1,109 5 Tissue enhanced mrna levels in a particular tissue at least five times average levels in all tissues 3,478 17 Expressed in all Detected in all tissues (FPKM >1) 8,874 44 Mixed Detected in fewer than 32 tissues but not elevated in any tissue 2,696 13 Not detected FPKM <1in all tissues 1,832 9 Total Total number of genes analyzed with RNAseq 20,344 100 Total elevated Total number of tissue-enriched, group-enriched, and tissue-enhanced genes 6,942 34 What is the definition of Group enriched?
Types of Protein-Coding Genes Table 1. Classification of all human protein-coding genes based on transcript expression levels in 32 tissues. Category Description No. of genes Fraction of genes (% ) Tissue enriched mrna levels in a particular tissue at least five times those in all other tissues 2,355 12 Group enriched mrna levels at least five times those in a group of 2 7 tissues 1,109 5 Tissue enhanced mrna levels in a particular tissue at least five times average levels in all tissues 3,478 17 Expressed in all Detected in all tissues (FPKM >1) 8,874 44 Mixed Detected in fewer than 32 tissues but not elevated in any tissue 2,696 13 Not detected FPKM <1in all tissues 1,832 9 Total Total number of genes analyzed with RNAseq 20,344 100 Total elevated Total number of tissue-enriched, group-enriched, and tissue-enhanced genes 6,942 34 What is the definition of Group enriched? mrna levels in a group of 2 7 tissues at least five times those in all other tissues. Is the term tissue-specific appropriate? For example, albumin is not liver-specific. It is highly expressed in the liver and in other tissues (kidney and pancreas), although much lower than the liver).
Types of Protein-Coding Genes Table 1. Classification of all human protein-coding genes based on transcript expression levels in 32 tissues. Category Description No. of genes Fraction of genes (% ) Tissue enriched mrna levels in a particular tissue at least five times those in all other tissues 2,355 12 Group enriched mrna levels at least five times those in a group of 2 7 tissues 1,109 5 Tissue enhanced mrna levels in a particular tissue at least five times average levels in all tissues 3,478 17 Expressed in all Detected in all tissues (FPKM >1) 8,874 44 Mixed Detected in fewer than 32 tissues but not elevated in any tissue 2,696 13 Not detected FPKM <1in all tissues 1,832 9 Total Total number of genes analyzed with RNAseq 20,344 100 Total elevated Total number of tissue-enriched, group-enriched, and tissue-enhanced genes 6,942 34 What is the definition of Group enriched? mrna levels in a group of 2 7 tissues at least five times those in all other tissues. Is the term tissue-specific appropriate? No, because it depends on arbitrary cut-offs. For example, albumin is not liver-specific. It is highly expressed in the liver and in other tissues (kidney and pancreas), although much lower than the liver).
What is Gene Function? Not an easy question to answer! A gene s function has many aspects. Different aspects are interesting to different biologists. There are many ways to describe a gene s function. Different groups of biologists have derived different vocabularies. A number of different functional catalogues exist: MultiFun, MIPS FunCat, structure-based (e.g., PFam/ProSite domains, SCOP), COG, EC, Uniprot...
The Gene Ontology Collaborative effort to define a controlled vocabulary to describe gene and gene product attributes in any organism.
The Gene Ontology Collaborative effort to define a controlled vocabulary to describe gene and gene product attributes in any organism. Visit http://www.geneontology.org Three Gene Ontology (GO) categories:
The Gene Ontology Collaborative effort to define a controlled vocabulary to describe gene and gene product attributes in any organism. Visit http://www.geneontology.org Three Gene Ontology (GO) categories: A gene product has a molecular function: an activity, such as catalytic or binding activity, carried out by the gene product at the molecular level;
The Gene Ontology Collaborative effort to define a controlled vocabulary to describe gene and gene product attributes in any organism. Visit http://www.geneontology.org Three Gene Ontology (GO) categories: A gene product has a molecular function: an activity, such as catalytic or binding activity, carried out by the gene product at the molecular level; is used in a biological process: a series of events accomplished by one or more ordered assemblies of molecular functions; and
The Gene Ontology Collaborative effort to define a controlled vocabulary to describe gene and gene product attributes in any organism. Visit http://www.geneontology.org Three Gene Ontology (GO) categories: A gene product has a molecular function: an activity, such as catalytic or binding activity, carried out by the gene product at the molecular level; is used in a biological process: a series of events accomplished by one or more ordered assemblies of molecular functions; and might be associated with a cellular component: a component of a cell that is part of some larger object, which may be an anatomical structure or a gene product group.
The Gene Ontology Collaborative effort to define a controlled vocabulary to describe gene and gene product attributes in any organism. Visit http://www.geneontology.org Three Gene Ontology (GO) categories: A gene product has a molecular function: an activity, such as catalytic or binding activity, carried out by the gene product at the molecular level; is used in a biological process: a series of events accomplished by one or more ordered assemblies of molecular functions; and might be associated with a cellular component: a component of a cell that is part of some larger object, which may be an anatomical structure or a gene product group. For example, the gene product Frizzled 1 can be described by the molecular function term protein binding, the biological process terms positive regulation of protein phosphorylation, and the cellular component terms plasma membrane.
Features of GO: Hierarchy A team of experts defines GO terms. GO terms are described at multiple levels of detail. Parent-child relationships between terms form a directed acyclic graph.
Advantages of GO The vocabulary is controlled common vocabulary for all biologists. Designed to apply across species. The GO terms and relationships are constantly updated. isa complete. Automated Ontology engineering (Alterovitz et al., Nat. Biotech., 2010). Freely available to the community.
Evaluating Set of Proteins Unveiled by an Experiment How do we evaluate the set of proteins expressed only in the brain? Genes in C and annotated to f C: genes expressed in the brain
Evaluating Set of Proteins Unveiled by an Experiment How do we evaluate the set of proteins expressed only in the brain? Compute functional enrichment. If there are k genes annotated to a particular GO biological process in this set of proteins, is this fact interesting? Genes in C and annotated to f C: genes expressed in the brain
Evaluating Set of Proteins Unveiled by an Experiment How do we evaluate the set of proteins expressed only in the brain? Compute functional enrichment. If there are k genes annotated to a particular GO biological process in this set of proteins, is this fact interesting? Must look at how many genes overall are annotated with that function. U: set of all genes Genes in C and annotated to f C: genes expressed in the brain Genes annotated to function f
Evaluating Set of Proteins Unveiled by an Experiment How do we evaluate the set of proteins expressed only in the brain? Compute functional enrichment. If there are k genes annotated to a particular GO biological process in this set of proteins, is this fact interesting? Must look at how many genes overall are annotated with that function. Use χ 2 test or Fisher s exact test. U: set of all genes Genes in C and annotated to f C: genes expressed in the brain Genes annotated to function f
Fisher s Exact Test C = set of genes expressed in the brain, c = #genes in the set. u = total number of genes studied. f = be the function or biological process of interest: U: set of all genes Genes in C and annotated to f C: genes expressed in the brain Genes annotated to function f
Fisher s Exact Test C = set of genes expressed in the brain, c = #genes in the set. u = total number of genes studied. f = be the function or biological process of interest: uf = #genes in the data set annotated with f. c f = #genes in the set C annotated with f. U: set of all genes Genes in C and annotated to f C: genes expressed in the brain Genes annotated to function f
Fisher s Exact Test C = set of genes expressed in the brain, c = #genes in the set. u = total number of genes studied. f = be the function or biological process of interest: uf = #genes in the data set annotated with f. c f = #genes in the set C annotated with f. Fisher s exact test answers the following question: U: set of all genes Genes in C and annotated to f C: genes expressed in the brain Genes annotated to function f If we selected c genes at random from the set of all u genes, what is the probability that we will select c f or more genes from the set of u f genes annotated with f?
Fisher s Exact Test C = set of genes expressed in the brain, c = #genes in the set. u = total number of genes studied. f = be the function or biological process of interest: uf = #genes in the data set annotated with f. c f = #genes in the set C annotated with f. Fisher s exact test answers the following question: U: set of all genes Genes in C and annotated to f C: genes expressed in the brain Genes annotated to function f If we selected c genes at random from the set of all u genes, what is the probability that we will select c f or more genes from the set of u f genes annotated with f? min(c,u f ) i=c f ( uf ) ( u uf i c i ( u c) )
Fraction of genes (%) 0 20 40 60 80 100 80 membane and secreted isoforms membrane secreted soluble membrane secreted membrane and secreted isoforms RNA-based class elevated mixed expressed in all tissues not detected Fraction of transcripts (%) 60 40 20 0 adipose tissue adrenal gland appendix bone marrow brain colon duodenum endometrium esophagus fallopian tube gallbladder heart muscle kidney liver lung lymph node ovary pancreas placenta prostate rectum salivary gland skeletal muscle skin small intestine smooth muscle spleen stomach testis thyroid gland tonsil urinary bladder
Not detected Tissue enriched Group enriched Tissue enhanced Mixed Expressed in all tissues 100 Fraction of genes (%) 80 60 40 20 0 targets for approved drugs transcription factors cancer mutation genes
Targets for approved drugs (n=618) 30 % small molecule drugs 510 38 % 83 25 biotech drugs 4 % 12 % 1 % 16% multi-pass membrane protein single-pass membrane protein multi-pass and single-pass membrane protein secreted membrane and secreted isoforms soluble, intracellular protein