Wright State University CORE Scholar Physics Seminars Physics 11-1-2013 A Bioinformatics Method for Identifying RNA Structures within Human Cells Stephen Donald Huff Follow this and additional works at: http://corescholar.libraries.wright.edu/physics_seminars Part of the Physics Commons Repository Citation Huff, S. D. (2013). A Bioinformatics Method for Identifying RNA Structures within Human Cells.. This Presentation is brought to you for free and open access by the Physics at CORE Scholar. It has been accepted for inclusion in Physics Seminars by an authorized administrator of CORE Scholar. For more information, please contact corescholar@www.libraries.wright.edu.
A Bioinform atics Method for Identifying RNA Stru ctu res within Human Cells Stephen Huff Biological Informatics Group USAFRL - RHDJ WPAFB
Agenda Background Overview The Problem The Solution (?) Ideally, Resultant Knowledge (Bioinformatics = biology + computers)
Background - The BIG Picture The "Law s" of The "Laws" of Chem istry
Background - The Human Picture Imag es courtesy of MS Office ClipArt, per fair use
Background - The Dogma Images courtesy of MS Office ClipArt, per fair use.
Background - The Dogma Images courtesy of MS Office ClipArt, per fair use.
Background - Viral Infection Viral infection can be defined as the process of "hijacking" a host cell to change it into a VIRAL FACTORY (producing mainly viral genomes and proteins NOT host genomes and proteins).
Remember the Dogma... DNA -> RNA -> Protein (The Stuff" of Life) Background - Viral Infection A Cell -> Viral Attachment Nucleus Genomeg -^(dsdna)^ Viral Transcripti on & Translation Viral Payload Delivery Viral Budding (Release) Viral Assembly
Background - Viruses I II III IV V VI VI dsdna ssdna dsrna ssrna + + ssrna- ssdna + dsdjnia dsdn A ssrna D N A / R N dsdn A ssrna- Messenger RNA (Translates to Proteins)
Overview - Virus vs. YOU Influenza virus as example vector - ssrna-, replicates in nucleus - Genome ~ 14Kb (kilo-bases), segmented (8) - Proteome ~ 12 Human as example host - dsdna, multiple organs, tissues and cell types - Genome ~ 3Gb (giga-bases), segmented (22+1) - Proteome ~ 21,000 + 10
Overview - Curious Question SO... how can something so SMALL (virus) so completely overwhelm something so LARGE (host)??? Traditional view focuses on proteins. - 12 proteins overwhelm and undermine 21K+??? Could other phenomena affect the system? - Nucleotide phenomena. specifically, formation of secondary, tertiary (and 1 q u a rte n a ry?) stru c tu r e s
Overview - Nucleotide Pair Bonding Single-Stranded DNA/RNA T A C G A C G T C C G G T A A T A T T C G A C T ^ AGCTGA A T G C T G C A G G C C A T T A T A Single-Stranded DNA/RNA (-) Double-Stranded DNA/RNA (+/-) 12
Overview - Nucleotide Self-Bonding T. thermophila telomerase RNA The _"Pseudoknot" Multiple Hairpins" Single-stranded nucleotides free to bond... w ith ju st about A N Ything biological Images courtesy of WikiPedia, per fair use.
Overview - Interesting Possibilities Many examples of nucleotide-structure-based control have recently been discovered - Silencing RNA, micro RNA, nucleolar RNA, etc... - Exert surprising biological control (start/stop/alter protein synthesis and other functions) - Others probably exist (so called "junk" DNA) Potential for biological exploitation (therapies) is VAST but unknown 14
The Problem Given the human transcriptome/proteome (RNA -> protein), identify key control structures - Compared to what? - 21K+ candidates, each with 10K-10M + nucleotides - Permutations may be more numerous than stars in the known COSMOS (given mutational variants) Currently, this problem is intractable
The Solution (?) Viruses have been "picking biological locks" for eons - More than just proteins - nucleotide structures are probably involved, too - They have evolved to become VERY GOOD at this Use smaller viral genomes to identify keys oldviral&hostranscriptsthen F - Presumably they are RICH in such structures
The Solution - Method 1. Use evolutionary distance to choose viral candidates (proprietary software) 2. Fold viral candidates (Rosetta/ViennaRNA) 3. Fold host transcriptome (Same as 2) 4. Use GORS to identify conserved structures within virus (proprietary software) - These are likely to be vital to virus-host relation 5. Compare to human 6. Validate in wet-lab via high-throughput screens 17
The Solution Graphically Images courtesy of MS Office ClipArt, per fair use.
Solution - Fofanov-Distance Influenza Type-A as example parasite - Segmented genome, currently attenuating to human host (SOMEtimes) Some segments attenuated to human, others to avian/swine or other host(s) Temporal changes - Use of wrong segments at wrong times = noise F-Distance is novel tool for non-heuristic analysis of genomic distance, computationally intense due to exhaustive mutational analysis 19
F-Distances for Segment 5 (NP) Human Sero-Types Isolated from Human or Avian Hosts for the Type-A Influenza Virus (Human Background) Normalized Mean Genomic Distance Note: Error bars represent one standard error, +/- 20
Mean Normalized F-Distances for Segment 4 (HA) H1N1 for Human Hosts by Annual Cohorts (a) Figure I inferred isolated 2008. In 2011 d i m e CLi o H1N1 1909-2008 o H1N1 p-value : 2.6e-05 o H1N1 Adj. R-Sq.: 3.3e-01 o H1N1 Slope r GO 03 n CD CD : -8.3e-04 GO CN 03 GO CO 03 OO 03 XI 03 CD CD 03 GO r - 03 CD CD 03 CD 03 03 CD o CM lumans, q lienees 973 and a n t i g,, --------- Biological Sciences, Online May ' J '=-J 21
Segment 5 (NP) Fifty Year Timelines with Analytically Bifurcated Regressions for F-Distances of H3N2 Type A Influenza (Human Background) Mean Normalized Distance CD CD CD CD CD I D CD N - CD 0 5 O) O i CD CD CD 2008
Solution - RNA Folding ViennaRNA and Rosetta are best candidates - Open source software for Windows and Linux Accepts RNA sequences as input, uses lab-validated thermodynamics and chemistry Folds primary structure (sequence) into secondary and tertiary structures
m nf hnrlr nr ^ nvirf inn m nf hnrlr Solution - GORS Primarily, need to identify changing patterns of structures over time (within parasite) - Conservation indicates need (survival pressure) - This indicates something KEY to structures in host GORS is statistical method for such analysis May have to invent new statistical 24
Solution - Products F-Distance - R-extension (dll), overlaid by custom GUI (Windows, Mac, smart-devices) - Publication ViennaRNA/Rosetta - Custom computational pipeline GORS - R-extension (dll), GUI, pipeline, publication Final results - publication and software suite 25
Ideally - Resultant Knowledge Improved grasp of virus-host relationships - Influenza vaccine, new generation? Toxin mitigation (RHDJ focus) - Structures bond to metallic ions, small molecules, proteins, other nucleotides, etc... Exploit discovered structures for therapies Investigate biology from a new
Questions 27
Sequence Background Array IO r Binary Version Index Present A A A A A A A A A A A A A A A A 00000000000000000000000000000000 0 0 A A A A A A A A A A A A A A A T 00000000000000000000000000000001 1 0 A A A A A A A A A A A A A A A G 00000000000000000000000000000010 2 0 A A A A A A A A A A A A A A A C 00000000000000000000000000000011 3 0 A A A A A A A A A A A A A A T A 0000000000000000000000000000000100 4 1 A A A A A A A A A A A A A A T T 00000000000000000000000000000101 5 0 A A A A A A A A A A A A A A T G 00000000000000000000000000000110 6 0 A A A A A A A A A A A A A A T C 00000000000000000000000000000111 7 0 A A A A A A A A A A A A A A G A 000000000000000000000000000001000 8 1 A A A A A A A A A A A A A A G T 00000000000000000000000000001001 9 0 A A A A A A A A A A A A A A G G 00000000000000000000000000001010 10 0 A A A A A A A A A A A A A A G C 00(300000000000000000000000001011 11 0 T C C C C C C C C C C C C C C C 010011111111111111111111111111 i CO 0 G C C C C C C C C C C C C C C C A01111101111111111111111111111 4n - 2 0 C C C C C C C C C C C C C C C C 111111111111111111111111111111 4n - 1 0 1
Foreground Array IO T G A T C G C C A C G T A G C T G A A T G A T C G C C A C G T A T A C G T. (Foreground Sequence) 1 1 1 2 3 2 1 1 1 1 1 1 2 1 1 1 0. (Foreground Array) Background Array Index Present 0 0...... 410,766,922 0 410,766,923 1... 1,643,067,690 1 1,643,067,691 0 1,643,067,692 1,643,067,693 0...... 2,173,156,579 0 2,173,156,580 f \ v w 2,173,156,581 0...... 4n - 1 0 2,173,156,588 10100001111001111011110011100101 G A A T G A T C G C C A C G T A 29