Methods: Biological Data

Size: px
Start display at page:

Download "Methods: Biological Data"

Transcription

1 Transcriptome analysis of short read Illumina RNA sequencing: investigating baseline variability in gene expression levels and splice variants among human brain and Lymphoblastoid samples Abstract Understanding baseline variability in gene expression levels and splice variants is essential for interpreting many studies. Although recent advances in RNA sequencing enable the analyses of transcript variation at unprecedented resolution, not much effort has been done to assess baseline variation between individuals. Biological variability in gene expression has been shown not to be eliminated by sequencing technology 13 and we show the same accounts to splice variation. We analyzed RNA-seq data from two studies using sufficient biological variants (replicates). We found Cerebellum tissue data and Lymphoblastoid cell-line data from unrelated human individuals. We show insight in variation between individuals of Cerebellum tissue samples and Lymphoblastoid cell-line samples for differential splice variant expression and differential gene expression. In comparison to the Lymphoblastoid cell-line samples, we find for the Cerebellum tissue samples little differential expression of genes and splice variants. Also we find ~75% of the differentially expressed genes are differentially expressed by one of the samples. Although these can be biologically functional, they can also possibly represent baseline variance of the Cerebellum cells. In contrast to Cerebellum cells, we find many differentially expressed genes and a large variation in differentially expressed splice variants for Lymphoblastoid cell-line samples. We find three samples of the pool of samples to be very much alike and could be annotated as a subgroup. As Lymphoblastoid cells are stem-cell like that differentiate into three subgroups (B lymphocytes, T lymphocytes and Natural Killer cells), we suggest the three samples could, speaking of genotype, already be one of them. Our results enforce significant use of biological replicates. The desired saturation level, which is partial to sample type, should be taken into account before deciding the sequencing depth. Introduction As a deep-sequencing tool, RNA-seq is able to accurately detect many types of RNA: mrnas, small non-coding RNAs including mirnas (micro) and Piwi-interacting RNAs (pirnas)/21u-rna sequences, rrnas (ribosomal), trnas (transfer), snornas (small nucleolar). Analysis of gene expression by sequencing is highly reproducible and more sensitive than micro-arrays 2. Although the current RNA-seq-based approaches for studying srnas is limited in its ability to provide an absolutely quantitative view of the transcripts 1,2. RNA-seq has been used to interrogate transcriptomes of yeast, Caenorhabditis elegans, Drosophila melanogaster, mouse and human tissues and stem cells 3,4,5,6,7,8,9,10,11,12. The 1

2 studies show the technology enables to: find most annotated genes, splicing isoforms and alternative transcript start sites (of human brains 11 ), find many previously unknown splice variants (of embryonic and neonatal mouse cortex12), quantify gene expression levels, significantly major changes in expression during development and between males and hermaphrodites of Caenorhabditis elegans and novel mirna candidates and Piwiinteracting RNAs (pirnas)/21u-rnas 7, alternate splicing and novel transcripts in Drosophila melanogaster 8, identification of potential genes involved in stress resistance (of C. elegans 10 ) and reveal sequences and expression levels (of known and novel mirna genes involved in human embryonic stem cells 9 ). Since its initiation last decade RNA Sequencing (RNA-Seq) has with its massively parallel cdna sequencing shown to make firm advances in genomics and especially transcriptomics. Many new previously unknown coding and non-coding RNA species had been found 7,8,9,12 and we have come to appreciate more the complexity of the transcriptome. Different methodologies and new analysis tools have been developed. Unfortunately many studies using RNA-seq have only used few if any biological replicates in their study, perhaps because of the costs. A recent study shows RNA-seq does not eliminate biological variability 13 and we want to show not only accounts for gene expression but also for splice variants. In this research we will investigate technical and biological variance. The influence of sequencing depth on technical variability will also be investigated. We used data from the studies of Wang et al. 14 (2008), and Pickrell et al. 15 (2010). Both studies used high-throughput sequencing by RNA-seq using Illumina technology and provided us with data from a fair amount of biological variants. Wang et al. focus on finding genes involved in neurogenesis and Pickrell et al. on finding mechanisms underlying gene expression. With their data we will investigate differential splice variant and gene expression: 1) How do they differ from each other? 2) How do they differ among biological variants of the same cell-type or tissue? 3) Is one biological replicate as used in many studies using Next Generation RNA Sequencing enough? This study provides a critical look on RNA-seq analysis and methodology. The results will be discussed in context with technical and biological variability. Methods: Biological Data As RNA-seq is still a new tool there are not many datasets available to our choosing and information about the data is not always complete. Important for this study is that the samples are uniquely annotated to individuals and are Homo sapiens. Sample ID's are provided on the SRA website ( Favorable knowledge for further analyses are age and gender of the individuals for each sample, but not all information is available for the data sets provided and used in this study. We obtained data of seven Cerebellum samples which came from human males 14. The Lymphoblastoid samples came from a very large pool of sixty-nine Nigerian individuals 15. Sample information is shown in the next two tables. 2

3 Table 1 RNA-seq data (a,b) Basic sample information of the Cerebellum tissue samples and Lymphoblastoid cell-line samples are described. SRA is their annotation given by the online Sequence Read Archive. Number of reads depicts how 'deep' the sample is sequenced. The sample data is acquired from sequencing cdna single end reads. The number of bases per read shows the length of the reads. All reads are unrelated humans and of Cerebellum samples they are all male and the Lymphoblastoid cell-lines are from Nigerians. a b Data manipulation and non-junction mapping The uploaded FastQ files, initially in Illumina format, need to be 'groomed' with FastQ Groomer so that all data, before further manipulation, are in Sanger format. Essentially the goal is to find differentially expressed splice variants and genes. The tool TopHat 16 first maps non-junction reads using Bowtie, which is an ultra-fast short-read mapping program 17. Bowtie indexes the reference genome, human (Homo sapiens): hg 19 full, using a technique borrowed from data-compression: the Borrows-Wheeler transform 18,19. TopHat finds junctions by mapping reads to the reference in two phases. First off, all reads are mapped to the reference genome using Bowtie. See Figure 1. Bowtie takes into account that the 5' of a read contains fewer sequencing errors than the 3' end 20 and allows for so called 'multi-reads' from genes with multiple copies to be reported, but discards low-complexity reads. All reads that do not map to the reference genome are set aside as Initially UnMapped reads (IUM reads). TopHat assembles the mapped reads using the assembly module in Maq 21. 3

4 Figure 1 TopHat and Bowtie (non-)junction mapping. Reads are mapped using Bowtie. Initially UnMapped (IUM) reads are firstly set aside. Mapping read sequences are searched for flanking potential donor/acceptor splice sites with Maq. These are joined to form potential splice junctions, for which IUM reads are indexed and aligned to. Junction mapping To map the IUM reads to splice junctions, TopHat first enumerates all canonical donor and acceptor sites within island sequences (as well as their reverse complements), defined by a default conservative parameter describing the allowed coverage gap between exons. Next, it considers all pairings of these sites that could form canonical (GT-AG) introns between neighboring (but not necessarily adjacent) islands. Each possible intron is checked against the IUM reads for reads that span the splice junction. See Figure 2. In order to detect junctions without sacrificing performance and specificity, the TopHat algorithm looks for introns within islands that are deeply sequenced. It can be set more sensitive to find splice junctions in order to find more splice junctions, in expense of running time. For each junction, the average depth of a read coverage is computed for the left and right flanking regions of the junction separately. The number of alignments crossing the junction is divided by the coverage of the more deeply covered side to obtain an estimate for the minor isoform frequency. 4

5 Figure 2 Mapping IUM reads. Detecting introns within islands. In this island the intron of one splice variant is overlapped by the 5'-UTR of another transcript. In case of a lack of a large coverage gab between two exons as in this picture, TopHat will look for introns within single islands to detect junctions. Both isoforms are found in mouse brain. Creating a pool To investigate variation of splicing and gene expression we need a pool that serves as a normalized background. For each tissue, Cerebellum and Lymphoblastoid, we created a pool of all samples where each splice junction and each gene is represented, but are normalized as a group average. Transcript abundance The next step is estimating transcript abundance. We use algorithms that are not restricted by prior gene annotations and account for alternative transcription and splicing, allowing for simultaneous transcript discovery and abundance estimation for RNA-seq data. For finding the minimal set of transcript that is supported by fragment read alignments, we use a comparative transcriptome assembly algorithm. To find maximum matching, compatibilities among fragments are represented in a weighted bipartite graph 14,22. See Fig.3. Abundances are reported in FPKM (IsoForm-level relative abundance in Reads Per Kilo-base of exon model per Million mapped reads). In these units, the relative abundances of transcripts are described in terms of expected biological objects (fragments) observed from an RNA-seq experiment. Confidence intervals for estimates are obtained using a Bayesian enference method based on importance sampling from the posterior distribution. 5

6 Figure 3 Overview of CuffLinks. (a) The algorithm inputs mapped reads, as for instance by TopHat. (b) Incompatible fragments are identified and assembled with the other fragments in an overlap graph to find possible isoforms (c). The minimal set of isoforms that cover all fragments are found. (d) A statistical model estimates transcript abundances. The probability of each splice variant is estimated by incorporating the probability of the accompanied possible transcript length by annotating them to different isoforms. (e) The abundances that best explain the observed fragments are produced numerically and shown as a pie chart. Differential expression between individual samples and their respective pool By comparing FPKM values of expressed genes or isoforms differential expression is found between samples. When for example the expression of a particular gene is significantly different (significantly higher or lower FPKM values) then the gene will show up to be differentially expressed, with a False Discovery Rate (FDR) of 5%. In this study we are interested in finding differential expression of genes and splice variants of individual samples compared to a group of samples, which we call the pool of samples. For the Cerebellum tissue samples we created a pool of samples, including all the Cerebellum samples depicted in Table 1. As for the Lymphoblastoid cell-line samples we made two pools, one including the Lymphoblastoid samples 1-7, and the other including the Lymphoblastoid samples Also for these pools abundance estimates are produced 6

7 as explained in the previous paragraph. This means the FPKM values of the pools are comparable to individual samples. In this study we compare individual samples against the pool they are part of. For example: sample 1 of the Cerebellum tissue samples will be compared to find differential expression of genes or splice variants to the pool of Cerebellum tissue samples 1-7 (the sample investigated is also included in the pool it is compared to). Differential expression of splice variants and genes are annotated to loci and gene ids and estimates are given numerically. In this study we only used significant (FDR = 0.05) differentially expressed genes and spliced variants for further study. Characterization of Differential Expression For further analyses of differential expression we counted for each significantly differentially expressed splice variant or gene the number of times it is found among the samples of the same cell type. Having seven samples in a pool means a splice variant or gene can be found differentially expressed in one to seven samples. We made a Perl script to run this calculation (in supplementary information). With this information it is possible to determine how unique a differentially expressed gene or splice variant is and get an idea how these are regulated. In case of finding differential expression in all seven out of seven samples we often find (over) expression of a splice variant or gene for one sample and no expression by the other samples. Finding differential expression in six out of seven samples means one of the samples has expression levels close to the estimated group average. Finding differential expression for five out of seven means two of the samples have expression close to the estimated group average, and so on. Software Most tools (Bowtie, TopHat, CuffLinks) used in this study are part of a popular pipeline for RNA-seq data, which is provided by the free web-based (also for local use) Galaxy ( A protocol is provided with a robust default setup, but also proves freely flexible 16,23. In sense of time management datasets were acquired via the online and free Sequence Read Archive (SRA) ( Here the raw data files are uploaded in archived SRA files for free and shared use in the community. There are tools available to convert the SRA files to many different file types. Galaxy imports FastQ (Sanger) files in different formats, depending on the instrument used to obtain the reads. The included quality data indicates how certain the given bases of the reads are and this is useful for further analyses of the RNA-seq data. The FastQ files obtained from SRA were intentionally only acquired from data by Illumina instruments, to minimize technical variation. The RNA-seq data of the Cerebellum came from the study by Wang 14 and the Lymphoblastoid RNA-seq data from the study by Pickrell 15. Pools were made with SAM Tools provided by the Galaxy site by combining the BAM files, which are one of the output files of the tool TopHat. The BAM files together represent all the accepted hits of all the samples included. 7

8 Results Output For Lymphoblastoid cell-line samples we found loci where all samples differentially express splice variants compared to the samples taken together as a pool. Figure 4 shows a locus of where for all seven samples splicing variants are found differentially expressed. Finding all seven samples differentially expressed in a certain locus does not mean all samples express splice variants (or genes when looking for differential gene expression). Some samples express the splice variants and others don't. This picture shows expression only qualitatively, not quantitatively, which we will show later. Figure 4 Example of a loci on chromosome 1 of Lymphoblastoid cells with all samples finding differentially expressed splice variants This figure does not show quantitatively, but does qualitatively which transcripts (accepted hits) and splice junctions are found in which sample from the pool of seven Lymphoblastoid cell-line samples, compared to the first row which shows the accepted hits of the pooled samples accumulatively. Samples one to seven are shown here as data 11 to 17 respectively. The first row indicates the accepted hits on the reference genome of the pooled samples of Lymphoblastoid cells. Of each sample the accepted hits (black boxes) are shown mapped on the genome. A horizontal gray line between the hits depicts a possible splice junction, where the adjacent accepted hit is possibly part of a splice variant. The splice junctions found for each sample in the specified loci are shown in the next row. Note that some samples did not find splice junctions in this region of the genome (chromosome 1: 155,935, ,948,387 human genome19). 8

9 Table 2 shows when finding differential gene expression in seven samples in a particular locus this often means six samples don't have any gene expression in this locus and one sample has. Table 2 Example of differential expression of splice variants in all seven Cerebellum samples for a particular locus FPKM (IsoForm-level relative abundance in Reads Per Kilo-base of exon model per Million mapped reads) values are depicted for the seven samples of the Cerebellum for the locus where is found differential gene expression for all seven samples of the pool. The gene id is also included which can be used to find more information about this particular gene. Looking for gene on finds this is a heat shock factor as well as other information. Sometimes finding differential gene expression in seven samples means any number of the seven samples find gene expression, but all have differential expression with regard to the FPKM of the pool. Table 3 shows an example where all seven Lymphoblastoid cellline samples find differential expression in a particular locus and four of them have expression and three have not. Table 3 Example of differential expression of splice variants in all seven Lymphoblastoid samples for a particular locus FPKM (IsoForm-level relative abundance in Reads Per Kilo-base of exon model per Million mapped reads) values are depicted for the seven samples of the Lymphoblastoid for the locus where is found differential gene expression for all seven samples of the pool. The gene id is also included which can be used to find more information about this particular gene. 9

10 Total Significant Differential Splice Variants Expression There is a significant difference between the total significant differentially expressed spliced variants found in Cerebellum tissue samples and Lymphoblastoid cell-line samples, as depicted in Figure 5. Figure 5 Total Significant Differential Splice Variant Expression Total significant differential splice variant expression is shown for Cerebellum tissue samples and Lymphoblastoid cell-line samples. The pool of Cerebellum samples consists of seven individual samples, whereas we have a pool of seven but also ten samples for Lymphoblastoid samples. Among Cerebellum tissue samples we find 24 to 35 significantly differentially expressed splice variants. For Lymphoblastoid cell-line samples we are interested in how pool size will affect results for finding differential expression and the results for a pool of seven samples as well as a pool of ten Lymphoblastoid cell-line samples is depicted in Figure 5. Total significant differentially expressed splice variants are more numerous in Lymphoblastoid cell-line samples for both pools compared to Cerebellum samples. On average we found 29 in Cerebellum samples and 226 and 242 in Lymphoblastoid samples of the pool of seven samples and ten samples respectively, a factor ~8 difference. We wondered if our pools of seven samples were sufficient to represent the population. By doing so we created a pool of seven samples and a pool of ten for Lymphoblastoid cell-line samples. Figure 5 shows pool size makes a difference in the number of significantly differentially expressed splice variants but the difference is small. Figure 6 Total Significant Differential Gene Expression Total significant differential gene expression is shown for Cerebellum tissue samples and Lymphoblastoid cell-line samples. The pool of Cerebellum samples consists of seven individual samples, whereas we have a pool of seven but also ten samples for Lymphoblastoid samples. 10

11 The total number found differentially expressed genes is more numerous for Lymphoblastoid samples compared to Cerebellum. On average we found 820 in Cerebellum samples and 5022 and 7198 in Lymphoblastoid samples of the pool of seven samples and ten samples respectively, a factor ~8-9 difference. Increasing the pool from seven to ten samples doesn't make a significant difference for the samples individually, with the exception of sample 4, SRR This sample finds many more differentially expressed genes with the bigger pool, where three different Lymphoblastoid cell-line samples are added. This suggests sample 4 is proportionally much more different from the three newly introduced samples compared to the other samples in the original pool. In this study we will not go into further detail on this particular sample. Overall the conclusion is the total amount of differentially expressed genes or spliced variants found in the samples individually doesn't greatly change by increasing the pool and a pool of seven samples for both cell types suffices for this study. Discovery Rate Bias: Number of Differentially Expressed Splice Variants or Genes Found Against Total Number of Reads of the Sample The study by Wang 14 describes the importance of finding many transcripts to find a big fraction of transcribed genes. The number of reads found in the Cerebellum samples are all close to 2.5 million and on average 4 million reads are found in the Lymphoblastoid samples. The first has a standard deviation of 5% and the latter a whopping 57%. This makes it reasonable to investigate if the number of reads are enumerate enough and if it influences the results. Wang suggests a breaking point in the number of reads to finding total fraction of transcribed genes and spliced variants. They suggest a breaking point around 1 million reads to finding ~100% of transcribed genes 14. The smallest sample used in this study has 1.3 million reads (Lymphoblastoid) and should be enough, but we check to be sure. 11

12 Figure7 Discovery Rate Bias - Saturation a b c d e f (a-f) Discovery rates vary among the samples (indicated as points in the graphs), and also differs when the pool size changes. Discovery rates of finding differential expression of genes or splice variants in Lymphoblastoid or Cerebellum samples are depicted as percentages, as total amount of differential expression per sample is divided by the total reads of the sample. Discovery rate of differential expression of splice variants and genes are shown for the pool of seven Cerebellum samples (a,b), the pool of seven Lymphoblastoid samples (c,d) and the pool of ten Lymphoblastoid samples (e,f). In Figure 7 the discovery rate drop when the number of reads of the sample increases, this picture is a fairly clear for the Cerebellum discovery rates and much more clear when looking at the discovery rates found in the Lymphoblastoid pools. Important is to look at the scales of the figures. The discovery rates are much smaller for Cerebellum samples in finding differentially expressed splice variants than for finding differentially expressed splice variants or genes in Lymphoblastoid samples. This suggests the number of reads for Cerebellum samples is more sufficient than the same number of reads for Lymphoblastoid samples. Only the three Lymphoblastoid samples with the most reads have roughly the same discovery rates and therefore are equally saturated. This suggests it makes a large difference which type of cell or tissue is being sequenced to find equal 12

13 saturation levels. Saturation levels should be checked when acquiring data and researchers should decide which saturation levels they find reasonable in practical sense and account for that in their study. We find saturation levels are better for the Cerebellum samples and we expect saturation is not optimal for the Lymphoblastoid samples. Because we want to make a comparison of variability between individuals within different pools we decided to go with two pools of seven samples, one of Cerebellum tissue samples and the other of Lymphoblastoid cell-line samples. We could not make the pool of Cerebellum samples bigger as there were no extra available. We continued with the original pool of seven Lymphoblastoid samples to minimize confusion, although we would now have chosen a pool of Lymphoblastoid samples with the most reads per sample due to reasons previously explained. Luckily this would only result in swapping sample 5 with sample 8 as the three added samples to create the pool of ten are, together with sample 5, in the top 4 with least reads per sample. Differential Splice Variant Variance Further analyses of differential splicing between samples of Cerebellum tissue or Lymphoblastoid cell-lines shows biological variance in more detail. Figure 8 Occurrence of Differential Splice Variant Expression The ratio of differentially expressed splice variants of the particular cell type that is found in a particular maximum number of sample(s) is shown in the figure. Table 4 Occurrence of Differential Splice Variant Expression The ratio of differentially expressed splice variants of the particular cell type that is found in a particular maximum number of sample(s) is shown. The table also shows in numbers how many differentially expressed splice variants are found in how many samples at the most. 13

14 As shown in Figure 8 and Table 4, most differential expressed splice variants are only differentially expressed in one single sample. Cerebellum splice variants are, on average, found differentially expressed in 2 out of 7 samples, with an average of 2.2. With an average of 3.4, splice variants are often significantly differential expressed in 3 out of 7 Lymphoblastoid cell-line samples. In contrast to differentially expressed splice variants found in Cerebellum samples, differentially expressed splice variants found in Lymphoblastoid samples are also found to be differentially expressed in 6 out of 7 or all samples, in regard to the group norm. A representation of the found hits and junctions of the Lymphoblastoid samples for a gene where all seven samples differentially express splicing variants is found in Figure 4. These results suggest there to be more variation among Lymphoblastoid cells than among Cerebellum cells. Furthermore we want to know how much different the samples are to one another. The next figure and table shed more light on the relative differential splice variant expression. Figure 9 Relative Differential Splice Variant Expression among Cerebellum tissue samples and Lymphoblastoid cell-line samples The ratio of samples an average differentially expressed splice variant from the samples one to seven is found in is shown with their standard deviation error bars and average ratio. Table 5 Relative Differential Splice Variant Expression among Cerebellum tissue samples and Lymphoblastoid cell-line samples In numbers the ratio of samples an average differentially expressed splice variant from the samples one to seven is found in is shown. Figure 5 shows for samples one to seven, the expected ratio of samples an average splice variant from the sample is found in. For example, an average differential splice variant found in Cerebellum sample 1 is found in 32% of Cerebellum samples (which includes 14

15 sample 1). This would be 32% * 7 = ~2.25 samples, which makes two samples in total. Standard deviation (stdev) for Lymphoblastoid samples is more than six times bigger than the stdev of the Cerebellum samples. Differential Gene Expression Variance In contrast to Cerebellum tissue, many differentially expressed genes are found in the Lymphoblastoid samples, an average of 820 and 5022 respectively. We are interested in how these are distributed among the samples individually. See Figure 10 and Table 6. Figure 10 Occurrence of Differential Gene Expression The ratio of differentially expressed genes of the particular cell type that is found in a particular maximum number of sample(s) is shown in the figure. Table 6 Occurrence of Differential Gene Expression The table shows numerically how many differentially expressed genes are found in how many samples at the most. For both Cerebellum tissue and Lymphoblastoid cells, most differentially expressed genes are only found in one sample, which means these differential expressed genes seem to be unique for these samples. As explained earlier (Table 2), finding differential expression in seven samples often means finding expression of the gene or splice variant in just one of the samples and none for the remaining six samples. For Cerebellum tissue this means the differential expression found for all seven samples (18%) can often be attributed to only one of the seven samples, much like finding differential expression in only one of the Cerebellum samples (58%). This makes for ~75% of differentially expressed genes to be attributed by a single sample of the pool. Lymphoblastoid cells have a lower rate of differential expression being attributed by just a single sample (57%). In comparison to Cerebellum tissue samples differential expression in a particular locus is often found to be attributed by two (28%) or three 15

16 (11%) samples. Note that in a pool of seven finding differential expression of two samples in a particular locus means five of the seven samples find expression close to the FPKM of the pool. The two samples finding differentially expressed can have a FPKM significantly lower, higher or one higher and one lower than the FPKM of the pool. Figure 11 Relative Differential Gene Expression among Cerebellum tissue samples and Lymphoblastoid cell-line samples The ratio of samples an average differentially expressed gene from the samples one to seven is found in is shown with their standard deviation error bars and average ratio. Table 7 Relative Differential Gene Expression among Cerebellum tissue samples and Lymphoblastoid cell-line samples The ratio of samples an average differentially expressed gene from the samples one to seven is found in is shown numerically. The standard deviations of Cerebellum and Lymphoblastoid cells are small, 3.6% and 4.7% respectively, Figure 11 and Table 7. This suggests all individuals samples of Cerebellum are about equally different from each other; this also goes for Lymphoblastoid cells. This gives confidence the results are reliable. 16

17 Table 8 Differential Gene Expression Grouping of Lymphoblastoid Samples Shown for the different differential gene expression of Lymphoblastoid samples are how many times the groups of samples find differential gene expression together. The first three columns show results for genes where groups of two samples find differential gene expression, the next three where groups of three samples find differential gene expression and the last three where groups of four samples find differential gene expression. Note that the genes only show up in one of the columns as they are annotated by maximum number of samples that find differential gene expression as in Fig.10 and 11. Table 8 shows how many times groups of samples find differential gene expression for the same genes. For example sample 5 and 7 (5+7) find differential gene expression uniquely together for 1882 genes, for which the other samples find expression FPKM levels close to the pool FPKM. The most striking feature in the table is that ~50% of the genes that are found differentially expressed in three samples are in the group of sample For virtually each of the 1882 genes the three samples show no expression and therefore show up as differentially expressed. Also when looking at genes that are differentially spliced in two or four samples these three samples stand out the most, and clearly distinguish themselves from the other samples. Samples 1 to 4 do not seem to group together, group have a ratio of having differentially expressed genes to be uniquely ascribed to them of only 0.33% of differentially expressed genes found by in four out of seven samples. 17

18 Table 9 Differential Gene Expression Grouping of Cerebellum Samples Shown for the different differential gene expression of Cerebellum samples are how many times the groups of samples find differential gene expression together. For example sample 1 and 7 find differential gene expressions together for 28 genes, for which the other samples find expression FPKM levels close to the pool FPKM. The first three columns show results for genes where groups of two samples find differential gene expression, the next three where groups of three samples find differential gene expression and the last three where groups of four samples find differential gene expression. Note that the genes only show up in one of the columns as they are annotated by maximum number of samples that find differential gene expression as in Figure 10 and 11. Table 9 shows the same kind of data as in table 8 for the Cerebellum samples. As no figures stand out in this table, there seems to be no clear grouping of samples. 18

19 Table 10 Differential Splice Variant Expression Grouping of Lymphoblastoid cellline samples Shown for the different differential splice variant expression of Lymphoblastoid samples are how many times the groups of samples find differential gene expression together. For example sample 5 and 7 find differential splice variant expression together for 51 splice variants, for which the other samples find FPKM values close to the pool FPKM. The first three columns show results for splice variants where groups of two samples find differential gene expression, the next three where groups of three samples find differential splice variant expression and the last three where groups of four samples find differential splice variant expression. Note that the splice variants only show up in one of the columns as they are annotated by maximum number of samples that find differential splice variant expression as in Figure8 and 9. Table 10 shows as in table 8 that samples 5+7 and 5+6 group together well with regard to the same differential splice variant expression they find. Also the sample group group well when looking for differential splice variant expression for three samples. These three samples also group together well with sample 4. 19

20 Discussion For every study it s important to know how to interpret your results. When using few biological replicates in a research the results will probably tell more about the individual samples than about the species in general term. As sequencing provides detailed information about samples baseline variation will play a part in the variation of the results, but also technical variation needs to be investigated properly to provide a context for the reliability of the results. A recent study has shown biological variability in gene expression is not eliminated by sequencing technology 13, but not much research has been done in baseline variability. In this research we want to contribute to understand more how this affects results in Next-Gen Sequencing. In this study we want to understand how technical variation attributes to our results. The methodology of our research will be assessed and we look for baseline variation in Cerebellum tissue samples and Lymphoblastoid cell-line samples. We do this by investigating in detail the differential expression of genes and splice variants between samples and pools of samples, and also compare how a pool of Cerebellum samples compares to a pool of Lymphoblastoid samples. We also look how different pool sizes affect our results and how sequencing depth and saturation levels correlate. Quantification and qualification A lot of data about differential gene expression and spliced variants has been obtained by Next-Gen Sequencing with the Illumina instrument by the two studies. This study shows individual samples show detail of the state they are sampled in and reads, differentially expressed splice variants and differentially expressed genes need to be put into biological context. A single sample gives enough data in quantity, but for quality and quantity analyses context is highly important and biological replicates should be included in all studies. Technical Variation Transcriptomes of different tissues were already known to be highly variable within the same individual, but this study shows also individual samples have many unique characteristics when looking into detail of the differentially expressed splice variants and the differentially expressed genes. We tried to minimize as much as possible the technical variation by choosing our data mindfully; the samples we use are all deeply sequenced: more than 1 million reads per sample 14, and produced by Illumina instruments. But we nevertheless find saturation levels of our samples to vary and not the exclude technical variation as hoped. Also read depth of the samples varies from 1.2M to 7.8M reads per sample and possibly introduce technical variation. Saturation The discovery rates in Fig.7 show the discovery rates drop when increasing the number of reads of the samples. The read count of the samples is not the only variable in saturation, also the type of tissue or sample that is investigated makes a difference. To find equal levels of saturation one would need many more reads for Lymphoblastoid samples than for Cerebellum samples. We suggest this is due to larger biological variability of the Lymphoblastoid samples. When comparing different tissues researchers 20

21 should investigate saturation levels of their samples and decide which saturation levels they want for their samples, and decide with that information the sample sizes for each tissue individually, and keep read counts for individuals of the same tissue or cells the same. Mind that read counts should surpass the wanted threshold for finding genes as suggested by Wang et al 14. Total Significant Differential Expression We find many more differential expression of splice variants and genes for Lymphoblastoid cell-line samples than for Cerebellum tissue samples, shown in table 1. This also accounts to samples with about the same amount of reads. Even the Lymphoblastoid samples (5, 9, 10) which have fewer reads than the Cerebellum samples have many times over more differential expression found. The Lymphoblastoid samples with the most reads even find fewer differential expression in comparison to the other Lymphoblastoid samples, for example samples 1, 2 and 3 against samples 5, 6 and 7. Pool Size Pool size seems to matter little when comparing a pool of seven Lymphoblastoid samples to a pool of ten Lymphoblastoid samples. One sample, sample 4, does clearly find more differential gene expression with the larger pool. We decided to continue the study with the original group of seven Lymphoblastoid samples, as this is also the size of the Cerebellum pool and we can not increase this pool due to lack of available data. We suggest that sample 4 is, relative to the other six samples in the pool of seven, more different from the newly introduced samples in the larger pool, and we therefore find for this sample more differential gene expression. Differential Splice Variant Expression Figures 6, 8 and 9 show Lymphoblastoid samples to have more individual variability than Cerebellum samples, as total differential splice variant expression is substantially higher, although differentially expressed splice variants are less unique for the samples individually. The Lymphoblastoid samples also show to vary in the ratio of samples an average differentially spliced variant from the samples are found in. This suggests some samples are more like one another than the other. For the Lymphoblastoid samples we go into further investigation for this matter, but for Cerebellum cells we will not, as the total amount of differentially expressed splice variants found are too few. In table 10 we find grouping of Lymphoblastoid samples. Samples 5, 6 and 7 seem to find, relatively to the other groups, quite a few differentially expressed splice variants for the same splice variants. When investigating differential gene expression into detail this becomes more clear. Differential Gene Expression The Cerebellum samples find fewer differentially expressed genes than the Lymphoblastoid samples and also the differentially expressed genes can be uniquely attributed for ~75% to individual samples. In total the Cerebellum samples find about one-tenth (2212) differentially expressed genes in comparison to the Lymphoblastoid samples (21695), with a FDR of 0.05%. This suggests the Cerebellum samples are less 21

22 biologically variable than the Lymphoblastoid samples. Also when looking into detail in table 9 there do not appear to be any subgroups. Things are different for the Lymphoblastoid samples, not only do they find many more differentially expressed genes than the Cerebellum tissue samples, table 8 shows three samples to be much alike, sample 5, 6 and 7. Unfortunately we do not find another group clearly among these samples. If samples 1 to 4 would make up one group we would expect combinations of these samples to find a large ratio of differentially expressed genes together in subgroups, but this we don't find. Possibly these remaining four samples vary too much, genotipically speaking, to stand out in this table. Biological context might explain the biological variance we find in the Lymphoblastoid samples. They are known as cells with stem-cell like behavior that divert into three different types of cells: B lymphocytes, T lymphocytes and Natural Killer cells (Large granular lymphocytes) during Lymphopoiesis 23. The cells of the samples could already be into motion into dividing into one of these cells, maybe only genotipically. Further studies In this study we only investigated Lymphoblastoid cells and Cerebellum tissue, so our suggestions are, next to known biological context, mainly based on comparison between the two. Further study should include more different cell types and at least as many, preferable more, biological replicates per type. Sequencing depth should be in focus with the desired saturation of finding differential expression. Conclusions We have found a higher biological variance in the Lymphoblastoid cell-line samples than in the Cerebellum tissue samples. Arguably Cerebellum cells are subject of more regulation as we find ~75% of differential gene expression to be attributed by just one of the samples, noting that total differential gene expression is about one-tenth of that is found in the Lymphoblastoid samples. For the Cerebellum tissue samples this gives an idea about baseline variability, although we cannot investigate functional variability of the differential expression found in these samples individually in this study. The Lymphoblastoid cells in our sample are less alike, although three of the seven samples show a large comparison in differential expression of genes and splice variants, samples 5, 6 and 7. These three samples found the most sequencing depth of all samples and possibly therefore find most differential expression together because of this, confirming technical variance plays a significant role in RNA sequencing. Another possibility is that this variance can be explained by the biological context that Lymphoblastoid cells are stem-like. They normally divide into three subgroups: B lymphocytes, T lymphocytes and Natural Killer cells. Because of this biological context that possibly plays a role in the Lymphoblastoid samples we cannot go into detail of baseline variability of these cells. Concluding this research we find technical variance and biological variance can not be distinguished by the particular set-up of our study. We find sequencing read depth should 22

23 be adjusted specifically to cell-type to provide equal saturation levels for all samples. Also biological replicates should be enumerate enough to provide general conclusions as possibly subgroups are to be found as we did among the Lymphoblastoid cell-line samples. 23

24 Supplementary information Supplementary 1 Perl script for characterization of differential expression open(file, "data.txt") or die "could not open file!\n"; my $count = 0; my $cnt_line = 0; = (); my $new_line; LINE: while(<file>){ chomp; push(@same, $_) if $. == 1; if ( $_ == $same[-1] ){ push(@same, $_); $cnt_line++; }else{ foreach my $temp (@same){ print $temp. "\t". $cnt_line. "\n"; = (); $cnt_line = 1; push(@same, $_); } # $count++; # if ($count == 30){ # last LINE; # } } This is the Perl script used to calculate the characterization of differential expression. The script takes as input a value, in our case we used as input the gene_id number, and sets the default count to 1. Then it checks if the next row has the same gene_id, if so it adds 1 to the count, if not it will add the count to the rows counted. Then it will go to the next row, set the default count to 1 and start counting again and continue like this through the file. At the end of the file when the script starts at the last row with counting it will fail to read the next row (as there is not any) and so this row should be checked and counted manually if needed. 24

25 References 1 Ozsolak F., Milos P. M. RNA sequencing: advances, challenges and opportunities. Nature Genetics 12, (2010) 2 Marioni J. C., et al. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18, (2009) 3 Mortazavi A., Williams B. A., McCue K., Schaeffer L., Wold B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods 5, (2008) 4 Nagalakshmi U, et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, (2008) 5 Sultan M., et al., A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321, (2008) 6 Wilhelm B. T., et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resoultion. Nature 453, (2008) 7 Kato M., Lencastre de A., Pincus Z., Slack F. J. Dynamic expression of small non-coding RNAs, including novel micrornas and pirnas/21u-rnas, during Caenorhabditis elegans development. Genome Biology 10, R54 (2009) 8 Daines B., Wang H., Wang L., et al. The Drosophila melanogaster transcriptome by paired-end RNA sequencing. Genome Res 21, (2011) 9 Morin R. D., O'Connor M. D., Griffith M., et al. Application of massively parallel sequencing to microrna profiling and discovery in human embryonic stem cells. Genome Res 18, (2008) 10 Shin H., Lee H., Fejes A. P., Baillie D. L., Koo H., Jones S. J. M. Gene expression profiling of oxidative stress response of C. elegans aging defective AMPK mutants using massively parallel transcriptome sequencing. BMC Research 4, 34 (2011) 11 Twine N. A., Janitz K., Wilkins M. R., Janitz M. Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer's disease. PloS ONE 6(1), e16266 (2011) 12 Han X., Wu X., Chung W., Li T., Nekrutenko A., Altman N. S., Chen G., Ma H. Transcriptome of embryonic and neonatal mouse cortex by high-throughput RNA sequencing. PNAS 106(31), (2009) 13 Hansen K. D., Wu Z., Irizarry R. A., Leek J. T. Sequencing technology does not eliminate biological variability. Nature Biotechnology 29(7), (2011) 14 Wang, E. T., Sandberg R., Luo S., Khrebtukova I., Zhang L., Mayr C., Kingsmore S. F., Schrth G. P., Burge C. B. Alternative isoform regulation in human tissue transcriptomes. Nature 456, (2008) 15 Pickrell J. K., Marioni J. C., Pai A. A., Degner J. F., Engelhardt B. E., Nkadori E., Veyrieras J, Stephens M., Gilad Y., Pritchard J. K. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, (2010) 16 Trapnell C., Pachter L., Salzberg S. L. TopHat: discovering splice junctions with RNA-seq. BioInformatics 25(9), (2009) 17 Langmead B., et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009) 18 Burrows M., Wheeler D. A block sorting lossless data compression lagorithm. Technical Report 124, DEC, Digital Systems Research Center, Palo Alto, California (1994) 19 Ferragina P., Manzini G. An experimental study of an opportunistic index. Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms. Washington, D. C. USA, (2001) 20 Hiller L. W., et al. Whole-genome sequencing and variant discovery in C. elegans. Nat. Meth. 5, (2008) 21 Li H., et al. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, (2008) 22 Haas, B.J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, (2003) 23 Wikipedia, Lymphopoeisis, (as of June 17, 2011, 16:04 GMT) 25

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq) RNA sequencing (RNA-seq) Module Outline MO 13-Mar-2017 RNA sequencing: Introduction 1 WE 15-Mar-2017 RNA sequencing: Introduction 2 MO 20-Mar-2017 Paper: PMID 25954002: Human genomics. The human transcriptome

More information

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,

More information

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by

More information

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies

More information

Transcriptome Analysis

Transcriptome Analysis Transcriptome Analysis Data Preprocessing Sample Preparation Illumina Sequencing Demultiplexing Raw FastQ Reference Genome (fasta) Reference Annotation (GTF) Reference Genome Analysis Tophat Accepted hits

More information

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

DNA Sequence Bioinformatics Analysis with the Galaxy Platform DNA Sequence Bioinformatics Analysis with the Galaxy Platform University of São Paulo, Brazil 28 July - 1 August 2014 Dave Clements Johns Hopkins University Robson Francisco de Souza University of São

More information

Supplementary Material for IPred - Integrating Ab Initio and Evidence Based Gene Predictions to Improve Prediction Accuracy

Supplementary Material for IPred - Integrating Ab Initio and Evidence Based Gene Predictions to Improve Prediction Accuracy 1 SYSTEM REQUIREMENTS 1 Supplementary Material for IPred - Integrating Ab Initio and Evidence Based Gene Predictions to Improve Prediction Accuracy Franziska Zickmann and Bernhard Y. Renard Research Group

More information

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological

More information

SpliceDB: database of canonical and non-canonical mammalian splice sites

SpliceDB: database of canonical and non-canonical mammalian splice sites 2001 Oxford University Press Nucleic Acids Research, 2001, Vol. 29, No. 1 255 259 SpliceDB: database of canonical and non-canonical mammalian splice sites M.Burset,I.A.Seledtsov 1 and V. V. Solovyev* The

More information

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures : RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures March 15, 2013 CLC bio Finlandsgade 10-12 8200 Aarhus N Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com support@clcbio.com

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford

Lecture 8 Understanding Transcription RNA-seq analysis. Foundations of Computational Systems Biology David K. Gifford Lecture 8 Understanding Transcription RNA-seq analysis Foundations of Computational Systems Biology David K. Gifford 1 Lecture 8 RNA-seq Analysis RNA-seq principles How can we characterize mrna isoform

More information

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro.

BIMM 143. RNA sequencing overview. Genome Informatics II. Barry Grant. Lecture In vivo. In vitro. RNA sequencing overview BIMM 143 Genome Informatics II Lecture 14 Barry Grant http://thegrantlab.org/bimm143 In vivo In vitro In silico ( control) Goal: RNA quantification, transcript discovery, variant

More information

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Accessing and Using ENCODE Data Dr. Peggy J. Farnham 1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000

More information

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

Ambient temperature regulated flowering time

Ambient temperature regulated flowering time Ambient temperature regulated flowering time Applications of RNAseq RNA- seq course: The power of RNA-seq June 7 th, 2013; Richard Immink Overview Introduction: Biological research question/hypothesis

More information

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene

More information

Transcript reconstruction

Transcript reconstruction Transcript reconstruction Summary I Data types, file formats and utilities Annotation: Genomic regions Genes Peaks bedtools Alignment: Map reads BAM/SAM Samtools Aggregation: Summary files Wig (UCSC) TDF

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 Frequency of alternative-cassette-exon engagement with the ribosome is consistent across data from multiple human cell types and from mouse stem cells. Box plots showing AS frequency

More information

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples

Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples DNA CLONING DNA AMPLIFICATION & PCR EPIGENETICS RNA ANALYSIS Selective depletion of abundant RNAs to enable transcriptome analysis of lowinput and highly-degraded RNA from FFPE breast cancer samples LIBRARY

More information

RNA-seq Introduction

RNA-seq Introduction RNA-seq Introduction DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different functional RNAs Which RNAs (and sometimes then translated

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis Jian Xu, Ph.D. Children s Research Institute, UTSW Introduction Outline Overview of genomic and next-gen sequencing technologies

More information

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University.

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University. Databases and Tools for High Throughput Sequencing Analysis P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( ); g ( ) Bioinformatics Center, Chang Gung University. HTseq Platforms Applications on Biomedical Sciences

More information

Obstacles and challenges in the analysis of microrna sequencing data

Obstacles and challenges in the analysis of microrna sequencing data Obstacles and challenges in the analysis of microrna sequencing data (mirna-seq) David Humphreys Genomics core Dr Victor Chang AC 1936-1991, Pioneering Cardiothoracic Surgeon and Humanitarian The ABCs

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

Phylogenomics. Antonis Rokas Department of Biological Sciences Vanderbilt University.

Phylogenomics. Antonis Rokas Department of Biological Sciences Vanderbilt University. Phylogenomics Antonis Rokas Department of Biological Sciences Vanderbilt University http://as.vanderbilt.edu/rokaslab High-Throughput DNA Sequencing Technologies 454 / Roche 450 bp 1.5 Gbp / day Illumina

More information

Daehwan Kim September 2018

Daehwan Kim September 2018 Daehwan Kim September 2018 Michael L. Rosenberg Assistant Professor, CPRIT Scholar (214) 645-1738 Lyda Hill Department of Bioinformatics infphilo@gmail.com University of Texas Southwestern Medical Center

More information

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant

More information

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. SUPPLEMENTAL FIGURE AND TABLE LEGENDS Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. A) Cirbp mrna expression levels in various mouse tissues collected around the clock

More information

ChIP-seq data analysis

ChIP-seq data analysis ChIP-seq data analysis Harri Lähdesmäki Department of Computer Science Aalto University November 24, 2017 Contents Background ChIP-seq protocol ChIP-seq data analysis Transcriptional regulation Transcriptional

More information

Small RNAs and how to analyze them using sequencing

Small RNAs and how to analyze them using sequencing Small RNAs and how to analyze them using sequencing RNA-seq Course November 8th 2017 Marc Friedländer ComputaAonal RNA Biology Group SciLifeLab / Stockholm University Special thanks to Jakub Westholm for

More information

Simple, rapid, and reliable RNA sequencing

Simple, rapid, and reliable RNA sequencing Simple, rapid, and reliable RNA sequencing RNA sequencing applications RNA sequencing provides fundamental insights into how genomes are organized and regulated, giving us valuable information about the

More information

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Supplemental Data Integrating omics and alternative splicing i reveals insights i into grape response to high temperature Jianfu Jiang 1, Xinna Liu 1, Guotian Liu, Chonghuih Liu*, Shaohuah Li*, and Lijun

More information

High-throughput transcriptome sequencing

High-throughput transcriptome sequencing High-throughput transcriptome sequencing Erik Kristiansson (erik.kristiansson@zool.gu.se) Department of Zoology Department of Neuroscience and Physiology University of Gothenburg, Sweden Outline Genome

More information

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009

Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009 Annotation of Chimp Chunk 2-10 Jerome M Molleston 5/4/2009 1 Abstract A stretch of chimpanzee DNA was annotated using tools including BLAST, BLAT, and Genscan. Analysis of Genscan predicted genes revealed

More information

RNA SEQUENCING AND DATA ANALYSIS

RNA SEQUENCING AND DATA ANALYSIS RNA SEQUENCING AND DATA ANALYSIS Length of mrna transcripts in the human genome 5,000 5,000 4,000 3,000 2,000 4,000 1,000 0 0 200 400 600 800 3,000 2,000 1,000 0 0 2,000 4,000 6,000 8,000 10,000 Length

More information

MODULE 3: TRANSCRIPTION PART II

MODULE 3: TRANSCRIPTION PART II MODULE 3: TRANSCRIPTION PART II Lesson Plan: Title S. CATHERINE SILVER KEY, CHIYEDZA SMALL Transcription Part II: What happens to the initial (premrna) transcript made by RNA pol II? Objectives Explain

More information

Introduction. Introduction

Introduction. Introduction Introduction We are leveraging genome sequencing data from The Cancer Genome Atlas (TCGA) to more accurately define mutated and stable genes and dysregulated metabolic pathways in solid tumors. These efforts

More information

Predominant contribution of cis-regulatory divergence in the evolution of mouse alternative splicing

Predominant contribution of cis-regulatory divergence in the evolution of mouse alternative splicing Molecular Systems Biology Peer Review Process File Predominant contribution of cis-regulatory divergence in the evolution of mouse alternative splicing Mr. Qingsong Gao, Wei Sun, Marlies Ballegeer, Claude

More information

Inference of Isoforms from Short Sequence Reads

Inference of Isoforms from Short Sequence Reads Inference of Isoforms from Short Sequence Reads Tao Jiang Department of Computer Science and Engineering University of California, Riverside Tsinghua University Joint work with Jianxing Feng and Wei Li

More information

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs ChIP-seq hands-on Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs Main goals Becoming familiar with essential tools and formats Visualizing and contextualizing raw data Understand

More information

Assignment 5: Integrative epigenomics analysis

Assignment 5: Integrative epigenomics analysis Assignment 5: Integrative epigenomics analysis Due date: Friday, 2/24 10am. Note: no late assignments will be accepted. Introduction CpG islands (CGIs) are important regulatory regions in the genome. What

More information

An Analysis of MDM4 Alternative Splicing and Effects Across Cancer Cell Lines

An Analysis of MDM4 Alternative Splicing and Effects Across Cancer Cell Lines An Analysis of MDM4 Alternative Splicing and Effects Across Cancer Cell Lines Kevin Hu Mentor: Dr. Mahmoud Ghandi 7th Annual MIT PRIMES Conference May 2021, 2017 Outline Introduction MDM4 Isoforms Methodology

More information

For all of the following, you will have to use this website to determine the answers:

For all of the following, you will have to use this website to determine the answers: For all of the following, you will have to use this website to determine the answers: http://blast.ncbi.nlm.nih.gov/blast.cgi We are going to be using the programs under this heading: Answer the following

More information

IPA Advanced Training Course

IPA Advanced Training Course IPA Advanced Training Course October 2013 Academia sinica Gene (Kuan Wen Chen) IPA Certified Analyst Agenda I. Data Upload and How to Run a Core Analysis II. Functional Interpretation in IPA Hands-on Exercises

More information

Nature Biotechnology: doi: /nbt.1904

Nature Biotechnology: doi: /nbt.1904 Supplementary Information Comparison between assembly-based SV calls and array CGH results Genome-wide array assessment of copy number changes, such as array comparative genomic hybridization (acgh), is

More information

Exercises: Differential Methylation

Exercises: Differential Methylation Exercises: Differential Methylation Version 2018-04 Exercises: Differential Methylation 2 Licence This manual is 2014-18, Simon Andrews. This manual is distributed under the creative commons Attribution-Non-Commercial-Share

More information

GeneOverlap: An R package to test and visualize

GeneOverlap: An R package to test and visualize GeneOverlap: An R package to test and visualize gene overlaps Li Shen Contact: li.shen@mssm.edu or shenli.sam@gmail.com Icahn School of Medicine at Mount Sinai New York, New York http://shenlab-sinai.github.io/shenlab-sinai/

More information

Circular RNAs (circrnas) act a stable mirna sponges

Circular RNAs (circrnas) act a stable mirna sponges Circular RNAs (circrnas) act a stable mirna sponges cernas compete for mirnas Ancestal mrna (+3 UTR) Pseudogene RNA (+3 UTR homolgy region) The model holds true for all RNAs that share a mirna binding

More information

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz Software Manual Institute of Bioinformatics, Johannes Kepler University Linz cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University

More information

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Not IN Our Genes - A Different Kind of Inheritance! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Epigenetics in Mainstream Media Epigenetics *Current definition:

More information

Lectures 13: High throughput sequencing: Beyond the genome. Spring 2017 March 28, 2017

Lectures 13: High throughput sequencing: Beyond the genome. Spring 2017 March 28, 2017 Lectures 13: High throughput sequencing: Beyond the genome Spring 2017 March 28, 2017 h@p://www.fejes.ca/2009/06/science- cartoons- 5- rna- seq.html Omics Transcriptome - the set of all mrnas present in

More information

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)

fl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b) KRas;At KRas;At KRas;At KRas;At a b Supplementary Figure 1. Gene set enrichment analyses. (a) GO gene sets (MSigDB v3. c5) enriched in KRas;Atg5 fl/+ as compared to KRas;Atg5 fl/fl tumors using gene set

More information

Analyse de données de séquençage haut débit

Analyse de données de séquençage haut débit Analyse de données de séquençage haut débit Vincent Lacroix Laboratoire de Biométrie et Biologie Évolutive INRIA ERABLE 9ème journée ITS 21 & 22 novembre 2017 Lyon https://its.aviesan.fr Sequencing is

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

omiras: MicroRNA regulation of gene expression

omiras: MicroRNA regulation of gene expression omiras: MicroRNA regulation of gene expression Sören Müller, Goethe University of Frankfurt am Main Molecular Bioinformatics Group, Institute of Computer Science Plant Molecular Biology Group, Institute

More information

A Statistical Framework for Classification of Tumor Type from microrna Data

A Statistical Framework for Classification of Tumor Type from microrna Data DEGREE PROJECT IN MATHEMATICS, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2016 A Statistical Framework for Classification of Tumor Type from microrna Data JOSEFINE RÖHSS KTH ROYAL INSTITUTE OF TECHNOLOGY

More information

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits Next-generation performance in liquid biopsies 2 Accelerating clinical research From liquid biopsy to next-generation

More information

MicroRNA in Cancer Karen Dybkær 2013

MicroRNA in Cancer Karen Dybkær 2013 MicroRNA in Cancer Karen Dybkær RNA Ribonucleic acid Types -Coding: messenger RNA (mrna) coding for proteins -Non-coding regulating protein formation Ribosomal RNA (rrna) Transfer RNA (trna) Small nuclear

More information

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Here we compare the results of this study to potentially overlapping results from four earlier studies

More information

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG)

MEDICAL GENOMICS LABORATORY. Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG) Next-Gen Sequencing and Deletion/Duplication Analysis of NF1 Only (NF1-NG) Ordering Information Acceptable specimen types: Fresh blood sample (3-6 ml EDTA; no time limitations associated with receipt)

More information

Sebastian Jaenicke. trnascan-se. Improved detection of trna genes in genomic sequences

Sebastian Jaenicke. trnascan-se. Improved detection of trna genes in genomic sequences Sebastian Jaenicke trnascan-se Improved detection of trna genes in genomic sequences trnascan-se Improved detection of trna genes in genomic sequences 1/15 Overview 1. trnas 2. Existing approaches 3. trnascan-se

More information

Genomic structural variation

Genomic structural variation Genomic structural variation Mario Cáceres The new genomic variation DNA sequence differs across individuals much more than researchers had suspected through structural changes A huge amount of structural

More information

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1 Lecture 27: Systems Biology and Bayesian Networks Systems Biology and Regulatory Networks o Definitions o Network motifs o Examples

More information

Nature Structural & Molecular Biology: doi: /nsmb.2419

Nature Structural & Molecular Biology: doi: /nsmb.2419 Supplementary Figure 1 Mapped sequence reads and nucleosome occupancies. (a) Distribution of sequencing reads on the mouse reference genome for chromosome 14 as an example. The number of reads in a 1 Mb

More information

Eukaryotic small RNA Small RNAseq data analysis for mirna identification

Eukaryotic small RNA Small RNAseq data analysis for mirna identification Eukaryotic small RNA Small RNAseq data analysis for mirna identification P. Bardou, C. Gaspin, S. Maman, J. Mariette, O. Rué, M. Zytnicki INRA Sigenae Toulouse INRA MIA Toulouse GenoToul Bioinfo INRA MaIAGE

More information

Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities

Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities Gábor Boross,2, Katalin Orosz,2 and Illés J. Farkas 2, Department of Biological Physics,

More information

Benjamin T. Langmead September, 2012

Benjamin T. Langmead September, 2012 Benjamin T. Langmead September, 2012 Assistant Professor Department of Computer Science Johns Hopkins University 3400 North Charles St Baltimore, MD 21218-2682 (410) 516-2033 (Office) (443) 928-8048 (Cell)

More information

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( 鄧致剛 ); g ( 黄栢榕 ) Bioinformatics Center, Chang Gung University.

P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( 鄧致剛 ); g ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Small RNA High Throughput Sequencing Analysis I P. Tang ( 鄧致剛 ); PJ Huang ( 黄栢榕 ) g( 鄧致剛 ); g ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Prominent members of the RNA family Classic RNAs mediating

More information

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6 Alternative splicing Biosciences 741: Genomics Fall, 2013 Week 6 Function(s) of RNA splicing Splicing of introns must be completed before nuclear RNAs can be exported to the cytoplasm. This led to early

More information

Trinity: Transcriptome Assembly for Genetic and Functional Analysis of Cancer [U24]

Trinity: Transcriptome Assembly for Genetic and Functional Analysis of Cancer [U24] Trinity: Transcriptome Assembly for Genetic and Functional Analysis of Cancer [U24] ITCR meeting, June 2016 The Cancer Transcriptome A window into the (expressed) genetic and epigenetic state of a tumor

More information

SUPPLEMENTAL INFORMATION

SUPPLEMENTAL INFORMATION SUPPLEMENTAL INFORMATION GO term analysis of differentially methylated SUMIs. GO term analysis of the 458 SUMIs with the largest differential methylation between human and chimp shows that they are more

More information

Results. Abstract. Introduc4on. Conclusions. Methods. Funding

Results. Abstract. Introduc4on. Conclusions. Methods. Funding . expression that plays a role in many cellular processes affecting a variety of traits. In this study DNA methylation was assessed in neuronal tissue from three pigs (frontal lobe) and one great tit (whole

More information

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells

Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells Histone Modifications Are Associated with Transcript Isoform Diversity in Normal and Cancer Cells Ondrej Podlaha 1, Subhajyoti De 2,3,4, Mithat Gonen 5, Franziska Michor 1 * 1 Department of Biostatistics

More information

RNA- seq Introduc1on. Promises and pi7alls

RNA- seq Introduc1on. Promises and pi7alls RNA- seq Introduc1on Promises and pi7alls DNA is the same in all cells but which RNAs that is present is different in all cells There is a wide variety of different func1onal RNAs Which RNAs (and some1mes

More information

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017

Epigenetics. Jenny van Dongen Vrije Universiteit (VU) Amsterdam Boulder, Friday march 10, 2017 Epigenetics Jenny van Dongen Vrije Universiteit (VU) Amsterdam j.van.dongen@vu.nl Boulder, Friday march 10, 2017 Epigenetics Epigenetics= The study of molecular mechanisms that influence the activity of

More information

RNA SEQUENCING AND DATA ANALYSIS

RNA SEQUENCING AND DATA ANALYSIS RNA SEQUENCING AND DATA ANALYSIS Download slides and package http://odin.mdacc.tmc.edu/~rverhaak/package.zip http://odin.mdacc.tmc.edu/~rverhaak/rna-seqlecture.zip Overview Introduction into the topic

More information

Table S1. Relative abundance of AGO1/4 proteins in different organs. Table S2. Summary of smrna datasets from various samples.

Table S1. Relative abundance of AGO1/4 proteins in different organs. Table S2. Summary of smrna datasets from various samples. Supplementary files Table S1. Relative abundance of AGO1/4 proteins in different organs. Table S2. Summary of smrna datasets from various samples. Table S3. Specificity of AGO1- and AGO4-preferred 24-nt

More information

Chip Seq Peak Calling in Galaxy

Chip Seq Peak Calling in Galaxy Chip Seq Peak Calling in Galaxy Chris Seward PowerPoint by Pei-Chen Peng Chip-Seq Peak Calling in Galaxy Chris Seward 2018 1 Introduction This goals of the lab are as follows: 1. Gain experience using

More information

High AU content: a signature of upregulated mirna in cardiac diseases

High AU content: a signature of upregulated mirna in cardiac diseases https://helda.helsinki.fi High AU content: a signature of upregulated mirna in cardiac diseases Gupta, Richa 2010-09-20 Gupta, R, Soni, N, Patnaik, P, Sood, I, Singh, R, Rawal, K & Rani, V 2010, ' High

More information

VIP: an integrated pipeline for metagenomics of virus

VIP: an integrated pipeline for metagenomics of virus VIP: an integrated pipeline for metagenomics of virus identification and discovery Yang Li 1, Hao Wang 2, Kai Nie 1, Chen Zhang 1, Yi Zhang 1, Ji Wang 1, Peihua Niu 1 and Xuejun Ma 1 * 1. Key Laboratory

More information

Supplement to SCnorm: robust normalization of single-cell RNA-seq data

Supplement to SCnorm: robust normalization of single-cell RNA-seq data Supplement to SCnorm: robust normalization of single-cell RNA-seq data Supplementary Note 1: SCnorm does not require spike-ins, since we find that the performance of spike-ins in scrna-seq is often compromised,

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality. Supplementary Figure 1 Assessment of sample purity and quality. (a) Hematoxylin and eosin staining of formaldehyde-fixed, paraffin-embedded sections from a human testis biopsy collected concurrently with

More information

IMPaLA tutorial.

IMPaLA tutorial. IMPaLA tutorial http://impala.molgen.mpg.de/ 1. Introduction IMPaLA is a web tool, developed for integrated pathway analysis of metabolomics data alongside gene expression or protein abundance data. It

More information

SUPPLEMENTARY FIGURES: Supplementary Figure 1

SUPPLEMENTARY FIGURES: Supplementary Figure 1 SUPPLEMENTARY FIGURES: Supplementary Figure 1 Supplementary Figure 1. Glioblastoma 5hmC quantified by paired BS and oxbs treated DNA hybridized to Infinium DNA methylation arrays. Workflow depicts analytic

More information

Data mining with Ensembl Biomart. Stéphanie Le Gras

Data mining with Ensembl Biomart. Stéphanie Le Gras Data mining with Ensembl Biomart Stéphanie Le Gras (slegras@igbmc.fr) Guidelines Genome data Genome browsers Getting access to genomic data: Ensembl/BioMart 2 Genome Sequencing Example: Human genome 2000:

More information

Transcriptome and isoform reconstruc1on with short reads. Tangled up in reads

Transcriptome and isoform reconstruc1on with short reads. Tangled up in reads Transcriptome and isoform reconstruc1on with short reads Tangled up in reads Topics of this lecture Mapping- based reconstruc1on methods Case study: The domes1c dog De- novo reconstruc1on method Trinity

More information

The Epigenome Tools 2: ChIP-Seq and Data Analysis

The Epigenome Tools 2: ChIP-Seq and Data Analysis The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1 Outline Epigenome: basics review ChIP-seq overview

More information

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles

ChromHMM Tutorial. Jason Ernst Assistant Professor University of California, Los Angeles ChromHMM Tutorial Jason Ernst Assistant Professor University of California, Los Angeles Talk Outline Chromatin states analysis and ChromHMM Accessing chromatin state annotations for ENCODE2 and Roadmap

More information

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed. Reviewers' Comments: Reviewer #1 (Remarks to the Author) The manuscript titled 'Association of variations in HLA-class II and other loci with susceptibility to lung adenocarcinoma with EGFR mutation' evaluated

More information

Module 3: Pathway and Drug Development

Module 3: Pathway and Drug Development Module 3: Pathway and Drug Development Table of Contents 1.1 Getting Started... 6 1.2 Identifying a Dasatinib sensitive cancer signature... 7 1.2.1 Identifying and validating a Dasatinib Signature... 7

More information

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics

ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics ACE ImmunoID Biomarker Discovery Solutions ACE ImmunoID Platform for Tumor Immunogenomics Precision Genomics for Immuno-Oncology Personalis, Inc. ACE ImmunoID When one biomarker doesn t tell the whole

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

Small RNA-Seq and profiling

Small RNA-Seq and profiling Small RNA-Seq and profiling Y. Hoogstrate 1,2 1 Department of Bioinformatics & Department of Urology ErasmusMC, Rotterdam 2 CTMM Translational Research IT (TraIT) BioSB: 5th RNA-seq data analysis course,

More information

Introduction to Cancer Biology

Introduction to Cancer Biology Introduction to Cancer Biology Robin Hesketh Multiple choice questions (choose the one correct answer from the five choices) Which ONE of the following is a tumour suppressor? a. AKT b. APC c. BCL2 d.

More information

microrna analysis Merete Molton Worren Ståle Nygård

microrna analysis Merete Molton Worren Ståle Nygård microrna analysis Merete Molton Worren Ståle Nygård Help personnel: Daniel Vodak Background Dysregulation of mirna expression has been connected to progression and development of atherosclerosis The hypothesis:

More information

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach)

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach) High-Throughput Sequencing Course Gene-Set Analysis Biostatistics and Bioinformatics Summer 28 Section Introduction What is Gene Set Analysis? Many names for gene set analysis: Pathway analysis Gene set

More information

Section 6: Analysing Relationships Between Variables

Section 6: Analysing Relationships Between Variables 6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations

More information

Section 9. Junaid Malek, M.D.

Section 9. Junaid Malek, M.D. Section 9 Junaid Malek, M.D. Mutation Objective: Understand how mutations can arise, and how beneficial ones can alter populations Mutation= a randomly produced, heritable change in the nucleotide sequence

More information