Gene expression scaled by distance to the genome replication site. Ying et al. Supporting Information. Contents. Supplementary note I p.

Similar documents
Lentiviral Delivery of Combinatorial mirna Expression Constructs Provides Efficient Target Gene Repression.

Concentration Estimation from Flow Cytometry Exosome Data Protocol

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Nature Protocols: doi: /nprot Supplementary Figure 1

1.4 - Linear Regression and MS Excel

Gladstone Institutes, University of California (UCSF), San Francisco, USA

STATISTICS AND RESEARCH DESIGN

A programmable NOR-based device for transcription profile analyses

Correlation and regression

Chapter 3 CORRELATION AND REGRESSION

ab Exosome Isolation and Analysis Kit - Flow Cytometry, Cell Culture (CD63 / CD81)

Understandable Statistics

Nature Structural & Molecular Biology: doi: /nsmb.2419

Simple Linear Regression the model, estimation and testing

bivariate analysis: The statistical analysis of the relationship between two variables.

Nature Medicine: doi: /nm.2109

Business Statistics Probability

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Supplementary Information Methods Subjects The study was comprised of 84 chronic pain patients with either chronic back pain (CBP) or osteoarthritis

Analysis of glipizide binding to normal and glycated human serum Albumin by high-performance affinity chromatography

Method Comparison Report Semi-Annual 1/5/2018

ab Exosome Isolation and Analysis Kit - Flow Cytometry, Plasma

Chapter 3: Describing Relationships

BD CBA on the BD Accuri C6: Bringing Multiplexed Cytokine Detection to the Benchtop

9 research designs likely for PSYC 2100

Supplementary Figure 1. SC35M polymerase activity in the presence of Bat or SC35M NP encoded from the phw2000 rescue plasmid.

ab Exosome Isolation and Analysis Kit - Flow Cytometry, Cell culture

Supplementary Information

Late assembly of the Vibrio cholerae cell division machinery postpones

Supplementary Figure 1. Nature Neuroscience: doi: /nn.4547

Unit 1 Exploring and Understanding Data

Development and Validation of a Polysorbate 20 Assay in a Therapeutic Antibody Formulation by RP-HPLC and Charged Aerosol Detector (CAD)

Still important ideas

Supplementary Table; Supplementary Figures and legends S1-S21; Supplementary Materials and Methods

Supplemental Figures Supplemental Figure 1:

Sum of Neurally Distinct Stimulus- and Task-Related Components.

Behavioral generalization

The Impact of Melamine Spiking on the Gel Strength and Viscosity of Gelatin

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

RP-HPLC Analysis of Temozolomide in Pharmaceutical Dosage Forms

The North Carolina Health Data Explorer

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Online Appendix Material and Methods: Pancreatic RNA isolation and quantitative real-time (q)rt-pcr. Mice were fasted overnight and killed 1 hour (h)

Chapter 1: Exploring Data

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

The unconstrained evolution of fast and efficient antibiotic-resistant bacterial genomes

Suppl Video: Tumor cells (green) and monocytes (white) are seeded on a confluent endothelial

Rapid evolution of highly variable competitive abilities in a key phytoplankton species

IAPT: Regression. Regression analyses

NC bp. b 1481 bp

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

Assay Report. Histone Deacetylase (HDAC) Inhibitor Assays Enzymatic Study of Compounds from Client

Pelagia Research Library

Diurnal Pattern of Reaction Time: Statistical analysis

Chapter 3: Examining Relationships

A simple and rapid flow cytometry-based assay to identify a competent embryo prior to embryo transfer.

Week 17 and 21 Comparing two assays and Measurement of Uncertainty Explain tools used to compare the performance of two assays, including

Supplementary Material

Nature Immunology: doi: /ni Supplementary Figure 1. Huwe1 has high expression in HSCs and is necessary for quiescence.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Analysis of hair bundle morphology in Ush1c c.216g>a mice at P18 by SEM.

Pre-made Lentiviral Particles for Fluorescent Proteins

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Nature Neuroscience: doi: /nn Supplementary Figure 1

Supplementary Figure 1

International Journal of Pharma and Bio Sciences DEVELOPMENT AND VALIDATION OF RP-HPLC METHOD FOR THE ESTIMATION OF STRONTIUM RANELATE IN SACHET

Speed Accuracy Trade-Off

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

Supplementary data for:

Bangor University Laboratory Exercise 1, June 2008

Nature Genetics: doi: /ng Supplementary Figure 1. HOX fusions enhance self-renewal capacity.

Win MN, Smolke CD Higher-order cellular information processing with synthetic RNA devices. Science. 322: DOI: /science.

Nature Immunology: doi: /ni Supplementary Figure 1. DNA-methylation machinery is essential for silencing of Cd4 in cytotoxic T cells.

Supplementary Figure 1. Supernatants electrophoresis from CD14+ and dendritic cells. Supernatants were resolved by SDS-PAGE and stained with

B and T-Cells. CD8+ Cytotoxic T Cells

Supplementary Figure 1 Information on transgenic mouse models and their recording and optogenetic equipment. (a) 108 (b-c) (d) (e) (f) (g)

Pitfalls in Linear Regression Analysis

General Single Ion Calibration. Pete 14-May-09

Wenqin Hu, Cuiping Tian, Tun Li, Mingpo Yang, Han Hou & Yousheng Shu

Clincial Biostatistics. Regression

SCATTER PLOTS AND TREND LINES

Supplementary Figure 1

To test the possible source of the HBV infection outside the study family, we searched the Genbank

Supplemental Material for. Figure S1. Identification of TetR responsive promoters in F. novicida and E. coli.

Pearson r = P (one-tailed) = n = 9

File name: Supplementary Information Description: Supplementary Figures, Supplementary Table and Supplementary References

Supplementary Figure 1. BMS enhances human T cell activation in vitro in a

Flow Cytometry Based Exosome Detection and Analysis Using the ZE5 Cell Analyzer

11/24/2017. Do not imply a cause-and-effect relationship

Supplementary Figures

Supplementary Materials and Methods

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M.

ISSN (Print)

Certificate of Analysis

MODELING AN SMT LINE TO IMPROVE THROUGHPUT

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

PointCare NOW TM Technical Note

Reporting Checklist for Nature Neuroscience

Transcription:

Gene expression scaled by distance to the genome replication site Ying et al Supporting Information Contents Supplementary note I p. 2 Supplementary note II p. 3-5 Supplementary note III p. 5-7 Supplementary note IV p. 7-9 Supplementary references p. 9 Supplementary figures and figure legends p. 10-11 Supplementary table legends p. 12 1

Supplementary note I FCM data processing Population analysis was performed by flow cytometry. The flow data sets were analyzed using custom-designed scripts written in R as previously described 1, 2. In the analysis, the data sets were processed through several filtering steps, a total of four steps from the raw data (Fig. S1). To show the effects of each processing step, the distributions or density plots of three examples are presented, as follows: MDS42 (left panels), MDS42leuC::Ptet_gfp (type II, middle panels), and MDS42leuC::gfp (type I, right panels). These three constructs enabled autofluorescence of E. coli cells and relatively high and low levels of gfp expression in cells, respectively. First, the raw data sets for the fluorescence beads, which were loaded to calculate the cell concentration, were gated appropriately. In addition, the events that occurred at the bottom or top of the detector s range, which could be due to systematic errors, were eliminated (A, arrows). The data were then gated with a narrow FSC window at the peak of the FSC histogram to reduce cell size variability (B, blue lines). The resultant distributions of cellular GFP abundance (i.e., GFP FI) still contained the high and low extreme outliers of green fluorescence intensity (C, arrows). To prevent these unreliable rare events, one percent of the total cells representing both the highest and the lowest values of fluorescence intensity were removed (D). Finally, the autofluorescence (D, distributions in gray) was subtracted, yielding the true distributions of fluorescence intensity (E). The mean values of GFP FI were calculated accordingly. A total of 6 to 8 measurements (GFP FI distributions) were used for each genetic construct to acquire the mean GFP FI values shown in Fig. 2B (Table S3). 2

Supplementary note II Genome replication models The linear regression for the expression level (f), as a function of the distance from oric (x), can be described by the following equation (Eq. 1.1). In the present study, the value of a/b was experimentally determined to be 3.3 to 2.6 10-7 bp -1. f ( x) ax b (Eq. 1.1) To verify whether this experimentally determined value was reasonable or reliable, we constructed a mathematical model, presented in Eq. 2.1, in which the expression level (f) is simply determined by the promoter activity (p) multiplied by the gene dosage (d). The promoter activity, which can be considered the expression efficiency, is proportional to the protein abundance per single gene per cell. f p d (Eq. 2.1) The gene dosage (d) scales by the distance from oric (x), which depends on the number of replication forks on chromosome. In particular, the gene dosage (d) scales down from the copy number (c) at the replication initiation site (x = 0) to 1 copy at the termination region (x = G/2). That is, the value of d varies from 1 to c. Here, c depends on the mode of replication, for instance, the dual-fork replication for rapid growth and the single-fork replication for slow growth. Considering a simple linear case, the gene dosage (d) is proportional to the distance from oric, as the following equation described. x c1 d c 1 1 1 c x 1 1 G G 2 2 (Eq. 2.2) 3

Here, G represents the genome size. The first term, c, represents the gene dosage at the replication c 1 initiation site (x = 0), and the second term, x 1, represents how much the gene dosage G 2 decreases with x. The maximal decrease of the gene dosage is (c 1), when x = G/2. Introducing Eq. 2.2 into Eq. 2.1 results in the following equation, which is identical to Eq. 1.2 in the main text. f c 1 pc x 1 G 2 (Eq. 2.3) Comparing Eq. 1.1 with Eq. 2.3, the following relations among the parameters used in the linear model were drawn. c 1 a p 1 G 2 b pc (Eq. 2.4) Dual-fork replication model According to the bidirectional dual-fork replication model 3, the gene dosage (d) scales down from 4 copies (c = 4) at the replication initiation site (x = 0) to 1 copy at the termination region (x = G/2), that is, the value of d varies from 1 to 4. G represents the genome size, which was 3.98 10 6 bps for strain MDS42 5. By introducing this relationship into the mathematical model (Eqs. 2.2 and 2.3), the expression level can be rewritten in Eq. 2.5. x d 31 1 (Eq. 2.5) 1 G 2 4

x 6 p f ( x) p31 1 x 4 p 1 (Eq. 2.6) G G 2 Consequently, the following equation (Eq. 2.7.1) is obtained by either comparing the linear relationship (Eq. 2.6) with the linear regression (Eq. 1.1) or directly from Eq. 2.4. The following theoretical value was calculated in accordance with that relationship (Fig. 2C, bottom gray line). a b 3 1 3.77 10 2G 7 bps (Eq. 2.7.1) Single-fork replication model According to the same analytical procedure (Eqs. 2.5 and 2.6), the scaling down of the gene dosage (d) from 2 (c = 2) to 1 copy in the single-fork replication model 4 yields the following theoretical value (Fig. 2C, upper gray line). a b 1 1 1.2610 2G 7 bps (Eq. 2.7.2) Supplementary note III Parallelism and equivalence in the linear regressions of scaled expression The linear regression of the expression level (f) as a function of the distance from oric (x) can be described by the following equation (Eq. 1.1), where x represents the distance from oric. f ( x) ax b (Eq. 1.1) According to this equation, the relationships between the raw expression level and the distance from oric were described by the following formulas (Fig. S2, upper): Type II Type III, no IPTG 6 f ( x) 4.03 10 x+12.9 (Eq. 1.1.1) 6 f ( x) 1.05 10 x+3.8 (Eq. 1.1.2) 5

Type III, 10 μm IPTG 7 f ( x) 5.53 10 x+2.2 (Eq. 1.1.3) First, the equality (equivalence) of these three regression coefficients 10 was tested. Similar to the analysis of covariance (ANCOVA), the null hypothesis was that the slopes (coefficients) of these regression lines were the same (H 0 ), while the alternative was they were not the same (H 1 ). The statistical analysis (3 groups, 37 gene expression levels in each group (Fig.1B)) yielded an F value of 68.44, given by the ratio of the regression mean square to the error mean square (MSR/MSE). According to the F distribution (degrees of freedom d1=3-1=2, d2=3 37-2 3-22=83, 22 is the number of missing values), the difference in coefficients was highly significant (p=2 10-16 ), as can be observed from the plots (Fig. S2, upper) without statistical tests. Thus, these regression lines are not parallel, and there were interactions among the groups (data sets). Second, when the plots were normalized by the intercept b, scaled expression levels with similar slopes were acquired (Fig.S2, bottom): Type II a/b = 3.28 10-7 = k 1 Type III, IPTG 0 μm a/b = 2.79 10-7 = k 2 Type III, IPTG 10 μm a/b = 2.55 10-7 = k 3 Consequently, the linear regression formulas (Eq. 1.1.1-1.1.3) could be rewritten as the following equations (Eq. 1.1.4-1.1.6). Note that the linearity of the chromosomal location-mediated gene dosage effect is maintained by scaling. Type II g( x) k1x+1 (Eq. 1.1.4) Type III, no IPTG g( x) k2x+1 (Eq. 1.1.5) Type III, 10 μm IPTG g( x) k3x+1 (Eq. 1.1.6) The equality test of the regression slopes was performed on the scaled slopes as described above (MSR/MSE = 1.14), and the statistical significance (p-value) was as low as 0.325. This means that 6

the null hypothesis, that is, that the slopes (k 1, k 2 and k 3 ) are the same, could not be rejected. In other words, the coefficients of regressions were equivalent. Supplementary note IV Experimental and analytical details Genetic constructs The E. coli strain MDS42 5 was purchased from Scarab Genomics (Madison, WI, USA) and used as the host strain. The Red system was used to replace the target gene with the reporter gene, gfp, either with or without the universal promoter P tet. The fast-maturing gfp used is identical to the reporter gene used in our previous studies. Both the P tet -containing and GFP-only sequences were PCR-amplified (Table S1) from the pbrgalkgr plasmid (pbr322 derivative), which contained the P tet -gfp-p kan -Km cassette downstream of a terminator sequence for genome replacement, as previously described 6. The manipulation of the homologous recombination and the phenotypic and genetic verification of the transformants (positive colonies) were performed as described previously 6. Subsequently, a plasmid carrying a repressor protein, TetR, for controlling the P tet -derived gene expression was introduced into each construct. The plasmid (pbrtetrcm) was constructed by simply removing the rfp sequence from a previously reported plasmid (pbrintc series) in which the rfp, tetr, and cat genes are under the control of the lac promoter (P lac ) and its operator 6. The final collection of constructs is summarized in Table S2. Cell culture Cell culture was performed using M63 minimal medium supplemented with 19 amino acids at a final concentration of 0.2 mm each and tyrosine at a concentration of 0.05 mm. The cells carrying the plasmid (TetR) were cultured in the presence of the necessary antibiotics, 50 μg/ml ampicillin, 7

and the inducer IPTG as indicated. The cell cultures were controlled during the early logarithmic growth phase, and the final cell concentrations were approximately 10 7 cells/ml. The cell concentration was measured by flow cytometry. All cell cultures were transferred for several passages (days) to acquire multiple measurements of the growth rate and GFP abundance. The growth rates were calculated based on the initial and final cell concentrations, as described previously 9. A total of six to eight replicate cultures were used for each genetic construct (Table S2). Flow cytometry The GFP expression (fluorescence intensity) and relative cell size were evaluated using a flow cytometer (FACSCanto II; Becton Dickinson) equipped with a 488-nm argon laser and a 515 545-nm emission filter (GFP). The following PMT voltage settings were applied: forward scatter (FSC), 280; side scatter (SSC), 400; GFP, 600. The flow rate for the sample measurements was set to low. The cell samples, which were mixed with known concentrations of fluorescent beads (3 μm Fluoresbrite YG Microspheres; Polysciences), were loaded to calculate the cell concentration. The flow data sets were analyzed using custom-designed scripts written in R as previously described 1, 2. Examples are shown in Fig. S1. The data processing procedure is described in detail in Supplementary note I. Omics data and statistical analyses The proteome data sets reporting the mean protein abundance were from Taniguchi et al. 7 and Ishihama et al. 8, respectively. The transcriptome data sets reporting the gene expression profiling of MDS42 and MG1655 were from Ying et al. 9 (GEO Series accession number GSE33212). The mean values of seven replicates of exponential growth at 37 C were applied. The gene names and the chromosomal positions (distances from oric) were based on the following publicly deposited 8

genome sequences: MDS42, DDBJ accession ID AP012306; MG1655, GenBank accession ID U00096; MC4100, GenBank accession ID NC012759. Statistical analyses were performed using the software package R. Pearson s correlation coefficient was used to assess the association between gene expression levels and the distance from oric. The p-value indicates the significance of the correlation by testing the null hypothesis of no linear relationship between expression level and chromosomal location. Supplementary references 1. Y. Shimizu, S. Tsuru, Y. Ito, B. W. Ying and T. Yomo, PLoS One, 2011, 6, e23953. 2. S. Tsuru, J. Ichinose, A. Kashiwagi, B. W. Ying, K. Kaneko and T. Yomo, Phys Biol, 2009, 6, 036015. 3. H. Yoshikawa, A. O'Sullivan and N. Sueoka, Proc Natl Acad Sci U S A, 1964, 52, 973-980. 4. S. Cooper and C. E. Helmstetter, J Mol Biol, 1968, 31, 519-540. 5. G. Posfai, G. Plunkett, 3rd, T. Feher, D. Frisch, G. M. Keil, K. Umenhoffer, V. Kolisnychenko, B. Stahl, S. S. Sharma, M. de Arruda, V. Burland, S. W. Harcum and F. R. Blattner, Science, 2006, 312, 1044-1046. 6. B. W. Ying, Y. Ito, Y. Shimizu and T. Yomo, J Biosci Bioeng, 2010, 110, 529-536. 7. Y. Taniguchi, P. J. Choi, G. W. Li, H. Chen, M. Babu, J. Hearn, A. Emili and X. S. Xie, Science, 2010, 329, 533-538. 8. Y. Ishihama, T. Schmidt, J. Rappsilber, M. Mann, F. U. Hartl, M. J. Kerner and D. Frishman, BMC Genomics, 2008, 9, 102. 9. B. W. Ying, S. Seno, F. Kaneko, H. Matsuda and T. Yomo, BMC Genomics, 2013, 14, 25. 10. J.H. McDonald. Handbook of Biological Statistics, 2nd ed. Sparky House Publishing, Baltimore, Maryland, 2009. 9

Supplementary figures and figure legends Figure S1. FCM data processing. Examples of the data processing, which comprised four steps, are illustrated. 10

Figure S2 Chromosomal location dependency scaled by genome position. The upper and lower panels represent the raw and scaled gene expression levels (mean GFP FI), respectively. The type II (pistachio) and type III constructs (lilac, colors in light and dark indicate the growth conditions in the presence and absence of 10 μm IPTG, respectively) are shown. The correlation coefficients (R 2 ) of the linear regressions (upper panel) were 0.85 (green), 0.75 (dark lilac) and 0.51 (light lilac), and those of the corresponding exponential regressions were 0.87, 0.76 and 0.56, respectively. 11

Supplementary tables Table S1. Primers. Primers used for genome replacement and genetic confirmation of the constructs. Table S2. Strains and clones. The genetically engineered E. coli cells comprising 3 types of genetic designs (types I, II and III) used in this study are summarized. The growth rates within the early exponential growth phase are indicated. μ1 4, μ_ave, and μ_sd indicate the growth rates of the repeated cell cultures, the average growth rate of the corresponding construct, and the standard deviation of the growth rate, respectively. Table S3. The mean values of GFP FI of the three types of genetic constructs. Data sets of the average GFP abundance (mean) and the standard deviation (sd) of repeated experiments are shown. The data sets were used to generate the plots in Fig. 2B. 12