Supplementary Information Computational Methods

Similar documents
describing DNA reassociation* (renaturation/nucleation inhibition/single strand ends)

Detection and Classification of Brain Tumor in MRI Images

Sequence Analysis using Logic Regression

Overview. On the computational aspects of sign language recognition. What is ASL recognition? What makes it hard? Christian Vogler

DEPOSITION AND CLEARANCE OF FINE PARTICLES IN THE HUMAN RESPIRATORY TRACT

SUPPLEMENTARY INFORMATION

Utilizing Bio-Mechanical Characteristics For User-Independent Gesture Recognition

Superspreading and the impact of individual variation on disease emergence

Southwest Fisheries Science Center National Marine Fisheries Service 8604 La Jolla Shores Dr. La Jolla, California 92037

Community-Based Bayesian Aggregation Models for Crowdsourcing

The comparison of psychological evaluation between military aircraft noise and civil aircraft noise

4th. generally by. Since most. to the. Borjkhani 1,* Mehdi. execution [2]. cortex, computerized. tomography. of Technology.

Spatial Responsiveness of Monkey Hippocampal Neurons to Various Visual and Auditory Stimuli

Coupled feedback loops maintain synaptic long-term. potentiation: A computational model of PKMzeta

Systematic Review of Trends in Fish Tissue Mercury Concentrations

Regulation of spike timing in visual cortical circuits

PARKINSON S DISEASE: MODELING THE TREMOR AND OPTIMIZING THE TREATMENT. Keywords: Medical, Optimization, Modelling, Oscillation, Noise characteristics.

Measurement of Dose Rate Dependence of Radiation Induced Damage to the Current Gain in Bipolar Transistors 1

% of Nestin-EGFP (+) cells

Onset, timing, and exposure therapy of stress disorders: mechanistic insight from a mathematical model of oscillating neuroendocrine dynamics

Data Retrieval Methods by Using Data Discovery and Query Builder and Life Sciences System

RADIATION DOSIMETRY INTRODUCTION NEW MODALITIES

Incremental Diagnosis of DES with a Non-Exhaustive Diagnosis Engine

Urbanization and childhood leukaemia in Taiwan

Model of α-linolenic acid metabolism

Opening and Closing Transitions for BK Channels Often Occur in Two

Leukotriene B4-like material in scale of psoriatic skin lesions

Learned spatiotemporal sequence recognition and prediction in primary visual cortex

Monte Carlo dynamics study of motions in &s-unsaturated hydrocarbon chains

A Hospital Based Clinical Study on Corneal Blindness in a Tertiary Eye Care Centre in North Telangana

lysates of strain PML15 cells treated with mitomycin C by described by Hoshino and Kageyama (4). The preparation of

THE ATP-DEPENDENT CONCENTRATION OF CALCIUM BY A GOLGI APPARATUS-RICH FRACTION ISOLATED FROM RAT LIVER

Abstrat The goal of this qualifying projet was to investigate the output of neutron radiation by the new fast neutron failities at the University of M

Supplementary Figure 1. Implants derived from human embryoid body preparations contain non-cardiac structures. In early studies, infarcted hearts

Channel Modeling Based on Interference Temperature in Underlay Cognitive Wireless Networks

The effects of bilingualism on stuttering during late childhood

ISSN: [Pal* et al., 6(12): December, 2017] Impact Factor: 4.116

Effect of Dietary Astaxanthin and Background Color on Pigmentation and Growth of Red Cher r y Shr imp, Neocaridina heteropoda

Quantification of population benefit in evaluation of biomarkers: practical implications for disease detection and prevention

Prognostic value of tissue-type plasminogen activator (tpa) and its complex with the type-1 inhibitor (PAI-1) in breast cancer

Cyclic Fluctuations of the Alveolar Carbon Dioxide Tension during the Normal Menstrual Cycle

incorporation in hepatoma 7288CTC perfused in situ

Allocation of attention across saccades

mouths before diffusional equilibration can occur at the

Analysis of EEG background activity in Autism disease patients with Bispectrum and STFT measure

Structural Analysis of a Prokaryotic Ribosome Using a Novel Amidinating Cross-Linker and Mass Spectrometry

was cultured on dextran beads in the presence of nerve growth factor for 7-10 days. Culture medium was formulated

Stress is an essential component of an organism s attempt

Onset, timing, and exposure therapy of stress disorders: mechanistic insight from a mathematical model of oscillating neuroendocrine dynamics

Study of Necrosis in the Liver of Formaldehyde and Benzo(α)Pyrene Exposured-Mice

An Intelligent Decision Support System for the Treatment of Patients Receiving Ventricular Assist Device Support

Mark J Monaghan. Imaging techniques ROLE OF REAL TIME 3D ECHOCARDIOGRAPHY IN EVALUATING THE LEFT VENTRICLE TIME 3D ECHO TECHNOLOGY

Computational Saliency Models Cheston Tan, Sharat Chikkerur

Direction of active sliding of microtubules in Tetrahymena cilia (dynein/cell motility/electron microscopy)

marker, followed by digital limits) 0.04% (0.02%)) but infiltration of macrophages was detected in minimal change nephropathy (0.29% (0.

is the branch of science that studies the transformation of energy from one form to another. Thermochemistry specifically studies.

Wise, 1974), and this was shown to be associated with an increase in the rate of 45Ca. Denmark (Received 18 August 1978) by tetracaine (104 M).

The burden of smoking-related ill health in the United Kingdom

Translational Regulation of Polysome Formation During

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Rate of processing and judgment of response speed: Comparing the effects of alcohol and practice

Reading a Textbook Chapter

American Orthodontics Exhibit 1001 Page 1 of 6. US 6,276,930 Bl Aug. 21,2001 /IIIII

What causes the spacing effect? Some effects ofrepetition, duration, and spacing on memory for pictures

site-specificity in intermediate fi'lament-membrane interactions

Identification of an adipose tissue-like lipoprotein lipase in perfusates of chicken liver

Effect of Curing Conditions on Hydration Reaction and Compressive Strength Development of Fly Ash-Cement Pastes

Defective Peroxisomal Cleavage of the C27-Steroid Side Chain

collagen-induced arthritis

Binding and Transport of Thiamine by Lactobacillus casei

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

A NEW ROBOT-BASED SETUP FOR EXPLORING THE STIFFNESS OF ANATOMICAL JOINT STRUCTURES. [Frey, Burgkart,

Assessment of neuropsychological trajectories in longitudinal population-based studies of children

Bayesian Inference for Single-cell ClUstering and ImpuTing (BISCUIT) Elham Azizi

ADVANCED NEUTRON IRRADIATION SYSTEM USING TEXAS A&M UNIVERSITY NUCLEAR SCIENCE CENTER REACTOR

Cycloheximide resistance can be mediated through either ribosomal subunit

Rho-kinase Regulates Energy Balance by Targeting Hypothalamic Leptin Receptor Signaling

Granulocytosis and Lymphocytopenia in the Blood as Indicators for Drug Adverse Reaction during Calcitonin

A DESIGN ASSESSMENT TOOL FOR SQUEAK AND RATTLE PERFORMANCE

STRUCTURAL AND HORMONAL ALTERATIONS IN THE GASTROINTESTINAL TRACT OF PARENTERALLY FED RATS

Unit 02 - The Inside Story about Nutrition and Health. True / False

polymorphonuclear neutrophil release of granular

Defective neutrophil function in low-birth-weight,

The effects of question order and response-choice on self-rated health status in the English Longitudinal Study of Ageing (ELSA)

Late influences on perceptual grouping: Illusory figures

Outlier Analysis. Lijun Zhang

Is cancer risk of radiation workers larger than expected?

Sexual and marital trajectories and HIV infection among ever-married women in rural Malawi

Occupation and male infertility: glycol ethers and other exposures

Effect of atorvastatin on inflammation and outcome in patients with type 2 diabetes mellitus on hemodialysis

Reversal of ammonia coma in rats by L-dopa: a peripheral effect

Every Flex is Quite Complex

MCNP MODELING OF PROSTATE BRACHYTHERAPY AND ORGAN DOSIMETRY. A Thesis SUSRUT RAJANIKANT USGAONKER

Costly Price Discrimination

Selection of Prebiotic Molecules in Amphiphilic Environments

constituent amino acids in man'

Normal Venous Anatomy of the Brain: Demonstration with Gadopentetate

Department of Medicine, University of California, Irvine, California, U.S.A.

RATING SCALES FOR NEUROLOGISTS

Transcription:

Supplementary Information Computational Methods Data preproessing In this setion we desribe the preproessing steps taken to establish the data matrix of hepatoyte single ell gene expression data (Table S1). The main steps were the removal of non-parenhymal ells and the subtration of bakground signal for eah gene. Suh bakground is predominantly aused by barode swithing and sequening errors 1. The resulting data matrix onsisted of G = 27297 genes and N = 1415 ells. Eah ell in the matrix held the UMI ount of gene g in ell, denoted UMI g. As a first preproessing step, we removed non-parenhymal ells. To this end we onsidered ell-type speifi sets of genes the Kupffer ell genes: Cle4f, Csf1r, C1q, C1qa and C1qb, the endothelial ell genes: Kdr, Egfl7, Igfbp7 and Aqp1, and the hepatoyte genes: Apoa1, Apob, Pk1, G6p and Ttr. Cells for whih the aggregated transript ounts of either the Kupffer set or the endothelial set exeeded the aggregated ounts of the hepatoyte set were removed from further analysis. We next alulated the bakground expression level for eah gene, based on wells in whih RNA extration or amplifiation failed. These were defined as wells for whih the aggregated sum of all ERCC spike-in moleules was greater than 0.04 of the aggregated sum of the non-ercc moleules. These wells invariably inluded the four empty wells in eah plate. The bakground expression of eah gene was then set to the mean number of moleules aross all bakground ells, and was subtrated from the expression of that gene in the remaining ells. Negative ounts were set to zero. Following subtration of bakground levels, we filtered-out ells with total number of moleules smaller than 1000 UMI or larger than 30,000 UMI. Finally, we 1

disarded ells in whih the expression level of Albumin was lower than 1% of the total ellular UMI. This threshold was hosen based on previous bulk estimates that estimated the average Albumin gene expression to be ~10% of the total number of hepatoyte ellular mrna 2, and our findings using smfish that Albumin levels rarely dereased below 10% of this average. Our preproessing steps yielded 1415 high-onfidene hepatoytes. Algorithm for spatial reonstrution of liver zonation profiles In this setion we desribe our algorithm for reonstruting zonation profiles by ombining smfish measurements of landmark genes and single ell RNAseq measurements. Inferene of spatial oordinates of ells from single ell RNAseq and traditional binary in-situ hybridization have been reently desribed 3,4. Our method differs from these studies in two main aspets: 1) Here we used single moleule FISH rather than traditional FISH, yielding preise ontinuous ellular gene expression levels of individual ells at defined spatial oordinates, rather than binary expression. 2) While our inferene provides the estimated lobule position of eah ell, our main goal is reonstrution of zonation profiles. We thus utilize the omplete posterior probability vetors of ells to belong to any oordinate, thus maximizing the information used to reonstrut these profiles. 1. Lobule geometry For simpliity we onsidered one-dimensional lobule geometry, where lobules onsist of hexagonal shaped olumns with infinite height and radial symmetry. We further assumed 2

that the relevant oordinate is the distane of eah ell from the losest entral vein, disretized into 9 lobule layers (layer z =1 begins at the entral vein whereas layer z =9 ends at the portal node). More ompliated topologies that inlude additional dimensions, suh as distane of eah ell to the losest portal node / portal trat or vertial distane along the lobule olumn an be onsidered in future work. Our topology defines a prior probability of sampling a ell at lobule layer z, P prior (z) (Extended Data Fig. 3,d). 2. Probabilisti reonstrution of zonation profiles Our goal was to infer the gene zonation matrix, whih held the average expression level (in fration of total ellular mrna) attributed to every gene g at lobule layer z, i.e. E g,z. Sine the RNA yield in srnaseq experiments is variable among ells, we normalized the bakground subtrated expression matrix UMI g by dividing the number of UMI of eah gene in eah ell by the total number of UMI for that ell to obtain the data matrix: [1]D g, = UMI g / G g=1 UMI g with g=1..g genes and =1.. N ells (G=27297 and N=1415). This normalization failitated pooling multiple ells to estimate the average expression in eah lobule layer. To ompute the gene zonation matrix, we multiplied the data matrix D g, (the expression of eah gene in eah ell) by a weighted probability matrix, W,z (the weighted probability for eah ell to be at eah zone), [2]E g,z = D g, W,z, where we used bootstrapping to obtain standard errors for the mean zonation profiles. 3

The key step in our algorithm was to estimate weighted probability matrix, W,z. To this end we estimated the posterior probability matrix: [3]P,z = P posterior (z) = P sampling(u z) P prior (z) 9 z=1 P sampling (U z) P prior (z) The matrix P,z desribes the probability of eah ell to belong to eah lobule layer z, given the vetor of expression of the 6 landmark genes U = {U 1,, U 6 }. It onsists of N rows representing ells and Z olumns representing lobule layers (N=1415 and Z=9, Table S2). For eah ell, the posterior probability is the produt of the sampling distribution, namely the probability to have the given landmark gene expression vetor at eah layer, P sampling (U z), and the prior probability of the ell to belong to that layer, P prior (z), see Extended Data Fig. 3. The sampling distribution was obtained by measuring the distributions of ellular expression of eah of the landmark genes in eah layer using smfish, P sampling (U z) = 6 g=1 P(U g z), assuming that the expression of the landmark genes is independent (for further details, see setion 3). The sampling distribution enapsulates both the gene-speifi unertainty introdued by the spatial variability in gene expression, as well as the ell-speifi sampling unertainty introdued by the sparse sampling of the srnaseq method. Lastly, we normalized the posterior matrix by the olumn sums to obtain the weight matrix: [4]W,z = P,z N =1 P,z 4

This normalization ensured that the number of ells in eah lobule layer did not affet the average layer expression. We next desribe our method for obtaining the sampling distribution P sampling (U z) that was used to obtain the weight matrix W,z. 3. Computing the sampling distribution based on smfish measurements Sine RNAseq yields a sparse sampling of the ellular mrna we hose genes with high levels of expression for our landmark gene panel, to ensure their representation in eah of the sequened ells. As a result, individual mrna moleules ould rarely be diserned in the smfish images. Thus, we quantified expression as average ellular fluoresene intensity, and onverted it to estimates of ellular mrna ounts, as explained below. For eah of our landmark genes, segmented ells from the smfish images were pooled and divided into 9 equidistant layers aording to their normalized distane from the entral vein. At every layer z we omputed the histogram of expression levels, yielding the sampling distribution in units of fluoresene intensity onentrations P sampling (I g z), where I g is the ellular fluoresene intensity of gene g. The normalized distane was defined as the ratio between the distane of eah ell from the entral vein and the distane between the portal node and entral vein in eah quantified lobule. This normalization orreted for lobules that were not setioned perpendiularly to the lobule vertial axis. We sought to onvert the distributions of ellular expression levels from units of fluoresene intensity to absolute mrna ounts per ell. To this end, we first used our RNAseq data to ompute the average fration of total ellular mrna attributed to eah of our landmark genes F g. We multiplied this fration by an estimate of the total 5

number of mrna moleules in a tetraploid hepatoyte (the most abundant hepatoyte ploidy lass in the mouse ages studied 5 ), T, to obtain the average number of mrna moleules of gene g in a tetraploid hepatoyte, M g = F g T. The ellular fluoresene intensity of eah ell was divided by the average fluoresene intensity over all ells and multiplied by M g : [5]M g = I g I g M g Using equation [5] and the normalized distane of eah ell we obtained the sampling distribution in units of absolute mrna moleules P sampling (M g z). We fit these sampling distributions with gamma funtions (Table S7). To estimate T, the average total number of mrna moleules in a tetraploid hepatoyte, we used previous smfish-based absolute measurements of the steady state ellular mrna ontent for the genes Ass1, G6p and Pk1 in liver of fasted mie 5. We divided the average mrna ontent of these genes by their orresponding fration of the total transriptome, as obtained from bulk RNAseq measurements of liver tissue 2. This analysis yielded an estimate of 787,000 mrna moleules per typial tetraploid hepatoyte. In the RNAseq proedure, we do not detet all mrna moleules but only a subsample of them. Subsampling the ellular mrna broadens the distributions of expression levels. A key feature of our algorithm is that ells with lower levels of sampling have broader sampling distributions due to sampling noise, and therefore ontribute less to the reonstruted zonation profiles (sine their orresponding values in W,z used in equation [4] will be smaller). To inorporate this feature we sought to 6

estimate a ell-speifi sampling distribution, P(U g z) defined as the probability of observing U g moleules of gene g in ell in eah zone z, given that a sparse sampling of a fration β of the ellular moleules has been applied. We first estimated β for eah ell, as the ratio between the total UMI moleules for that ell and the average number of mrna moleules per hepatoyte T. The range of sampling levels was β = 1.49% ± 0.95%. For omputational effiieny we disretized the sampling levels into 8 bins representing ells with similar sampling levels and omputed a median sampling for that bin, defined as β d, d = 1 8. For eah sampling bin we built the sampling distributions as follows for every landmark gene g in every lobule layer we drew 50,000 values, M g from the relevant gamma distribution of ellular mrna expression in that layer P sampling (M g z) (Table S7) and performed a Poisson sampling of this value with parameter λ = β d ν M g to obtain the sampled value m g. ν = 10 was a fator that orreted for the fat that the smfish measurements do not apture the entire hepatoyte volume, but rather 0.1 ± 0.04 (median ± median absolute deviation) of the hepatoyte volumes, and therefore represents by itself a sampling whih broadens the true distribution of ellular mrna 5. We multiplied the obtained sampled values m g by β β d 1 ν, to ensure that they mathed the ellular UMI modeled. This generated the sampling distribution P(U g z), used in equation [3] to generate for eah ell the posterior probability P,z of originating from any of the z = 1 9 layers. Data visualization 7

For the tsne data visualization (Fig. 3a,b, Extended Data Fig. 2) we used an adjusted version of RaeID software 6, on all of the single liver ells aquired, inluding the nonparenhymal ells filtered out (see Data Preproessing setion). We exluded hepatoytes with less than 1% Albumin out of the total transript ounts. We also exluded ells with less than 600 or more than 50,000 UMI per ell. For eah ell the UMI ounts of every gene were divided by the summed UMI ount of all genes, and multiplied by the median aross all ells. Following addition of a pseudoount of 0.1 to the expression data, genes that ontained less than a single transript or more than 1000 transripts in at least 1100 ells were removed. Additionally, genes that had more than 1000 transripts in any ell after normalization, were removed as well, resulting in 1724 ells and 753 genes. Dimensionality redution and visualization were performed with t-distributed stohasti neighbor embedding (t-sne 7 ). Seleted gene sets were olored aording to the aggregated log-expression of the normalized data. 1. Jaitin, D. A. et al. Massively parallel single ell RNA-Seq for marker-free deomposition of tissues into ell types. Siene 343, 776 779 (2014). 2. Atger, F. et al. Ciradian and feeding rhythms differentially affet rhythmi mrna transription and translation in mouse liver. Pro. Natl. Aad. Si. U. S. A. 112, E6579-6588 (2015). 3. Satija, R., Farrell, J. A., Gennert, D., Shier, A. F. & Regev, A. Spatial reonstrution of single-ell gene expression data. Nat. Biotehnol. 33, 495 502 (2015). 4. Ahim, K. et al. High-throughput spatial mapping of single-ell RNA-seq data to tissue of origin. Nat. Biotehnol. 33, 503 509 (2015). 8

5. Bahar Halpern, K. et al. Bursty Gene Expression in the Intat Mammalian Liver. Mol. Cell 58, 147 156 (2015). 6. Grün, D. et al. Single-ell messenger RNA sequening reveals rare intestinal ell types. Nature 525, 251 255 (2015). 7. van der Maaten, L. & Hinton, G. Visualizing Data using t-sne. J. Mah. Learn. 9, 2579 2605 (2008). 9