Nonlinear, Nongaussian Ensemble Data Assimilation with Rank Regression and a Rank Histogram Filter

Similar documents
AP Statistics. Semester One Review Part 1 Chapters 1-5

Unit 1 Exploring and Understanding Data

Nov versus Fam. Fam 1 versus. Fam 2. Supplementary figure 1

MULTIPLE REGRESSION OF CPS DATA

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

SUPPLEMENTAL MATERIAL

STATISTICS AND RESEARCH DESIGN

Changing expectations about speed alters perceived motion direction

Bayesian integration in sensorimotor learning

The Pretest! Pretest! Pretest! Assignment (Example 2)

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

BREAST CANCER EPIDEMIOLOGY MODEL:

J2.6 Imputation of missing data with nonlinear relationships

Supplementary materials for: Executive control processes underlying multi- item working memory

Reach and grasp by people with tetraplegia using a neurally controlled robotic arm

Tutorial 3: MANOVA. Pekka Malo 30E00500 Quantitative Empirical Research Spring 2016

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

CSE 258 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

7 Grip aperture and target shape

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

10. LINEAR REGRESSION AND CORRELATION

Study of cigarette sales in the United States Ge Cheng1, a,

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer)

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger

Early Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.

Lecture 15. There is a strong scientific consensus that the Earth is getting warmer over time.

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Numerical Integration of Bivariate Gaussian Distribution

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models

Political Science 15, Winter 2014 Final Review

Statistical Methods and Reasoning for the Clinical Sciences

10-1 MMSE Estimation S. Lall, Stanford

bivariate analysis: The statistical analysis of the relationship between two variables.

An Introduction to Bayesian Statistics

Business Statistics Probability

Neural correlates of multisensory cue integration in macaque MSTd

Multiple Bivariate Gaussian Plotting and Checking

Econometrics II - Time Series Analysis

CHAPTER VI RESEARCH METHODOLOGY

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

NONLINEAR REGRESSION I

Chapter 1: Introduction to Statistics

Analysis and Interpretation of Data Part 1

Efficiency and design optimization

Modeling Sentiment with Ridge Regression

Information-theoretic stimulus design for neurophysiology & psychophysics

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model

SUPPLEMENTARY INFORMATION

Understandable Statistics

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

Bayesian Nonparametric Methods for Precision Medicine

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Probabilistic Inference in Human Sensorimotor Processing

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

E 490 FE Exam Prep. Engineering Probability and Statistics

Georgetown University ECON-616, Fall Macroeconometrics. URL: Office Hours: by appointment

Near Optimal Combination of Sensory and Motor Uncertainty in Time During a Naturalistic Perception-Action Task

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

REGRESSION MODELLING IN PREDICTING MILK PRODUCTION DEPENDING ON DAIRY BOVINE LIVESTOCK

Computational Models of Visual Attention: Bottom-Up and Top-Down. By: Soheil Borhani

Profile Analysis. Intro and Assumptions Psy 524 Andrew Ainsworth

Lesson 1: Distributions and Their Shapes

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Learning from data when all models are wrong

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

Assignment #6. Chapter 10: 14, 15 Chapter 11: 14, 18. Due tomorrow Nov. 6 th by 2pm in your TA s homework box

Score Tests of Normality in Bivariate Probit Models

MOCKTIME.COM ONLINE TEST SERIES CORRESPONDENCE COURSE

Should a Normal Imputation Model Be Modified to Impute Skewed Variables?

isc ove ring i Statistics sing SPSS

Prepared by: Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies

Intelligent Systems. Discriminative Learning. Parts marked by * are optional. WS2013/2014 Carsten Rother, Dmitrij Schlesinger

Learning to classify integral-dimension stimuli

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Mixed Frequency Models with MA components by Foroni, Marcellino

Functional Elements and Networks in fmri

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Still important ideas

Machine learning for neural decoding

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Study Guide for the Final Exam

Application of Phased Array Radar Theory to Ultrasonic Linear Array Medical Imaging System

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018

The 29th Fuzzy System Symposium (Osaka, September 9-, 3) Color Feature Maps (BY, RG) Color Saliency Map Input Image (I) Linear Filtering and Gaussian

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Bayesian approaches to handling missing data: Practical Exercises

Still important ideas

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering

Economic evaluation of factorial randomised controlled trials: challenges, methods and recommendations

Power-Based Connectivity. JL Sanguinetti

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Activity: Smart Guessing

Neuron, Volume 63 Spatial attention decorrelates intrinsic activity fluctuations in Macaque area V4.

Seminar Thesis: Efficient Planning under Uncertainty with Macro-actions

Transcription:

Nonlinear, Nongaussian Ensemble Data Assimilation with Rank Regression and a Rank Histogram Filter Jeff Anderson, NCAR Data Assimilation Research Section pg 1

Schematic of a Sequential Ensemble Filter 1. Use model to advance ensemble (3 members here) to time at which next observation becomes available. Ensemble state estimate after using previous observation (analysis) Ensemble state at time of next observation (prior)

Schematic of a Sequential Ensemble Filter 2. Get prior ensemble sample of observation, y = h(x), by applying forward operator h to each ensemble member. Theory: observations from instruments with uncorrelated errors can be done sequentially. Can think about single observation without (too much) loss of generality.

Schematic of a Sequential Ensemble Filter 3. Get observed value and observational error distribution from observing system.

Schematic of a Sequential Ensemble Filter 4. Find the increments for the prior observation ensemble (this is a scalar problem for uncorrelated observation errors). Ensemble Kalman filters assume Gaussianity for this problem. Can compute increments without Gaussian assumptions. (Old news).

Schematic of a Sequential Ensemble Filter 5. Use ensemble samples of y and each state variable to linearly regress observation increments onto state variable increments. Theory: impact of observation increments on each state variable can be handled independently!

Schematic of a Sequential Ensemble Filter 5. Use ensemble samples of y and each state variable to linearly regress observation increments onto state variable increments. Can solve this bivariate problem in other ways. Topic of this talk.

Schematic of a Sequential Ensemble Filter 6. When all ensemble members for each state variable are updated, there is a new analysis. Integrate to time of next observation

Focus on the Regression Step Standard ensemble filters just use bivariate sample linear regression to compute state increments. Dx i,n =bdy n, n=1, N. N is ensemble size. b is regression coefficient. pg 9

Focus on the Regression Step Examine another way to increment state given observation increments; still bivariate. pg 10

Nonlinear Regression Example Try to exploit nonlinear prior relation between a state variable and an observation. Example: Observation y~log(x). Also relevant for variables that are log transformed for boundedness (like concentrations). pg 11

Standard Ensemble Adjustment Filter (EAKF) Try to exploit nonlinear prior relation between a state variable and an observation. Example: Observation y~log(x). Also relevant for variables that are log transformed for boundedness (like concentrations). pg 12

Standard Rank Histogram Filter (RHF) Try to exploit nonlinear prior relation between a state variable and an observation. Example: Observation y~log(x). Also relevant for variables that are log transformed for boundedness (like concentrations). pg 13

Rank Regression 1. Convert bivariate ensemble to bivariate rank ensemble. 2. Do least squares on bivariate rank ensemble. Standard regression Rank regression Noisy Relation. pg 14

Rank Regression 1. Convert bivariate ensemble to bivariate rank ensemble. 2. Do least squares on bivariate rank ensemble. Standard regression Rank regression Monotonic relation. pg 15

Rank Regression 1. Convert bivariate ensemble to bivariate rank ensemble. 2. Do least squares on bivariate rank ensemble. 3. Convert observation posteriors to rank. 4. Regress rank increments onto state ranks. pg 16

Rank Regression 1. Convert bivariate ensemble to bivariate rank ensemble. 2. Do least squares on bivariate rank ensemble. 3. Convert observation posteriors to rank. 4. Regress rank increments onto state ranks. 5. Convert posterior state ranks to state values. pg 17

Rank Regression Example Rank regression with RHF for observation marginal. Follows monotonic ensemble prior exactly. pg 18

Compare Six Treatments Each treatment pairs an observation space method: 1. Deterministic Ensemble Adjustment Kalman Filter (EAKF), 2. Stochastic Ensemble Kalman Filter (EnKF), 3. Non-Gaussian Rank Histogram Filter (RHF) With a regression method: A. Standard linear regression, B. Rank regression. pg 19

Empirically Tuned Localization and Inflation Select the Gaspari Cohn localization half-width: 0.125 0.15, 0.175, 0.2, 0.25, 0.4, infinite. and fixed multiplicative inflation: 1.0, 1.02, 1.04, 1.08, 1.16, 1.32, 1.64. that gives minimum time mean posterior RMSE. (7x7=49 possibilities checked). Find best localization/inflation pair for each of the six treatments. pg 20

Lorenz-96 Tests 40 state variables. 40 randomly located observing stations around unit circle. (! " is state interpolated to a station location). Explored 4 different forward operators: Identity: # " =! ", Square root: # " = %&'! "! ", Square: Cube: # " =! ( ", # " =! ) ". pg 21

Lorenz-96 Tests: Statistical Significance Select rank regression treatment with lowest RMSE: This picks inflation, localization, and observation space filter (EAKF, EnKF, or RHF). Select linear regression treatment with lowest RMSE. Apply each to ten 5000-step assimilation cases. Result is 10 pairs of RMSE for rank versus linear regression. Paired T-test used to establish significance of RMSE differences. pg 22

Identity Forward Operator Results Observation error variance 1.0 pg 23

Identity Forward Operator Results Observation error variance 1.0 Circle: EnKF best pg 24

Identity Forward Operator Results Observation error variance 1.0 Triangle: RHF best Circle: EnKF best pg 25

Identity Forward Operator Results Observation error variance 1.0 Small Marker: P-value > 0.01, Not significant pg 26

Identity Forward Operator Results Observation error variance 1.0 Large Marker: P-value < 0.00001 Small Marker: P-value > 0.01 Not significant pg 27

Identity Forward Operator Results Observation error variance 1.0 Triangle: RHF best Circle: EnKF best pg 28

Identity Forward Operator Results Observation error variance 2.0 Triangle: RHF best Circle: EnKF best pg 29

Identity Forward Operator Results Observation error variance 4.0 Triangle: RHF best Circle: EnKF best pg 30

Identity Forward Operator Results Observation error variance 16.0 Triangle: RHF best Circle: EnKF best pg 31

Summary: Identity Forward Operator Results RHF better for larger RMSE, EnKF for smaller. Linear regression always better than rank regression. Triangle: RHF best Circle: EnKF best pg 32

Square Root Forward Operator Results Observation error variance 0.25 Triangle: RHF best Circle: EnKF best pg 33

Square Root Forward Operator Results Observation error variance 0.25 Red: Rank regression best Blue: Linear regression best pg 34

Square Root Forward Operator Results Observation error variance 0.5 pg 35

Square Root Forward Operator Results Observation error variance 1.0 pg 36

Square Root Forward Operator Results Observation error variance 2.0 pg 37

Summary: Square Root Forward Operator Results RHF with rank regression better for larger RMSE. EnKF with linear regression better for smaller RMSE. Blue Circle: EnKF with linear regression best Red Triangle: RHF with rank regression best pg 38

Square Root Forward Operator: Ensemble Size 20 Members: EAKF with linear regression always best. Blue x : EAKF with linear regression best pg 39

Square Root Forward Operator: Ensemble Size 40 Members: RHF with rank regression for large RMSE. EAKF/linear regression for intermediate RMSE. EnKF/linear for smaller RMSE. pg 40

Square Root Forward Operator: Ensemble Size 80 Members: RHF with rank regression for large RMSE. EnKF/linear for smaller RMSE. pg 41

Square Root Forward Operator: Ensemble Size 160 Members: RHF with rank regression for large RMSE. EnKF/linear for smaller RMSE. pg 42

Summary: Cube Forward Operator Results RHF with rank regression better for larger RMSE. EnKF with linear regression better for smaller RMSE. Blue Circle: EnKF with linear regression best Red Triangle: RHF with rank regression best pg 43

Summary: Square Forward Operator Results RHF/linear better for larger RMSE. Mixed for smaller RMSE. pg 44

Summary: Square Forward Operator Results RHF/linear better for larger RMSE. Mixed for smaller RMSE. Observation operator is not invertible! pg 45

Computational Cost Computational cost for the state variable update: Base regression: O(m 2 n) Rank regression: O(m 2 nlogn) m: sum of state size plus number of observations, n: ensemble size. Average rank cost can be made O(m 2 n) with some work (smart sorting and searching). Good for GPUs (more computation per byte). pg 46

Conclusions Non-Gaussian observation space methods can be superior to Gaussian. Nonlinear regression can be superior in cases of strong nonlinearity. Other nonlinear regressions (e.g. polynomial) can be effective. Ensemble method details make significant differences in performance. NWP applications should be carefully developed. pg 47

All results with DART/DARTLAB tools freely available in DART. www.image.ucar.edu/dares/dart Anderson, J., Hoar, T., Raeder, K., Liu, H., Collins, N., Torn, R., Arellano, A., 2009: The Data Assimilation Research Testbed: A community facility. BAMS, 90, 1283 1296, doi: 10.1175/2009BAMS2618.1 pg 48

Multi-valued, not smooth example. Similar in form to a wind speed observation with state velocity component. pg 49

Multi-valued, not smooth example. Standard regression does not capture bimodality of state posterior. pg 50

Multi-valued, not smooth example. Rank regression nearly identical in this case. pg 51