CHAPTER II EXPLORATORY FACTOR ANALYSIS

Size: px
Start display at page:

Download "CHAPTER II EXPLORATORY FACTOR ANALYSIS"

Transcription

1 CHAPTER II EXPLORATORY FACTOR ANALYSIS Section I - Introduction: In maximum likelihood estimation and hypothesis testing, the true values of the model parameters are viewed as fixed but unknown and the estimates of those parameters from a given sample are viewed as random but known. An alternative kind of statistical inference called the Bayesian approach, views any quantity that is unknown as a random variable and assigns it a probability distribution. From a Bayesian standpoint, true model parameters are unknown and therefore considered to be random and they are assigned a joint probability distribution. This distribution is not meant to suggest that the parameters are varying or changing in some fashion. Rather, the distribution is intended to summarize the state of knowledge, or what is currently known about the parameters. The distribution of the parameters before the data are seen is called a prior distribution. Once the data are observed, the evidence provided by the data is combined with the prior distribution by a well-known formula called Bayes Theorem. The result is an updated distribution for the parameters, called a posterior distribution, which reflects a combination of prior belief and empirical evidence. Exploratory factor analysis contains in this chapter Markov chain Monte Carlo (MCMC) using a new class of simulation techniques, high-dimensional joint posterior distribution, Maximum Likelihood Analysis, Regression Weights, Intercepts, covariance, and variance towards the discussions of first model of this thesis, namely Bayesian estimation. Section II - Bayesian Estimation: 2.2. Bayesian Analysis: Human beings tend to have difficulty visualizing and interpreting the joint posterior distribution for the parameters of a model. Therefore, when performing a Bayesian analysis, one needs summaries of the posterior distribution that are easy to interpret. 8

2 A good way to start is to plot the marginal posterior density for each parameter, one at a time. Often, especially with large data samples, the marginal posterior distributions for parameters tend to resemble normal distributions. The mean of a marginal posterior distribution, called a posterior mean, can be reported as a parameter estimate. The posterior standard deviation, the standard deviation of the distribution, is a useful measure of uncertainty similar to a conventional standard error. The analogue of a confidence interval may be computed from the percentiles of the marginal posterior distribution; the interval that runs from the 2.5 percentile to the 97.5 percentile forms a Bayesian 95% credible interval. If the marginal posterior distribution is approximately normal, the 95% credible interval will be approximately equal to the posterior mean ±.96 posterior standard deviations. In that case, the credible interval becomes essentially identical to an ordinary confidence interval that assumes a normal sampling distribution for the parameter estimate. If the posterior distribution is not normal, the interval will not be symmetric about the posterior mean. In that case, the Bayesian version often has better properties than the conventional one. Although the idea of Bayesian inference dates back to the late 8 th century, its use by statisticians has been rare until recently. For some, reluctance to apply Bayesian methods stems from a philosophical distaste for viewing probability as a state of belief and from the inherent subjectivity in choosing prior distributions. But for the most part, Bayesian analyses have been rare because computational methods for summarizing joint posterior distributions have been difficult or unavailable. Using a new class of simulation techniques called Markov chain Monte Carlo (MCMC), however, it is now possible to draw random values of parameters from high-dimensional joint posterior distributions, even in complex problems. With MCMC, obtaining posterior summaries becomes as simple as plotting histograms and computing sample means and percentiles. 9

3 2.2.2 Selecting Priors: A prior distribution quantifies the researcher s belief concerning where the unknown parameter may lie. Knowledge of how a variable is distributed in the population can sometimes be used to help researchers select reasonable priors for parameters of interest. If the test is given to participants in a study who are fairly representative of the general population, then it would be reasonable to center the prior distributions for the mean and standard deviation of the test score at 00 and 5, respectively. Knowing that an observed variable is bounded may help any one to place bounds on the parameters. Prior distributions for the mean and variance of this item can be specified to enforce these bounds. In many cases, researchers would like to specify prior distribution that introduces as little information as possible, so that the data may be allowed to speak for themselves. A prior distribution is said to be diffuse if it spreads its probability over a very wide range of parameter values. By default, Amos applies a uniform distribution from to to each parameter. Diffuse prior distributions are often said to be non-informative, and any one will use that term as well. In a strict sense, however, no prior distribution is ever completely non- informative, not even a uniform distribution over the entire range of allowable values, because it would cease to be uniform if the parameter were transformed (if the variance of a variable is uniformly distributed from 0 to, then the standard deviation will not be uniformly distributed). Every prior distribution carries with it at least some information. As the size of a dataset grows, the evidence from the data eventually swamps this information, and the influence of the prior distribution diminishes. Unless a sample is unusually small or if a model and/or prior distribution are strongly contradicted by the data, one will find that the answers from a Bayesian analysis tend to change very little if the prior is changed. Amos makes it easy for any one to change the prior distribution for any parameter, so he can easily perform this kind of sensitivity check. 20

4 2.2.3 Gaussian graphical model: Former and present researchers are concerned with Datasets d in which a large number p (example, tens of hundreds) of variables is recorded and the sample size n is relatively small (example, tens or possibly hundreds of observations). Through suitable transformations, d can sometimes be assumed to roughly follow a multivariate Gaussian distribution N p (0, Σ). Directly attempting to fit this apparently simple model with p.(p+) / 2 parameters represented by the entries of Σ raises challenging questions of structuring and dimensionality reduction in parameter space. Dempster (972) introduced the idea of reducing the number of parameters that need to be estimated by setting to zero selected elements of the precision matrix Ω = (Σ ). This can, and generally will, lead to more robust estimates of Σ if Ω is required to have a substantial number of structural zeros. In addition, the dependency patterns among the variables in d can be visually summarized by means of an undirected independence graph G in which each variable is associated with a vertex and the edges that link the vertices are the off-diagonal elements of Ω that are not constrained to be zero. The resulting Gaussian distribution satisfies a set of conditional independence relations encoded by G.These relations are called the pairwise, local and global Markov properties, while the pair M = (Σ, G) is called a Gaussian graphical model (Lauritzen (996). This model is undirected since the edges in G are lines that represent symmetric associations Directed acyclic graph for data: Regarding in performing covariance selection (Dempster, 972) with the objective of identifying a number of Gaussian graphical models that are best supported by the data and, in a Bayesian framework, by the prior information available, inference on the parameters of Σ (equivalently, Ω) can consequently be done by Bayesian model averaging a cross the pool of models selected. Searching for graphical models with tens of thousands of nodes is an extremely difficult task, statistically and computationally, due to the vast space of possible graphs that 2

5 needs to be explored. The majority of the structural learning methods developed so far involve exploring the target space by sequentially adding (deleting) one or more edges to (from) the current graph. In the special case when decomposable graphs G are the only graphs considered, the search space is considerably reduced and there exist conjugate prior distributions for the parameters of M = (Σ, G) (Lauritzen,996) that lead to exact formulas for the marginal likelihood of M, p (d M). Unfortunately there are two major shortcomings that make decomposable graphs less desirable: (i) the learning procedure is slowed down by the need of determining what edges can be changed so that the resulting graph is still decomposable; and, much more importantly, (ii) the decomposability constraint is simply too severe to yield models that are representative for the complex dependency patterns that exist among the variables in d in other than rather small dimensional problems. In the most general case when all the possible graphs are considered, numerical or stochastic methods for approximately computing the posterior probability of M need to be employed (Roverato, 2002; Atay-Kayis and Massam, 2003; Dellaportas et al., 2004) which result in search procedures that cannot efficiently cover huge sets of graphs For a comprehensive review of learning Gaussian graphical models in moderately large datasets, see Jones et al. (2004). An alternative method of performing covariance selection is to exploit the connection between graphical models on undirected graphs and graphical models on directed acyclic graphs (DAGs, henceforth). The latter distributions follow the order Markov property relative to their underlying DAGs and further obey the Markov properties with respect to the moralized undirected versions of these DAGs (Lauritzen, 996). A DAG is a convenient graphical structure that induces a recursive factorization of the joint density as a product of univariate regressions associated with each variable. 22

6 Variables are linked in a DAG with arrows instead of lines. An arrow points from an explanatory variable (the parent) to the response (the child). The decomposition of a multivariate joint distribution induced by a DAG is a straightforward generalization of the usual chain rule and yields exact formulas for computing the marginal likelihood of the corresponding model (Heckerman and Geiger, 995; Geiger and Heckerman, 2002). Thus DAG models have properties similar to those of graphical models on decomposable graphs. Actually, any decomposable graph can be transformed in a DAG using an ordering generating by the maximum cardinality search algorithm (Lauritzen, 996) which implies that the class of decomposable graphs is included in the class S of undirected graphs that can be obtained by moralizing (ensuring an edge exists between the parents of each child) and replacing the arrows with undirected edges. Searching the space of DAGs can be done using local moves involving the addition, deletion or reversal of arrows. Unfortunately, methods based on local moves can spend much time traversing DAGs that are statistically equivalent (Heckerman et al., 994). Two equivalent DAGs describe the same joint distribution and consequently the same Markov relations. Chickering (995) presented characterizations of equivalent DAGs and introduce search algorithms that jump between equivalence Markov classes. These search methods, although proven to be better than simple local moves-based algorithms, are, unfortunately, simply not efficient enough to scale to datasets with tens of thousands of variables. A novel framework for constructing high-dimensional Gaussian graphical models by searching for graphical models on DAGs was presented. This approach builds on the methods introduced in Dobra and West (2004a) and is guaranteed to eventually converge to local optima in the space of undirected graphs S, in the sense of identifying local modes in posterior distributions over S based on dataset d. 23

7 2.2.5 HdBCS: The related works were presented a novel structural learning method called HdBCS that performs covariance selection in a Bayesian framework for datasets with tens of thousands of variables. HdBCS is based on the intrinsic connection between graphical models on undirected graphs and graphical models on directed acyclic graphs (Bayesian networks). There was a model show how to produce and explore the corresponding association networks by Bayesian model averaging across the models identified. The use of HdBCS with an example from a large-scale gene expression study of breast cancer was illustrated by many researchers. Section III - Bayesian Estimation for Blood Cancer: 2.3. Introduction: In some previous works of Bayesian analysis of zero-inflated count data with applications to dental caries by Dipankar Bandyopadhyay, experimental and observational studies in high-throughput genomics often generate multiple gene expression signatures, each signature being a list of genes with associated numerical measures of change in gene expression relative to an experimental condition or outcome. A biological or environmental design factor in a controlled experiment generates a signature of response to that factor (Huang et al., 2003c,b; Bild et al., 2006; Chen et al., 2007), while evaluation of gene expression related to a specific clinical outcome or state may generate a signature as a biomarker of the outcome in disease studies (West et al., 200; Huang et al., 2002, 2003a; Pittman et al., 2004; Seo et al., 2004; Rich et al., 2005; Seo et al., 2007). Interpretation and, often, follow on biological studies rely on the comparison of such signatures with multiple, annotated biological pathway databases that contain lists of putatively pathway-specific genes based on cumulated biological research. A core challenge is then to assess the candidate signature gene sets and numerical summaries against these databases to suggest potential pathway interpretations and connections. The focus here was a formal, novel statistical modeling approach to this problem. 24

8 2.3.2 Gene set enrichment analysis (GSEA): The first statistical approach, and general identification of this problem area, led to the method of gene set enrichment analysis (GSEA) (Subramanian et al., 2005) and has generated some deeper statistical approaches more recently (Newton et al., 2007). GSEA aims to measure aggregate association between a full list of genes ranked by their association with an outcome also referred to as a phenotype, and a set of genes in a predefined pathway gene set. The underlying idea is to assess whether or not the pathway gene set is enriched with genes that score highly in association with the experimental outcome, perhaps with a directional component that looks separately at genes positively versus negatively associated. GSEA was path-breaking and is now quite widely used. In previous applied work, broader questions were interested and also in formal statistical inference on gene-pathway membership, and this have motivated a formal probabilistic framework that extends the basic thinking into a broader statistical approach. The resulting probabilistic pathway annotation (PROPA) methodology then also addresses a number of issues GSEA methods were not designed for, including the abilities to: (a) deliver formal probabilistic assessments of phenotype-pathway concordance, in terms of marginal likelihoods and posterior probabilities; (b) formally assess concordance of experimental results with several or many biological pathways simultaneously and in comparison with each other;(c) recognize that experimental inferences and established biological pathway databases are error prone, and allow for the identification and correction of errors of both kinds within the analysis; (d) utilize a range of direct numerical measures of association between genes and an experimental outcome as inputs; and (e) provide a more general framework that can be customized to apply to the outputs of gene expression, or other genomic studies of many forms. In addition to withinanalysis robustness, item (c) here also leads to an ability to suggest refinements to the pathway gene lists in established biological databases. 25

9 The corresponding focus was on applications in cancer genomics. While the primary aim of the research is to highlight the area and applications, the statistical methodology has modeling and computational novelty. A core ingredient of biological pathway assessment is the evaluation of marginal likelihoods in Bayesian models fitted using MCMC methods. Marginal likelihood computations are common and often hard problems (Raftery et al. (2007) for a recent approach with discussion and many references to other approaches), especially in cases, such as here, of high-dimensional parameter spaces. The favored approach involves a novel extension of variational methods that have been applied in other problems of marginal likelihood computation (example, Jordan et al. (999); Corduneanu and Bishop (200); McGrory and Titterington (2007)); in addressing this problem in our specific applied context, an extension of existing variational methods that was introduced, will apply in many other model contexts. It describes the overall MCMC strategy for posterior simulation in an analysis focused on a single biological pathway, and the developments of computational methods for marginal likelihood computation to aid in comparisons of multiple biological pathways. This includes the innovations in variational methodology. It explores examples to highlight the specification and use of the model. The first cancer genomics application concerns a detailed study of two well-known hormonal pathways in blood cancer. The second application concerns novel experimental data arising in studies of micro-environmental influences on gene expression from in vitro experiments, and connects these experimental findings to in vivo observational blood cancer data. Among other things, this case study demonstrates an overall strategy for in vitro to in vivo projection of gene expression patterns within which PROPA analysis plays key roles. Related to above informations in research works, final finding by researchers were as follows: The Gullah-speaking inhabitants of the Sea Islands of South Carolina are a unique 26

10 population because of their minimal Caucasian genetic admixture and high propensity for diabetes. A clinical study was conducted to determine their dental health status of diabetic patients. Dental caries was assessed using the total number of decayed, missing and filled surfaces, an index known as DMFS in the dental literature. Data resulted from examining 4 (for canines and incisors) or 5 (for premolars and molars) surfaces per tooth, for all (up to 32) teeth, for over 260 individuals. Also recorded were covariates including age, gender, smoking and brushing/flossing habits, etc., which may influence caries development. Then the tooth-level contributions to DMFS, which range from 0 to 5, and evaluate associations with covariates, were already modeled. Histograms suggest a zero-inflated binomial model for the tooth-level counts. As in a Hurdle Model, the process determining a healthy tooth (with a count of 0) is treated as a structural zero and hence separated from the remaining counts ( to 5), which are modeled using a zero-truncated binomial distribution. A multivariate model where covariates enter through a random effects logistic regression on the logit of the probability of a carious surface, was developed. To preserve marginal logit structure for interpretability, a bridge density (Wang and Louis, 2003) for the subject-specific random effects was used. The tooth-specific zero-inflation probability is modeled as arising from a beta distribution whose shape/scale parameters are linked to the odds of a healthy tooth (Song et al., 2006). The model with alternatives to assess improvements in stating prediction and interpretability was modeled Probabilistic pathway annotation: Next discussion was presented about Bayesian models and computational methods for the problem of matching predictions from molecular studies with known biological pathway databases, and the problem of pathway annotation of summary results of an experiment or observational study. In areas such as cancer genomics, 27

11 linking quantified, experimentally defined gene expression signatures with known biological pathway gene sets is essential to improving the understanding of the complexity of molecular pathways related to outcome. The probabilistic pathway annotation (PROPA) analysis involves new models for formal assessment and rankings of pathways putatively linked to an experimental or observational phenotype. Integrates qualitative biological information into the analysis and generates coherent inferences on uncertainties about gene pathway membership that can inform the revision of pathway databases. The analysis in final works mentioned above relied on simulation-based computation in high-dimensional models, and introduced a novel extension of variational methods for computation of model evidence, or marginal likelihood functions, that were central to the comparison of multiple biological pathways. Examples highlight the methodology using both simulated and real data, and detailed cases studies in breast cancer genomics involving hormonal pathways and pathway activities underlying cellular responses to lactic acidosis in breast cancer were developed. The second study demonstrated the application of the method in decomposing the complexity of gene expression-based predictions about interacting biological pathway activation from both experimental (in vitro) and observational (in vivo) human cancer data Introduction for first model: The first statistical approach, and general identification of a problem, led to the method of Bayesian estimation, and has generated some deeper statistical approaches more recently in exploratory factor analysis in AMOS. It aims to measure aggregate association between a full list of pair of variables ranked by their association with an outcome also referred by its effect, and a set of all statistical measures are analyzed. 28

12 The underlying idea is to assess whether or not the pathway is enriched with all components of a cancer among blood cancer, breast cancer, and primary tumor. It is discussed a way that score highly in association with the experimental outcome, perhaps with a directional component that looks separately at each component of the a cancer, and is now quite widely used. Our first model is concerned with datasets d in which a large number p (tens of hundreds) of variables is recorded and the sample size n is relatively small (tens). Our first model is Bayesian estimation from exploratory factor analysis. To illustrate Bayesian estimation using Amos Graphics, an example is explained, and it shows how to test the null hypothesis that the covariance between two variables is 0 by fixing the value of the covariance between age and vocabulary to 0. This is the resulting path diagram: Chi-square = \ c min ( \ df df) P = \ p Age 0 Vocabulary Maximum Likelihood Analysis: Before performing a Bayesian analysis of our model, a maximum likelihood analysis for comparison purposes is performed using AMOS software for calculating estimates to display the following parameter estimates and standard errors. Our first model Bayesian Estimation contains the following tables (. to.9) including F-F2 diagram Amos displays Estimates, Scalar Estimates, Maximum Likelihood Estimates, and Regression Weights Table. 29

13 Estimate S.E. C.R. P Class <--- F.000 Age <--- F Lymphatics <--- F2.000 Affere <--- F Lymphc <--- F Lymphs <--- F Extravasates <--- F regeneration <--- F Earlyup <--- F ly.no.dim <--- F ly.no.en <--- F ch.in.lym <--- F Defect <--- F changeinnode <--- F changesinstru <--- F specialforms <--- F dislocationof <--- F exclusionofno <--- F no.ofnodesin <--- F Intercepts: Table.2 Estimate S.E. C.R. P Class *** Age *** Lymphatics *** Affere *** Lymphc *** Lymphs *** Extravasates *** Regeneration *** Earlyup *** ly.no.dim *** ly.no.en *** ch.in.lym *** Defect *** changeinnode *** changesinstru *** specialforms *** dislocationof *** exclusionofno *** no.ofnodesin *** 30

14 Covariances: - Table.3 Estimate S.E. C.R. P Label F2 <--> F C Variances: Table.4 Estimate S.E. C.R. P F F e e *** e *** e *** e *** e *** e *** e *** e *** e *** e *** e *** e *** e *** e *** e *** e *** e *** e *** Bayesian Analysis: It requires estimation of explicit means and intercepts. Before performing any Bayesian analysis in Amos, one must first tell Amos to estimate means and intercepts. Then only Bayesian SEM window appears, and the MCMC algorithm immediately begins generating samples. Therefore F-F2 diagram is then obtained after analyzing the tables in Amos. In it, F contains class & age, and F2 includes all other components for blood cancer. Here is the regression weight for all components. For e, 0.2 is the mean in variance table (.9), and 2.45 is the regression coefficient (standard loading) with class in intercepts table (.7). 0.3 is the regression coefficient between age and F. Similarly, mean in variance table (.9), and regression coefficient for every pair are observed from table (.5). 3

15 F-F2 diagram (Table.5).07 0,.20 0,.30 e e class age ,.3 F F lymphatics affere.55.8 lymphc.05 lymphs earlyup ,.29.04ly.no.dim extravasates regeneration ly.no.en ch.in.lym.6.93 defect e3 e4 e5 e6 e7 e8 e9 e0 e e2 e3 changeinnodee changesinstrue specialforms e6.66 dislocationof e7.79 exclusionofnoe no.ofnodesine9 0,.62 0,.25 0,.4 0,.04 0,.22 0,.05 0,.8 0,.09 0,.47 0,.3 0,.65 0,.54 0, ,.39 0,.9 0,.3 0,.46 32

16 Table.6 Mean S.E. S.D. C.S. Median Regression weights 95% Lower bound 95% Upper bound Skewness Kurtosis Min Max age<--f affere<--f lymphc<--f lymphs<--f extravasates<--f regeneration<--f earlyup<--f ly.no.dim<--f ly.no.en<--f ch.in.lym<--f defect<--f changeinnode<--f changesinstru<--f specialforms<--f dislocationof<--f exclusionofno<--f no.ofnodesin<--f Intercepts- Table.7 Class Age Lymphatics Affere Lymphc Lymphs Extravasates Regeneration Earlyup ly.no.dim ly.no.en ch.in.lym Defect Changeinnode Changesinstru Specialforms Dislocationof Exclusionofno no.ofnodesin Covariances Table.8 F2<->F

17 Variances- Table.9 F F e e e e e e e e e e e e e e e e e e e : F-F2 diagram: The Bayesian SEM window has a toolbar near the top of the window and has a results summary table below. Each row of the summary table describes the marginal posterior distribution of a single model parameter. The first column, labeled Mean, contains the posterior mean, which is the center or average of the posterior distribution. This can be used as a Bayesian point estimate of the parameter, based on the data and the prior distribution. With a large dataset, the posterior mean will tend to be close to the maximum likelihood estimate. (In this case, the two are somewhat close; compare the posterior mean of for the age-vocabulary covariance to the maximum likelihood estimate of Bayesian SEM window: When Analyze Bayesian Estimation is chosen in Amos, the MCMC algorithm begins sampling immediately, and it continues until the Pause Sampling button is clicked to halt the process. Sampling was halted after 40 completed 34

18 samples. Amos generated and discarded 500 burn-in samples prior to drawing the first sample that was retained for the analysis. Amos draws burn-in samples to allow the MCMC procedure to converge to the true joint posterior distribution. After Amos draws and discards the burn-in samples, it draws additional samples to give us a clear picture of what this joint posterior distribution looks like. Amos has drawn 5,000 of these analysis samples, and it is upon these analysis samples that the results in the summary table are based. Actually, the displayed results are for analyzing 725 samples. Because the sampling algorithm Amos uses is very fast, updating the summary table after each sample would lead to a rapid, incomprehensible blur of changing results in the Bayesian SEM window. It would also slow the analysis down. To avoid both problems, Amos refreshes the results after every 250 samples. The above tables are for first model if F-F2 diagram and other tables are fitting to diagnose a patient in blood cancer Conclusion: Maximum likelihood estimations for all types of Blood Cancer are near to zero. Further the posterior means are near to zero which is Bayesian point estimate for each parameter, based on dataset in the Blood Cancer and the prior distribution. The disease is to diagnostic through this model, and the expected life of a patient is also calculated from means of patients in data. Our model is strongly diagnosed for a patient in blood cancer having component namely Affearc or lymph s with mean 0-, component like as lymph; c, extravasatee, regeneration of, early uptake in, lymnodes dimin, change in lym, number of nodes in, or dislocation, exclusion of node with mean 2-4. It is further diagnosed for a patient in blood cancer having component like as lymnodes enlar, defect in node, changes in structure, special forms with mean 4-8. It also is diagnosed for a patient in blood cancer having remaining components with mean above 8. 35

19 The posterior mean will tend to be close to the maximum likelihood estimate. In this case, two are some what close: compare the posterior mean of for the age - vocabulary covariance to the maximum likelihood estimate of So the first model fits to diagnostic a patient in Blood cancer. If F-F2 diagram with Bayesian Estimate standard error and intercept mean for independent and dependent variable is obtained, then only the other models are best fitness to diagnostic Blood cancer for a patient from chapter III to Chapter VII Convergence statistic for blood cancer: The next two investigations of the first model for a patient in blood cancer are Standard error (S.E) and convergence statistic (C.S). The second column, labeled S.E., of our first model in blood cancer reports an estimated standard error that suggests how far the Monte-Carlo estimated posterior mean may lie from the true posterior mean. As the MCMC procedure continues to generate more samples, the estimate of the posterior mean becomes more precise, and the S.E. gradually drops. Note that this S.E. is not an estimate of how far the posterior mean may lie from the unknown true value of the parameter. One would not use ± 2 S.E. values as the width of a 95% interval for the parameter. Additional columns of first model in blood cancer contain the convergence statistic (C.S.), the median value of each parameter, the lower and upper 50% boundaries of the distribution of each parameter, and the skewness, kurtosis, minimum value, and maximum value of each parameter. The lower and upper 50% boundaries are the endpoints of a 50% Bayesian credible set, which is the Bayesian analogue of a 50% confidence interval Conclusion: For collected information in blood cancer and from Tables. to.9, there is no significant difference between posterior mean and the true posterior mean, and the difference is near to zero expect ly.no.enc, defect in node, and no.of node in. The likely distance between the posterior mean and the unknown true parameter is reported in the third column, labeled S.D., and that number is analogous to the standard error in maximum likelihood estimation. 36

20 Most of us are accustomed to using a confidence level of 95%, so we will soon show one how to change to 95% from convergence analysis. Section IV: Bayesian estimation for breast cancer 2.4. Introduction: In a research work, decision-analytical models are widely used in economic evaluations of health care interventions with the objective of providing information to allow scarce health care resources to be allocated efficiently. Such models have a range of uses including the synthesis of data from a variety of sources (often using meta-analysis) to produce the cost or cost-effectiveness results of interest. Researchers are often used to evaluate the complex process usually associated with the implementation of health care interventions. Examples of instances when decision modeling techniques may be of value include the extrapolation of primary data beyond the endpoint of a trial or to make comparisons between treatments for which no head-to-head trials exist. Decision trees provide a simple way to structure problems of decision making under uncertainty whilst describing the major factors involved. More complicated decision trees can be represented in the form of Markov models. Such models provide a technique for analyzing events that arerepeatable (example: relapses of a chronic disease such as multiple sclerosis, arthritis and asthma), or events that play out over an extended period of time (example: the progression of cancer). To evaluate decision-analytical models, estimates need to be acquired for the costs and health outcomes of the various pathways through the model together with the probability of their occurrence. It should not be forgotten that the usefulness of the results obtained from such models depends on the source and quality of the estimates input into the model. 37

21 2.4.2 Decision-analytical models: They are sometimes based on primary data collection, but more often rely on published or other secondary sources for cost and effectiveness information. Systematic review methods are a formal and replicable approach to identifying and summarizing existing evidence. When data permit, quantitative synthesis of the evidence often referred to as metaanalysis can be conducted within a systematic review. The uses of systematic methods for evidence synthesis are desirable for the evaluation of health care to be truly evidence-based and hence information for decision models should be based on such rigorous methods. However, very little has been written on the methods of systematic reviews (including meta-analysis) to be used for the synthesis of evidence for an economic decision. It is currently unclear what sources of evidence should be included in systematic reviews informing decision models (example should both RCTs and observational studies be included in the same analysis). This is a particularly pertinent issue in economic decision modeling since estimates of costs and probabilities are required in addition to clinical effectiveness, Probabilistic decision models: They used in economic evaluation are almost exclusively analyzed using classical statistical approaches (two of the rare exceptions are Parmigiani et al. (997) and Fryback et al. (200)). Such models place probability distributions on parameters where there is uncertainty in their true value. These can be derived from the results of individual studies, or, more desirably, the results of systematic reviews. Parametric distributional assumptions are necessary when specifying parameter uncertainty, but occasions do exist when such assumptions may be inappropriate, such as when events are rare. The Bayesian analyses described herein relax the need for some of these distributional assumptions. Further, by combining the synthesis and decision process into one coherent model, the Bayesian approach described here incorporates uncertainty in incidental model parameters, that need estimating but are not of direct interest in the decision 38

22 model, that is often ignored in classical analyses (the between study variance parameters in meta-analyses are examples of these, as will be shown later). An additional advantage of the Bayesian approach is that the correlation between parameters induced by the fact that the same data sources (example: systematic review may be used to propagate different parts of the model is automatically accounted for) Probabilistic decision analytical models: Fryback et al. (200) outlined how these simple models may be evaluated using Bayesian methods. The previous works were to extend the above by describing a method whereby the whole process (systematic review incorporating meta-analyses, estimation of transition probabilities, and evaluation of model and sensitivity analysis) may be combined into a single coherent Bayesian model. The ease of applying such a method is demonstrated through the use of two illustrative examples: ) The prophylactic use of antibiotics for caesarean section patients to reduce the incidence of wound infections; and 2) The use of taxanes for the second-line treatment of advanced breast cancer compared to conventional treatment. Examples were given to illustrates the process of inputting the pooled estimates obtained from a systematic review (meta-analyses), together with their associated uncertainty, directly into a probabilistic decision analytical model, and to illustrates the situation whereby the pooled estimates obtained from a systematic review, together with their associated uncertainty, are initially converted to transition probabilities and then applied to a probabilistic Markov decision model. The incorporation of subjective/expert prior beliefs is also illustrated in the latter example. 39

23 2.4.5 Decision analytical economic modeling: Towards the model within a Bayesian framework, the some final works were as follows: Economic evaluation of health care interventions based on decision analytic modeling can generate valuable information for health policy decision-makers. However, the usefulness of the results obtained depends on the quality of the data input into the model, and the accuracy of the estimates for the costs, effectiveness and transition probabilities between the different health states of the model. Few models are to demonstrate how the individual components required for decision analytical modeling (systematic review incorporating meta-analyses, estimation of transition probabilities, evaluation of the model and sensitivity analysis) may be addressed simultaneously in one coherent Bayesian model evaluated using Markov Chain Monte Carlo simulation implemented in the specialist Bayesian statistics software WinBUGS. The approach described is applied to two illustrative examples: ) The prophylactic use of antibiotics for caesarean section patients; and 2) The use of taxanes for the second-line treatment of advanced breast cancer. The advantages of using the Bayesian statistical approach outlined compared to the conventional classical approaches to decision analysis include the ability to: (i) perform all necessary analyses, including all intermediate analyses (example, meta-analyses) required to derive model parameters, in a single coherent model; (ii) incorporate expert opinion either directly or regarding the relative credibility of different data sources; (iii) use the actual posterior distributions for parameters of interest (opposed to making distributional assumptions necessary for the classical formulation); and (iv) incorporate uncertainty for all model parameters Bayesian estimation for breast cancer As the same procedure in blood cancer, Amos displays Estimates, Scalar Estimates, Maximum Likelihood Estimates, and Regression Weights Tables.0 to.8. 40

24 Table.0 Estimate S.E. C.R. P Class <--- F.000 Age <--- F me2pause <--- F2.000 Tumorsize <--- F inv2des <--- F Decaps <--- F Degmalig <--- F Breast <--- F breastquad <--- F Irradiat <--- F Intercepts: Covariances Table.2 Table. Estimate S.E. C.R. P Class *** Age *** me2pause *** Tumorsize *** inv2des *** Decaps *** Degmalig *** Breast *** breastquad *** Irradiat *** Estimate S.E. C.R. P F2 <--> F

25 2.4.0 Variances: - Table. 3 Estimate S.E. C.R. P F F e e *** e *** e *** e *** e *** e *** e *** e *** e *** 2.4. Bayesian Analysis: Bayesian analysis requires estimation of explicit means and intercepts. Before performing any Bayesian analysis in Amos, any one must first tell Amos to estimate means and intercepts. Further F-F2 diagram is then obtained after analyzing the tables in Amos. In it, F contains class & age, and F2 includes all other components for breast cancer. Here is the regression weight for all components. For e, 0.4 is the mean in variance table (.8), and.30 is the regression coefficient (standard loading) with class in intercepts table (.6) is the regression coefficient between age and F. Similarly, mean in variance table (.8), and regression coefficient for every pair are observed from table (.4). 42

26 F-F2 diagram (Table. 4) 0,.4 0,.00 e e class age ,.07 F e3 0,.30 0, ,.58 0,.07 0,.42 0,.25 0,.43 0, e e5 e6 e7 e8 e9 e me2pause tumorsize inv2des decaps degmalig breast breastquad irradiat , F2 43

27 Table. 5 Mean S.E. S.D. C.S. Median Regression weights 95% Lower bound 95% Upper bound Skewness Kurtosis Min Max age<--f tumorsize<--f inv2des<--f decaps<--f degmalig<--f breast<--f breastquad<--f irradiat<--f Intercepts -Table.6 Class Age me2pause Tumorsize inv2des Decaps Degmalig Breast Breastquad Irradiat Covariances -Table.7 F2<->F Variances-Table.8 F F e e e e e e e e e e

28 2.4.2: The Bayesian SEM window appears similarly in patients in breast cancer. The above tables are for first model if F-F2 diagram and other tables are fitting to diagnose a patient in breast cancer Conclusion: Maximum likelihood estimations for all types of Breast Cancer are near to zero. Further the posterior means are near to zero which is Bayesian point estimate for each parameter, based on dataset in the Breast Cancer and the prior distribution. The disease is to diagnostic through this model in breast cancer, and the expected life of a patient is also calculated from means of patients in data. The first model of the thesis is strongly diagnosed for a patient in breast cancer having component namely nodecaps, irradiat with negative mean not near to zero. It is further diagnosed for a patient in breast cancer having component like as breast, breastquad with negative mean nearing zero. It also is diagnosed for a patient in breast cancer having remaining components like as tumorsize, inv-nodes and deg-maling with positive mean. The posterior mean will tend to be close to the maximum likelihood estimate. In this case the two are some what close: compare the posterior mean of for the age - vocabulary covariance to the maximum likelihood estimate of So the first model fits to diagnostic a patient in breast cancer. If F-F2 diagram with Bayesian Estimate standard error and intercept mean for independent and dependent variable is obtained, then only the other models are best fitness to diagnostic breast cancer for a patient from chapter III to Chapter VII Standard error and convergence statistic: The same conclusions are obtained for breast cancer as in blood cancer, and the data fit to diagnose for a patient in breast cancer. 45

29 2.4.5 F-F2 diagram: The same conclusions are got to diagnose a patient in breast cancer as in blood cancer. These give the fact that () there is no significant difference between posterior mean and the difference is near to zero except tumor size, inv-nodes.decamps, and degmalig, (2) for these Components and due to prioi assumptions in S.E., the expected life of patients can be calculated, and (3) only then the other models are best fitness to diagnostic Brest cancer for a patient from chapter III to chapter VII. Section V - Bayesian Estimation for Primary tumor cancer 2.5. Amos displays Estimates, Scalar Estimates, Maximum Likelihood Estimates,and Regression Weights: Tables.9 to.27 Table.9 Estimate S.E. C.R. P Class <--- F.000 Age <--- F Sex <--- F *** Type <--- F *** Bone <--- F2.000 Difference <--- F Bonemarrow <--- F Lung <--- F Pleura <--- F Peritoneum <--- F Liver <--- F Brain <--- F Skin <--- F Neck <--- F Supraclavicular <--- F Axillar <--- F Mediastinum <--- F Abdominal <--- F

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

BAYESIAN HYPOTHESIS TESTING WITH SPSS AMOS

BAYESIAN HYPOTHESIS TESTING WITH SPSS AMOS Sara Garofalo Department of Psychiatry, University of Cambridge BAYESIAN HYPOTHESIS TESTING WITH SPSS AMOS Overview Bayesian VS classical (NHST or Frequentist) statistical approaches Theoretical issues

More information

Introduction to Bayesian Analysis 1

Introduction to Bayesian Analysis 1 Biostats VHM 801/802 Courses Fall 2005, Atlantic Veterinary College, PEI Henrik Stryhn Introduction to Bayesian Analysis 1 Little known outside the statistical science, there exist two different approaches

More information

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm

Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Journal of Social and Development Sciences Vol. 4, No. 4, pp. 93-97, Apr 203 (ISSN 222-52) Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Henry De-Graft Acquah University

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL 1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across

More information

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Number XX An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Prepared for: Agency for Healthcare Research and Quality U.S. Department of Health and Human Services 54 Gaither

More information

Bayes Linear Statistics. Theory and Methods

Bayes Linear Statistics. Theory and Methods Bayes Linear Statistics Theory and Methods Michael Goldstein and David Wooff Durham University, UK BICENTENNI AL BICENTENNIAL Contents r Preface xvii 1 The Bayes linear approach 1 1.1 Combining beliefs

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

A Case Study: Two-sample categorical data

A Case Study: Two-sample categorical data A Case Study: Two-sample categorical data Patrick Breheny January 31 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/43 Introduction Model specification Continuous vs. mixture priors Choice

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions J. Harvey a,b, & A.J. van der Merwe b a Centre for Statistical Consultation Department of Statistics

More information

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences. SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods

More information

Bayesian (Belief) Network Models,

Bayesian (Belief) Network Models, Bayesian (Belief) Network Models, 2/10/03 & 2/12/03 Outline of This Lecture 1. Overview of the model 2. Bayes Probability and Rules of Inference Conditional Probabilities Priors and posteriors Joint distributions

More information

Combining Risks from Several Tumors Using Markov Chain Monte Carlo

Combining Risks from Several Tumors Using Markov Chain Monte Carlo University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln U.S. Environmental Protection Agency Papers U.S. Environmental Protection Agency 2009 Combining Risks from Several Tumors

More information

Methods Research Report. An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy

Methods Research Report. An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy Methods Research Report An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy Methods Research Report An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Lec 02: Estimation & Hypothesis Testing in Animal Ecology Lec 02: Estimation & Hypothesis Testing in Animal Ecology Parameter Estimation from Samples Samples We typically observe systems incompletely, i.e., we sample according to a designed protocol. We then

More information

Individual Differences in Attention During Category Learning

Individual Differences in Attention During Category Learning Individual Differences in Attention During Category Learning Michael D. Lee (mdlee@uci.edu) Department of Cognitive Sciences, 35 Social Sciences Plaza A University of California, Irvine, CA 92697-5 USA

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

How do we combine two treatment arm trials with multiple arms trials in IPD metaanalysis? An Illustration with College Drinking Interventions

How do we combine two treatment arm trials with multiple arms trials in IPD metaanalysis? An Illustration with College Drinking Interventions 1/29 How do we combine two treatment arm trials with multiple arms trials in IPD metaanalysis? An Illustration with College Drinking Interventions David Huh, PhD 1, Eun-Young Mun, PhD 2, & David C. Atkins,

More information

Bayesian Joint Modelling of Benefit and Risk in Drug Development

Bayesian Joint Modelling of Benefit and Risk in Drug Development Bayesian Joint Modelling of Benefit and Risk in Drug Development EFSPI/PSDM Safety Statistics Meeting Leiden 2017 Disclosure is an employee and shareholder of GSK Data presented is based on human research

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

Graphical Modeling Approaches for Estimating Brain Networks

Graphical Modeling Approaches for Estimating Brain Networks Graphical Modeling Approaches for Estimating Brain Networks BIOS 516 Suprateek Kundu Department of Biostatistics Emory University. September 28, 2017 Introduction My research focuses on understanding how

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months? Medical Statistics 1 Basic Concepts Farhad Pishgar Defining the data Population and samples Except when a full census is taken, we collect data on a sample from a much larger group called the population.

More information

Bayesian Belief Network Based Fault Diagnosis in Automotive Electronic Systems

Bayesian Belief Network Based Fault Diagnosis in Automotive Electronic Systems Bayesian Belief Network Based Fault Diagnosis in Automotive Electronic Systems Yingping Huang *, David Antory, R. Peter Jones, Craig Groom, Ross McMurran, Peter Earp and Francis Mckinney International

More information

TRIPODS Workshop: Models & Machine Learning for Causal I. & Decision Making

TRIPODS Workshop: Models & Machine Learning for Causal I. & Decision Making TRIPODS Workshop: Models & Machine Learning for Causal Inference & Decision Making in Medical Decision Making : and Predictive Accuracy text Stavroula Chrysanthopoulou, PhD Department of Biostatistics

More information

THE INDIRECT EFFECT IN MULTIPLE MEDIATORS MODEL BY STRUCTURAL EQUATION MODELING ABSTRACT

THE INDIRECT EFFECT IN MULTIPLE MEDIATORS MODEL BY STRUCTURAL EQUATION MODELING ABSTRACT European Journal of Business, Economics and Accountancy Vol. 4, No. 3, 016 ISSN 056-6018 THE INDIRECT EFFECT IN MULTIPLE MEDIATORS MODEL BY STRUCTURAL EQUATION MODELING Li-Ju Chen Department of Business

More information

Ecological Statistics

Ecological Statistics A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents

More information

An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics

More information

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to

More information

Bayesians methods in system identification: equivalences, differences, and misunderstandings

Bayesians methods in system identification: equivalences, differences, and misunderstandings Bayesians methods in system identification: equivalences, differences, and misunderstandings Johan Schoukens and Carl Edward Rasmussen ERNSI 217 Workshop on System Identification Lyon, September 24-27,

More information

Method Comparison for Interrater Reliability of an Image Processing Technique in Epilepsy Subjects

Method Comparison for Interrater Reliability of an Image Processing Technique in Epilepsy Subjects 22nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 2017 mssanz.org.au/modsim2017 Method Comparison for Interrater Reliability of an Image Processing Technique

More information

Examining differences between two sets of scores

Examining differences between two sets of scores 6 Examining differences between two sets of scores In this chapter you will learn about tests which tell us if there is a statistically significant difference between two sets of scores. In so doing you

More information

Ordinal Data Modeling

Ordinal Data Modeling Valen E. Johnson James H. Albert Ordinal Data Modeling With 73 illustrations I ". Springer Contents Preface v 1 Review of Classical and Bayesian Inference 1 1.1 Learning about a binomial proportion 1 1.1.1

More information

Tech Talk: Using the Lafayette ESS Report Generator

Tech Talk: Using the Lafayette ESS Report Generator Raymond Nelson Included in LXSoftware is a fully featured manual score sheet that can be used with any validated comparison question test format. Included in the manual score sheet utility of LXSoftware

More information

Progress in Risk Science and Causality

Progress in Risk Science and Causality Progress in Risk Science and Causality Tony Cox, tcoxdenver@aol.com AAPCA March 27, 2017 1 Vision for causal analytics Represent understanding of how the world works by an explicit causal model. Learn,

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

extraction can take place. Another problem is that the treatment for chronic diseases is sequential based upon the progression of the disease.

extraction can take place. Another problem is that the treatment for chronic diseases is sequential based upon the progression of the disease. ix Preface The purpose of this text is to show how the investigation of healthcare databases can be used to examine physician decisions to develop evidence-based treatment guidelines that optimize patient

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

More information

You must answer question 1.

You must answer question 1. Research Methods and Statistics Specialty Area Exam October 28, 2015 Part I: Statistics Committee: Richard Williams (Chair), Elizabeth McClintock, Sarah Mustillo You must answer question 1. 1. Suppose

More information

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics Mike West Duke University Papers, software, many links: www.isds.duke.edu/~mw ABS04 web site: Lecture slides, stats notes, papers,

More information

Analysis and Interpretation of Data Part 1

Analysis and Interpretation of Data Part 1 Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying

More information

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n. University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose

More information

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1 Lecture 27: Systems Biology and Bayesian Networks Systems Biology and Regulatory Networks o Definitions o Network motifs o Examples

More information

Student Performance Q&A:

Student Performance Q&A: Student Performance Q&A: 2009 AP Statistics Free-Response Questions The following comments on the 2009 free-response questions for AP Statistics were written by the Chief Reader, Christine Franklin of

More information

Section 6: Analysing Relationships Between Variables

Section 6: Analysing Relationships Between Variables 6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations

More information

Mediation Analysis With Principal Stratification

Mediation Analysis With Principal Stratification University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 3-30-009 Mediation Analysis With Principal Stratification Robert Gallop Dylan S. Small University of Pennsylvania

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Advanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill)

Advanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill) Advanced Bayesian Models for the Social Sciences Instructors: Week 1&2: Skyler J. Cranmer Department of Political Science University of North Carolina, Chapel Hill skyler@unc.edu Week 3&4: Daniel Stegmueller

More information

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Sylvia Richardson 1 sylvia.richardson@imperial.co.uk Joint work with: Alexina Mason 1, Lawrence

More information

Bayesian Hierarchical Models for Fitting Dose-Response Relationships

Bayesian Hierarchical Models for Fitting Dose-Response Relationships Bayesian Hierarchical Models for Fitting Dose-Response Relationships Ketra A. Schmitt Battelle Memorial Institute Mitchell J. Small and Kan Shao Carnegie Mellon University Dose Response Estimates using

More information

GUIDELINE COMPARATORS & COMPARISONS:

GUIDELINE COMPARATORS & COMPARISONS: GUIDELINE COMPARATORS & COMPARISONS: Direct and indirect comparisons Adapted version (2015) based on COMPARATORS & COMPARISONS: Direct and indirect comparisons - February 2013 The primary objective of

More information

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Meta-Analyses and Mixed Treatment Comparisons Slide 1 An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics Part 2 Meta-Analyses

More information

Binary Diagnostic Tests Paired Samples

Binary Diagnostic Tests Paired Samples Chapter 536 Binary Diagnostic Tests Paired Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary measures

More information

Numerical Integration of Bivariate Gaussian Distribution

Numerical Integration of Bivariate Gaussian Distribution Numerical Integration of Bivariate Gaussian Distribution S. H. Derakhshan and C. V. Deutsch The bivariate normal distribution arises in many geostatistical applications as most geostatistical techniques

More information

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA PART 1: Introduction to Factorial ANOVA ingle factor or One - Way Analysis of Variance can be used to test the null hypothesis that k or more treatment or group

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s Using Bayesian Networks to Analyze Expression Data Xu Siwei, s0789023 Muhammad Ali Faisal, s0677834 Tejal Joshi, s0677858 Outline Introduction Bayesian Networks Equivalence Classes Applying to Expression

More information

IAPT: Regression. Regression analyses

IAPT: Regression. Regression analyses Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project

More information

Advanced Bayesian Models for the Social Sciences

Advanced Bayesian Models for the Social Sciences Advanced Bayesian Models for the Social Sciences Jeff Harden Department of Political Science, University of Colorado Boulder jeffrey.harden@colorado.edu Daniel Stegmueller Department of Government, University

More information

CISC453 Winter Probabilistic Reasoning Part B: AIMA3e Ch

CISC453 Winter Probabilistic Reasoning Part B: AIMA3e Ch CISC453 Winter 2010 Probabilistic Reasoning Part B: AIMA3e Ch 14.5-14.8 Overview 2 a roundup of approaches from AIMA3e 14.5-14.8 14.5 a survey of approximate methods alternatives to the direct computing

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

Bayesian Inference Bayes Laplace

Bayesian Inference Bayes Laplace Bayesian Inference Bayes Laplace Course objective The aim of this course is to introduce the modern approach to Bayesian statistics, emphasizing the computational aspects and the differences between the

More information

An Exercise in Bayesian Econometric Analysis Probit and Linear Probability Models

An Exercise in Bayesian Econometric Analysis Probit and Linear Probability Models Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 An Exercise in Bayesian Econometric Analysis Probit and Linear Probability Models Brooke Jeneane

More information

Inference Methods for First Few Hundred Studies

Inference Methods for First Few Hundred Studies Inference Methods for First Few Hundred Studies James Nicholas Walker Thesis submitted for the degree of Master of Philosophy in Applied Mathematics and Statistics at The University of Adelaide (Faculty

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Data Analysis Using Regression and Multilevel/Hierarchical Models

Data Analysis Using Regression and Multilevel/Hierarchical Models Data Analysis Using Regression and Multilevel/Hierarchical Models ANDREW GELMAN Columbia University JENNIFER HILL Columbia University CAMBRIDGE UNIVERSITY PRESS Contents List of examples V a 9 e xv " Preface

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Remarks on Bayesian Control Charts

Remarks on Bayesian Control Charts Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis: Section 1.0 Making Sense of Data Statistics: Data Analysis: Individuals objects described by a set of data Variable any characteristic of an individual Categorical Variable places an individual into one

More information

How to interpret results of metaanalysis

How to interpret results of metaanalysis How to interpret results of metaanalysis Tony Hak, Henk van Rhee, & Robert Suurmond Version 1.0, March 2016 Version 1.3, Updated June 2018 Meta-analysis is a systematic method for synthesizing quantitative

More information

ERA: Architectures for Inference

ERA: Architectures for Inference ERA: Architectures for Inference Dan Hammerstrom Electrical And Computer Engineering 7/28/09 1 Intelligent Computing In spite of the transistor bounty of Moore s law, there is a large class of problems

More information

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015 Introduction to diagnostic accuracy meta-analysis Yemisi Takwoingi October 2015 Learning objectives To appreciate the concept underlying DTA meta-analytic approaches To know the Moses-Littenberg SROC method

More information

Knowledge discovery tools 381

Knowledge discovery tools 381 Knowledge discovery tools 381 hours, and prime time is prime time precisely because more people tend to watch television at that time.. Compare histograms from di erent periods of time. Changes in histogram

More information

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point

More information

Representation and Analysis of Medical Decision Problems with Influence. Diagrams

Representation and Analysis of Medical Decision Problems with Influence. Diagrams Representation and Analysis of Medical Decision Problems with Influence Diagrams Douglas K. Owens, M.D., M.Sc., VA Palo Alto Health Care System, Palo Alto, California, Section on Medical Informatics, Department

More information

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI /

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI / Index A Adjusted Heterogeneity without Overdispersion, 63 Agenda-driven bias, 40 Agenda-Driven Meta-Analyses, 306 307 Alternative Methods for diagnostic meta-analyses, 133 Antihypertensive effect of potassium,

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test

More information

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? Richards J. Heuer, Jr. Version 1.2, October 16, 2005 This document is from a collection of works by Richards J. Heuer, Jr.

More information

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Thomas E. Rothenfluh 1, Karl Bögl 2, and Klaus-Peter Adlassnig 2 1 Department of Psychology University of Zurich, Zürichbergstraße

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T.

Introduction. We can make a prediction about Y i based on X i by setting a threshold value T, and predicting Y i = 1 when X i > T. Diagnostic Tests 1 Introduction Suppose we have a quantitative measurement X i on experimental or observed units i = 1,..., n, and a characteristic Y i = 0 or Y i = 1 (e.g. case/control status). The measurement

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information