Subsampling for Efficient and Effective Unsupervised Outlier Detection Ensembles
|
|
- Sara Sharp
- 6 years ago
- Views:
Transcription
1 Subsampling for Efficient an Effective Unsupervise Outlier Detection Ensembles Arthur Zime, Matthew Gauet, Ricaro J. G. B. Campello, Jörg Saner Department of Computing Science, University of Alberta, Emonton, AB, Canaa ABSTRACT Outlier etection an ensemble learning are well establishe research irections in ata mining yet the application of ensemble techniques to outlier etection has been rarely stuie. Here, we propose an stuy subsampling as a technique to inuce iversity among iniviual outlier etectors. We show analytically an experimentally that an outlier etector base on a subsample per se, besies inucing iversity, can, uner certain conitions, alreay improve upon the results of the same outlier etector on the complete ataset. Builing an ensemble on top of several subsamples is further improving the results. While in the literature so far the intuition that ensembles improve over single outlier etectors has just been transferre from the classification literature, here we also justify analytically why ensembles are also expecte to wor in the unsupervise area of outlier etection. As a sie effect, running an ensemble of several outlier etectors on subsamples of the ataset is more efficient than ensembles base on other means of introucing iversity an, epening on the sample rate an the size of the ensemble, can be even more efficient than just the single outlier etector on the complete ata. Categories an Subject Descriptors H.2.8 [Database Applications]: Data mining Keywors outlier etection; ensemble. INTRODUCTION An outlier is an observation (or subset of observations) which appears to be inconsistent with the remainer of that This wor was one while the author was on leave of absence from Luwig-Maximilians-Universität München, Germany. This wor was one while the author was on sabbatical leave from University of São Paulo, São Carlos, Brazil. Permission to mae igital or har copies of all or part of this wor for personal or classroom use is grante without fee provie that copies are not mae or istribute for profit or commercial avantage an that copies bear this notice an the full citation on the first page. Copyrights for components of this wor owne by others than ACM must be honore. Abstracting with creit is permitte. To copy otherwise, or republish, to post on servers or to reistribute to lists, requires prior specific permission an/or a fee. Request permissions from permissions@acm.org. KDD 3, August 4, 203, Chicago, Illinois, USA. Copyright 203 ACM /3/08...$5.00. set of ata [6]. Detecting outliers is an important tas in many practical applications. Some applications of outlier etection, such as etecting measurement errors, are mostly concerne with removing the outliers from the ata as a form of noise. Other applications, such as creit car abuse etection, or the ientification of unusual measurements in scientific ata, are concerne with fining outliers because their eviating behavior from the rest of the ata may require specific actions or provie opportunities for new insights. Various approaches to outlier etection have been propose, base on ifferent notions of outliers, or targete towars specific applications that require the ientification of outliers. Here, we are intereste in unsupervise, nonparametric outlier etection methos that assign a score to each ata object an thus allow a raning of objects accoring to their egree of outlierness. Parametric, statistical approaches [6, 35] fit certain istributions to the ata by estimating the parameters of these istributions from the given ata. A problem with these approaches is that istribution parameters such as mean, stanar eviation, an covariances are rather sensitive to the presence of outliers. Possible effects of outliers on the parameter estimation have been terme masing an swamping. Outliers can mas their own presence by influencing the values of the istribution parameters (resulting in false negatives), or swamp inliers to appear as outlying ue to the influence parameters (resulting in false positives) [6, 9]. Non-parametric approaches o not assume a specific istribution of the ata, but estimate (explicitly or implicitly) certain aspects of the probability ensity. Non-parametric methos inclue the well-nown istance-base an ensity-base methos. Both istance-base an ensitybase methos basically aim at proviing a rather simple estimate of the ensity aroun points, which can be seen as an approximation of statistical ernel ensity estimates. Distance-base methos such as DB-outlier [25] an its variants are base on the nearest neighbor (NN) istances [5, 34], trying to fin so-calle global outliers as points that are, roughly speaing, far away from the rest of the ata. Density-base methos such as LOF [0] an its variants try to fin so-calle local outliers as points that are, roughly speaing, locate in an area of relative low ensity compare to their NN (intene to inicate points that are outliers with respect to the nearest moe in the ata istribution). The ensity aroun points in these methos is also estimate base on NN istances. One problem with istance-base an ensity-base methos is that they can
2 also suffer from effects similar to masing an swamping, ue to the simplicity of (an thus error in) the ensity estimates. Another problem is the typically high runtime of these approaches, ue to the fact that their computation inclues at least fining the NN of each ata point (resulting in an at least quaratic complexity w.r.t. the atabase size). In this paper, we aress both problems of istance-base an ensity-base methos. We propose an stuy a general approach to improve both the quality an the performance of such outlier etection methos by combining into an ensemble results of a base metho on subsamples of the ata. Previous wor on outlier ensembles is very limite an only shows empirically that ensembles of outlier etectors have the potential to improve the quality, compare to that of their base methos [30, 36], at an increase runtime cost. Our wor is novel an avances the area of outlier etection in the following respects: We argue theoretically an emonstrate empirically that it is possible to construct ensemble members for outlier etection methos which perform iniviually alreay better than the base metho, in general. Combining those outlier etectors into an ensemble reners the performance gain not only more robust but can improve the performance even further. At the same time, when using small sample sizes for the ensemble members, we can gain consierable spee-up in runtime compare to running a stanar ensemble an, for small ensemble sizes, even compare to running the base metho on the whole ata set. The propose principle is funamental an flexible. It oes not rely on specific ata types. It can be combine with various conventional outlier etection techniques. The rest of the paper is organize as follows: We iscuss relate wor on outlier etection an ensembles for outlier etection (Section 2). We provie theoretical reasoning to support outlier etection ensembles in general an the claime properties of our metho in particular (Section 3). We provie experimental results to support our claims empirically (Section 4). We conclue the paper in Section RELATED WORK The istance-base notion of outliers (DB-outlier) [25] was the first atabase-oriente approach in the area of unsupervise outlier etection, which initiate a new line of research on this topic in the ata mining community. Variants of DBoutliers consier the istances to the nearest neighbors of each object an use these istances to ran the objects [34], or, they use the sum of istances to all points within the set of NN (calle the weight ) as an outlier egree [5]. These methos are also calle global methos in that the compute outlier scores represent global ensity scores for each point. The so-calle local methos, e.g. LOF [0], consier instea local ensity scores, which are ratios between the ensity aroun an object an the ensity aroun its neighboring objects. Variants of the local outlier moel inclue LoOP [27], an LOCI [33]. Also the istance-base metho LDOF [44] is relate in reasoning about local comparisons. It has been shown recently [37], however, that the ifferentiation between global an local methos is not strictly ichotomous but that there are egrees of locality. Much research has aime at improving the efficiency of unsupervise outlier etection by algorithmic techniques, for example base on approximations or improve pruning techniques for mining the top-n outliers [4, 7, 22, 23, 26, 42]. An analysis of such efficiency improving techniques for outlier etection algorithms has been provie by Orair et al. [32]. These techniques, however, o not aim at improving the approximations of the unerlying statistical notion of outlierness. They only approximate a specific algorithmic moel. Ensemble techniques, on the other han, have the potential to improve the performance of their components in terms of the quality of the etecte outliers, rather than in terms of runtime (but we will show in this paper that it is even possible to gain performance improvements when constructing certain types of outlier ensembles). The first approach to improve outlier etection by ensemble techniques, base on feature bagging, was propose by Lazarevic an Kumar [30], combining ifferent results of the same algorithm (namely LOF [0]) applie to ifferent, ranomly selecte feature subsets. Feature bagging is a common proceure to inuce iversity of ensemble members in ensemble classification [] or ensemble clustering [8, 4, 40]. Subsequent research on outlier etection ensembles focuse on the issue of comparability of scores for score combinations, using Sigmoi functions an mixture moeling to fit outlier scores, provie by ifferent etectors, into comparable probability values [7], or scaling by stanar eviation [3], or statistical reasoning about score istributions [28], enabling the combination of ifferent outlier etection methos into one ensemble. Schubert et al. [36] propose a similarity measure to appropriately compare ifferent outlier ranings (base on scores) an to allow for the assessment of the iversity of ifferent outlier etectors. As an application, they propose a greey ensemble approach, emonstrating the importance of iversity for the performance of an ensemble. In all these papers, although outlier etection ensembles have been iscusse an improve, no new metho of inucing iversity has been pursue. Except for feature bagging [30], all other existing ensemble methos for outlier etection [7, 28, 3, 36] are metamethos an coul be use on top of our sample-base metho (or on top of feature bagging, as in [28,3,36]). They o not propose original means to inuce iversity when using a selecte base outlier etection metho. In general, while the motivation for ensemble methos for outlier etection is borrowe from the rich traition in the literature on supervise ensemble learning [,2,2,4], the theoretical founation for ensemble learning in the unsupervise setting is far less mature. The same hols true not only for outlier etection ensembles but also for clustering ensembles espite the far more abunant literature on practical approaches in that area [8]. Although the problem setting is consierably ifferent, let us finally note that sampling has been use in ensemble clustering to inuce iversity. Different subsamples of the ata set have been clustere an the resulting clusterings were combine into a consensus clustering [3, 6, 20, 39]. 3. OUTLIER DETECTION ENSEMBLES BASED ON SUBAMPLING In this section, we will iscuss the potential benefits of using outlier etection ensembles base on subsampling. Previous approaches using ensemble learning for outlier etection [7, 28, 30, 3, 36] transferre techniques without any theoretical founation of why, what has a clear theoret-
3 ical bacgroun in supervise learning, shoul also wor in unsupervise outlier etection. Such a view can be loosely argue for when we consier outlier etection methos as classifiers. When assuming that a threshol on outlier scores is use to istinguish between outliers an inliers, we can view the outlier metho as classifying all objects into one of these two classes: outliers an inliers even though, no labels are use in the training phase when the moel (raning) is built. If we succee to construct iverse enough outlier etectors for the same ata set, we can hope to improve the overall performance over the iniviual members by combining them into an ensemble. The generic argument given is that all the ensemble members are committing errors but on ifferent cases, if the members are inepenent, i.e., iverse, or, in other wors, if the errors are uncorrelate. While such a generic view may potentially explain some of the performance gains, we will show in the following subsections that there are more specific reasons for why (uner some general assumptions) an ensemble of outlier etection methos can improve the performance over its iniviual members. 3. Benefits of Ensembles for Outlier Detection Base on Density Estimates In this paper, we are focusing on istance-base an ensity-base outlier etection methos, which, as iscusse in the introuction, compute outlier scores that are base, implicitly or explicitly, on some form of ensity estimates. One can view these methos as trying to ientify the outliers in a given ata set X with respect to an unnown probability ensity f, which represents the process that has generate the majority of the ata set (at least the inliers). The ata set X itself can be viewe as a sample rawn from the true, but unnown unerlying ensity istribution, an the methos try to estimate the ensity f(x) aroun points x using a more or less rough ensity estimate ˆf X(x) (in orer to compute outlier scores in some way). Assuming the correctness of the unerlying outlier moel of the methos, it is clear that the quality of a metho s result epens on the quality of the ensity estimate ˆf X(x) an that the results will improve if the estimate can be improve. For this case, we can show formally that a iverse ensemble of such outlier etectors oes in fact show an improve expecte performance over the iniviual ensemble members, uner some general conitions. Given a true, smooth p..f. f(x) an a ata set X, we can express an estimate ˆf X(x) of f(x) base on X as: ˆf X(x) = f(x) + v X(x) where v X(x) is a ranom variable escribing the error of the estimate ue to the finite sample. The quality of the estimate ˆf of f ecies over success an failure of the outlier etection. However, the ensity estimates use by the consiere outlier etection algorithms may not be reliable an stable in all regions of the ata space, ue to the natural intrinsic ranomness associate with a single sample that the ata set represents. If we are able to obtain multiple ensity estimates for each point x (e.g., as we propose via subsamples), we can obtain more reliable an stable ensity estimates by averaging the multiple ensity estimates for each point. The rationale for this is the following: The output of outlier methos is a raning of all points x in terms of outlier scores that, in essence, epens on the raning of the points accoring to ˆf X(x). Ieally, we want a raning of the points x accoring to f(x). If we have multiple ensity estimates for each point that we average, we can consier the estimate itself as a ranom variable an averaging these estimates for each point gives us the expectation of this variable as: E{ ˆf X(x)} = E{f(x)} + E{v X(x)} = f(x) + E{v X(x)} In this formulation, one can clearly see that the raning of objects w.r.t. E{ ˆf X(x)} is the same as the raning w.r.t. the true ensity f(x) (the ieal raning ), if just the expectation of the error v X(x) in the iniviual estimates is the same for every point x. This is obviously the case when the ranom variable that escribes the error woul not epen on x, in which case E{v X(x)} = E{v X} = µ vx, but one woul also obtain the ieal raning when the error is not inepenent on x; for instance, when the error woul vary between points but the expectation is the same for each point, we woul also have the same raning. We can even obtain the same raning as the ieal raning if the expectations E{v X(x )} an E{v X(x 2)} iffer for two points x an x 2, as long as the ifference oes not cause an inversion between the actual rans E{ ˆf X(x )} an E{ ˆf X(x 2)}, respectively. Furthermore, if we consier that for successful outlier etection, the methos only have to istinguish between outliers an inliers, we can even allow inversions between rans, as long as ran inversions occur only within outliers or within inliers. Only a ran inversion between an outlier an an inlier woul be problematic. In the next subsection, we will argue that for the propose ensemble technique using subsamples, the expectation of the error in the ensity estimate E{v X(x)} oes epen on the location x an its surrouning ensity, but that the metho has the esirable property that it can increase the gap in rans between the outliers an the inliers, maing inversions in ran between these groups of points even less liely. 3.2 Aitional Benefits of Subsampling Subsampling is theoretically well suite to introuce iversity into an ensemble of otherwise ientical istance-base or ensity-base outlier etection methos. Every member of the ensemble will etermine the outlier score of every object in the atabase, but only using a small subset of the ata to estimate the ensity aroun points. Learning ensity estimates for outlier etection on smaller samples can actually improve the etection rate of outliers, compare to learning these estimates on the whole ata set that conceptually represents just a somewhat larger sample of an unnown istribution f. We will see in the empirical evaluation that in practice, surprisingly small sample sizes (such as 20% or in many cases even just 0%) are typically not leaing to a eteriorate but to a consierably improve quality of the outlier etection for a sample-base ensemble of outlier etectors. One reason for the improve performance of an ensemble is, as expecte, just the combination of the results of multiple outlier etectors. Compare to using the ataset as the only sample rawn from f, rawing multiple subsamples X from this sample can minimize the effect of the ranomness associate with a single sample. Note that averaging the scores to buil an ensemble has been, heuristically, common practice [7, 28, 30, 3, 36], but now it fins also a theoretical justification.
4 Another, more interesting reason for the improve performance is that the base metho applie to a smaller subsample of a given ata often shows an improve outlier etection rate, compare to the same metho applie to the whole ata set. As we will argue formally in the following, this is ue to the fact that istance-base an ensity-base methos are essentially using simple (not volume normalize) nearest neighbor istances to estimate ensity. To unerstan the effects of sample base nearest neighbor istances, consier a sphere of raius r in a -imensional Eucliean space, containing n ata points uniformly istribute within the sphere. The expecte Eucliean istance from a point to its nearest neighbour (NN) is given by [9]: ( ) E{ } = r () n For a given ata set, let r be a constant value small enough so that, for two spheres having the same raius r but lying on ifferent positions of the ata space, the ata points within both spheres are approximately uniformly istribute. Now, suppose that the number of ata points within each of these spheres is ifferent, given by n an n 2 (n n 2), which means that the ensities of the ata in the respective regions of the space are ifferent (as their volumes are the same). For example, one sphere might be locate insie a ense cluster, whereas the other one might lie on a sparse area containing bacgroun noise. Then, it follows from () that the expecte NN istances in the corresponing regions of the space are given by: ( ) ( ) E{ } = r ; E{ } = r (2) n n 2 If one ranomly removes a fraction m of the ata objects with equal probability, the expecte number of remaining objects within those two spheres are given by n m an n 2m, respectively. In this case, the expecte NN istances become: ( ) ( ) E{ } = r ; E{ } = r (3) n m n 2m The ifference in the expecte istances are therefore: ( ) ( ) ( ) ( ) m = r r = r (4) n m n n m ( ) ( ) ( ) ( ) m 2 = r r = r n 2m n 2 n 2 m In relative terms, if we ivie an 2 by the original expecte istances (for the full ataset, i.e., before the subsampling), we get: ( ) 2 m ( ) = ( ) = (6) r n r m n 2 The result in (6) says that the expecte NN istances within the spheres increase proportionally as a function of the subsampling rate m. This result reflects the intuition that, in relative terms, the contrast between the ensities of the spheres is ept constant, which justifies the use of a (5) Expecte NN Distances Fraction of Data (m) Figure : Behaviour of the expecte 5-NN istances for two spheres with raius r =, in a 2D Eucliean space, containing 000m (circles) an 00m (triangles) objects uniformly istribute (m is a fraction of the ata). subsampling proceure with even sampling probabilities. In an ensemble setting, for instance, this means that one can get multiple (sub)samples that exhibit variability (iversity) in terms of their observations, but eep the same expecte ensity profile as the full ataset. The above result is important but it oes not explain all implications of subsampling when using unnormalize nearest neighbor istances. In absolute terms, Equations (4) an (5) tell us that the expecte ifference in the NN istances will be greater for a less ense sphere, i.e., > 2 if n < n 2. This means that the expecte NN istances iverge in absolute terms when the ata are ownsample to a fraction m of their original size. In other wors, the absolute ifferences between the expecte NN istances in areas of ifferent ensities ten to increase as a function of the subsampling rate. This effect is illustrate in Figure for r =, = 2, = 5, n = 00, n 2 = 000, an m ranging from 0. to. Such an effect can be beneficial for outlier etection, since it can mae it easier to istinguish between outliers an inliers. Particularly when also using an ensemble as iscusse above, the gap in the rans between outliers an inliers can increase, maing inversion of rans between these two groups less liely. 3.3 Metho an Complexity Note that the implementation of our proposal is not as simple as to tae subsamples an then run the outlier etection algorithms on these subsamples. This way we woul very liely completely miss information on the outlierness of many objects that are not containe in any subsample, an many objects woul get scores only from some of the subsamples. Instea, for each ensemble member, we raw a subsample from the atabase an compute the neighborhoo of each object in the atabase base on the subsample. This way, using subsample-base ensembles can also lea to a consierable spee-up, compare to other types of ensembles an, for small subsamples an ensemble sizes, even compare to running the base metho on the whole ata set. We will emonstrate in the experimental evaluation that sample sizes small enough to achieve substantial runtime improvements are goo choices in practice, leaing to goo outlier etection rates. In this subsection, we show the expecte runtime improvements by stuying the theoretical complexities.
5 While other ensemble methos require a multiple of the computing time compare to the base learner, the theoretical behaviour of a subsample base ensemble is faster (an requires less resources) than other types of ensembles. The typical complexity of a base metho is O(n 2 ), ue to the require NN queries over a atabase of n objects. The runtime of a stanar ensemble such as feature bagging is essentially s times the runtime of the base metho, where s is a factor that is etermine by the number of base learners use in the ensemble (i.e., the size of the ensemble). This factor is reuce in the case of feature bagging. Using only a subset of the imensions maes iniviual istance computations faster by some constant factor. For sample base ensembles, on the other han, the complete ensemble can even be faster than the base metho on the complete ataset, because of the quaratic runtime in n of the base metho. While the base metho requires NN queries for each object on the complete atabase (hence O(n 2 )), using a subsample of size m n, 0 < m <, reuces this to O(n 2 m). The runtime of a sample base ensemble is essentially s times the runtime of the base metho, using a much smaller ata set for the neighborhoo computation. For an ensemble size of 0 base learners an sample size of 0%, the sample-base ensemble woul require roughly the same runtime than a single base metho on the full ataset but 0 times less time than an ensemble with the same number s of ensemble members base on other means of iversity. For larger ensembles, the ensemble requires only a small multiple of the base metho but still only 0% (or the equivalent of the sample size m) of a stanar ensemble. For example, if we use 25 ensemble members an sample size 0%, the ensemble will require roughly 2.5 times the runtime of the base metho. 4. EVALUATION 4. Methos an Parameters For the reasons iscusse in Section 2, the canonical competitor is feature bagging (FB) [30]. As base methos we use LOF [0], LDOF [44], an LoOP [27]. For the setup of experiments, we have to consier various parameters. For both ensemble methos (feature bagging an subsampling), we choose a fixe number of 25 ensemble members. We follow the original setup of the feature bagging metho, combining the scores of the ensemble members by computing the average. For the subsampling, we consier various sample sizes. Each of the base methos requires a size of the neighborhoo. Hence we will show experimental results (i) with a fixe choice of an varying sample size; (ii) with a fixe sample size, varying ; an (iii) with fixe choices of an sample size, comparing ifferent base methos. When we fix, we choose a value that gives a reasonable result quality (i.e., better than ranom) for the base metho an compare that to the ensemble variants. Finally (iv), for the synthetic ataset collections, where the iniviual atasets follow the same general characteristics, we show an average behaviour over all atasets of the collection. We report the area uner the receiver operating characteristic curve (), which plots the true positive rate vs. the false positive rate, a common measure for evaluation of outlier etection methos [7, 28, 30, 3, 36]. The experiments are performe using ELKI [2, 3]. 4.2 Datasets For a statistical assessment, we generate two inepenent sets of 30 synthetic atasets (batch an batch2). For each ataset, we choose ranomly values for the following parameters in the given range: imensionality [20,..., 40], number of clusters c [2,..., 0], for each cluster inepenently the number of points n ci [600,..., 000]. For each cluster, the points are generate following a Gaussian moel as follows: For each cluster c i, an each attribute a, we choose a mean µ ci,a from a uniform istribution in [ 0, 0] an a stanar eviation σ ci,a from a uniform istribution in [0., ]. Then for the cluster c i, n ci cluster objects (points) are generate attribute-wise by the Gaussians N (µ ci,a, σ ci,a). The resulting cluster is rotate by a series of ranom rotations an the covariance matrix Σ corresponing to the theoretical moel is compute by the corresponing matrix operations [38]. Then, we compute for each point the Mahalanobis istance to its corresponing cluster center, using the covariance matrix Σ of the cluster. For a ataset imensionality, the Mahalanobis istances for each cluster follow a χ 2 istribution with egrees of freeom. We label as outliers those points that exhibit a istance to their cluster center larger than the theoretical 75 quantile, inepenently of the actually occurring Mahalanobis istances of the sample points. This results in an expecte amount of 2.5% outliers per ataset. As real atasets we use the atasets Satimage, Lymphography, an Segment (use also by Lazarevic an Kumar [30]). Aitionally, we chose from the UCI machine learning repository [5]: Wisconsin breast cancer (WBC) an Waveform Database Generator (waveform). While Lazarevic an Kumar consier outlier etection as equivalent to rare class etection, we argue that outliers are boun to be rare, but objects of a rare class are not necessarily outliers. Therefore, we use a ifferent preprocessing for some of the atasets: For Satimage, we combine train an test set an transforme the ataset to an outlier tas by taing a sample of 0% from class 2, evaluating the ownsample class as outliers vs. the rest. 2 For Lymphography, we merge the small classes &4 as outliers vs. the rest. For Segment, we chose classes GRASS, PATH, an SKY for ownsampling, in turn, to 0%, which reners the remaining objects of these classes outliers (resulting in three ifferent atasets). For the atasets WBC an waveform we also select a meaningful outlier class for ownsampling ( malignant, an 0, respectively). With this metho of using classification ata for evaluation of outlier etection methos we are conform with the literature [, 24, 29, 43, 44]. Overall, this results in 60 synthetic an 7 real ata sets. 4.3 Efficiency For a fair comparison, we use a preprocessing of the neighborhoo computation for all methos on equal terms, as facilitate by the framewor ELKI [2]. As in our experiments we use 25 ensemble members, we stuy the runtime of a typical base metho (LOF), the subsampling ensemble (0% sample size) an feature bagging, when scaling the number of objects in the atabase. As emonstrate in Figure 2, 2 Lazarevic an Kumar use the smallest class 4 as outlier vs. rest, but this is an example where the rare class oes not constitute outliers, as the classes 3-7 are all very similar. Accoringly, they report performance very close to a ranom result on this ataset.
6 Time (s) feature bagging ensemble subsampling ensemble base metho (LOF) Instances in ataset Figure 2: Runtime of LOF, subsampling ensemble, an feature bagging when increasing atabase size no. ensemble members Figure 3: Quality with increasing ensemble size. the subsampling ensemble is close to the base metho while feature bagging requires a multiple of the runtime. As iscusse in Section 3.3, the efficiency epens on the sample size an on the ensemble size. We o not evaluate the ensemble size further, let us just consier an example on one of the synthetic atasets to stuy the behaviour with aing more ensemble members (Figure 3). We see a strong increase in quality between 2 an 0 ensemble members, then, up to 25 ensemble members, the quality increases further, steaily but slowly. This improve performance comes at moerate runtime cost. Nevertheless, we fix the ensemble size to 25 in the following experiments. 4.4 Effectiveness For illustration of results with variances we use box plots where the box extens from the lower to upper quartile values of the ata, with a line at the meian. The whisers exten from the box to show the range of the ata. The length of the whisers exten to the most extreme ata point within.5*(75%-25%) ata range. Occasionally occurring single ata points beyon that range are plotte as flier points past the en of the whisers. Note however that the source of variance in the plots will iffer: in synthetic ata, we give the istribution over the 30 atasets, in real ata, we give the istribution over the iniviual ensemble members. Synthetic Data. First, we show as a statistical assessment the results of the subsample-base ensemble over all the synthetic atasets of batch. Here the box plots visualize the istribution of the results for the same sample size, the same base metho, an the same parametrization of the base metho for all atasets in the batch for the subsampling ensemble, the base metho (sample size ), an the feature bagging ensemble (FB). Figure 4 shows examples for a fixe = 3 for the base methos LDOF, LOF, an LoOP. The behaviour on batch2 (not shown) follows the same general FB (a) LDOF, = FB (b) LOF, = FB (c) LoOP, = 3 Figure 4: for ensembles ifferent sample sizes as well as feature bagging (FB) an base metho (sample size=), on the 30 atasets of batch. pattern. We varie from 2 to 0 an got similar results. The smaller sample size leas to larger improvements. Real Data. Having shown the ensemble performances over a set of 30 atasets for the synthetic ata, we now analyze the behaviour on iniviual real atasets. Here, we show in the whiser plots the variance in the achieve by the iniviual ensemble members base on subsamples of ifferent sample size (zero variance for sample size, which reflects the performance of the eterministic base metho on the complete ata), an feature bagging (FB). The ROC AUC of the ensembles (subsampling an feature bagging) are visualize by a iamon. Figures 5, 6, an 7 show the results for the three base methos on the atasets Lymphography, WBC, an Satimage-2, respectively. We choose the same for all base methos such that at least some of the base methos get reasonable results. For the larger ataset satimage-2, the nees to be larger as well. Comparing these plots, we see a ifferent behaviour of the base methos as some atasets are easy for some base methos while some other atasets are relatively har. In particular, LDOF oes not retrieve sensible results on all three atasets. In all cases, however, the subsampling ensemble improves. Feature bagging oes
7 FB (a) LDOF, = FB (a) LDOF, = FB (b) LOF, = FB (b) LOF, = FB (c) LoOP, = FB (c) LoOP, = 2 Figure 5: for ensemble members of the subsampling ensemble for ifferent sample sizes (boxes), the base metho (sample size=), an ensembles (iamons) on top of subsamples an feature bags (FB) on ataset Lymphography. Figure 6: for ensemble members of the subsampling ensemble for ifferent sample sizes (boxes), the base metho (sample size=), an ensembles (iamons) on top of subsamples an feature bags (FB) on ataset WBC. not perform always that convincingly, in some cases it rops to (or below) ranom quality. Only for LDOF an LoOP on Lymphography (Figures 5(a), 5(c)), feature bagging can recover from the wea performance of the base learner. As a general picture from these an other results, we see that the smaller sample size actually has the larger potential of improvement. Although the smaller sample eeps not as much information about the ataset (an the unnown unerlying ensity-istribution), from the point of view of ensemble learning, these finings mae sense, as the smaller samples will actually provie the most iverse ensemble members, an it also shows the practical applicability of the reasoning we provie in Section 3.2. In most cases, we fin the 0%-sample to wor best. However, the brea-even point between too much loss of information an too high similarity of ensemble members iffers from ataset to ataset. We have also examples where the 0%-sample is alreay too small such as in Figure 5(a). That is possibly relate to the fact that the lymphography ata are relatively small. However, we fix the sample size to 0. for the following experiments an explore the behaviour of base metho, subsampling ensemble an feature bagging ensemble over a range of. We see, as an example, in Figure 8, a slight but steay increase of the with for the base methos an the subsampling ensemble while the feature bagging ensemble appears to be much more instable. While increasing oes not, in general, increase the quality of the results, we observe the same pattern of stability of the base metho an the subsampling ensemble an higher variance of the feature bagging ensemble on other atasets as well. For the three atasets base on segment, for = 20 (again a selection that gives reasonable results for most of the base methos), we show results for all three base methos in Figure 9. Again, the subsampling ensemble compares favourably against the base metho as well as against feature bagging. 5. CONCLUSION Although we compare the sample-base ensemble against feature bagging [30], let us finally note that these two approaches are not strictly competitors. Feature bagging is liely to be an interesting approach in the context of very
8 FB (a) LDOF, = FB (b) LOF, = FB (c) LoOP, = 50 Figure 7: for ensemble members of the subsampling ensemble for ifferent sample sizes (boxes), the base metho (sample size=), an ensembles (iamons) on top of subsamples an feature bags (FB) on ataset Satimage-2. high-imensional ata [45]. Sampling shoul be helpful when the atasets are growing too large. On the other han, feature bagging is not meaningful for low-imensional ata, as the ensemble members are boun to be too similar. An sampling on too small ata is probably not too promising. However, these two problems (too small atasets with only a few imensions) are not really problems of toays research. It might be an interesting question for future wor to investigate the integration of both techniques, builing ensembles on subsets of features an subsets of ata objects simultaneously. Acnowlegments This wor has been partially supporte by NSERC (Canaa), FAPESP (Brazil), an CNPq (Brazil). 6. REFERENCES [] N. Abe, B. Zarozny, an J. Langfor. Outlier etection by active learning. In Proc. KDD, pages , Subsampling Ensemble LDOF Feature Bagging Ensemble (a) LDOF, m = 0. Subsampling Ensemble LOF Feature Bagging Ensemble (b) LOF, m = 0. Subsampling Ensemble LOOP Feature Bagging Ensemble (c) LoOP, m = 0. Figure 8: for base methos an corresponing ensembles varying on ataset waveform. segment-sky segment-path segment-grass KNN KNNW LDOF LOF LOOP LDOF LOF LOOP LDOF LOF LOOP LDOF LOF LOOP Base Subsampling FB Figure 9: for all methos, = 20, on ifferent atasets (variants of segment). [2] E. Achtert, S. Golhofer, H.-P. Kriegel, E. Schubert, an A. Zime. Evaluation of clusterings metrics an visual support. In Proc. ICDE, pages , 202. [3] E. Achtert, H.-P. Kriegel, E. Schubert, an A. Zime. Interactive ata mining with 3-parallel-coorinate-trees. In Proc. SIGMOD, 203. [4] F. Angiulli an F. Fassetti. DOLPHIN: an efficient algorithm for mining istance-base outliers in very large atasets. ACM TKDD, 3():4: 57, [5] F. Angiulli an C. Pizzuti. Fast outlier etection in high imensional spaces. In Proc. PKDD, pages 5 26, 2002.
9 [6] V. Barnett an T. Lewis. Outliers in Statistical Data. John Wiley&Sons, 3r eition, 994. [7] S. D. Bay an M. Schwabacher. Mining istance-base outliers in near linear time with ranomization an a simple pruning rule. In Proc. KDD, pages 29 38, [8] A. Bertoni an G. Valentini. Ensembles base on ranom projections to improve the accuracy of clustering algorithms. In WIRN / NAIS, pages 3 37, [9] M. M. Breunig, H.-P. Kriegel, P. Kröger, an J. Saner. Data Bubbles: Quality preserving performance boosting for hierarchical clustering. In Proc. SIGMOD, pages 79 90, 200. [0] M. M. Breunig, H.-P. Kriegel, R. Ng, an J. Saner. LOF: Ientifying ensity-base local outliers. In Proc. SIGMOD, pages 93 04, [] G. Brown, J. Wyatt, R. Harris, an X. Yao. Diversity creation methos: a survey an categorisation. Information Fusion, 6:5 20, [2] T. G. Dietterich. Ensemble methos in machine learning. In Proc. MCS, pages 5, [3] S. Duoit an J. Frilyan. Bagging to improve the accuracy of a clustering proceure. Bioinformatics, 9(9): , [4] X. Z. Fern an C. E. Broley. Ranom projection for high imensional ata clustering: A cluster ensemble approach. In Proc. ICML, pages 86 93, [5] A. Fran an A. Asuncion. UCI machine learning repository [6] A. L. N. Fre an A. K. Jain. Robust ata clustering. In Proc. CVPR, pages 28 36, [7] J. Gao an P.-N. Tan. Converting output scores from outlier etection algorithms into probability estimates. In Proc. ICDM, pages 22 22, [8] J. Ghosh an A. Acharya. Cluster ensembles. WIREs DMKD, (4):305 35, 20. [9] A. S. Hai, A. H. M. Rahmatullah Imon, an M. Werner. Detection of outliers. WIREs Comp. Stat., ():57 70, [20] S. T. Hajitoorov, L. I. Kuncheva, an L. P. Toorova. Moerate iversity for better cluster ensembles. Information Fusion, 7(3): , [2] L. K. Hansen an P. Salamon. Neural networ ensembles. IEEE TPAMI, 2(0):993 00, 990. [22] W. Jin, A. Tung, an J. Han. Mining top-n local outliers in large atabases. In Proc. KDD, pages , 200. [23] W. Jin, A. K. H. Tung, J. Han, an W. Wang. Raning outliers using symmetric neighborhoo relationship. In Proc. PAKDD, pages , [24] F. Keller, E. Müller, an K. Böhm. HiCS: high contrast subspaces for ensity-base outlier raning. In Proc. ICDE, 202. [25] E. M. Knorr an R. T. Ng. A unifie notion of outliers: Properties an computation. In Proc. KDD, pages , 997. [26] G. Kollios, D. Gunopulos, N. Kouas, an S. Berchthol. Efficient biase sampling for approximate clustering an outlier etection in large atasets. IEEE TKDE, 5(5):70 87, [27] H.-P. Kriegel, P. Kröger, E. Schubert, an A. Zime. LoOP: local outlier probabilities. In Proc. CIKM, pages , [28] H.-P. Kriegel, P. Kröger, E. Schubert, an A. Zime. Interpreting an unifying outlier scores. In Proc. SDM, pages 3 24, 20. [29] H.-P. Kriegel, M. Schubert, an A. Zime. Angle-base outlier etection in high-imensional ata. In Proc. KDD, pages , [30] A. Lazarevic an V. Kumar. Feature bagging for outlier etection. In Proc. KDD, pages 57 66, [3] H. V. Nguyen, H. H. Ang, an V. Gopalrishnan. Mining outliers with ensemble of heterogeneous etectors on ranom subspaces. In Proc. DASFAA, pages , 200. [32] G. H. Orair, C. Teixeira, Y. Wang, W. Meira Jr., an S. Parthasarathy. Distance-base outlier etection: Consoliation an renewe bearing. PVLDB, 3(2): , 200. [33] S. Papaimitriou, H. Kitagawa, P. Gibbons, an C. Faloutsos. LOCI: Fast outlier etection using the local correlation integral. In Proc. ICDE, pages , [34] S. Ramaswamy, R. Rastogi, an K. Shim. Efficient algorithms for mining outliers from large ata sets. In Proc. SIGMOD, pages , [35] P. J. Rousseeuw an M. Hubert. Robust statistics for outlier etection. WIREs DMKD, ():73 79, 20. [36] E. Schubert, R. Wojanowsi, A. Zime, an H.-P. Kriegel. On evaluation of outlier ranings an outlier scores. In Proc. SDM, pages , 202. [37] E. Schubert, A. Zime, an H.-P. Kriegel. Local outlier etection reconsiere: a generalize view on locality with applications to spatial, vieo, an networ outlier etection. Data Min. Knowl. Disc., 202. [38] T. Soler an M. Chin. On transformation of covariance matrices between local Cartesian coorinate systems an commutative iagrams. In ASP-ACSM Convention, pages , 985. [39] A. Strehl an J. Ghosh. Cluster ensembles a nowlege reuse framewor for combining multiple partitions. J. Mach. Learn. Res., 3:583 67, [40] A. Topchy, A. Jain, an W. Punch. Clustering ensembles: Moels of concensus an wea partitions. IEEE TPAMI, 27(2):866 88, [4] G. Valentini an F. Masulli. Ensembles of learning machines. In Proc. Neural Nets WIRN, pages 3 22, [42] N. H. Vu an V. Gopalrishnan. Efficient pruning schemes for istance-base outlier etection. In Proc. ECML PKDD, pages 60 75, [43] J. Yang, N. Zhong, Y. Yao, an J. Wang. Local peculiarity factor an its application in outlier etection. In Proc. KDD, pages , [44] K. Zhang, M. Hutter, an H. Jin. A new local istance-base outlier etection approach for scattere real-worl ata. In Proc. PAKDD, pages , [45] A. Zime, E. Schubert, an H.-P. Kriegel. A survey on unsupervise outlier etection in high-imensional numerical ata. Stat. Anal. Data Min., 5(5): , 202.
Review Article Statistical methods and common problems in medical or biomedical science research
Int J Physiol Pathophysiol Pharmacol 017;9(5):157-163 www.ijppp.org /ISSN:1944-8171/IJPPP006608 Review Article Statistical methos an common problems in meical or biomeical science research Fengxia Yan
More informationKnowledge Discovery and Data Mining I
Ludwig-Maximilians-Universität München Lehrstuhl für Datenbanksysteme und Data Mining Prof. Dr. Thomas Seidl Knowledge Discovery and Data Mining I Winter Semester 2018/19 Introduction What is an outlier?
More informationEnsembles for Unsupervised Outlier Detection: Challenges and Research Questions
Ensembles for Unsupervised Outlier Detection: Challenges and Research Questions [Position Paper] Arthur Zimek Ludwig-Maximilians-Universität Munich, Germany http://www.dbs.ifi.lmu.de zimek@dbs.ifi.lmu.de
More informationAudiological Bulletin no. 35
Auiological Bulletin no. 35 Ensuring the correct in-situ gain News from Auiological Research an Communication 9 502 1041 001 / 05-07 Introuction Hearing ais are commonly fitte accoring to ata base on a
More informationPERFORMANCE EVALUATION OF HIGHWAY MOBILE INFOSTATION NETWORKS
PERFORMANCE EVALUATION OF HIGHWAY MOBILE INFOSTATION NETWORKS Wing Ho Yuen WINLAB Rutgers University Piscataway, NJ 8854 anyyuen@winlab.rutgers.eu Roy D. Yates WINLAB Rutgers University Piscataway, NJ
More informationSince many political theories assert that the
Improving Tests of Theories Positing Interaction William D. Berry Matt Goler Daniel Milton Floria State University Pennsylvania State University Brigham Young University It is well establishe that all
More informationMETA-ANALYSIS. Topic #11
ARTHUR PSYC 204 (EXPERIMENTAL PSYCHOLOGY) 16C LECTURE NOTES [11/09/16] META-ANALYSIS PAGE 1 Topic #11 META-ANALYSIS Meta-analysis can be escribe as a set of statistical methos for quantitatively aggregating
More informationA FORMATION BEHAVIOR FOR LARGE-SCALE MICRO-ROBOT FORCE DEPLOYMENT. Donald D. Dudenhoeffer Michael P. Jones
Proceeings of the 2000 Winter Simulation Conference J. A. Joines, R. R. Barton, K. Kang, an P. A. Fishwick, es. A FORMATION BEHAVIOR FOR LARGE-SCALE MICRO-ROBOT FORCE DEPLOYMENT Donal D. Duenhoeffer Michael
More informationReporting Checklist for Nature Neuroscience
Corresponing Author: Manuscript Number: Manuscript Type: Kathryn V. Anerson an SongHai Shi NNA4806B Article Reporting Checklist for Nature Neuroscience # Main Figures: 7 # Supplementary Figures: 1 # Supplementary
More informationModeling Latently Infected Cell Activation: Viral and Latent Reservoir Persistence, and Viral Blips in HIV-infected Patients on Potent Therapy
Moeling Latently Infecte Cell Activation: Viral an Latent Reservoir Persistence, an Viral Blips in HIV-infecte Patients on Potent Therapy Libin Rong, Alan S. Perelson* Theoretical Biology an Biophysics,
More informationFully Heterogeneous Collective Regression
Fully Heterogeneous Collective Regression ABSTRACT Davi J. Lietka Department of Computer Science Unite States Naval Acaemy Annapolis, Marylan lietka@gmail.com Prior work has emonstrate that multiple methos
More informationA PRELIMINARY STUDY OF MODELING AND SIMULATION IN INDIVIDUALIZED DRUG DOSAGE AZATHIOPRINE ON INFLAMMATORY BOWEL DISEASE
This is a correcte version of the corresponing paper publishe in SIMS 26: Proceeings of the 47th Conference on Simulation an Moelling. Errata: equations.3 an.4 have been change to timecontinuous form an
More informationInfluence of Neural Delay in Sensorimotor Systems on the Control Performance and Mechanism in Bicycle Riding
Neural Information Processing Letters an Reviews Vol. 12, Nos. 1-3, January-March 28 Influence of Neural Delay in Sensorimotor Systems on the Control Performance an Mechanism in Bicycle Riing Yusuke Azuma
More informationLocalization-based secret key agreement for wireless network
The University of Toleo The University of Toleo Digital Repository Theses an Dissertations 2015 Localization-base secret key agreement for wireless network Qiang Wu University of Toleo Follow this an aitional
More informationReporting Checklist for Nature Neuroscience
Corresponing Author: Manuscript Number: Manuscript Type: Albert La Spaa NNA4471A Article Reporting Checklist for Nature Neuroscience # Main Figures: 8 # Supplementary Figures: 9 # Supplementary Tables:
More informationSupplementary Methods Enzyme expression and purification
Supplementary Methos Enzyme expression an purification he expression vector pjel236 (18) encoing the full length S. cerevisiae topoisomerase II enzyme fuse to an intein an a chitin bining omain was kinly
More informationOutlier Analysis. Lijun Zhang
Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based
More informationAudiological Bulletin no. 31
Auiological Bulletin no. 31 The effect - an introuction News from Auiological Research an Communication 9 502 1043 001 / 05-07 Introuction Venting in earmouls has been use for many years to control the
More informationClustered Encouragement Designs with Individual Noncompliance: Bayesian Inference with Randomization, and Application to Advance Directive Forms.
To appear in Biostatistics (with Discussion). Clustere Encouragement Designs with Iniviual Noncompliance: Bayesian Inference with Ranomization, an Application to Avance Directive Forms. CONSTANTINE E.
More informationMathematical Beta Cell Model for Insulin Secretion following IVGTT and OGTT
Annals of Biomeical Engineering, Vol. 3, No. 8, August 2006 ( C 2006) pp. 33 35 DOI: 0.007/s039-006-95-0 Mathematical Beta Cell Moel for Insulin Secretion following IVGTT an OGTT RUNE V. OVERGAARD,, 2,
More informationBiomarkers of Nutritional Exposure and Nutritional Status
Biomarkers of Nutritional Exposure an Nutritional Status Laboratory Issues: Use of Nutritional Biomarkers 1 Heii Michels Blanck,* 2 Barbara A. Bowman, y Geral R. Cooper, z Gary L. Myers z an Dayton T.
More informationAPPLICATION OF GOAL PROGRAMMING IN FARM AGRICULTURAL PLANNING
APPLICATION OF GOAL PROGRAMMING IN FARM AGRICULTURAL PLANNING Dr.P.K.VASHISTHA, Dean Acaemics, Vivekanan Institute of Technology & Science, Ghaziaba vashisthapk@gmail.com ABSTRACT In this paper we present
More informationInterpreting and Unifying Outlier Scores
Interpreting and Unifying Outlier cores Hans-Peter Kriegel Peer Kröger Erich chubert Arthur Zimek Institut für Informatik, Ludwig-Maximilians Universität München http://www.dbs.ifi.lmu.de {kriegel,kroegerp,schube,zimek}@dbs.ifi.lmu.de
More informationPredicting Breast Cancer Survival Using Treatment and Patient Factors
Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women
More informationA DISCRETE MODEL OF GLUCOSE-INSULIN INTERACTION AND STABILITY ANALYSIS A. & B.
A DISCRETE MODEL OF GLUCOSE-INSULIN INTERACTION AND STABILITY ANALYSIS A. George Maria Selvam* & B. Bavya** Sacre Heart College, Tirupattur, Vellore, Tamilnau Abstract: The stability of a iscrete-time
More informationStudies With Staggered Starts: Multiple Baseline Designs and Group-Randomized Trials
Stuies With Staggere Starts: Multiple Baseline Designs an Group-Ranomize Trials Dale A. Rhoa, MAS, MS, MPP, Davi M. Murray, PhD, Rebecca R. Anrige, PhD, Michael L. Pennell, PhD, an Erinn M. Hae, MS The
More informationDynamic Modeling of Behavior Change
Dynamic Moeling of Behavior Change H. T. Banks, Keri L. Rehm, Karyn L. Sutton Center for Research in Scientific Computation Center for Quantitative Science in Biomeicine North Carolina State University
More informationInformation-Theoretic Outlier Detection For Large_Scale Categorical Data
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 11 November, 2014 Page No. 9178-9182 Information-Theoretic Outlier Detection For Large_Scale Categorical
More informationMotivation: Fraud Detection
Outlier Detection Motivation: Fraud Detection http://i.imgur.com/ckkoaop.gif Jian Pei: CMPT 741/459 Data Mining -- Outlier Detection (1) 2 Techniques: Fraud Detection Features Dissimilarity Groups and
More informationPerceptions of harm from secondhand smoke exposure among US adults,
Perceptions of harm from seconhan smoke exposure among US aults, 2009-2010 Juy Kruger, Emory University Roshni Patel, Centers for Disease Control an Prevention Michelle Kegler, Emory University Steven
More informationVELDA: Relating an Image Tweet s Text and Images
VELDA: Relating an Image Tweet s Text an Images Tao Chen 1 Hany M. SalahEleen 2 Xiangnan He 1 Min-Yen Kan 1,3 Dongyuan Lu 1 1 School of Computing, ational University of Singapore 2 Department of Computer
More informationTowards semantic and affective coupling in emotionally annotated databases
Towars semantic an affective coupling in emotionally annotate atabases M Horvat, S Popović an K Ćosić Faculty of Electrical Engineering an Computing, University of Zagreb Department of Electric Machines,
More informationIntention-to-Treat Analysis and Accounting for Missing Data in Orthopaedic Randomized Clinical Trials
2137 COPYRIGHT Ó 2009 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED Intention-to-Treat Analysis an Accounting for Missing Data in Orthopaeic Ranomize Clinical Trials By Amir Herman, MD, MSc, Itamar
More information6dB SNR improved 64 Channel Hearing Aid Development using CSR8675 Bluetooth Chip
016 International Conference on Computational Science an Computational Intelligence 6B SNR improve 64 Channel Hearing Ai Development using CSR8675 Bluetooth Chip S. S. Jarng Dept. of Electronics Eng. Chosun
More informationOptimal Precoding and MMSE Receiver Designs for MIMO WCDMA
Optimal Precoing an MMSE Receiver Designs for MIMO WCDMA Shakti Prasa Shenoy, Irfan Ghauri, Dirk T.M. Slock Infineon Technologies France SAS, GAIA, 26 Route es Crêtes, 656 Sophia Antipolis Cee, France
More informationSkeletal Age Assessment from the Olecranon for Idiopathic Scoliosis at Risser Grade 0
This is an enhance PF from The Journal of Bone an Joint Surgery The PF of the article you requeste follows this cover page. Skeletal Age Assessment from the Olecranon for Iiopathic Scoliosis at Risser
More informationCompetitive Helping in Online Giving
Report Competitive Helping in Online Giving Graphical Abstract Authors Nichola J. Raihani, Sarah Smith Corresponence nicholaraihani@gmail.com In Brief Raihani an Smith show competitive helping in onations
More informationBinary Increase Congestion Control (BIC) for Fast Long-Distance Networks
Binary Increase Congestion Control () for Fast Long-Distance Networks Lisong Xu, Khale Harfoush, an Injong Rhee Department of Computer Science North Carolina State University Raleigh, NC 27695-7534 lxu2,
More informationFactorial HMMs with Collapsed Gibbs Sampling for Optimizing Long-term HIV Therapy
Factorial HMMs with Collapse Gibbs Sampling for ptimizing Long-term HIV Therapy Amit Gruber 1,, Chen Yanover 1, Tal El-Hay 1, Aners Sönnerborg 2 Vanni Borghi 3, Francesca Incarona 4, Yaara Golschmit 1
More informationAnalysis of Observational Studies: A Guide to Understanding Statistical Methods
50 COPYRIGHT Ó 2009 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED Analysis of Observational Stuies: A Guie to Unerstaning Statistical Methos By Saam Morshe, MD, MPH, Paul Tornetta III, MD, an
More informationVolume 5, Issue 4, April 2017 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) e-isjn: A4372-3114 Impact Factor: 6.047 Volume 5, Issue 4, April 2017 International Journal of Avance Research in Computer Science an Management Stuies Research Article / Survey
More informationUSING BAYESIAN NETWORKS TO MODEL AGENT RELATIONSHIPS
Ó Applie ArtiÐcial Intelligence, 14 :867È879, 2000 Copyright 2000 Taylor & Francis 0883-9514 /00 $12.00 1.00 USING BAYESIAN NETWORKS TO MODEL AGENT RELATIONSHIPS BIKRAMJIT BANERJEE, ANISH BISWAS, MANISHA
More informationthe Orthopaedic forum Is There Truly No Significant Difference? Underpowered Randomized Controlled Trials in the Orthopaedic Literature
2068 COPYRIGHT Ó 2015 BY THE JOURNAL OF BONE AN JOINT SURGERY, INCORPORATE the Orthopaeic forum Is There Truly No Significant ifference? Unerpowere Ranomize Controlle Trials in the Orthopaeic Literature
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeare in a journal publishe by Elsevier. The attache copy is furnishe to the author for internal non-commercial research an eucation use, incluing for instruction at the authors institution
More informationCost-Effectiveness of Antibiotic-Impregnated Bone Cement Used in Primary Total Hip Arthroplasty
This is an enhance PDF from The Journal of Bone an Joint Surgery The PDF of the article you requeste follows this cover page. Cost-Effectiveness of Antibiotic-Impregnate Bone Cement Use in Primary Total
More informationIdentifying Factors Related to the Survival of AIDS Patients under the Follow-up of Antiretroviral Therapy (ART): The Case of South Wollo
International Journal of Data Envelopment Analysis an *Operations Research*, 014, Vol. 1, No., 1-7 Available online at http://pubs.sciepub.com/ijeaor/1// Science an Eucation Publishing DOI:10.1691/ijeaor-1--
More informationA simple mathematical model of the bovine estrous cycle: follicle development and endocrine interactions
Konra-Zuse-Zentrum für Informationstechnik Berlin Takustraße 7 D-14195 Berlin-Dahlem Germany H.M.T.BOER, C.STÖTZEL, S.RÖBLITZ, P.DEUFLHARD, R.F.VEERKAMP, H.WOELDERS A simple mathematical moel of the bovine
More informationWANTED Species Survival Plan Coordinator
WANTED Species Survival Plan Coorinator Knowlegeable zoo or aquarium professional to manage propagation of hunres of animals locate in several states an countries. Must be verse in genetics, sophisticate
More informationAn Adaptive Load Sharing Algorithm for Heterogeneous Distributed System
An Aaptive Loa Sharing Algorithm for Heterogeneous Distribute System P.Neelakantan, A.Rama Mohan Rey Abstract Due to the restriction of esigning faster an faster computers, one has to fin the ways to maximize
More informationMINING OF OUTLIER DETECTION IN LARGE CATEGORICAL DATASETS
MINING OF OUTLIER DETECTION IN LARGE CATEGORICAL DATASETS Mrs. Ramalan Kani K 1, Ms. N.Radhika 2 1 M.TECH Student, Department of computer Science and Engineering, PRIST University, Trichy 2 Asst.Professor,
More informationOn the Expected Connection Lifetime and Stochastic Resilience of Wireless Multi-hop Networks
On the Expecte Cnecti Lifetime an Stochastic Resilience of Wireless Multi-hop Networks Fei Xing Wenye Wang Department of Electrical an Computer Engineering North Carolina State University, Raleigh, NC
More informationReverse Shoulder Arthroplasty for the Treatment of Rotator Cuff Deficiency
1895 COPYRIGHT Ó 2017 BY THE JOURAL OF BOE AD JOIT SURGERY, ICORPORATED Reverse Shouler Arthroplasty for the Treatment of Rotator Cuff Deficiency A Concise Follow-up, at a Minimum of 10 Years, of Previous
More informationSinger-Loomis Report
Name/Coename: Agent X Singer-Loomis Report TM Base On: Singer-Loomis Type Deployment Inventory (SL-TDI ) DEVELOPED BY June Singer, Ph.D. Elizabeth Kirkhart, Ph.D. Mary Loomis, Ph. D. Larry Kirkhart, Ph.
More informationX 2. s 1 n 1 s 2. n 2. s 2. 2 r 12
Homework for t-tests -- one sample, two inepenent samples, an correlate samples Formulas X One sample t-test: t s/ n Two inepenent samples t-test: t X SE X s 1 s n 1 n Correlate samples t-test: t X SE
More informationSURVEY ON OUTLIER DETECTION TECHNIQUES USING CATEGORICAL DATA
SURVEY ON OUTLIER DETECTION TECHNIQUES USING CATEGORICAL DATA K.T.Divya 1, N.Senthil Kumaran 2 1Research Scholar, Department of Computer Science, Vellalar college for Women, Erode, Tamilnadu, India 2Assistant
More informationStatistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.
Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension
More informationDuration of the Increase in Early Postoperative Mortality After Elective Hip and Knee Replacement
This is an enhance PDF from The Journal of Bone an Joint Surgery The PDF of the article you requeste follows this cover page. Duration of the Increase in Early Postoperative Mortality After Elective Hip
More informationc 2007 Society for Industrial and Applied Mathematics
SIAM J. APPL. MATH. Vol. 7, No. 3, pp. 73 75 c 27 Society for Inustrial an Applie Mathematics MATHEMATICAL ANALYSIS OF AGE-STRUCTURED HIV- DYNAMICS WITH COMBINATION ANTIRETROVIRAL THERAPY LIBIN RONG, ZHILAN
More informationAmerican Academy of Periodontology Best Evidence Consensus Statement on Selected Oral Applications for Cone-Beam Computed Tomography
J Perioontol October 2017 American Acaemy of Perioontology Best Evience Consensus Statement on Selecte Oral Applications for Cone-Beam Compute Tomography George A. Manelaris,* E. To Scheyer, Marianna Evans,
More informationWinner s Report: KDD CUP Breast Cancer Identification
Winner s Report: KDD CUP Breast Cancer Identification ABSTRACT Claudia Perlich, Prem Melville, Yan Liu, Grzegorz Świrszcz, Richard Lawrence IBM T.J. Watson Research Center Yorktown Heights, NY 10598 {perlich,pmelvil,liuya}@us.ibm.com
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017
RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science
More informationImproved Accuracy of Component Positioning with Robotic-Assisted Unicompartmental Knee Arthroplasty
627 COPYRIGHT Ó 2016 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED Improve Accuracy of Component Positioning with Robotic-Assiste Unicompartmental Knee Arthroplasty Data from a Prospective, Ranomize
More informationGary L. Grove, PhD, and Chou I. Eyberg, MS. Investigation performed at cyberderm Clinical Studies, Broomall, Pennsylvania
1187 COPYRIGHT Ó 2012 BY THE OURNAL OF BONE AND OINT SURGERY, INCORPORATED Comparison of Two Preoperative Skin Antiseptic Preparations an Resultant Surgical Incise Drape Ahesion to Skin in Healthy Volunteers
More informationA Clinical Decision Support Tool for Familial Hypercholesterolemia Based on Physician Input
ORIGINAL ARTICLE A Clinical Decision Support Tool for Familial Hypercholesterolemia Base on Physician Input Ali A. Hasnie, MD; Ashok Kumbamu, PhD; Maya S. Safarova, MD, PhD; Pero J. Caraballo, MD; an Iftikhar
More informationStatistical Consideration for Bilateral Cases in Orthopaedic Research
1732 COPYRIGHT Ó 2010 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED Statistical Consieration for Bilateral Cases in Orthopaeic Research By Moon Seok Park, MD, Sung Ju Kim, MS, Chin Youb Chung,
More informationUC Berkeley UC Berkeley Previously Published Works
UC Berkeley UC Berkeley Previously Publishe Works Title Variability in Costs Associate with Total Hip an Knee Replacement Implants Permalink https://escholarship.org/uc/item/67z1b71r Journal The Journal
More informationA Prospective Randomized Study of Minimally Invasive Total Knee Arthroplasty Compared with Conventional Surgery
This is an enhance PDF from The Journal of Bone an Joint Surgery The PDF of the article you requeste follows this cover page. A Prospective Ranomize Stuy of Total Knee Arthroplasty Compare with Conventional
More informationLegg-Calvé-Perthes Disease: A Review of Cases with Onset Before Six Years of Age
This is an enhance PF from The Journal of Bone an Joint Surgery The PF of the article you requeste follows this cover page. Legg-Calvé-Perthes isease: A Review of Cases with Onset Before Six Years of Age
More informationTrend Toward High-Volume Hospitals and the Influence on Complications in Knee and Hip Arthroplasty
707 COPYRIGHT Ó 2016 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED A commentary by Davi W. Manning, MD, is linke to the online version of this article at jbjs.org. Tren Towar High-Volume Hospitals
More informationHost-vector interaction in dengue: a simple mathematical model
Host-vector interaction in engue: a simple mathematical moel K Tennakone, L Ajith De Silva (Inex wors: engue, engue moel, engue Sri Lanka, enemic equilibrium, engue virus iversity) Abstract Introuction
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and
More informationRecurrent Neural Networks for Multivariate Time Series with Missing Values
www.nature.com/scientificreports Receive: 1 November 2017 Accepte: 26 March 2018 Publishe: xx xx xxxx OPEN Recurrent Neural Networks for Multivariate Time Series with Missing Values Zhengping Che 1, Sanjay
More informationAnalysis and Simulations of Dynamic Models of Hepatitis B Virus
Analysis an Simulations of Dynamic Moels of Hepatitis B Virus Xisong Dong (Corresponing author) National Engineering Laboratory for Disaster Backup an Recovery Beijing University of Posts an Telecommunications
More informationAnalyzing the impact of modeling choices and assumptions in compartmental epidemiological models
Simulation Special Section on Meical Simulation Analyzing the impact of moeling choices an assumptions in compartmental epiemiological moels Simulation: Transactions of the Society for Moeling an Simulation
More informationAnalyzing the Impact of Modeling Choices and Assumptions in Compartmental Epidemiological Models
Analyzing the Impact of Moeling Choices an Assumptions in Compartmental Epiemiological Moels Journal Title XX(X):1 11 c The Author(s) 2016 Reprints an permission: sagepub.co.uk/journalspermissions.nav
More informationComputer-Assisted Surgical Navigation Does Not Improve the Alignment and Orientation of the Components in Total Knee Arthroplasty
This is an enhance PDF from The Journal of Bone an Joint Surgery The PDF of the article you requeste follows this cover page. Computer-Assiste Surgical Navigation Does Not Improve the Alignment an Orientation
More informationThe incidence of treated end-stage renal disease in New Zealand Maori and Pacific Island people and in Indigenous Australians
Nephrol Dial Transplant (2004) 19: 678 685 DOI: 10.1093/nt/gfg592 Original Article The incience of treate en-stage renal isease in New Zealan Maori an Pacific Islan people an in Inigenous Australians John
More informationThree-Dimensional Analysis of Acute Scaphoid Fracture Displacement: Proximal Extension Deformity of the Scaphoid
141 COPYRIGHT Ó 2017 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED Three-Dimensional Analysis of Acute Scaphoi Fracture Displacement: Proximal Extension Deformity of the Scaphoi Yonatan Schwarcz,
More informationExperimental Study on Strength Evaluation Applied for Teeth Extraction: An In Vivo Study
Sen Orers of Reprints at reprints@benthamscience.net 2 The Open Dentistry Journal, 213, 7, 2-26 Open Access Experimental Stuy on Strength Evaluation Applie for Teeth Extraction: An In Vivo Stuy Marco Cicciù
More informationImproving genomics-based predictions for precision medicine through active elicitation of expert knowledge
Bioinformatics, 34, 2018, i395 i403 oi: 10.1093/bioinformatics/bty257 ISMB 2018 Improving genomics-base preictions for precision meicine through active elicitation of expert knowlege Iiris Sunin 1,, Tomi
More informationAs information technologies and applications
COMPUTING PRACTICES Using Coplink to Analyze Criminal-Justice Data The Coplink system applies a concept space a statistics-base, algorithmic technique that ientifies relationships between suspects, victims,
More informationA Vital Sign and Sleep Monitoring Using Millimeter Wave
A Vital Sign an Sleep Monitoring Using Millimeter Wave ZHICHENG YANG, University of California, Davis PARTH H. PATHAK, George Mason University YUNZE ZENG, University of California, Davis XIXI LIRAN, University
More informationCAN Tree Routing for Content-Addressable Network
Sensors & Transucers 2014 by IFSA Publishing, S. L. htt://www.sensorsortal.com CAN Tree Routing for Content-Aressable Network Zhongtao LI, Torben WEIS University Duisburg-Essen, Universität Duisburg-Essen
More informationImproving genomics-based predictions for precision medicine through active elicitation of expert knowledge
https://hela.helsinki.fi Improving genomics-base preictions for precision meicine through active elicitation of expert knowlege Sunin, Iiris 2018-07-01 Sunin, I, Peltola, T, Micallef, L, Afrabanpey, H,
More informationDistal extension of the direct anterior approach to the hip poses risk to neurovascular structures: an anatomical study
Zurich Open Repository an Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2015 Distal extension of the irect anterior approach to the hip poses risk to
More informationAsymmetric lateral distribution of melanoma and Merkel cell carcinoma in the United States
Asymmetric lateral istribution of melanoma an Merkel cell carcinoma in the Unite States KellyG.Paulson,PhD,JayasriG.Iyer,MD,anPaulNghiem,MD,PhD Seattle, Washington Backgroun: A recent report suggeste a
More informationAn Improved Algorithm To Predict Recurrence Of Breast Cancer
An Improved Algorithm To Predict Recurrence Of Breast Cancer Umang Agrawal 1, Ass. Prof. Ishan K Rajani 2 1 M.E Computer Engineer, Silver Oak College of Engineering & Technology, Gujarat, India. 2 Assistant
More informationA Propensity-Matched Cohort Study
380 COPYRIGHT Ó 2014 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED Delaye Woun Closure Increases Deep-Infection Rate Associate with Lower-Grae Open Fractures A Propensity-Matche Cohort Stuy Richar
More informationA reduced ODE model of the bovine estrous cycle
Konra-Zuse-Zentrum für Informationstechnik Berlin Takustraße 7 D-14195 Berlin-Dahlem Germany C. STÖTZEL, M. APRI, S. RÖBLITZ A reuce ODE moel of the bovine estrous cycle ZIB Report 14-33 (August 2014)
More informationVariable Features Selection for Classification of Medical Data using SVM
Variable Features Selection for Classification of Medical Data using SVM Monika Lamba USICT, GGSIPU, Delhi, India ABSTRACT: The parameters selection in support vector machines (SVM), with regards to accuracy
More informationDownloaded from:
Eames, KTD (2007) Contact tracing strategies in heterogeneous populations. Epiemiology an infection, 135 (3). pp. 443-454. ISSN 0950-2688 DOI: https://oi.org/10.1017/s0950268806006923 Downloae from: http://researchonline.lshtm.ac.uk/6930/
More informationAMERICAN THORACIC SOCIETY DOCUMENTS
AMERICAN THORACIC SOCIETY DOCUMENTS An Official American Thoracic Society Research Statement: Current Challenges Facing Research an Therapeutic Avances in Airway Remoeling Y. S. Prakash, Anrew J. Halayko,
More informationBy Edmund Lau, MS, Kevin Ong, PhD, Steven Kurtz, PhD, Jordana Schmier, MA, and Av Edidin, PhD
1479 COPYRIGHT Ó 2008 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED Mortality Following the Diagnosis of a Vertebral Compression Fracture in the Meicare Population By Emun Lau, MS, Kevin Ong,
More informationBy Jae Kwang Kim, MD, PhD, Young-Do Koh, MD, PhD, and Nam-Hoon Do, MD
1 COPYRIGHT Ó 2010 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED A commentary by Moheb S. Moneim, MD, is available at www.jbjs.org/commentary an as supplemental material to the online version
More informationHow to Design a Good Case Series
21 COPYRIGHT Ó 2009 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED How to Design a Goo Case Series By Bauke Kooistra, BSc, Bernaette Dijkman, BSc, Thomas A. Einhorn, MD, an Mohit Bhanari, MD, MSc,
More informationBackground. Aim. Design and setting. Method. Results. Conclusion. Keywords
Research Ebun A Abarshi, Michael A Echtel, Lieve Van en Block, Gé A Donker, Luc Deliens an Bregje D Onwuteaka-Philipsen Recognising patients who will ie in the near future: a nationwie stuy via the Dutch
More informationA scored AUC Metric for Classifier Evaluation and Selection
A scored AUC Metric for Classifier Evaluation and Selection Shaomin Wu SHAOMIN.WU@READING.AC.UK School of Construction Management and Engineering, The University of Reading, Reading RG6 6AW, UK Peter Flach
More informationAn Empirical and Formal Analysis of Decision Trees for Ranking
An Empirical and Formal Analysis of Decision Trees for Ranking Eyke Hüllermeier Department of Mathematics and Computer Science Marburg University 35032 Marburg, Germany eyke@mathematik.uni-marburg.de Stijn
More informationCorticosteroid injection in diabetic patients with trigger finger: A prospective, randomized, controlled double-blinded study
Washington University School of Meicine igital Commons@Becker Open Access Publications 12-1-2007 Corticosteroi injection in iabetic patients with trigger finger: A prospective, ranomize, controlleouble-bline
More information