Clustering & Classification of ERP Patterns: Methods & Results for TKDD09 Paper Haishan Liu, Gwen Frishkoff, Robert Frank, & Dejing Dou
|
|
- Kellie Merritt
- 5 years ago
- Views:
Transcription
1 Clustering & Classification of ERP Patterns: Methods & Results for TKDD09 Paper Haishan Liu, Gwen Frishkoff, Robert Frank, & Dejing Dou Created: 01/21/2009 by HL Last edit: 01/27/2009 by GF Dataset: 1. Summary There are two parts to this report: 1) clustering results for the 4 LP1, LP2 datasets, 3 target ERP patterns, and 14 pattern attributes; and 2) clustering and cluster-based classification results for WL3a data using 6 target ERP patterns (8 originally; see Notes in Section XX), and XX pattern attributes. [Summary of results -- maybe copy summary tables for best results & summarize in a few sentences. Note that a priori specfication of #clusters is helpful. Discuss consistency of results, or lack thereof, across 4 LP datasets. Talk about basis for selection of metrics to use for clustering & classification. Note results for WL3a classification based on expert vs. autolabeled data.] 2. Clustering LP1 and LP2 data For the LP1 and LP2 experiments, there are four datasets: LP1g1, LP1g2, LP2g1, LP2g2; and two clustering techniques: (a) manually specifying the number of clusters and (b) automatic determination of the number of clusters. These parameters resulted in 4x2=8 experiments. 2.1 Three (3) Pattern Rules used for Autolabeling for LP1 and LP2 data The input to the clustering consists of labeled data for 3 early patterns (patterns with peak latencies between ~0-250 ms after stimulus onset). For these case studies, the LP1 and LP2 data (i.e, individual observations for each subject, condition, and tpca factor) were automatically labeled by RMF using the rules that GF specified in ERP_Rules_09-02.doc. The 3 rules are listed below: Rule #1 (pattern PT 1 = P100-visual component of the ERP) Let ROI=occipital (average of left occipital, right occipital) For any n, FA n = PT 1 iff 80ms < TI-max (FA n ) < 150ms AND temporal criterion #1 IN-mean(ROI).4 mv AND min variance criterion IN-mean(ROI) > 0 spatial criterion #1 Rule #2a (pattern PT 2 = N100-visual component of the ERP) Let ROI=occipital (average of left occipital, right occipital) For any n, FA n = PT 2 iff 150ms < TI-max (FA n ) < 220ms AND temporal criterion #2a IN-mean(ROI).4 mv AND min variance criterion IN-mean(ROI) < 0 spatial criterion #2a Rule #2b (pattern PT 2 = laten1/n2-visual component of the ERP) Let ROI=occipital-posterior temporal (average of left occipital, left posterior temporal) For any n, FA n = PT 2 iff 220ms < TI-max (FA n ) < 300ms AND temporal criterion #2b IN-mean(ROI).4 mv AND min variance criterion Robert Frank Comment: Rules are consistent with ERP_Rules_09 02.doc Robert Frank Comment: Rule 1 criteria consistent with ERP_Rules_09 02.doc Robert Frank Comment: Rule 2a criteria consistent with ERP_Rules_09 02.doc Robert Frank Comment: Rule 2b criteria consistent with ERP_Rules_09 02.doc
2 IN-mean(ROI) < 0 spatial criterion #2b 2.2 Metrics used for Clustering of LP1 and LP2 data We used 13 metrics to summarize the temporal and spatial attributes of the 3 ERP patterns in datasets LP1 and LP2, as shown in Table 2. Note that, where ROI is a pre-defined scalp region, is not used in the clustering. Table 1. Metrics used for clustering of LP1 and LP2 Metric Label Brief Definition Temporal Spatial TI-max Peak latency (in ms) x IN-mean (LOCC) Mean intensity over LOCC scalp region x IN-mean (ROCC) Mean intensity over ROCC scalp region x IN-mean (LPAR) Mean intensity over LPAR scalp region x IN-mean (RPAR) Mean intensity over RPAR scalp region x IN-mean (LPTEM) Mean intensity over LPTEM scalp region x IN-mean (RPTEM) Mean intensity over RPTEM scalp region x IN-mean (LATEM) Mean intensity over LATEM scalp region x IN-mean (RATEM) Mean intensity over RATEM scalp region x IN-mean (LORB) Mean intensity over LORB scalp region x IN-mean (RORB) Mean intensity over RORB scalp region x IN-mean (LFRON) Mean intensity over LFRON scalp region x IN-mean (RFRON) Mean intensity over RFRON scalp region x 2.3 LP1g1 Dataset Data structure #Observations = 126 (#Subj*#Cond) #Subjects = 21 #Conditions = 6 #PCA Factors Retained for Autolabeling (Patt/Fac Matching) = 15 Pattern rules as specified in Section 2.1. Table 2: Autolabeling Results Summary for LP1group1 Factor Factor Factor 8 Factor Factor Factor NObs Rule 1 #Nonmatch (P100) #Match of 126 %Match 60% 20% 80% Rule 2a (N100) #Nonmatch #Match %Match 74% 74% Rule 2b #Nonmatch Haishan Liu Comment: As Dejing pointed out, this fraction is wrong. Can t add two fractions together. Should be 99/(99+153)=39% GF: The reason for adding these percentages was to show that some observations must have been belonged to multiple factors, which means that we need to be cautious about which observations are input to the clustering. Robert Frank 1/26/09 2:45 PM Comment: You can add percentages if they are w.r.t the same base. 75 = 60 % of 126, and 24 = 20% of 126. So 99 = 80% of 126. The base is 126, the # of raw ERP observatios in LP1g1. If the % in the last column is < 100 %, then PCAautlabel is stating that for some raw ERP observations, the pattern of interest is not present in their tpca factors. If the last column % is > 100, then autolabel is stating that for some raw ERP observations, the pattern of interest is present in 2 or more factors. Perhaps we can show in the last column that the #Match is out of 126 (# of raw ERP observations). Also, I am not certain we should sum across the # Nonmatch columns, and suggest deleting that row.
3 (N1/N2) #Match %Match 87% 76% 60% *223% Modal Factor (Rule) X (P100) X (N100) X (N1/N2) 474 (matches) [[DD: Is there any factors match more than one rule? It seems Factor 5 only match to P100, Factor 7 only match to N100 and Factor 9,12 and 15 only match to N1/N2. ]] [RF: I believe the phenomena of a single factor matching more than one rule occurred in WL3a. Also, the P100 (Rule 1) was captured by factors 5 and 12, while N1/N2 (Rule 2b) was captured by factors 8, 9 and 15.] Case Study #1: Clustering LP1g1 data using expert specification of # target patterns HL manually set the number of clusters to 3, since there are 3 target patterns (see Sec. 2.1). Only those observations that belonged to the 3 modal factors (Factors 5, 7, and 8; see Table 2) were used in the clustering. Hence, the total N is =278. The following run information gives the settings for clustering of these data in WEKA using the EM algorithm. Scheme: weka.clusterers.em -I 100 -N 3 -M 1.0E-6 -S 100 Relation: LP1group1_Subj21_NN_NW_WC_WN_WR_WU_pattern_factors_modalweka.filters.unsupervised.attribute.Remove-R1-4-weka.filters.unsupervised.attribute.Remove-R24-26 Instances: 278 Attributes: 24 [GF PLEASE RERUN CLUSTERING WITH THE ATTRIBUTES SPECIFIED IN TABLE 1] [[DD We agreed to use 14 (13?) attributes to re-run the tests. I think Haishan can do it Monday]] ROI TI-max IN-mean (LOCC) IN-mean (ROCC) IN-mean (LPAR) IN-mean (RPAR) IN-mean (LPTEM) IN-mean (RPTEM) IN-mean (LATEM) IN-mean (RATEM) IN-mean (LORB) IN-mean (RORB) IN-mean (LFRON) IN-mean (RFRON) SP-cor
4 IN-max SP-max SP-max ROI IN-min SP-min SP-min ROI Ignored: Pattern Test mode: Classes to clusters evaluation on training data === Model and evaluation on training set === EM == Number of clusters: 3 Class attribute: Pattern Classes to Clusters: <-- assigned to cluster P P2a P2b Cluster 0 <-- P1 Cluster 1 <-- P2b Cluster 2 <-- P2a Incorrectly clustered instances : % [[DD: I think the result is similar good (P1 and P2a) or bad (P2b) as kdd 07 replications]] Case Study #2: Clustering LP1g1 data without expert specification of # target patterns For this next analysis, HL let WEKA discover the number of patterns/clusters automatically. Only those observations that belonged to the 3 modal factors (Factors 5, 7, and 8; see Table 2) were used in the clustering. Hence, the total N is =278. The following run information gives the settings for clustering of these data in WEKA using the EM algorithm. === Run information === Scheme: weka.clusterers.em -I 100 -N -1 -M 1.0E-6 -S 100 Relation: LP1group1_Subj21_NN_NW_WC_WN_WR_WU_pattern_factors_modalweka.filters.unsupervised.attribute.Remove-R1-4,28-30 Instances: 278
5 Attributes: 24 [GF PLEASE RERUN CLUSTERING WITH THE ATTRIBUTES SPECIFIED IN TABLE 1] [[DD Again, we agreed to do it]] ROI TI-max IN-mean (LOCC) IN-mean (ROCC) IN-mean (LPAR) IN-mean (RPAR) IN-mean (LPTEM) IN-mean (RPTEM) IN-mean (LATEM) IN-mean (RATEM) IN-mean (LORB) IN-mean (RORB) IN-mean (LFRON) IN-mean (RFRON) SP-cor IN-max SP-max SP-max ROI IN-min SP-min SP-min ROI Ignored: Pattern Test mode: Classes to clusters evaluation on training data === Model and evaluation on training set === EM == Number of clusters selected by cross validation: 5 Class attribute: Pattern Classes to Clusters: <-- assigned to cluster P P2a P2b Cluster 0 <-- No class Cluster 1 <-- P1 Cluster 2 <-- No class Cluster 3 <-- P2a Cluster 4 <-- P2b
6 Incorrectly clustered instances : % [[DD: it is hard to say whether clustering without a number of clusters is better or worse than the 3-cluster clustering. One interesting question is that how close autolabeling is to gold standard? We may discuss it on Thursday]] 2.4. LP1g2 Dataset Data structure #Observations = 120 (#Subj*#Cond) #Subjects = 20 #Conditions = 6 #PCA Factors Retained for Autolabeling (Patt/Fac Matching) = 15 Pattern rules as specified in Section 2.1. Table 3: Autolabeling Results Summary for LP1group2 Factor 5 Factor 3 Factor Factor 10 Factor NObs 8 14 Rule 1 #Nonmatch (P100) #Match of 120 Rule 2a (N100) Rule 2b (N1/N2) %Match 65% 65% #Nonmatch #Match %Match 96% 51% *147% #Nonmatch #Match %Match 75% 34% *109% Modal Factor (Rule) X (Fac5/P1) X (Fac3/N1) X (Fac10/N2) 385 (matches) Case Study #3: Clustering LP1g2 data using expert specification of # target patterns HL manually set the number of clusters to 3, since there are 3 target patterns (see Sec. 2.1). Only those observations that belonged to the 3 modal factors (Factors 3, 5, and 10; see Table 3) were used in the clustering. Hence, the total N is =283. The following run information gives the settings for clustering of these data in WEKA using the EM algorithm. Scheme: weka.clusterers.em -I 100 -N 3 -M 1.0E-6 -S 100 Relation: LP1group2_Subj21_NN_NW_WC_WN_WR_WU_pattern_factors_modalweka.filters.unsupervised.attribute.Remove-R1-4,28-30 Instances: 283
7 Attributes: 24 [GF PLEASE RERUN CLUSTERING WITH THE ATTRIBUTES SPECIFIED IN TABLE 1] [[DD Agreed.]] ROI TI-max IN-mean (LOCC) IN-mean (ROCC) IN-mean (LPAR) IN-mean (RPAR) IN-mean (LPTEM) IN-mean (RPTEM) IN-mean (LATEM) IN-mean (RATEM) IN-mean (LORB) IN-mean (RORB) IN-mean (LFRON) IN-mean (RFRON) SP-cor IN-max SP-max SP-max ROI IN-min SP-min SP-min ROI Ignored: Pattern Test mode: Classes to clusters evaluation on training data === Model and evaluation on training set === EM == Number of clusters: 3 Class attribute: Pattern Classes to Clusters: <-- assigned to cluster P2a P P2b Cluster 0 <-- P2b Cluster 1 <-- P2a Cluster 2 <-- P1 Incorrectly clustered instances : %
8 [[DD: It is similar as kdd 07 replication that LP1 group 2 data are much more distinguishable then group 1 data. I understand we set 4 clusters in kdd replication and used different number of input]] Case Study #4: Clustering LP1g2 data without expert specification of # target patterns For this next analysis, HL let WEKA discover the number of patterns/clusters automatically. Only those observations that belonged to the 3 modal factors (Factors 3, 5, and 7; see Table 3) were used in the clustering. Hence, the total N is =283. The following run information gives the settings for clustering of these data in WEKA using the EM algorithm. === Run information === Scheme: weka.clusterers.em -I 100 -N -1 -M 1.0E-6 -S 100 Relation: LP1group2_Subj21_NN_NW_WC_WN_WR_WU_pattern_factors_modalweka.filters.unsupervised.attribute.Remove-R1-4,28-30 Instances: 283 Attributes: 24 [GF PLEASE RERUN CLUSTERING WITH THE ATTRIBUTES SPECIFIED IN TABLE 1] [[DD Agreed]] ROI TI-max IN-mean (LOCC) IN-mean (ROCC) IN-mean (LPAR) IN-mean (RPAR) IN-mean (LPTEM) IN-mean (RPTEM) IN-mean (LATEM) IN-mean (RATEM) IN-mean (LORB) IN-mean (RORB) IN-mean (LFRON) IN-mean (RFRON) SP-cor IN-max SP-max SP-max ROI IN-min SP-min SP-min ROI Ignored: Pattern Test mode: Classes to clusters evaluation on training data
9 === Model and evaluation on training set === EM == Number of clusters selected by cross validation: 13 Class attribute: Pattern Classes to Clusters: <-- assigned to cluster P2a P P2b Cluster 0 <-- No class Cluster 1 <-- No class Cluster 2 <-- No class Cluster 3 <-- P2a Cluster 4 <-- P1 Cluster 5 <-- No class Cluster 6 <-- No class Cluster 7 <-- No class Cluster 8 <-- No class Cluster 9 <-- No class Cluster 10 <-- No class Cluster 11 <-- No class Cluster 12 <-- P2b Incorrectly clustered instances : % [[DD: I think the result is worse than pre-set up number of clusters. Here it actually shows the domain knowledge are helpful for data mining]] 2.5. LP2g1 Dataset Data structure: #Observations = 144 (#Subj*#Cond) #Subjects = 24 #Conditions = 6 #PCA Factors Retained for Autolabeling (Patt/Fac Matching) = 15 Pattern rules as specified in Section 2.1. HL came up with the autolabeling results summary for LP2 data according to the LP1 examples [GF?? -- We should review this to be sure the results summary is correct]. [[DD: I have some doubt too. Haisan, can you explain it a little more. Why LP1 examples can be used for LP2 data?]] [RF: I have attached a copy of the GrandAverageStats_LP2-Gp1 spreadsheet, which I believe references this data, and the
10 results summary is a bit different.] HL chose modal factors for the target ERP patterns based the percentage of observations that matched a given rule for each of the latent factors. All the experiments were conducted using only observations that were captured by the modal factors. The cluster-to-class assignment tables in the results are highlighted. Table 4: Autolabeling Results Summary for LP2group1 Factor 3 Factor 5 Factor 7 Factor 8 NObs Rule 1 #Nonmatch (P100) #Match Rule 2a (N100) Rule 2b (N1/N2) %Match 67% 67% #Nonmatch #Match %Match 67% 31% 49% #Nonmatch #Match %Match 79% 79% Modal Factor (Rule) X (P100) X (N100) X (N1/N2) 306 (matches) Case Study #5: Clustering LP2g1 data using expert specification of # target patterns HL manually set the number of clusters to 3, since there are 3 target patterns (see Sec. 2.1). Only those observations that belonged to the 3 modal factors (Factors 3, 5, and 8; see Table 4) were used in the clustering. Hence, the total N is =278. The following run information gives the settings for clustering of these data in WEKA using the EM algorithm. Scheme: weka.clusterers.em -I 100 -N 3 -M 1.0E-6 -S 100 Relation: LP2group1_Subj21_NN_NW_WC_WN_WR_WU_pattern_modalweka.filters.unsupervised.attribute.Remove-R1-4,28-30 Instances: 267 Attributes: 24 [GF PLEASE RERUN CLUSTERING WITH THE ATTRIBUTES SPECIFIED IN TABLE 1] [[DD Agreed]] ROI TI-max IN-mean (LOCC) IN-mean (ROCC) IN-mean (LPAR) IN-mean (RPAR) IN-mean (LPTEM) IN-mean (RPTEM) IN-mean (LATEM) IN-mean (RATEM) IN-mean (LORB)
11 IN-mean (RORB) IN-mean (LFRON) IN-mean (RFRON) SP-cor IN-max SP-max SP-max ROI IN-min SP-min SP-min ROI Ignored: Pattern Test mode: Classes to clusters evaluation on training data === Model and evaluation on training set === EM == Number of clusters: 3 Class attribute: Pattern Classes to Clusters: <-- assigned to cluster P P2a P3 Cluster 0 <-- P1 Cluster 1 <-- P2a Cluster 2 <-- P3 Incorrectly clustered instances : % [[DD: I would say that the result is similar to LP1g1, not bad but not very good]] Case Study #6: Clustering LP2g1 data without expert specification of # target patterns For this next analysis, HL let WEKA discover the number of patterns/clusters automatically. Only those observations that belonged to the 3 modal factors (Factors 3, 5, and 8; see Table 3) were used in the clustering. Hence, the total N is =278. The following run information gives the settings for clustering of these data in WEKA using the EM algorithm. Scheme: weka.clusterers.em -I 100 -N -1 -M 1.0E-6 -S 100
12 Relation: LP2group1_Subj21_NN_NW_WC_WN_WR_WU_pattern_modalweka.filters.unsupervised.attribute.Remove-R1-4,28-30 Instances: 267 Attributes: 24 [GF PLEASE RERUN CLUSTERING WITH THE ATTRIBUTES SPECIFIED IN TABLE 1] [[DD Agreed]] ROI TI-max IN-mean (LOCC) IN-mean (ROCC) IN-mean (LPAR) IN-mean (RPAR) IN-mean (LPTEM) IN-mean (RPTEM) IN-mean (LATEM) IN-mean (RATEM) IN-mean (LORB) IN-mean (RORB) IN-mean (LFRON) IN-mean (RFRON) SP-cor IN-max SP-max SP-max ROI IN-min SP-min SP-min ROI Ignored: Pattern Test mode: Classes to clusters evaluation on training data === Model and evaluation on training set === EM == Number of clusters selected by cross validation: 3 Class attribute: Pattern Classes to Clusters: <-- assigned to cluster P P2a P3 Cluster 0 <-- P1
13 Cluster 1 <-- P2a Cluster 2 <-- P3 Incorrectly clustered instances : % [[DD: It is interesting Weka choose the same number of clusters as autolabeling results]] 2.6. LP2g2 Dataset Data structure: #Observations = 144 (#Subj*#Cond) #Subjects = 24 #Conditions = 6 #PCA Factors Retained for Autolabeling (Patt/Fac Matching) = 15 Pattern rules as specified in Section 2.1. Table 5: Autolabeling Results Summary for LP2group2 Factor Factor Factor Factor Factor Factor Factor NObs Rule 1 #Nonmatch (P100) #Match %Match 0% 40% Rule 2a #Nonmatch (N100) #Match Rule 2b (N1/N2) %Match 36% #Nonmatch #Match %Match 65% Modal Factor (Rule) X (P100) X (N100) X (N1/N2) 405 (matches) Case Study #7: Clustering LP2g2 data using expert specification of # target patterns HL manually set the number of clusters to 3, since there are 3 target patterns (see Sec. 2.1). Only those observations that belonged to the 3 modal factors (Factors 5, 6, and 7; see Table 5) were used in the clustering. Hence, the total N is =267. [GF I would use Factor 7 as the modal factor for the N2 pattern, not Factor 14. Factor 14 is noiser]. [[DD: Either Factor 7 or Factor 14 is ok for me because I do not have enough domain knowledge to choose. Gwen, could you please explain why Factor 14 is noiser although it has high percentage matching to Rule 2b? On the other hand, if we know it is a noiser, why list in table 5?]] [RF: I need to double-check, but I believe the factors are in order of decreasing variance accounted for, so the higher # factors tend to be noisier. However, regardless of a factor s SNR, we applied PCAautolabel to the first 15 factors: If any one of them was flagged as capturing a pattern of interest in one or more raw ERP observations, it would appear in the table.]
14 The following run information gives the settings for clustering of these data in WEKA using the EM algorithm. Scheme: weka.clusterers.em -I 100 -N -1 -M 1.0E-6 -S 100 Relation: LP2group1_Subj21_NN_NW_WC_WN_WR_WU_pattern_modalweka.filters.unsupervised.attribute.Remove-R1-4,28-30 Instances: 267 Attributes: 24 [GF PLEASE RERUN CLUSTERING WITH THE ATTRIBUTES SPECIFIED IN TABLE 1] [DD Agreed] ROI TI-max IN-mean (LOCC) IN-mean (ROCC) IN-mean (LPAR) IN-mean (RPAR) IN-mean (LPTEM) IN-mean (RPTEM) IN-mean (LATEM) IN-mean (RATEM) IN-mean (LORB) IN-mean (RORB) IN-mean (LFRON) IN-mean (RFRON) SP-cor IN-max SP-max SP-max ROI IN-min SP-min SP-min ROI Ignored: Pattern Test mode: Classes to clusters evaluation on training data === Model and evaluation on training set === EM == Number of clusters selected by cross validation: 3 Class attribute: Pattern Classes to Clusters: <-- assigned to cluster P P2a
15 P3 Cluster 0 <-- P1 Cluster 1 <-- P2a Cluster 2 <-- P3 Incorrectly clustered instances : % Case Study #8: Clustering LP2g2 data without expert specification of # target patterns For this next analysis, HL let WEKA discover the number of patterns/clusters automatically. Only those observations that belonged to the 3 modal factors (Factors 5, 6, and 14; see Table 5) were used in the clustering. Hence, the total N is =267. [GF I would use Factor 7 as the modal factor for the N2 pattern, not Factor 14. Factor 14 is noiser]. [[DD Gwen may help us understand this a little more]] The following run information gives the settings for clustering of these data in WEKA using the EM algorithm. Scheme: weka.clusterers.em -I 100 -N -1 -M 1.0E-6 -S 100 Relation: LP2group2_Subj21_NN_NW_WC_WN_WR_WU_pattern_moadalweka.filters.unsupervised.attribute.Remove-R1-4,28-30 Instances: 257 Attributes: 24 [GF PLEASE RERUN CLUSTERING WITH THE ATTRIBUTES SPECIFIED IN TABLE 1] [[DD Agreed]] ROI TI-max IN-mean (LOCC) IN-mean (ROCC) IN-mean (LPAR) IN-mean (RPAR) IN-mean (LPTEM) IN-mean (RPTEM) IN-mean (LATEM) IN-mean (RATEM) IN-mean (LORB) IN-mean (RORB) IN-mean (LFRON) IN-mean (RFRON) SP-cor IN-max SP-max SP-max ROI
16 IN-min SP-min SP-min ROI Ignored: Pattern Test mode: Classes to clusters evaluation on training data === Model and evaluation on training set === EM == Number of clusters selected by cross validation: 5 Class attribute: Pattern Classes to Clusters: <-- assigned to cluster P P2a P2b Cluster 0 <-- P2a Cluster 1 <-- P1 Cluster 2 <-- No class Cluster 3 <-- No class Cluster 4 <-- P2b Incorrectly clustered instances : %
17 3. Clustering & Classification of WL3a data 3.1 Eight (8) Pattern Rules used for Autolabeling for WL3a data The input to the clustering consists of labeled data for 8 ERP patterns (patterns with peak latencies between ~0-900 ms after stimulus onset). For these case studies, the WL3a data (i.e, individual observations for each subject, condition, and tpca factor) were automatically labeled using the rules as 3.2 Metrics used for Clustering of WL3a data We used 14 metrics to summarize the temporal and spatial attributes of the 8 ERP patterns in dataset WL3a, as shown in Table 2. Note that, where ROI is a pre-defined scalp region, is not used in the clustering. Table 6. Metrics used for clustering of WL3 data Metric Label Brief Definition Temporal Spatial Functional TI-max Peak latency (in ms) x TI-duration Duration (in ms) x IN-mean (LOCC) Mean intensity over LOCC scalp region X IN-mean (ROCC) Mean intensity over ROCC scalp region X IN-mean (LPAR) Mean intensity over LPAR scalp region X IN-mean (RPAR) Mean intensity over RPAR scalp region X IN-mean (LPTEM) Mean intensity over LPTEM scalp region X IN-mean (RPTEM) Mean intensity over RPTEM scalp region X IN-mean (LATEM) Mean intensity over LATEM scalp region X IN-mean (RATEM) Mean intensity over RATEM scalp region X IN-mean (LORB) Mean intensity over LORB scalp region X IN-mean (RORB) Mean intensity over RORB scalp region X IN-mean (LFRON) Mean intensity over LFRON scalp region X IN-mean (RFRON) Mean intensity over RFRON scalp region X Pseudo Known Condition (Diffwave) x RareMisses-RareHits Condition (Diffwave) x RareHits-Known Condition (Diffwave) x Pseudo RareMisses Condition (Diffwave) x 3.3 Clustering WL3a data using expert specification of # target patterns Input: WL3a_PCAautolabel_2007Feb07v4.xls Pattern factors are extracted according to the auto-labeling results (column N) Preprocessing: Combined all sheets except Attribute4Mining in the input file into a single sheet. Add a new column at the end of the new sheet called pattern. Filter out non-pattern factors according to the value in the Pattern Present column. [[GF Please clarify whether observations were filtered using Pattern Present (expert labeling ) or Fac=Patt (autolabeling). [[DD Haishan, can you explain this?]]
18 Data structure: #Observations = 144 (#Subj*#Cond) #Subjects = 36 #Conditions = 4 #PCA Factors Retained for Autolabeling (Patt/Fac Matching) = 15 Pattern Rules used for Autolabeling: See Appendix B of Frishkoff, Frank, et al., 2007 (Computational Intelligence & Neuroscience) Table 7. Autolabeling Results Summary: (Grand Average Mean %Match GrandAverageStats.xls_WL-3a.xls -> Column H) Rule 1 (P100) Rule 2a (N100) Rule 2b (N1/N2) Rule 3 (N3) Rule 4 (P1r) Rule 5 (MFN) Rule 6 (N4) Rule 7 (P300) Fac4 Fac3 Fac10 Fac7 Fac2 Fac8 Fac9 Fac11 Fac13 Fac15 NObs #Nonmatch #Match %Match 83% 35% 118% #Nonmatch #Match %Match 83% 83% #Nonmatch #Match * %Match 51% 69% 120% #Nonmatch #Match * %Match 42% 48% 59% 149% #Nonmatch #Match * %Match 65% 26% 63% 151% #Nonmatch #Match %Match 23% 37% 34% 41% 135% #Nonmatch #Match 14 *51 65 %Match 10% 35% 45% #Nonmatch #Match %Match 60% 57% 10% 127% Robert Frank Comment: Statistics are missing from GranAvgStat spreadsheet. Need to recomputed Case Study #9: Clustering WL3a data without expert specification of # target patterns HL set the number of clusters to six (6), since there are 6 latent pattern-related factors (see Table 7). Only those observations that belonged to the 6 modal factors (Factors 2, 3, 4, 7, 9, and 10; see Table 7) were used in the clustering. Hence, the total N is 119(P1/Fac4)+119(N1/Fac3)+74(N2/fac10)+61(N3/Fac7)+53(MFN/Fac2)+82(P3/Fac9)=508.
19 [[DD It seems some factors match more than one rule. I am little confused. Why the number of clusters should be 6? Since we have 8 pattern rules, the ideal case is that we can have 8 clusters. Otherwise, we have no way to generate classification rules based on clustering result.]] [RF: With respect to 1 factor matching more than 1 rule, take for instance Factor 2. Rules 5, 6 and 7 have overlapping temporal windows, so the factor s Ti-max, which is subject and condition invariant, can meet all three rule criteria. Although the MFN (Rule 5), N4 (Rule 6) and P300 (Rule 7) have different spatial criteria, the spatial topography of a given factor in tpca, such as Factor 2, is subject and observation specific: Factor 2 can satisfy a given rule s spatial criteria in one ERP observation (subject and condition) and still satisfy another rule s very different spatial criteria in some other ERP observation. It also depends on whether or not the spatial criteria of the rules of the patterns in question are mutually exclusive: Rules 6 and 7 have mutually exclusive spatial criteria, but Rules 5 and 6 as one pair, and Rules 5 and 7 as another, do not. Moreover, the extent to which a factor s multiple rule matches correspond to identical or distinct ERP observations can affect whether or not the factor is actually describing more than one pattern. I think that collapsing the number of clusters to 6, based on the PCAautolabel results, is a judgement call; There may be more than 6 patterns described by the 6 factors. Gwen, do my remarks seem reasonable?] [GF I m in total agreement with Bob s comments. It is a judgment call. It may also be informative to note that for Factor 2, I ONLY selected observations meeting criteria for one rule (the MFN). Imagine that we didn t know that the N400 had been observed in other experiments as a distinct pattern. Then we would not know that some observations that meet the MFN criteria actualy contain information that CAN be captured with more than one pattern rule. It s possible, in principle, that any of the latent temporal PCA factors could confound more than one pattern. Similarly for Factor 7 (I chose observations matching the N3 somewhat arbitrarily, and because the N3 is of greater interest to me than the P1r at the moment ). So, if my reasoning is correct, I think it is possible to explain and justify the decision to summarize the patterns in the WL3a data using only 6 expert-defined pattern rules (because we only selected observations matching these 6 patterns). **IF CLUSTERING WITHOUT PRESPECIFICATION OF NUMBER OF CLUSTERS SUGGESTS THERE ARE MORE PATTERNS AND IF WE THE RESULT IS BELIEVEABLE THAN I BELIEVE THIS WOULD SHOW HOW DATA MINING CAN ADD TO OUR CERTAINTY THAT MORE PATTERNS EXIST (WHICH WE ALREADY BELIEVE, BUT ARE HARD-PRESSED TO SHOW USING TPCA WITH THESE DATA).** ] The following run information gives the settings for clustering of these data in WEKA using the EM algorithm Scheme: weka.clusterers.em -I 100 -N 8 -M 1.0E-6 -S 100 Relation: WL3a_PCAautolabel_2007Feb16_merged_with_pattern_auto_label- weka.filters.unsupervised.attribute.remove-r1-4-weka.filters.unsupervised.attribute.remove-r1-4- weka.filters.unsupervised.attribute.remove-r2-8-weka.filters.unsupervised.attribute.remove-r28-30,35-50 Instances: 615 [GF PLEASE RERUN CLUSTERING WITH 508 OBSERVATIONS AS SPECIFIED ABOVE] [[DD again, if we do not consider Factor 11, 13, 15 whatever the matching percentage is, why we list them in the table]]
20 Attributes: 32 [GF PLEASE RERUN CLUSTERING WITH THE ATTRIBUTES SPECIFIED IN TABLE 6] [[DD agreed and we believe it is the best domain knowledge so far]] NGOODS IN-LOCC IN-ROCC IN-LPAR IN-RPAR IN-LPTEM IN-RPTEM IN-LATEM IN-RATEM IN-LORB IN-RORB IN-LFRON IN-RFRON SP-cor TI-max TI-begin TI-end TI-duration IN-max to Baseline IN-min to Baseline IN-max SP-max SP-max ROI IN-min SP-min SP-min ROI Pseudo-Known RareMisses-RareHits RareHits-Known Pseudo-RareMisses Ignored: Pattern Test mode: Classes to clusters evaluation on training data === Model and evaluation on training set === EM == Classes to Clusters: <-- assigned to cluster P N N N3
21 P1r MFN N P3 Cluster 0 <-- MFN Cluster 1 <-- P100 Cluster 2 <-- P1r Cluster 3 <-- N2 Cluster 4 <-- N100 Cluster 5 <-- P3 Cluster 6 <-- No class Cluster 7 <-- N3 Incorrectly clustered instances : % 3.4 Cluster-based classification of WL3a data using expert specification of # target patterns For the WL3a experiments, HL used 6-cluster EM clustering algorithm and based on the result of which HL conduct the classification process [GF?? -- I thought we were going to use 6 clusters, since there are only 6 modal factors? Can HL explain his procedure for deriving 8 clusters instead of 6?]. The rules derived are highlighted. [[DD I guess Haishan was looking that there are 8 pattern rules because we finally hope compare data mining rules with expert rules. The number of classes (cluters) would better be eight. Even we use 6 modal factors, we can still set up the number of clusters as 8 because the autolabeling already show that one factor can match more than one patterns. ]] [GF BUT note that I am suggesting only to select observations that match one of the 6 pattern rules see explanation above.] Input: WL3a_PCAautolabel_2007Feb07v4.xls Pattern factors are extracted according to the auto-labeling results (column N) Preprocessing: Generating class label: Apply the AddCluster filter in the Preprocess tab in Weka Explorer. In the parameter panel of the filter, choose EM as the clusterer and set the # of clusters to 8 in EM parameter. This procedure attaches a new column at the end of the file with the assigned cluster values to each factor. Result: === Run information === Scheme: weka.classifiers.trees.j48 -C M 2 Relation: WL3a_PCAautolabel_2007Feb16_merged_with_pattern_auto_label- weka.filters.unsupervised.attribute.remove-r1-4-weka.filters.unsupervised.attribute.remove-r1-4- weka.filters.unsupervised.attribute.remove-r2-8-weka.filters.unsupervised.attribute.remove-r28-30,35-50-weka.filters.unsupervised.attribute.remove-r32-
22 weka.filters.unsupervised.attribute.addcluster-wweka.clusterers.em -I 100 -N 8 -M 1.0E-6 -S 100 Instances: 615 [GF PLEASE RERUN CLUSTERING WITH 508 OBSERVATIONS AS SPECIFIED ABOVE] Attributes: 32 [GF PLEASE RERUN CLUSTERING WITH THE ATTRIBUTES SPECIFIED IN TABLE 6] [[DD - Agreed]] NGOODS IN-LOCC IN-ROCC IN-LPAR IN-RPAR IN-LPTEM IN-RPTEM IN-LATEM IN-RATEM IN-LORB IN-RORB IN-LFRON IN-RFRON SP-cor TI-max TI-begin TI-end TI-duration IN-max to Baseline IN-min to Baseline IN-max SP-max SP-max ROI IN-min SP-min SP-min ROI Pseudo-Known RareMisses-RareHits RareHits-Known Pseudo-RareMisses cluster Test mode: 10-fold cross-validation === Classifier model (full training set) === J48 pruned tree TI-max <= 276 TI-max <= 102 IN-RORB <= : cluster2 (71.0/1.0)
23 IN-RORB > Pseudo-Known <= : cluster7 (43.0/3.0) Pseudo-Known > : cluster2 (5.0/1.0) TI-max > 102 TI-max <= 230 IN-min <= <= SP-min <= 20: cluster4 (3.0/1.0) SP-min > 20: cluster5 (61.0/5.0) > : cluster3 (9.0/1.0) IN-min > IN-LORB <= SP-cor <= : cluster3 (3.0) SP-cor > : cluster5 (2.0/1.0) IN-LORB > IN-max <= : cluster4 (104.0) IN-max > SP-min <= 92: cluster5 (7.0) SP-min > 92: cluster4 (4.0/1.0) TI-max > 230 IN-LOCC <= IN-LPTEM <= IN-LFRON <= SP-cor <= : cluster3 (3.0) SP-cor > : cluster5 (2.0) IN-LFRON > : cluster5 (15.0) IN-LPTEM > Pseudo-Known <= : cluster5 (2.0) Pseudo-Known > : cluster3 (116.0/1.0) IN-LOCC > SP-min <= 76: cluster8 (14.0) SP-min > 76: cluster3 (2.0) TI-max > 276 TI-max <= 408: cluster1 (67.0) TI-max > 408: cluster6 (82.0) Number of Leaves : 20 Size of the tree : 39 Time taken to build model: 0.16 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances % Incorrectly Classified Instances % Kappa statistic Mean absolute error Root mean squared error
24 Relative absolute error % Root relative squared error % Total Number of Instances 615 === Confusion Matrix === a b c d e f g h <-- classified as a = cluster b = cluster c = cluster d = cluster e = cluster f = cluster g = cluster h = cluster8 Quick Reference: Cluster toclass assignment: Cluster 0 < MFN Cluster 1 < P100 Cluster 2 < P1r Cluster 3 < N2 Cluster 4 < N100 Cluster 5 < P3 Cluster 6 < No class Cluster 7 < N3 Haishan Liu Comment: Cluster index is generated by weka preprocessor starting from 1. The cluster index in the cluster to class assignment starts from 0, which is generated by weka clustering module. I think there is a one to one correspondence between these indices, i.e., 0< >1, 1< >2 This can be further verified by comparing the data mining rules with the expert rules. === Rules Derived From the Tree === 1. TI-max <= 276 & TI-max <= 102 & IN-RORB <= ===> cluster2 (71.0/1.0) 2. TI-max <= 276 & TI-max <= 102 & IN-RORB > & Pseudo-Known <= ===> cluster7 (43.0/3.0) 3. TI-max <= 276 & TI-max <= 102 & IN-RORB > & Pseudo-Known > ===> cluster2 (5.0/1.0) 4. TI-max <= 276 & TI-max > 102 & TI-max <= 230 & IN-min <= & <= & SP-min <= 20 ===> cluster4 (3.0/1.0) 5. TI-max <= 276 & TI-max > 102 & TI-max <= 230 & IN-min <= & <= & SP-min > 20 ===> cluster5 (61.0/5.0) 6. TI-max <= 276 & TI-max > 102 & TI-max <= 230 & IN-min <= & > ===> cluster3 (9.0/1.0) 7. TI-max <= 276 & TI-max > 102 & TI-max <= 230 & IN-min > & IN-LORB <= & SP-cor <= ===> cluster3 (3.0) 8. TI-max <= 276 & TI-max > 102 & TI-max <= 230 & IN-min > & IN-LORB <= & SP-cor > ===> cluster5 (2.0/1.0) 9. TI-max <= 276 & TI-max > 102 & TI-max <= 230 & IN-min > & IN-LORB > & IN-max <= ===> cluster4 (104.0) 10. TI-max <= 276 & TI-max > 102 & TI-max <= 230 & IN-min > & IN-LORB > & IN-max > & SP-min <= 92 ===> cluster5 (7.0) 11. TI-max <= 276 & TI-max > 102 & TI-max <= 230 & IN-min > & IN-LORB > & IN-max > & SP-min > 92 ===> cluster4 (4.0/1.0) 12. TI-max <= 276 & TI-max > 102 & TI-max > 230 & IN-LOCC <= & IN-LPTEM <= & IN-LFRON <= & SP-cor <= ===> cluster3 (3.0) 13. TI-max <= 276 & TI-max > 102 & TI-max > 230 & IN-LOCC <= & IN-LPTEM <= & IN-LFRON <= & SP-cor > ===> cluster5 (2.0) 14. TI-max <= 276 & TI-max > 102 & TI-max > 230 & IN-LOCC <= & IN-LPTEM <= & IN-LFRON > ===> cluster5 (15.0) 15. TI-max <= 276 & TI-max > 102 & TI-max > 230 & IN-LOCC <= & IN-LPTEM > & Pseudo-Known <= ===> cluster5 (2.0) 16. TI-max <= 276 & TI-max > 102 & TI-max > 230 & IN-LOCC <= & IN-LPTEM > & Pseudo-Known > ===> cluster3 (116.0/1.0) 17. TI-max <= 276 & TI-max > 102 & TI-max > 230 & IN-LOCC > & SP-min <= 76 ===> cluster8 (14.0)
25 18. TI-max <= 276 & TI-max > 102 & TI-max > 230 & IN-LOCC > & SP-min > 76 ===> cluster3 (2.0) 19. TI-max > 276 & TI-max <= 408 ===> cluster1 (67.0) 20. TI-max > 276 & TI-max > 408 ===> cluster6 (82.0) [GF (1/24/2009): Dejing, please rewrite rules so they can be aligned with expert rules as we discussed 8 days ago.] [[DD (1/26/2009): It seems we need ask Haishan to re-run the tests based on new metrics in table 6 and also Gwen suggested that the number clusters/patterns will be 6. If that is the case, how can we compare 6 cluster/classification rules with 8 expert rules. I can do the comparison for the kdd 07 replication report because we will not change the metrics and number of clusters there.]] [GF (1/27/2009): We would compare data mining results with 6 pattern rules that match the 6 patterns that tpca shows clearly can be separated in these data: P100, N100, N2, N3 (or P1r I chose N3 for my own reasons), MFN, and P300.]
Research Article A Framework to Support Automated Classification and Labeling of Brain Electromagnetic Patterns
Hindawi Publishing Corporation Computational Intelligence and Neuroscience Volume 2007, Article ID 14567, 13 pages doi:10.1155/2007/14567 Research Article A Framework to Support Automated Classification
More informationTo open a CMA file > Download and Save file Start CMA Open file from within CMA
Example name Effect size Analysis type Level Tamiflu Symptom relief Mean difference (Hours to relief) Basic Basic Reference Cochrane Figure 4 Synopsis We have a series of studies that evaluated the effect
More informationAppendix B. Nodulus Observer XT Instructional Guide. 1. Setting up your project p. 2. a. Observation p. 2. b. Subjects, behaviors and coding p.
1 Appendix B Nodulus Observer XT Instructional Guide Sections: 1. Setting up your project p. 2 a. Observation p. 2 b. Subjects, behaviors and coding p. 3 c. Independent variables p. 4 2. Carry out an observation
More informationEVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS
DePaul University INTRODUCTION TO ITEM ANALYSIS: EVALUATING AND IMPROVING MULTIPLE CHOICE QUESTIONS Ivan Hernandez, PhD OVERVIEW What is Item Analysis? Overview Benefits of Item Analysis Applications Main
More informationPredicting Breast Cancer Survivability Rates
Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer
More informationDATA MANAGEMENT & TYPES OF ANALYSES OFTEN USED. Dennis L. Molfese University of Nebraska - Lincoln
DATA MANAGEMENT & TYPES OF ANALYSES OFTEN USED Dennis L. Molfese University of Nebraska - Lincoln 1 DATA MANAGEMENT Backups Storage Identification Analyses 2 Data Analysis Pre-processing Statistical Analysis
More informationTechnical Specifications
Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically
More informationThe Leeds Reliable Change Indicator
The Leeds Reliable Change Indicator Simple Excel (tm) applications for the analysis of individual patient and group data Stephen Morley and Clare Dowzer University of Leeds Cite as: Morley, S., Dowzer,
More information15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA
15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA Statistics does all kinds of stuff to describe data Talk about baseball, other useful stuff We can calculate the probability.
More informationResponse to reviewer comment (Rev. 2):
Response to reviewer comment (Rev. 2): The revised paper contains changes according to comments of all of the three reviewers. The abstract was revised according to the remarks of the three reviewers.
More informationComputer Science 101 Project 2: Predator Prey Model
Computer Science 101 Project 2: Predator Prey Model Real-life situations usually are complicated and difficult to model exactly because of the large number of variables present in real systems. Computer
More informationUsing the NWD Integrative Screener as a Data Collection Tool Agency-Level Aggregate Workbook
Using the NWD Integrative Screener as a Data Collection Tool Agency-Level Aggregate Workbook The NWD Integrative Screener was designed to be user friendly for providers and agencies. Many items and health
More informationTitle:Prediction of poor outcomes six months following total knee arthroplasty in patients awaiting surgery
Author's response to reviews Title:Prediction of poor outcomes six months following total knee arthroplasty in patients awaiting surgery Authors: Eugen Lungu (eugen.lungu@umontreal.ca) François Desmeules
More informationarxiv: v1 [cs.lg] 4 Feb 2019
Machine Learning for Seizure Type Classification: Setting the benchmark Subhrajit Roy [000 0002 6072 5500], Umar Asif [0000 0001 5209 7084], Jianbin Tang [0000 0001 5440 0796], and Stefan Harrer [0000
More informationPBSI-EHR Off the Charts!
PBSI-EHR Off the Charts! Enhancement Release 3.2.1 TABLE OF CONTENTS Description of enhancement change Page Encounter 2 Patient Chart 3 Meds/Allergies/Problems 4 Faxing 4 ICD 10 Posting Overview 5 Master
More informationMike Hinds, Royal Canadian Mint
Experience in the Use of the LBMA Reference Materials Mike Hinds Royal Canadian Mint LBMA Assayer and Refiner March 2011 1 LBMA RM Project 2007-2010 2 Gold Reference Materials AuRM1 and AuRM2 Available
More informationIntro to SPSS. Using SPSS through WebFAS
Intro to SPSS Using SPSS through WebFAS http://www.yorku.ca/computing/students/labs/webfas/ Try it early (make sure it works from your computer) If you need help contact UIT Client Services Voice: 416-736-5800
More informationSum of Neurally Distinct Stimulus- and Task-Related Components.
SUPPLEMENTARY MATERIAL for Cardoso et al. 22 The Neuroimaging Signal is a Linear Sum of Neurally Distinct Stimulus- and Task-Related Components. : Appendix: Homogeneous Linear ( Null ) and Modified Linear
More informationIncorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011
Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011 I. Purpose Drawing from the profile development of the QIBA-fMRI Technical Committee,
More informationExercises: Differential Methylation
Exercises: Differential Methylation Version 2018-04 Exercises: Differential Methylation 2 Licence This manual is 2014-18, Simon Andrews. This manual is distributed under the creative commons Attribution-Non-Commercial-Share
More informationSUPPLEMENTARY INFORMATION In format provided by Javier DeFelipe et al. (MARCH 2013)
Supplementary Online Information S2 Analysis of raw data Forty-two out of the 48 experts finished the experiment, and only data from these 42 experts are considered in the remainder of the analysis. We
More informationDifferences of Face and Object Recognition in Utilizing Early Visual Information
Differences of Face and Object Recognition in Utilizing Early Visual Information Peter Kalocsai and Irving Biederman Department of Psychology and Computer Science University of Southern California Los
More informationPsy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics
Psy201 Module 3 Study and Assignment Guide Using Excel to Calculate Descriptive and Inferential Statistics What is Excel? Excel is a spreadsheet program that allows one to enter numerical values or data
More informationTerm Paper Step-by-Step
Term Paper Step-by-Step As explained in the Syllabus, each student will submit an 6-8 page (1250-2000 words) paper that examines and discusses current thinking in psychology about explanations and treatments
More informationUsing Data Mining Techniques to Analyze Crime patterns in Sri Lanka National Crime Data. K.P.S.D. Kumarapathirana A
!_ & Jv OT-: j! O6 / *; a IT Oi/i34- Using Data Mining Techniques to Analyze Crime patterns in Sri Lanka National Crime Data K.P.S.D. Kumarapathirana 139169A LIBRARY UNIVERSITY or MORATL^VA, SRI LANKA
More informationYour Task: Find a ZIP code in Seattle where the crime rate is worse than you would expect and better than you would expect.
Forensic Geography Lab: Regression Part 1 Payday Lending and Crime Seattle, Washington Background Regression analyses are in many ways the Gold Standard among analytic techniques for undergraduates (and
More informationSUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing
Categorical Speech Representation in the Human Superior Temporal Gyrus Edward F. Chang, Jochem W. Rieger, Keith D. Johnson, Mitchel S. Berger, Nicholas M. Barbaro, Robert T. Knight SUPPLEMENTARY INFORMATION
More informationTo open a CMA file > Download and Save file Start CMA Open file from within CMA
Example name Effect size Analysis type Level Tamiflu Hospitalized Risk ratio Basic Basic Synopsis The US government has spent 1.4 billion dollars to stockpile Tamiflu, in anticipation of a possible flu
More informationSawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.
Sawtooth Software RESEARCH PAPER SERIES MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB Bryan Orme, Sawtooth Software, Inc. Copyright 009, Sawtooth Software, Inc. 530 W. Fir St. Sequim,
More informationIntroduction to Computational Neuroscience
Introduction to Computational Neuroscience Lecture 5: Data analysis II Lesson Title 1 Introduction 2 Structure and Function of the NS 3 Windows to the Brain 4 Data analysis 5 Data analysis II 6 Single
More informationChapter 7: Descriptive Statistics
Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of
More informationMBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks
MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1 Lecture 27: Systems Biology and Bayesian Networks Systems Biology and Regulatory Networks o Definitions o Network motifs o Examples
More informationOne-Way Independent ANOVA
One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.
More informationIn Class Problem Discovery of Drug Side Effect Using Study Designer
In Class Problem Discovery of Drug Side Effect Using Study Designer November 12, 2015 Version 30 Tutorial 25: In Class problem Discovery of drug side effect Gatifloxacin. https://www.youtube.com/watch?v=2die3bc3dzg
More informationPrincipal Components Factor Analysis in the Literature. Stage 1: Define the Research Problem
Principal Components Factor Analysis in the Literature This problem is taken from the research article: Charles P. Flynn and Suzanne R. Kunkel, "Deprivation, Compensation, and Conceptions of an Afterlife."
More informationChapter 1. Introduction
Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a
More informationCNV PCA Search Tutorial
CNV PCA Search Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Data Preparation 2 A. Join Log Ratio Data with Phenotype Information.............................. 2 B. Activate only
More informationEstimating national adult prevalence of HIV-1 in Generalized Epidemics
Estimating national adult prevalence of HIV-1 in Generalized Epidemics You are now ready to begin using EPP to generate HIV prevalence estimates for use in the Spectrum program. Introduction REMEMBER The
More informationAnswers to end of chapter questions
Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are
More information1) What is the independent variable? What is our Dependent Variable?
1) What is the independent variable? What is our Dependent Variable? Independent Variable: Whether the font color and word name are the same or different. (Congruency) Dependent Variable: The amount of
More informationA Comparison of Collaborative Filtering Methods for Medication Reconciliation
A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,
More informationBelow, we included the point-to-point response to the comments of both reviewers.
To the Editor and Reviewers: We would like to thank the editor and reviewers for careful reading, and constructive suggestions for our manuscript. According to comments from both reviewers, we have comprehensively
More informationChapter 1: Managing workbooks
Chapter 1: Managing workbooks Module A: Managing worksheets You use the Insert tab on the ribbon to insert new worksheets. True or False? Which of the following are options for moving or copying a worksheet?
More informationABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Data Mining Techniques to Predict Cancer Diseases
More informationAudio: In this lecture we are going to address psychology as a science. Slide #2
Psychology 312: Lecture 2 Psychology as a Science Slide #1 Psychology As A Science In this lecture we are going to address psychology as a science. Slide #2 Outline Psychology is an empirical science.
More informationInvestigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.
The Reliability of PLATO Running Head: THE RELIABILTY OF PLATO Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO M. Ken Cor Stanford University School of Education April,
More informationCHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL
127 CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL 6.1 INTRODUCTION Analyzing the human behavior in video sequences is an active field of research for the past few years. The vital applications of this field
More informationChapter 5: Producing Data
Chapter 5: Producing Data Key Vocabulary: observational study vs. experiment confounded variables population vs. sample sampling vs. census sample design voluntary response sampling convenience sampling
More informationSound Texture Classification Using Statistics from an Auditory Model
Sound Texture Classification Using Statistics from an Auditory Model Gabriele Carotti-Sha Evan Penn Daniel Villamizar Electrical Engineering Email: gcarotti@stanford.edu Mangement Science & Engineering
More informationSteps to Creating a New Workout Program
Steps to Creating a New Workout Program Step 1: Log into lab website: https://fitnessandhealthpromotion.ca/ a. If you have never logged in, use your FOL username without the @fanshaweonline.ca portion
More informationVariant Classification. Author: Mike Thiesen, Golden Helix, Inc.
Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets
More informationECDC HIV Modelling Tool User Manual
ECDC HIV Modelling Tool User Manual Version 1.3.0 European Centre for Disease Prevention and Control 20 December 2017 1 Table of Contents 2 Introduction... 3 2.1 Incidence Method... 3 2.2 London Method...
More informationUSER GUIDE: NEW CIR APP. Technician User Guide
USER GUIDE: NEW CIR APP. Technician User Guide 0 Table of Contents 1 A New CIR User Interface Why?... 3 2 How to get started?... 3 3 Navigating the new CIR app. user interface... 6 3.1 Introduction...
More informationLEFT VENTRICLE SEGMENTATION AND MEASUREMENT Using Analyze
LEFT VENTRICLE SEGMENTATION AND MEASUREMENT Using Analyze 2 Table of Contents 1. Introduction page 3 2. Segmentation page 4 3. Measurement Instructions page 11 4. Calculation Instructions page 14 5. References
More informationAnalysis of Cow Culling Data with a Machine Learning Workbench. by Rhys E. DeWar 1 and Robert J. McQueen 2. Working Paper 95/1 January, 1995
Working Paper Series ISSN 1170-487X Analysis of Cow Culling Data with a Machine Learning Workbench by Rhys E. DeWar 1 and Robert J. McQueen 2 Working Paper 95/1 January, 1995 1995 by Rhys E. DeWar & Robert
More informationQuantitative and Qualitative Approaches 1
Quantitative and Qualitative Approaches 1 Quantitative research (i.e., survey research) Focuses on specific behaviors that can be easily quantified Uses large samples Assigns numerical values to responses
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationAppendix: Instructions for Treatment Index B (Human Opponents, With Recommendations)
Appendix: Instructions for Treatment Index B (Human Opponents, With Recommendations) This is an experiment in the economics of strategic decision making. Various agencies have provided funds for this research.
More informationDRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials
DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials EFSPI Comments Page General Priority (H/M/L) Comment The concept to develop
More informationexisting statistical techniques. However, even with some statistical background, reading and
STRUCTURAL EQUATION MODELING (SEM): A STEP BY STEP APPROACH (PART 1) By: Zuraidah Zainol (PhD) Faculty of Management & Economics, Universiti Pendidikan Sultan Idris zuraidah@fpe.upsi.edu.my 2016 INTRODUCTION
More informationMultilevel modelling of PMETB data on trainee satisfaction and supervision
Multilevel modelling of PMETB data on trainee satisfaction and supervision Chris McManus March 2007. This final report on the PMETB trainee survey of 2006 is based on a finalised version of the SPSS data
More informationThe Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016
The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 This course does not cover how to perform statistical tests on SPSS or any other computer program. There are several courses
More information11. NATIONAL DAFNE CLINICAL AND RESEARCH DATABASE
11. NATIONAL DAFNE CLINICAL AND RESEARCH DATABASE The National DAFNE Clinical and Research database was set up as part of the DAFNE QA programme (refer to section 12) to facilitate Audit and was updated
More informationBIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA
BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA PART 1: Introduction to Factorial ANOVA ingle factor or One - Way Analysis of Variance can be used to test the null hypothesis that k or more treatment or group
More informationIdentifying or Verifying the Number of Factors to Extract using Very Simple Structure.
Identifying or Verifying the Number of Factors to Extract using Very Simple Structure. As published in Benchmarks RSS Matters, December 2014 http://web3.unt.edu/benchmarks/issues/2014/12/rss-matters Jon
More informationTHE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER
THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION
More informationModule 3: Pathway and Drug Development
Module 3: Pathway and Drug Development Table of Contents 1.1 Getting Started... 6 1.2 Identifying a Dasatinib sensitive cancer signature... 7 1.2.1 Identifying and validating a Dasatinib Signature... 7
More informationMS/MS Library Creation of Q-TOF LC/MS Data for MassHunter PCDL Manager
MS/MS Library Creation of Q-TOF LC/MS Data for MassHunter PCDL Manager Quick Start Guide Step 1. Calibrate the Q-TOF LC/MS for low m/z ratios 2 Step 2. Set up a Flow Injection Analysis (FIA) method for
More informationTwo-Way Independent ANOVA
Two-Way Independent ANOVA Analysis of Variance (ANOVA) a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment. There
More informationFinal. Marking Guidelines. Biology BIO6T/Q13. (Specification 2410) Unit 6T: Investigative Skills Assignment
Version 2.0 General Certificate of Education (A-level) June Biology BIO6T/Q13 (Specification 2410) Unit 6T: Investigative Skills Assignment Final Marking Guidelines Mark Schemes are prepared by the Principal
More informationAuthor s response to reviews
Author s response to reviews Title: The validity of a professional competence tool for physiotherapy students in simulationbased clinical education: a Rasch analysis Authors: Belinda Judd (belinda.judd@sydney.edu.au)
More informationCHAPTER 3 DATA ANALYSIS: DESCRIBING DATA
Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such
More informationEarly Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.
The temporal structure of motor variability is dynamically regulated and predicts individual differences in motor learning ability Howard Wu *, Yohsuke Miyamoto *, Luis Nicolas Gonzales-Castro, Bence P.
More informationIdentification of Tissue Independent Cancer Driver Genes
Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important
More informationMULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES
24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter
More informationProgram instructions
Program instructions Joe Marksteiner has written a truly remarkable program for use at Powerlifting events. Automatic sorting, weight class look-ups, Wilks formula calculations, results printing, a displayed
More informationAP STATISTICS 2007 SCORING GUIDELINES
AP STATISTICS 2007 SCING GUIDELINES Question 1 Intent of Question The goals of this question are to assess a student s ability to: (1) explain how a commonly used statistic measures variability; (2) use
More informationAssurance Activities Ensuring Triumph, Avoiding Tragedy Tony Boswell
Assurance Activities Ensuring Triumph, Avoiding Tragedy Tony Boswell CLEF Technical Manager SiVenture 1 Overview What are Assurance Activities? How do they relate to earlier CC work? What are we looking
More informationTop 10 Tips for Successful Searching ASMS 2003
Top 10 Tips for Successful Searching I'd like to present our top 10 tips for successful searching with Mascot. Like any hit parade, we will, of course, count them off in reverse order 1 10. Don t specify
More informationIdentifying Parkinson s Patients: A Functional Gradient Boosting Approach
Identifying Parkinson s Patients: A Functional Gradient Boosting Approach Devendra Singh Dhami 1, Ameet Soni 2, David Page 3, and Sriraam Natarajan 1 1 Indiana University Bloomington 2 Swarthmore College
More informationNature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.
Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze
More informationStata: Merge and append Topics: Merging datasets, appending datasets - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1. Terms There are several situations when working with
More informationECDC HIV Modelling Tool User Manual version 1.0.0
ECDC HIV Modelling Tool User Manual version 1 Copyright European Centre for Disease Prevention and Control, 2015 All rights reserved. No part of the contents of this document may be reproduced or transmitted
More informationFunctional connectivity in fmri
Functional connectivity in fmri Cyril Pernet, PhD Language and Categorization Laboratory, Brain Research Imaging Centre, University of Edinburgh Studying networks fmri can be used for studying both, functional
More informationCredal decision trees in noisy domains
Credal decision trees in noisy domains Carlos J. Mantas and Joaquín Abellán Department of Computer Science and Artificial Intelligence University of Granada, Granada, Spain {cmantas,jabellan}@decsai.ugr.es
More informationAppendix B Statistical Methods
Appendix B Statistical Methods Figure B. Graphing data. (a) The raw data are tallied into a frequency distribution. (b) The same data are portrayed in a bar graph called a histogram. (c) A frequency polygon
More informationCANCER DIAGNOSIS USING DATA MINING TECHNOLOGY
CANCER DIAGNOSIS USING DATA MINING TECHNOLOGY Muhammad Shahbaz 1, Shoaib Faruq 2, Muhammad Shaheen 1, Syed Ather Masood 2 1 Department of Computer Science and Engineering, UET, Lahore, Pakistan Muhammad.Shahbaz@gmail.com,
More informationV. LAB REPORT. PART I. ICP-AES (section IVA)
CH 461 & CH 461H 20 V. LAB REPORT The lab report should include an abstract and responses to the following items. All materials should be submitted by each individual, not one copy for the group. The goal
More informationUNEQUAL CELL SIZES DO MATTER
1 of 7 1/12/2010 11:26 AM UNEQUAL CELL SIZES DO MATTER David C. Howell Most textbooks dealing with factorial analysis of variance will tell you that unequal cell sizes alter the analysis in some way. I
More informationMultilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives
DOI 10.1186/s12868-015-0228-5 BMC Neuroscience RESEARCH ARTICLE Open Access Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives Emmeke
More informationECG Beat Recognition using Principal Components Analysis and Artificial Neural Network
International Journal of Electronics Engineering, 3 (1), 2011, pp. 55 58 ECG Beat Recognition using Principal Components Analysis and Artificial Neural Network Amitabh Sharma 1, and Tanushree Sharma 2
More informationPEER REVIEW FILE. Reviewers' Comments: Reviewer #1 (Remarks to the Author)
PEER REVIEW FILE Reviewers' Comments: Reviewer #1 (Remarks to the Author) Movement-related theta rhythm in the hippocampus is a robust and dominant feature of the local field potential of experimental
More information4. Model evaluation & selection
Foundations of Machine Learning CentraleSupélec Fall 2017 4. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr
More information15.053x. OpenSolver (http://opensolver.org/)
15.053x OpenSolver (http://opensolver.org/) 1 Table of Contents Introduction to OpenSolver slides 3-4 Example 1: Diet Problem, Set-Up slides 5-11 Example 1: Diet Problem, Dialog Box slides 12-17 Example
More informationSupplemental Material
1 Supplemental Material Golomb, J.D, and Kanwisher, N. (2012). Higher-level visual cortex represents retinotopic, not spatiotopic, object location. Cerebral Cortex. Contents: - Supplemental Figures S1-S3
More informationRAG Rating Indicator Values
Technical Guide RAG Rating Indicator Values Introduction This document sets out Public Health England s standard approach to the use of RAG ratings for indicator values in relation to comparator or benchmark
More informationError Detection based on neural signals
Error Detection based on neural signals Nir Even- Chen and Igor Berman, Electrical Engineering, Stanford Introduction Brain computer interface (BCI) is a direct communication pathway between the brain
More informationWhy we get hungry: Module 1, Part 1: Full report
Why we get hungry: Module 1, Part 1: Full report Print PDF Does Anyone Understand Hunger? Hunger is not simply a signal that your stomach is out of food. It s not simply a time when your body can switch
More informationTitle: What 'outliers' tell us about missed opportunities for TB control: a cross-sectional study of patients in Mumbai, India
Author's response to reviews Title: What 'outliers' tell us about missed opportunities for TB control: a cross-sectional study of patients in Authors: Anagha Pradhan (anp1002004@yahoo.com) Karina Kielmann
More informationSmart Sensor Based Human Emotion Recognition
Smart Sensor Based Human Emotion Recognition 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 26 27 28 29 3 31 32 33 34 35 36 37 38 39 4 41 42 Abstract A smart sensor based emotion recognition
More information